[jira] [Created] (SPARK-30762) Add dtype="float32" support to vector_to_array UDF
Liang Zhang created SPARK-30762: --- Summary: Add dtype="float32" support to vector_to_array UDF Key: SPARK-30762 URL: https://issues.apache.org/jira/browse/SPARK-30762 Project: Spark Issue Type: Story Components: MLlib Affects Versions: 3.0.0 Reporter: Liang Zhang Previous PR: [https://github.com/apache/spark/blob/master/python/pyspark/ml/functions.py] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30761) Nested pruning should not prune on required child outputs in Generate
[ https://issues.apache.org/jira/browse/SPARK-30761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-30761. - Resolution: Won't Fix > Nested pruning should not prune on required child outputs in Generate > - > > Key: SPARK-30761 > URL: https://issues.apache.org/jira/browse/SPARK-30761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: L. C. Hsieh >Priority: Major > Fix For: 3.0.0 > > > We prune nested fields from Generate. If a child output is required in a top > operator of Generate, we should not prune nested fields on it. Otherwise, the > accessors on top operator could be unresolved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30761) Nested pruning should not prune on required child outputs in Generate
L. C. Hsieh created SPARK-30761: --- Summary: Nested pruning should not prune on required child outputs in Generate Key: SPARK-30761 URL: https://issues.apache.org/jira/browse/SPARK-30761 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: L. C. Hsieh Fix For: 3.0.0 We prune nested fields from Generate. If a child output is required in a top operator of Generate, we should not prune nested fields on it. Otherwise, the accessors on top operator could be unresolved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30711) 64KB JVM bytecode limit - janino.InternalCompilerException
[ https://issues.apache.org/jira/browse/SPARK-30711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033112#comment-17033112 ] Kazuaki Ishizaki commented on SPARK-30711: -- [~schreiber] Sorry, I made a mistake. This test case can pass with master and branch-2.4 in my end. I have one question. Which value do you set into {{spark.sql.codegen.fallback}} ? The idea of the whole-stage codegen is stop using the whole-stage codegen if the generated code is larger than 64KB. For it, [this code|https://github.com/apache/spark/blob/branch-2.4/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L600-L607] catches the {{org.codehaus.janino.InternalCompilerException}} and tries to recompile the code with smaller pieces. > 64KB JVM bytecode limit - janino.InternalCompilerException > -- > > Key: SPARK-30711 > URL: https://issues.apache.org/jira/browse/SPARK-30711 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 > Environment: Windows 10 > Spark 2.4.4 > scalaVersion 2.11.12 > JVM Oracle 1.8.0_221-b11 >Reporter: Frederik Schreiber >Priority: Major > > Exception > {code:java} > ERROR CodeGenerator: failed to compile: > org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": > Code of method "processNext()V" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4" > grows beyond 64 KBERROR CodeGenerator: failed to compile: > org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": > Code of method "processNext()V" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4" > grows beyond 64 KBorg.codehaus.janino.InternalCompilerException: Compiling > "GeneratedClass": Code of method "processNext()V" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4" > grows beyond 64 KB at > org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:382) at > org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:237) at > org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:465) > at > org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313) > at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235) > at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:207) at > org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1290) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1372) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1369) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at > org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) at > org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1238) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:583) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3384) > at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783) > at
[jira] [Updated] (SPARK-30274) Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity
[ https://issues.apache.org/jira/browse/SPARK-30274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30274: -- Affects Version/s: 2.0.2 2.1.3 2.2.3 2.3.4 > Avoid BytesToBytesMap lookup hang forever when holding keys reaching max > capacity > - > > Key: SPARK-30274 > URL: https://issues.apache.org/jira/browse/SPARK-30274 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: release-notes > Fix For: 2.4.5, 3.0.0 > > > BytesToBytesMap.append allows to append keys until the number of keys reaches > MAX_CAPACITY. But once the the pointer array in the map holds MAX_CAPACITY > keys, next time call of lookup will hand forever. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24884) Implement regexp_extract_all
[ https://issues.apache.org/jira/browse/SPARK-24884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033097#comment-17033097 ] jiaan.geng commented on SPARK-24884: I'm working on. > Implement regexp_extract_all > > > Key: SPARK-24884 > URL: https://issues.apache.org/jira/browse/SPARK-24884 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Nick Nicolini >Priority: Major > > I've recently hit many cases of regexp parsing where we need to match on > something that is always arbitrary in length; for example, a text block that > looks something like: > {code:java} > AAA:WORDS| > BBB:TEXT| > MSG:ASDF| > MSG:QWER| > ... > MSG:ZXCV|{code} > Where I need to pull out all values between "MSG:" and "|", which can occur > in each instance between 1 and n times. I cannot reliably use the existing > {{regexp_extract}} method since the number of occurrences is always > arbitrary, and while I can write a UDF to handle this it'd be great if this > was supported natively in Spark. > Perhaps we can implement something like {{regexp_extract_all}} as > [Presto|https://prestodb.io/docs/current/functions/regexp.html] and > [Pig|https://pig.apache.org/docs/latest/api/org/apache/pig/builtin/REGEX_EXTRACT_ALL.html] > have? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30274) Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity
[ https://issues.apache.org/jira/browse/SPARK-30274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30274: -- Affects Version/s: 2.4.4 > Avoid BytesToBytesMap lookup hang forever when holding keys reaching max > capacity > - > > Key: SPARK-30274 > URL: https://issues.apache.org/jira/browse/SPARK-30274 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4, 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: release-notes > Fix For: 2.4.5, 3.0.0 > > > BytesToBytesMap.append allows to append keys until the number of keys reaches > MAX_CAPACITY. But once the the pointer array in the map holds MAX_CAPACITY > keys, next time call of lookup will hand forever. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29918) RecordBinaryComparator should check endianness when compared by long
[ https://issues.apache.org/jira/browse/SPARK-29918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29918: -- Affects Version/s: 2.4.4 > RecordBinaryComparator should check endianness when compared by long > > > Key: SPARK-29918 > URL: https://issues.apache.org/jira/browse/SPARK-29918 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: EdisonWang >Assignee: EdisonWang >Priority: Minor > Labels: correctness > Fix For: 2.4.5, 3.0.0 > > > If the architecture supports unaligned or the offset is 8 bytes aligned, > RecordBinaryComparator compare 8 bytes at a time by reading 8 bytes as a > long. Otherwise, it will compare bytes by bytes. > However, on little-endian machine, the result of compared by a long value > and compared bytes by bytes maybe different. If the architectures in a yarn > cluster is different(Some is unaligned-access capable while others not), then > the sequence of two records after sorted is undetermined, which will result > in the same problem as in https://issues.apache.org/jira/browse/SPARK-23207 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033096#comment-17033096 ] Dongjoon Hyun commented on SPARK-29042: --- Hi, [~viirya]. Could you update the `Affected Version` by checking at least `2.4.4` and `2.3.4`? > Sampling-based RDD with unordered input should be INDETERMINATE > --- > > Key: SPARK-29042 > URL: https://issues.apache.org/jira/browse/SPARK-29042 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > Fix For: 2.4.5, 3.0.0 > > > We have found and fixed the correctness issue when RDD output is > INDETERMINATE. One missing part is sampling-based RDD. This kind of RDDs is > order sensitive to its input. A sampling-based RDD with unordered input, > should be INDETERMINATE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29042: -- Affects Version/s: 2.4.4 > Sampling-based RDD with unordered input should be INDETERMINATE > --- > > Key: SPARK-29042 > URL: https://issues.apache.org/jira/browse/SPARK-29042 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4, 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > Fix For: 2.4.5, 3.0.0 > > > We have found and fixed the correctness issue when RDD output is > INDETERMINATE. One missing part is sampling-based RDD. This kind of RDDs is > order sensitive to its input. A sampling-based RDD with unordered input, > should be INDETERMINATE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30274) Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity
[ https://issues.apache.org/jira/browse/SPARK-30274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30274: -- Labels: release-notes (was: ) > Avoid BytesToBytesMap lookup hang forever when holding keys reaching max > capacity > - > > Key: SPARK-30274 > URL: https://issues.apache.org/jira/browse/SPARK-30274 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: release-notes > Fix For: 2.4.5, 3.0.0 > > > BytesToBytesMap.append allows to append keys until the number of keys reaches > MAX_CAPACITY. But once the the pointer array in the map holds MAX_CAPACITY > keys, next time call of lookup will hand forever. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30312) Preserve path permission when truncate table
[ https://issues.apache.org/jira/browse/SPARK-30312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30312: -- Labels: release-notes (was: ) > Preserve path permission when truncate table > > > Key: SPARK-30312 > URL: https://issues.apache.org/jira/browse/SPARK-30312 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: release-notes > Fix For: 2.4.5, 3.0.0 > > > When Spark SQL truncates table, it deletes the paths of table/partitions, > then re-create new ones. If custom permission/acls are set on the paths, the > metadata will be deleted. > We should preserve original permission/acls if possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29890: -- Labels: release-notes (was: ) > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.3 >Reporter: sandeshyapuram >Assignee: Terry Kim >Priority: Major > Labels: release-notes > Fix For: 2.4.5, 3.0.0 > > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{noformat} > {noformat} > 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: > error looking up the name of group 820818257: No such file or directory > org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could > be: abc, abc.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) > ... 54 elided{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30065) Unable to drop na with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-30065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30065: -- Labels: release-notes (was: ) > Unable to drop na with duplicate columns > > > Key: SPARK-30065 > URL: https://issues.apache.org/jira/browse/SPARK-30065 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Labels: release-notes > Fix For: 2.4.5, 3.0.0 > > > Trying to drop rows with null values fails even when no columns are > specified. This should be allowed: > {code:java} > scala> val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2") > left: org.apache.spark.sql.DataFrame = [col1: string, col2: string] > scala> val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2") > right: org.apache.spark.sql.DataFrame = [col1: string, col2: string] > scala> val df = left.join(right, Seq("col1")) > df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more > field] > scala> df.show > ++++ > |col1|col2|col2| > ++++ > | 1|null| 2| > | 3| 4|null| > ++++ > scala> df.na.drop("any") > org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could > be: col2, col2.; > at > org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28939) SQL configuration are not always propagated
[ https://issues.apache.org/jira/browse/SPARK-28939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28939: -- Labels: release-notes (was: ) > SQL configuration are not always propagated > --- > > Key: SPARK-28939 > URL: https://issues.apache.org/jira/browse/SPARK-28939 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.4 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Major > Labels: release-notes > Fix For: 2.4.5, 3.0.0 > > > The SQL configurations are propagated to executors in order to be effective. > Unfortunately, in some cases, we are missing to propagate them, making them > un-effective. > The problem happens every time {{rdd}} or {{queryExecution.toRdd}} are used. > And this is pretty frequent in the codebase. > Please notice that there are 2 parts of this issue: > - when a user directly uses those APIs > - when Spark invokes them (eg. throughout the ML lib and other usages or the > {{describe}} method on the {{Dataset}} class) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28152) Mapped ShortType to SMALLINT and FloatType to REAL for MsSqlServerDialect
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28152: -- Labels: release-notes (was: ) > Mapped ShortType to SMALLINT and FloatType to REAL for MsSqlServerDialect > - > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3, 3.0.0 >Reporter: Shiv Prashant Sood >Assignee: Shiv Prashant Sood >Priority: Minor > Labels: release-notes > Fix For: 3.0.0 > > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. The issue was observed when validating against > SQLServer. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27812) kubernetes client import non-daemon thread which block jvm exit.
[ https://issues.apache.org/jira/browse/SPARK-27812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27812: -- Labels: release-notes (was: ) > kubernetes client import non-daemon thread which block jvm exit. > > > Key: SPARK-27812 > URL: https://issues.apache.org/jira/browse/SPARK-27812 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.3, 2.4.4 >Reporter: Henry Yu >Assignee: Igor Calabria >Priority: Major > Labels: release-notes > Fix For: 2.4.5, 3.0.0 > > > I try spark-submit to k8s with cluster mode. Driver pod failed to exit with > An Okhttp Websocket Non-Daemon Thread. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21492) Memory leak in SortMergeJoin
[ https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-21492: -- Labels: release-notes (was: ) > Memory leak in SortMergeJoin > > > Key: SPARK-21492 > URL: https://issues.apache.org/jira/browse/SPARK-21492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0 >Reporter: Zhan Zhang >Assignee: Yuanjian Li >Priority: Major > Labels: release-notes > Fix For: 2.4.5, 3.0.0 > > > In SortMergeJoin, if the iterator is not exhausted, there will be memory leak > caused by the Sort. The memory is not released until the task end, and cannot > be used by other operators causing performance drop or OOM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30755) Support Hive 1.2.1's Serde after making built-in Hive to 2.3
[ https://issues.apache.org/jira/browse/SPARK-30755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-30755: Target Version/s: 3.0.0 Description: {noformat} 2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 1): java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe 2020-01-27 05:11:20.446 - stderr> at java.lang.ClassLoader.defineClass1(Native Method) 2020-01-27 05:11:20.446 - stderr> at java.lang.ClassLoader.defineClass(ClassLoader.java:756) 2020-01-27 05:11:20.446 - stderr> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 2020-01-27 05:11:20.446 - stderr> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) 2020-01-27 05:11:20.446 - stderr> at java.net.URLClassLoader.access$100(URLClassLoader.java:74) 2020-01-27 05:11:20.446 - stderr> at java.net.URLClassLoader$1.run(URLClassLoader.java:369) 2020-01-27 05:11:20.446 - stderr> at java.net.URLClassLoader$1.run(URLClassLoader.java:363) 2020-01-27 05:11:20.446 - stderr> at java.security.AccessController.doPrivileged(Native Method) 2020-01-27 05:11:20.446 - stderr> at java.net.URLClassLoader.findClass(URLClassLoader.java:362) 2020-01-27 05:11:20.446 - stderr> at java.lang.ClassLoader.loadClass(ClassLoader.java:418) 2020-01-27 05:11:20.446 - stderr> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) 2020-01-27 05:11:20.446 - stderr> at java.lang.ClassLoader.loadClass(ClassLoader.java:405) 2020-01-27 05:11:20.446 - stderr> at java.lang.ClassLoader.loadClass(ClassLoader.java:351) 2020-01-27 05:11:20.446 - stderr> at java.lang.Class.forName0(Native Method) 2020-01-27 05:11:20.446 - stderr> at java.lang.Class.forName(Class.java:348) 2020-01-27 05:11:20.446 - stderr> at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:104) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:111) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:267) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:208) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.scheduler.Task.run(Task.scala:117) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$6(Executor.scala:567) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1559) 2020-01-27 05:11:20.447 - stderr> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:570) 2020-01-27 05:11:20.447 - stderr> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 2020-01-27 05:11:20.447 - stderr> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 2020-01-27 05:11:20.447 - stderr> at java.lang.Thread.run(Thread.java:748) 2020-01-27 05:11:20.447 - stderr> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe 2020-01-27 05:11:20.447 - stderr> at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 2020-01-27 05:11:20.447 - stderr> at java.lang.ClassLoader.loadClass(ClassLoader.java:418) 2020-01-27 05:11:20.447 - stderr> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) 2020-01-27 05:11:20.447 - stderr> at java.lang.ClassLoader.loadClass(ClassLoader.java:351) 2020-01-27 05:11:20.447 - stderr> ... 31 more {noformat} was: {noformat} 2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 1): java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe 2020-01-27
[jira] [Updated] (SPARK-30755) Support Hive 1.2.1's Serde after making built-in Hive to 2.3
[ https://issues.apache.org/jira/browse/SPARK-30755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-30755: Priority: Blocker (was: Major) > Support Hive 1.2.1's Serde after making built-in Hive to 2.3 > > > Key: SPARK-30755 > URL: https://issues.apache.org/jira/browse/SPARK-30755 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Blocker > > {noformat} > 2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: > ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due > to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor > 1): java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe > 2020-01-27 05:11:20.446 - stderr> at > java.lang.ClassLoader.defineClass1(Native Method) > 2020-01-27 05:11:20.446 - stderr> at > java.lang.ClassLoader.defineClass(ClassLoader.java:756) > 2020-01-27 05:11:20.446 - stderr> at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > 2020-01-27 05:11:20.446 - stderr> at > java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > 2020-01-27 05:11:20.446 - stderr> at > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > 2020-01-27 05:11:20.446 - stderr> at > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > 2020-01-27 05:11:20.446 - stderr> at > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > 2020-01-27 05:11:20.446 - stderr> at > java.security.AccessController.doPrivileged(Native Method) > 2020-01-27 05:11:20.446 - stderr> at > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > 2020-01-27 05:11:20.446 - stderr> at > java.lang.ClassLoader.loadClass(ClassLoader.java:418) > 2020-01-27 05:11:20.446 - stderr> at > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > 2020-01-27 05:11:20.446 - stderr> at > java.lang.ClassLoader.loadClass(ClassLoader.java:405) > 2020-01-27 05:11:20.446 - stderr> at > java.lang.ClassLoader.loadClass(ClassLoader.java:351) > 2020-01-27 05:11:20.446 - stderr> at java.lang.Class.forName0(Native > Method) > 2020-01-27 05:11:20.446 - stderr> at > java.lang.Class.forName(Class.java:348) > 2020-01-27 05:11:20.446 - stderr> at > org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:104) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:111) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:267) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:208) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.scheduler.Task.doRunTask(Task.scala:144) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.scheduler.Task.run(Task.scala:117) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$6(Executor.scala:567) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1559) > 2020-01-27 05:11:20.447 - stderr> at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:570) > 2020-01-27 05:11:20.447 - stderr> at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 2020-01-27 05:11:20.447 - stderr> at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 2020-01-27 05:11:20.447 - stderr> at java.lang.Thread.run(Thread.java:748) > 2020-01-27 05:11:20.447 - stderr> Caused by: > java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe > 2020-01-27 05:11:20.447 - stderr> at > java.net.URLClassLoader.findClass(URLClassLoader.java:382) > 2020-01-27 05:11:20.447 - stderr> at > java.lang.ClassLoader.loadClass(ClassLoader.java:418) > 2020-01-27 05:11:20.447 - stderr> at > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > 2020-01-27
[jira] [Created] (SPARK-30760) Port `millisToDays` and `daysToMillis` on Java 8 time API
Maxim Gekk created SPARK-30760: -- Summary: Port `millisToDays` and `daysToMillis` on Java 8 time API Key: SPARK-30760 URL: https://issues.apache.org/jira/browse/SPARK-30760 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Currently, the `millisToDays` and `daysToMillis` methods of DateTimeUtils use Java 7 (and earlier) time API. The implementation is based on combined calendar - Julian + Gregorian. To be consistent to other date-time function, need to port the methods on Java 8 time API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30759) The cache in StringRegexExpression is not initialized for foldable patterns
[ https://issues.apache.org/jira/browse/SPARK-30759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30759: --- Priority: Minor (was: Major) > The cache in StringRegexExpression is not initialized for foldable patterns > --- > > Key: SPARK-30759 > URL: https://issues.apache.org/jira/browse/SPARK-30759 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: Screen Shot 2020-02-08 at 22.45.50.png > > > In the case of foldable patterns, the cache in StringRegexExpression should > be evaluated once but in fact it is compiled every time. Here is the example: > {code:sql} > SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*'; > {code} > the code > https://github.com/apache/spark/blob/8aebc80e0e67bcb1aa300b8c8b1a209159237632/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L45-L48: > {code:scala} > // try cache the pattern for Literal > private lazy val cache: Pattern = pattern match { > case Literal(value: String, StringType) => compile(value) > case _ => null > } > {code} > The attached screenshot shows that foldable expression doesn't fall to the > first case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30759) The cache in StringRegexExpression is not initialized for foldable patterns
Maxim Gekk created SPARK-30759: -- Summary: The cache in StringRegexExpression is not initialized for foldable patterns Key: SPARK-30759 URL: https://issues.apache.org/jira/browse/SPARK-30759 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.5, 3.0.0 Reporter: Maxim Gekk Attachments: Screen Shot 2020-02-08 at 22.45.50.png In the case of foldable patterns, the cache in StringRegexExpression should be evaluated once but in fact it is compiled every time. Here is the example: {code:sql} SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*'; {code} the code https://github.com/apache/spark/blob/8aebc80e0e67bcb1aa300b8c8b1a209159237632/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L45-L48: {code:scala} // try cache the pattern for Literal private lazy val cache: Pattern = pattern match { case Literal(value: String, StringType) => compile(value) case _ => null } {code} The attached screenshot shows that foldable expression doesn't fall to the first case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30759) The cache in StringRegexExpression is not initialized for foldable patterns
[ https://issues.apache.org/jira/browse/SPARK-30759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30759: --- Attachment: Screen Shot 2020-02-08 at 22.45.50.png > The cache in StringRegexExpression is not initialized for foldable patterns > --- > > Key: SPARK-30759 > URL: https://issues.apache.org/jira/browse/SPARK-30759 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maxim Gekk >Priority: Major > Attachments: Screen Shot 2020-02-08 at 22.45.50.png > > > In the case of foldable patterns, the cache in StringRegexExpression should > be evaluated once but in fact it is compiled every time. Here is the example: > {code:sql} > SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*'; > {code} > the code > https://github.com/apache/spark/blob/8aebc80e0e67bcb1aa300b8c8b1a209159237632/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L45-L48: > {code:scala} > // try cache the pattern for Literal > private lazy val cache: Pattern = pattern match { > case Literal(value: String, StringType) => compile(value) > case _ => null > } > {code} > The attached screenshot shows that foldable expression doesn't fall to the > first case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29292) Fix internal usages of mutable collection for Seq in 2.13
[ https://issues.apache.org/jira/browse/SPARK-29292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032923#comment-17032923 ] Sean R. Owen commented on SPARK-29292: -- This is a pretty good example of most of the changes: https://github.com/srowen/spark/commit/e0aacc173604daf972ff3f0f8949a6d3255e9f98 Note that it's not 100% up to date or complete, and does not by itself make this part work. See the parent JIRA for additional blockers. We will at least need Scala 2.13.2. > Fix internal usages of mutable collection for Seq in 2.13 > - > > Key: SPARK-29292 > URL: https://issues.apache.org/jira/browse/SPARK-29292 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > > Kind of related to https://issues.apache.org/jira/browse/SPARK-27681, but a > simpler subset. > In 2.13, a mutable collection can't be returned as a > {{scala.collection.Seq}}. It's easy enough to call .toSeq on these as that > still works on 2.12. > {code} > [ERROR] [Error] > /Users/seanowen/Documents/spark_2.13/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala:467: > type mismatch; > found : Seq[String] (in scala.collection) > required: Seq[String] (in scala.collection.immutable) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30740) months_between wrong calculation
[ https://issues.apache.org/jira/browse/SPARK-30740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032922#comment-17032922 ] Maxim Gekk commented on SPARK-30740: This is because of the special *if* [https://github.com/apache/spark/blob/a3e3cfa03a18d31370acd9a10562ff5312bb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L603-L605] which was implemented to be compatible with Hive: [https://github.com/apache/hive/blob/287e5d5e4c43beb2bc84a80e342f897494e32c6c/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMonthsBetween.java#L133-L138] > months_between wrong calculation > > > Key: SPARK-30740 > URL: https://issues.apache.org/jira/browse/SPARK-30740 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: nhufas >Priority: Critical > > months_between not calculating right for February > example > > {{select }} > {{ months_between('2020-02-29','2019-12-29')}} > {{,months_between('2020-02-29','2019-12-30') }} > {{,months_between('2020-02-29','2019-12-31') }} > > will generate a result like this > |2|1.96774194|2| > > For 2019-12-30 is calculating wrong. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30758) Spark SQL can't display bracketed comments well in generated golden files
[ https://issues.apache.org/jira/browse/SPARK-30758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032895#comment-17032895 ] jiaan.geng commented on SPARK-30758: I'm working on. > Spark SQL can't display bracketed comments well in generated golden files > - > > Key: SPARK-30758 > URL: https://issues.apache.org/jira/browse/SPARK-30758 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > Although Spark SQL support bracketed comments, but {{SQLQueryTestSuite}} > can't treat bracketed comments well and lead to generated golden files can't > display bracketed comments well. > We can read the output of comments.sql > [https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/postgreSQL/comments.sql.out] > Such as: > > {code:java} > -- !query/* This is an example of SQL which should not execute: * select > 'multi-line'-- !query schemastruct<>-- !query > outputorg.apache.spark.sql.catalyst.parser.ParseException > mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', > 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', > 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', > 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', > 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL ==/* This is an example of SQL which should not execute:^^^ * select > 'multi-line' > -- !query*/SELECT 'after multi-line' AS fifth-- !query schemastruct<>-- > !query outputorg.apache.spark.sql.catalyst.parser.ParseException > extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', > 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', > 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', > 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', > 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL ==*/^^^SELECT 'after multi-line' AS fifth > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30758) Spark SQL can't display bracketed comments well in generated golden files
[ https://issues.apache.org/jira/browse/SPARK-30758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-30758: --- Summary: Spark SQL can't display bracketed comments well in generated golden files (was: Spark SQL can't treat bracketed comments well and lead to generated golden files can't display bracketed comments well.) > Spark SQL can't display bracketed comments well in generated golden files > - > > Key: SPARK-30758 > URL: https://issues.apache.org/jira/browse/SPARK-30758 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > Although Spark SQL support bracketed comments, but {{SQLQueryTestSuite}} > can't treat bracketed comments well and lead to generated golden files can't > display bracketed comments well. > We can read the output of comments.sql > [https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/postgreSQL/comments.sql.out] > Such as: > > {code:java} > -- !query/* This is an example of SQL which should not execute: * select > 'multi-line'-- !query schemastruct<>-- !query > outputorg.apache.spark.sql.catalyst.parser.ParseException > mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', > 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', > 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', > 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', > 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL ==/* This is an example of SQL which should not execute:^^^ * select > 'multi-line' > -- !query*/SELECT 'after multi-line' AS fifth-- !query schemastruct<>-- > !query outputorg.apache.spark.sql.catalyst.parser.ParseException > extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', > 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', > 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', > 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', > 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL ==*/^^^SELECT 'after multi-line' AS fifth > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30758) Spark SQL can't treat bracketed comments well and lead to generated golden files can't display bracketed comments well.
jiaan.geng created SPARK-30758: -- Summary: Spark SQL can't treat bracketed comments well and lead to generated golden files can't display bracketed comments well. Key: SPARK-30758 URL: https://issues.apache.org/jira/browse/SPARK-30758 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: jiaan.geng Although Spark SQL support bracketed comments, but {{SQLQueryTestSuite}} can't treat bracketed comments well and lead to generated golden files can't display bracketed comments well. We can read the output of comments.sql [https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/postgreSQL/comments.sql.out] Such as: {code:java} -- !query/* This is an example of SQL which should not execute: * select 'multi-line'-- !query schemastruct<>-- !query outputorg.apache.spark.sql.catalyst.parser.ParseException mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL ==/* This is an example of SQL which should not execute:^^^ * select 'multi-line' -- !query*/SELECT 'after multi-line' AS fifth-- !query schemastruct<>-- !query outputorg.apache.spark.sql.catalyst.parser.ParseException extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL ==*/^^^SELECT 'after multi-line' AS fifth {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30740) months_between wrong calculation
[ https://issues.apache.org/jira/browse/SPARK-30740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032894#comment-17032894 ] Yuming Wang commented on SPARK-30740: - cc [~maxgekk] > months_between wrong calculation > > > Key: SPARK-30740 > URL: https://issues.apache.org/jira/browse/SPARK-30740 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: nhufas >Priority: Critical > > months_between not calculating right for February > example > > {{select }} > {{ months_between('2020-02-29','2019-12-29')}} > {{,months_between('2020-02-29','2019-12-30') }} > {{,months_between('2020-02-29','2019-12-31') }} > > will generate a result like this > |2|1.96774194|2| > > For 2019-12-30 is calculating wrong. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28880) ANSI SQL: Nested bracketed comments
[ https://issues.apache.org/jira/browse/SPARK-28880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-28880: --- Description: Spark SQL support these bracketed comments: *Case 1*: {code:sql} /* This is an example of SQL which should not execute: * select 'multi-line'; */ {code} *Case 2*: {code:sql} /* SELECT 'trailing' as x1; -- inside block comment */ {code} But Spark SQL not support nested bracketed comments show below: *Case 3*: {code:sql} /* This block comment surrounds a query which itself has a block comment... SELECT /* embedded single line */ 'embedded' AS x2; */ {code} *Case 4*: {code:sql} SELECT -- continued after the following block comments... /* Deeply nested comment. This includes a single apostrophe to make sure we aren't decoding this part as a string. SELECT 'deep nest' AS n1; /* Second level of nesting... SELECT 'deeper nest' as n2; /* Third level of nesting... SELECT 'deepest nest' as n3; */ Hoo boy. Still two deep... */ Now just one deep... */ 'deeply nested example' AS sixth; {code} *bracketed comments* Bracketed comments are introduced by /* and end with */. [https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.sql.ref.doc/doc/c0056402.html] [https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS] Feature ID: T351 was: We can not support these bracketed comments: *Case 1*: {code:sql} /* This is an example of SQL which should not execute: * select 'multi-line'; */ {code} *Case 2*: {code:sql} /* SELECT 'trailing' as x1; -- inside block comment */ {code} *Case 3*: {code:sql} /* This block comment surrounds a query which itself has a block comment... SELECT /* embedded single line */ 'embedded' AS x2; */ {code} *Case 4*: {code:sql} SELECT -- continued after the following block comments... /* Deeply nested comment. This includes a single apostrophe to make sure we aren't decoding this part as a string. SELECT 'deep nest' AS n1; /* Second level of nesting... SELECT 'deeper nest' as n2; /* Third level of nesting... SELECT 'deepest nest' as n3; */ Hoo boy. Still two deep... */ Now just one deep... */ 'deeply nested example' AS sixth; {code} *bracketed comments* Bracketed comments are introduced by /* and end with */. [https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.sql.ref.doc/doc/c0056402.html] [https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS] Feature ID: T351 > ANSI SQL: Nested bracketed comments > --- > > Key: SPARK-28880 > URL: https://issues.apache.org/jira/browse/SPARK-28880 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Spark SQL support these bracketed comments: > *Case 1*: > {code:sql} > /* This is an example of SQL which should not execute: > * select 'multi-line'; > */ > {code} > *Case 2*: > {code:sql} > /* > SELECT 'trailing' as x1; -- inside block comment > */ > {code} > But Spark SQL not support nested bracketed comments show below: > *Case 3*: > {code:sql} > /* This block comment surrounds a query which itself has a block comment... > SELECT /* embedded single line */ 'embedded' AS x2; > */ > {code} > *Case 4*: > {code:sql} > SELECT -- continued after the following block comments... > /* Deeply nested comment. >This includes a single apostrophe to make sure we aren't decoding this > part as a string. > SELECT 'deep nest' AS n1; > /* Second level of nesting... > SELECT 'deeper nest' as n2; > /* Third level of nesting... > SELECT 'deepest nest' as n3; > */ > Hoo boy. Still two deep... > */ > Now just one deep... > */ > 'deeply nested example' AS sixth; > {code} > *bracketed comments* > Bracketed comments are introduced by /* and end with */. > [https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.sql.ref.doc/doc/c0056402.html] > [https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS] > Feature ID: T351 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28880) ANSI SQL: Nested bracketed comments
[ https://issues.apache.org/jira/browse/SPARK-28880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-28880: --- Summary: ANSI SQL: Nested bracketed comments (was: ANSI SQL: Bracketed comments) > ANSI SQL: Nested bracketed comments > --- > > Key: SPARK-28880 > URL: https://issues.apache.org/jira/browse/SPARK-28880 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > We can not support these bracketed comments: > *Case 1*: > {code:sql} > /* This is an example of SQL which should not execute: > * select 'multi-line'; > */ > {code} > *Case 2*: > {code:sql} > /* > SELECT 'trailing' as x1; -- inside block comment > */ > {code} > *Case 3*: > {code:sql} > /* This block comment surrounds a query which itself has a block comment... > SELECT /* embedded single line */ 'embedded' AS x2; > */ > {code} > *Case 4*: > {code:sql} > SELECT -- continued after the following block comments... > /* Deeply nested comment. >This includes a single apostrophe to make sure we aren't decoding this > part as a string. > SELECT 'deep nest' AS n1; > /* Second level of nesting... > SELECT 'deeper nest' as n2; > /* Third level of nesting... > SELECT 'deepest nest' as n3; > */ > Hoo boy. Still two deep... > */ > Now just one deep... > */ > 'deeply nested example' AS sixth; > {code} > *bracketed comments* > Bracketed comments are introduced by /* and end with */. > [https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.sql.ref.doc/doc/c0056402.html] > [https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS] > Feature ID: T351 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-30724) Support 'like any' and 'like all' operators
[ https://issues.apache.org/jira/browse/SPARK-30724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-30724: --- Comment: was deleted (was: I will investigate this feature.) > Support 'like any' and 'like all' operators > --- > > Key: SPARK-30724 > URL: https://issues.apache.org/jira/browse/SPARK-30724 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > In Teradata/Hive and PostgreSQL 'like any' and 'like all' operators are > mostly used when we are matching a text field with numbers of patterns. For > example: > Teradata / Hive 3.0: > {code:sql} > --like any > select 'foo' LIKE ANY ('%foo%','%bar%'); > --like all > select 'foo' LIKE ALL ('%foo%','%bar%'); > {code} > PostgreSQL: > {code:sql} > -- like any > select 'foo' LIKE ANY (array['%foo%','%bar%']); > -- like all > select 'foo' LIKE ALL (array['%foo%','%bar%']); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org