[jira] [Created] (SPARK-30762) Add dtype="float32" support to vector_to_array UDF

2020-02-08 Thread Liang Zhang (Jira)
Liang Zhang created SPARK-30762:
---

 Summary: Add dtype="float32" support to vector_to_array UDF
 Key: SPARK-30762
 URL: https://issues.apache.org/jira/browse/SPARK-30762
 Project: Spark
  Issue Type: Story
  Components: MLlib
Affects Versions: 3.0.0
Reporter: Liang Zhang


Previous PR: 
[https://github.com/apache/spark/blob/master/python/pyspark/ml/functions.py]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30761) Nested pruning should not prune on required child outputs in Generate

2020-02-08 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-30761.
-
Resolution: Won't Fix

> Nested pruning should not prune on required child outputs in Generate
> -
>
> Key: SPARK-30761
> URL: https://issues.apache.org/jira/browse/SPARK-30761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: L. C. Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> We prune nested fields from Generate. If a child output is required in a top 
> operator of Generate, we should not prune nested fields on it. Otherwise, the 
> accessors on top operator could be unresolved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30761) Nested pruning should not prune on required child outputs in Generate

2020-02-08 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-30761:
---

 Summary: Nested pruning should not prune on required child outputs 
in Generate
 Key: SPARK-30761
 URL: https://issues.apache.org/jira/browse/SPARK-30761
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: L. C. Hsieh
 Fix For: 3.0.0


We prune nested fields from Generate. If a child output is required in a top 
operator of Generate, we should not prune nested fields on it. Otherwise, the 
accessors on top operator could be unresolved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30711) 64KB JVM bytecode limit - janino.InternalCompilerException

2020-02-08 Thread Kazuaki Ishizaki (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033112#comment-17033112
 ] 

Kazuaki Ishizaki commented on SPARK-30711:
--

[~schreiber] Sorry, I made a mistake. This test case can pass with master and 
branch-2.4 in my end.

I have one question. Which value do you set into {{spark.sql.codegen.fallback}} 
?  The idea of the whole-stage codegen is stop using the whole-stage codegen if 
the generated code is larger than 64KB. For it, [this 
code|https://github.com/apache/spark/blob/branch-2.4/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L600-L607]
 catches the {{org.codehaus.janino.InternalCompilerException}} and tries to 
recompile the code with smaller pieces.

> 64KB JVM bytecode limit - janino.InternalCompilerException
> --
>
> Key: SPARK-30711
> URL: https://issues.apache.org/jira/browse/SPARK-30711
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
> Environment: Windows 10
> Spark 2.4.4
> scalaVersion 2.11.12
> JVM Oracle 1.8.0_221-b11
>Reporter: Frederik Schreiber
>Priority: Major
>
> Exception
> {code:java}
> ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": 
> Code of method "processNext()V" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4"
>  grows beyond 64 KBERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": 
> Code of method "processNext()V" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4"
>  grows beyond 64 KBorg.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Code of method "processNext()V" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4"
>  grows beyond 64 KB at 
> org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:382) at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:237) at 
> org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:465)
>  at 
> org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313)
>  at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235) 
> at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:207) at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1290)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1372)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1369)
>  at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at 
> org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1238)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:583)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) 
> at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296) 
> at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3384)
>  at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783) 
> at 

[jira] [Updated] (SPARK-30274) Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30274:
--
Affects Version/s: 2.0.2
   2.1.3
   2.2.3
   2.3.4

> Avoid BytesToBytesMap lookup hang forever when holding keys reaching max 
> capacity
> -
>
> Key: SPARK-30274
> URL: https://issues.apache.org/jira/browse/SPARK-30274
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.5, 3.0.0
>
>
> BytesToBytesMap.append allows to append keys until the number of keys reaches 
> MAX_CAPACITY. But once the the pointer array in the map holds MAX_CAPACITY 
> keys, next time call of lookup will hand forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24884) Implement regexp_extract_all

2020-02-08 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033097#comment-17033097
 ] 

jiaan.geng commented on SPARK-24884:


I'm working on.

> Implement regexp_extract_all
> 
>
> Key: SPARK-24884
> URL: https://issues.apache.org/jira/browse/SPARK-24884
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Nick Nicolini
>Priority: Major
>
> I've recently hit many cases of regexp parsing where we need to match on 
> something that is always arbitrary in length; for example, a text block that 
> looks something like:
> {code:java}
> AAA:WORDS|
> BBB:TEXT|
> MSG:ASDF|
> MSG:QWER|
> ...
> MSG:ZXCV|{code}
> Where I need to pull out all values between "MSG:" and "|", which can occur 
> in each instance between 1 and n times. I cannot reliably use the existing 
> {{regexp_extract}} method since the number of occurrences is always 
> arbitrary, and while I can write a UDF to handle this it'd be great if this 
> was supported natively in Spark.
> Perhaps we can implement something like {{regexp_extract_all}} as 
> [Presto|https://prestodb.io/docs/current/functions/regexp.html] and 
> [Pig|https://pig.apache.org/docs/latest/api/org/apache/pig/builtin/REGEX_EXTRACT_ALL.html]
>  have?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30274) Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30274:
--
Affects Version/s: 2.4.4

> Avoid BytesToBytesMap lookup hang forever when holding keys reaching max 
> capacity
> -
>
> Key: SPARK-30274
> URL: https://issues.apache.org/jira/browse/SPARK-30274
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4, 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.5, 3.0.0
>
>
> BytesToBytesMap.append allows to append keys until the number of keys reaches 
> MAX_CAPACITY. But once the the pointer array in the map holds MAX_CAPACITY 
> keys, next time call of lookup will hand forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29918) RecordBinaryComparator should check endianness when compared by long

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29918:
--
Affects Version/s: 2.4.4

> RecordBinaryComparator should check endianness when compared by long
> 
>
> Key: SPARK-29918
> URL: https://issues.apache.org/jira/browse/SPARK-29918
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: EdisonWang
>Assignee: EdisonWang
>Priority: Minor
>  Labels: correctness
> Fix For: 2.4.5, 3.0.0
>
>
> If the architecture supports unaligned or the offset is 8 bytes aligned, 
> RecordBinaryComparator compare 8 bytes at a time by reading 8 bytes as a 
> long. Otherwise, it will compare bytes by bytes. 
> However, on little-endian machine,  the result of compared by a long value 
> and compared bytes by bytes maybe different. If the architectures in a yarn 
> cluster is different(Some is unaligned-access capable while others not), then 
> the sequence of two records after sorted is undetermined, which will result 
> in the same problem as in https://issues.apache.org/jira/browse/SPARK-23207
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2020-02-08 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033096#comment-17033096
 ] 

Dongjoon Hyun commented on SPARK-29042:
---

Hi, [~viirya]. Could you update the `Affected Version` by checking at least 
`2.4.4` and `2.3.4`?

> Sampling-based RDD with unordered input should be INDETERMINATE
> ---
>
> Key: SPARK-29042
> URL: https://issues.apache.org/jira/browse/SPARK-29042
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 2.4.5, 3.0.0
>
>
> We have found and fixed the correctness issue when RDD output is 
> INDETERMINATE. One missing part is sampling-based RDD. This kind of RDDs is 
> order sensitive to its input. A sampling-based RDD with unordered input, 
> should be INDETERMINATE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29042:
--
Affects Version/s: 2.4.4

> Sampling-based RDD with unordered input should be INDETERMINATE
> ---
>
> Key: SPARK-29042
> URL: https://issues.apache.org/jira/browse/SPARK-29042
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4, 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 2.4.5, 3.0.0
>
>
> We have found and fixed the correctness issue when RDD output is 
> INDETERMINATE. One missing part is sampling-based RDD. This kind of RDDs is 
> order sensitive to its input. A sampling-based RDD with unordered input, 
> should be INDETERMINATE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30274) Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30274:
--
Labels: release-notes  (was: )

> Avoid BytesToBytesMap lookup hang forever when holding keys reaching max 
> capacity
> -
>
> Key: SPARK-30274
> URL: https://issues.apache.org/jira/browse/SPARK-30274
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.5, 3.0.0
>
>
> BytesToBytesMap.append allows to append keys until the number of keys reaches 
> MAX_CAPACITY. But once the the pointer array in the map holds MAX_CAPACITY 
> keys, next time call of lookup will hand forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30312) Preserve path permission when truncate table

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30312:
--
Labels: release-notes  (was: )

> Preserve path permission when truncate table
> 
>
> Key: SPARK-30312
> URL: https://issues.apache.org/jira/browse/SPARK-30312
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.5, 3.0.0
>
>
> When Spark SQL truncates table, it deletes the paths of table/partitions, 
> then re-create new ones. If custom permission/acls are set on the paths, the 
> metadata will be deleted.
> We should preserve original permission/acls if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29890) Unable to fill na with 0 with duplicate columns

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29890:
--
Labels: release-notes  (was: )

> Unable to fill na with 0 with duplicate columns
> ---
>
> Key: SPARK-29890
> URL: https://issues.apache.org/jira/browse/SPARK-29890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.3
>Reporter: sandeshyapuram
>Assignee: Terry Kim
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.5, 3.0.0
>
>
> Trying to fill out na values with 0.
> {noformat}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> val parent = 
> spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc")
> val c1 = parent.filter(lit(true))
> val c2 = parent.filter(lit(true))
> c1.join(c2, Seq("nums"), "left")
> .na.fill(0).show{noformat}
> {noformat}
> 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: 
> error looking up the name of group 820818257: No such file or directory
> org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could 
> be: abc, abc.;
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117)
>   at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220)
>   at org.apache.spark.sql.Dataset.col(Dataset.scala:1246)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134)
>   ... 54 elided{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30065) Unable to drop na with duplicate columns

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30065:
--
Labels: release-notes  (was: )

> Unable to drop na with duplicate columns
> 
>
> Key: SPARK-30065
> URL: https://issues.apache.org/jira/browse/SPARK-30065
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.5, 3.0.0
>
>
> Trying to drop rows with null values fails even when no columns are 
> specified. This should be allowed:
> {code:java}
> scala> val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2")
> left: org.apache.spark.sql.DataFrame = [col1: string, col2: string]
> scala> val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2")
> right: org.apache.spark.sql.DataFrame = [col1: string, col2: string]
> scala> val df = left.join(right, Seq("col1"))
> df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more 
> field]
> scala> df.show
> ++++
> |col1|col2|col2|
> ++++
> |   1|null|   2|
> |   3|   4|null|
> ++++
> scala> df.na.drop("any")
> org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could 
> be: col2, col2.;
>   at 
> org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28939) SQL configuration are not always propagated

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28939:
--
Labels: release-notes  (was: )

> SQL configuration are not always propagated
> ---
>
> Key: SPARK-28939
> URL: https://issues.apache.org/jira/browse/SPARK-28939
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.5, 3.0.0
>
>
> The SQL configurations are propagated to executors in order to be effective.
> Unfortunately, in some cases, we are missing to propagate them, making them 
> un-effective.
> The problem happens every time {{rdd}} or {{queryExecution.toRdd}} are used. 
> And this is pretty frequent in the codebase.
> Please notice that there are 2 parts of this issue:
>  - when a user directly uses those APIs
>  - when Spark invokes them (eg. throughout the ML lib and other usages or the 
> {{describe}} method on the {{Dataset}} class)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28152) Mapped ShortType to SMALLINT and FloatType to REAL for MsSqlServerDialect

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28152:
--
Labels: release-notes  (was: )

> Mapped ShortType to SMALLINT and FloatType to REAL for MsSqlServerDialect
> -
>
> Key: SPARK-28152
> URL: https://issues.apache.org/jira/browse/SPARK-28152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3, 3.0.0
>Reporter: Shiv Prashant Sood
>Assignee: Shiv Prashant Sood
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
>  ShortType and FloatTypes are not correctly mapped to right JDBC types when 
> using JDBC connector. This results in tables and spark data frame being 
> created with unintended types. The issue was observed when validating against 
> SQLServer.
> Some example issue
>  * Write from df with column type results in a SQL table of with column type 
> as INTEGER as opposed to SMALLINT. Thus a larger table that expected.
>  * read results in a dataframe with type INTEGER as opposed to ShortType 
> FloatTypes have a issue with read path. In the write path Spark data type 
> 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in 
> the read path when JDBC data types need to be converted to Catalyst data 
> types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' 
> rather than 'FloatType'.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27812) kubernetes client import non-daemon thread which block jvm exit.

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27812:
--
Labels: release-notes  (was: )

> kubernetes client import non-daemon thread which block jvm exit.
> 
>
> Key: SPARK-27812
> URL: https://issues.apache.org/jira/browse/SPARK-27812
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.3, 2.4.4
>Reporter: Henry Yu
>Assignee: Igor Calabria
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.5, 3.0.0
>
>
> I try spark-submit to k8s with cluster mode. Driver pod failed to exit with 
> An Okhttp Websocket Non-Daemon Thread.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21492) Memory leak in SortMergeJoin

2020-02-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-21492:
--
Labels: release-notes  (was: )

> Memory leak in SortMergeJoin
> 
>
> Key: SPARK-21492
> URL: https://issues.apache.org/jira/browse/SPARK-21492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0
>Reporter: Zhan Zhang
>Assignee: Yuanjian Li
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.5, 3.0.0
>
>
> In SortMergeJoin, if the iterator is not exhausted, there will be memory leak 
> caused by the Sort. The memory is not released until the task end, and cannot 
> be used by other operators causing performance drop or OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30755) Support Hive 1.2.1's Serde after making built-in Hive to 2.3

2020-02-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30755:

Target Version/s: 3.0.0
 Description: 
{noformat}
2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: 
ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due 
to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 1): 
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.defineClass1(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.defineClass(ClassLoader.java:756)
  2020-01-27 05:11:20.446 - stderr>  at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader.access$100(URLClassLoader.java:74)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader$1.run(URLClassLoader.java:369)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader$1.run(URLClassLoader.java:363)
  2020-01-27 05:11:20.446 - stderr>  at 
java.security.AccessController.doPrivileged(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader.findClass(URLClassLoader.java:362)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  2020-01-27 05:11:20.446 - stderr>  at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:405)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName0(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName(Class.java:348)
  2020-01-27 05:11:20.446 - stderr>  at 
org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:104)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:111)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:267)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:208)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.scheduler.Task.run(Task.scala:117)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$6(Executor.scala:567)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1559)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:570)
  2020-01-27 05:11:20.447 - stderr>  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  2020-01-27 05:11:20.447 - stderr>  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2020-01-27 05:11:20.447 - stderr>  at java.lang.Thread.run(Thread.java:748)
  2020-01-27 05:11:20.447 - stderr> Caused by: 
java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe
  2020-01-27 05:11:20.447 - stderr>  at 
java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  2020-01-27 05:11:20.447 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  2020-01-27 05:11:20.447 - stderr>  at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  2020-01-27 05:11:20.447 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  2020-01-27 05:11:20.447 - stderr>  ... 31 more
{noformat}


  was:

{noformat}
2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: 
ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due 
to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 1): 
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe
  2020-01-27 

[jira] [Updated] (SPARK-30755) Support Hive 1.2.1's Serde after making built-in Hive to 2.3

2020-02-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30755:

Priority: Blocker  (was: Major)

> Support Hive 1.2.1's Serde after making built-in Hive to 2.3
> 
>
> Key: SPARK-30755
> URL: https://issues.apache.org/jira/browse/SPARK-30755
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Blocker
>
> {noformat}
> 2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: 
> ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due 
> to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 
> 1): java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.defineClass1(Native Method)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.defineClass(ClassLoader.java:756)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.security.AccessController.doPrivileged(Native Method)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   2020-01-27 05:11:20.446 - stderr>  at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:405)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName0(Native 
> Method)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.Class.forName(Class.java:348)
>   2020-01-27 05:11:20.446 - stderr>  at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:104)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:111)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:267)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:208)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.scheduler.Task.run(Task.scala:117)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$6(Executor.scala:567)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1559)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:570)
>   2020-01-27 05:11:20.447 - stderr>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   2020-01-27 05:11:20.447 - stderr>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   2020-01-27 05:11:20.447 - stderr>  at java.lang.Thread.run(Thread.java:748)
>   2020-01-27 05:11:20.447 - stderr> Caused by: 
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe
>   2020-01-27 05:11:20.447 - stderr>  at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   2020-01-27 05:11:20.447 - stderr>  at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   2020-01-27 05:11:20.447 - stderr>  at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   2020-01-27 

[jira] [Created] (SPARK-30760) Port `millisToDays` and `daysToMillis` on Java 8 time API

2020-02-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30760:
--

 Summary: Port `millisToDays` and `daysToMillis` on Java 8 time API
 Key: SPARK-30760
 URL: https://issues.apache.org/jira/browse/SPARK-30760
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Currently, the `millisToDays` and `daysToMillis` methods of DateTimeUtils use 
Java 7 (and earlier) time API. The implementation is based on combined calendar 
- Julian + Gregorian. To be consistent to other date-time function, need to 
port the methods on Java 8 time API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30759) The cache in StringRegexExpression is not initialized for foldable patterns

2020-02-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30759:
---
Priority: Minor  (was: Major)

> The cache in StringRegexExpression is not initialized for foldable patterns
> ---
>
> Key: SPARK-30759
> URL: https://issues.apache.org/jira/browse/SPARK-30759
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: Screen Shot 2020-02-08 at 22.45.50.png
>
>
> In the case of foldable patterns, the cache in StringRegexExpression should 
> be evaluated once but in fact it is compiled every time. Here is the example:
> {code:sql}
> SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*';
> {code}
> the code 
> https://github.com/apache/spark/blob/8aebc80e0e67bcb1aa300b8c8b1a209159237632/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L45-L48:
> {code:scala}
>   // try cache the pattern for Literal
>   private lazy val cache: Pattern = pattern match {
> case Literal(value: String, StringType) => compile(value)
> case _ => null
>   }
> {code}
> The attached screenshot shows that foldable expression doesn't fall to the 
> first case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30759) The cache in StringRegexExpression is not initialized for foldable patterns

2020-02-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30759:
--

 Summary: The cache in StringRegexExpression is not initialized for 
foldable patterns
 Key: SPARK-30759
 URL: https://issues.apache.org/jira/browse/SPARK-30759
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5, 3.0.0
Reporter: Maxim Gekk
 Attachments: Screen Shot 2020-02-08 at 22.45.50.png

In the case of foldable patterns, the cache in StringRegexExpression should be 
evaluated once but in fact it is compiled every time. Here is the example:
{code:sql}
SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*';
{code}
the code 
https://github.com/apache/spark/blob/8aebc80e0e67bcb1aa300b8c8b1a209159237632/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L45-L48:
{code:scala}
  // try cache the pattern for Literal
  private lazy val cache: Pattern = pattern match {
case Literal(value: String, StringType) => compile(value)
case _ => null
  }
{code}
The attached screenshot shows that foldable expression doesn't fall to the 
first case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30759) The cache in StringRegexExpression is not initialized for foldable patterns

2020-02-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30759:
---
Attachment: Screen Shot 2020-02-08 at 22.45.50.png

> The cache in StringRegexExpression is not initialized for foldable patterns
> ---
>
> Key: SPARK-30759
> URL: https://issues.apache.org/jira/browse/SPARK-30759
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
> Attachments: Screen Shot 2020-02-08 at 22.45.50.png
>
>
> In the case of foldable patterns, the cache in StringRegexExpression should 
> be evaluated once but in fact it is compiled every time. Here is the example:
> {code:sql}
> SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*';
> {code}
> the code 
> https://github.com/apache/spark/blob/8aebc80e0e67bcb1aa300b8c8b1a209159237632/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L45-L48:
> {code:scala}
>   // try cache the pattern for Literal
>   private lazy val cache: Pattern = pattern match {
> case Literal(value: String, StringType) => compile(value)
> case _ => null
>   }
> {code}
> The attached screenshot shows that foldable expression doesn't fall to the 
> first case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29292) Fix internal usages of mutable collection for Seq in 2.13

2020-02-08 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032923#comment-17032923
 ] 

Sean R. Owen commented on SPARK-29292:
--

This is a pretty good example of most of the changes: 
https://github.com/srowen/spark/commit/e0aacc173604daf972ff3f0f8949a6d3255e9f98

Note that it's not 100% up to date or complete, and does not by itself make 
this part work.

See the parent JIRA for additional blockers. We will at least need Scala 2.13.2.

> Fix internal usages of mutable collection for Seq in 2.13
> -
>
> Key: SPARK-29292
> URL: https://issues.apache.org/jira/browse/SPARK-29292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
>
> Kind of related to https://issues.apache.org/jira/browse/SPARK-27681, but a 
> simpler subset. 
> In 2.13, a mutable collection can't be returned as a 
> {{scala.collection.Seq}}. It's easy enough to call .toSeq on these as that 
> still works on 2.12.
> {code}
> [ERROR] [Error] 
> /Users/seanowen/Documents/spark_2.13/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala:467:
>  type mismatch;
>  found   : Seq[String] (in scala.collection) 
>  required: Seq[String] (in scala.collection.immutable) 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30740) months_between wrong calculation

2020-02-08 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032922#comment-17032922
 ] 

Maxim Gekk commented on SPARK-30740:


This is because of the special *if* 
[https://github.com/apache/spark/blob/a3e3cfa03a18d31370acd9a10562ff5312bb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L603-L605]
 which was implemented to be compatible with Hive: 
[https://github.com/apache/hive/blob/287e5d5e4c43beb2bc84a80e342f897494e32c6c/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMonthsBetween.java#L133-L138]

> months_between wrong calculation
> 
>
> Key: SPARK-30740
> URL: https://issues.apache.org/jira/browse/SPARK-30740
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: nhufas
>Priority: Critical
>
> months_between not calculating right for February
> example
>  
> {{select }}
> {{ months_between('2020-02-29','2019-12-29')}}
> {{,months_between('2020-02-29','2019-12-30') }}
> {{,months_between('2020-02-29','2019-12-31') }}
>  
> will generate a result like this 
> |2|1.96774194|2|
>  
> For 2019-12-30 is calculating wrong.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30758) Spark SQL can't display bracketed comments well in generated golden files

2020-02-08 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032895#comment-17032895
 ] 

jiaan.geng commented on SPARK-30758:


I'm working on.

> Spark SQL can't display bracketed comments well in generated golden files
> -
>
> Key: SPARK-30758
> URL: https://issues.apache.org/jira/browse/SPARK-30758
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> Although Spark SQL support bracketed comments, but {{SQLQueryTestSuite}} 
> can't treat bracketed comments well and lead to generated golden files can't 
> display bracketed comments well.
> We can read the output of comments.sql
> [https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/postgreSQL/comments.sql.out]
> Such as:
>  
> {code:java}
> -- !query/* This is an example of SQL which should not execute: * select 
> 'multi-line'-- !query schemastruct<>-- !query 
> outputorg.apache.spark.sql.catalyst.parser.ParseException
> mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
> 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
> 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==/* This is an example of SQL which should not execute:^^^ * select 
> 'multi-line'
> -- !query*/SELECT 'after multi-line' AS fifth-- !query schemastruct<>-- 
> !query outputorg.apache.spark.sql.catalyst.parser.ParseException
> extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
> 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
> 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==*/^^^SELECT 'after multi-line' AS fifth
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30758) Spark SQL can't display bracketed comments well in generated golden files

2020-02-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-30758:
---
Summary: Spark SQL can't display bracketed comments well in generated 
golden files  (was: Spark SQL can't treat bracketed comments well and lead to 
generated golden files can't display bracketed comments well.)

> Spark SQL can't display bracketed comments well in generated golden files
> -
>
> Key: SPARK-30758
> URL: https://issues.apache.org/jira/browse/SPARK-30758
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> Although Spark SQL support bracketed comments, but {{SQLQueryTestSuite}} 
> can't treat bracketed comments well and lead to generated golden files can't 
> display bracketed comments well.
> We can read the output of comments.sql
> [https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/postgreSQL/comments.sql.out]
> Such as:
>  
> {code:java}
> -- !query/* This is an example of SQL which should not execute: * select 
> 'multi-line'-- !query schemastruct<>-- !query 
> outputorg.apache.spark.sql.catalyst.parser.ParseException
> mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
> 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
> 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==/* This is an example of SQL which should not execute:^^^ * select 
> 'multi-line'
> -- !query*/SELECT 'after multi-line' AS fifth-- !query schemastruct<>-- 
> !query outputorg.apache.spark.sql.catalyst.parser.ParseException
> extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
> 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
> 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==*/^^^SELECT 'after multi-line' AS fifth
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30758) Spark SQL can't treat bracketed comments well and lead to generated golden files can't display bracketed comments well.

2020-02-08 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-30758:
--

 Summary: Spark SQL can't treat bracketed comments well and lead to 
generated golden files can't display bracketed comments well.
 Key: SPARK-30758
 URL: https://issues.apache.org/jira/browse/SPARK-30758
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: jiaan.geng


Although Spark SQL support bracketed comments, but {{SQLQueryTestSuite}} can't 
treat bracketed comments well and lead to generated golden files can't display 
bracketed comments well.

We can read the output of comments.sql

[https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/postgreSQL/comments.sql.out]

Such as:

 
{code:java}
-- !query/* This is an example of SQL which should not execute: * select 
'multi-line'-- !query schemastruct<>-- !query 
outputorg.apache.spark.sql.catalyst.parser.ParseException
mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)
== SQL ==/* This is an example of SQL which should not execute:^^^ * select 
'multi-line'

-- !query*/SELECT 'after multi-line' AS fifth-- !query schemastruct<>-- !query 
outputorg.apache.spark.sql.catalyst.parser.ParseException
extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)
== SQL ==*/^^^SELECT 'after multi-line' AS fifth
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30740) months_between wrong calculation

2020-02-08 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032894#comment-17032894
 ] 

Yuming Wang commented on SPARK-30740:
-

cc [~maxgekk]

> months_between wrong calculation
> 
>
> Key: SPARK-30740
> URL: https://issues.apache.org/jira/browse/SPARK-30740
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: nhufas
>Priority: Critical
>
> months_between not calculating right for February
> example
>  
> {{select }}
> {{ months_between('2020-02-29','2019-12-29')}}
> {{,months_between('2020-02-29','2019-12-30') }}
> {{,months_between('2020-02-29','2019-12-31') }}
>  
> will generate a result like this 
> |2|1.96774194|2|
>  
> For 2019-12-30 is calculating wrong.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28880) ANSI SQL: Nested bracketed comments

2020-02-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-28880:
---
Description: 
Spark SQL support these bracketed comments:
 *Case 1*:
{code:sql}
/* This is an example of SQL which should not execute:
 * select 'multi-line';
 */
{code}
*Case 2*:
{code:sql}
/*
SELECT 'trailing' as x1; -- inside block comment
*/
{code}
But Spark SQL not support nested bracketed comments show below:

*Case 3*:
{code:sql}
/* This block comment surrounds a query which itself has a block comment...
SELECT /* embedded single line */ 'embedded' AS x2;
*/
{code}
*Case 4*:
{code:sql}
SELECT -- continued after the following block comments...
/* Deeply nested comment.
   This includes a single apostrophe to make sure we aren't decoding this part 
as a string.
SELECT 'deep nest' AS n1;
/* Second level of nesting...
SELECT 'deeper nest' as n2;
/* Third level of nesting...
SELECT 'deepest nest' as n3;
*/
Hoo boy. Still two deep...
*/
Now just one deep...
*/
'deeply nested example' AS sixth;
{code}
*bracketed comments*
 Bracketed comments are introduced by /* and end with */. 

[https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.sql.ref.doc/doc/c0056402.html]

[https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS]
 Feature ID:  T351

  was:
We can not support these bracketed comments:
*Case 1*:
{code:sql}
/* This is an example of SQL which should not execute:
 * select 'multi-line';
 */
{code}
*Case 2*:
{code:sql}
/*
SELECT 'trailing' as x1; -- inside block comment
*/
{code}
*Case 3*:
{code:sql}
/* This block comment surrounds a query which itself has a block comment...
SELECT /* embedded single line */ 'embedded' AS x2;
*/
{code}
*Case 4*:
{code:sql}
SELECT -- continued after the following block comments...
/* Deeply nested comment.
   This includes a single apostrophe to make sure we aren't decoding this part 
as a string.
SELECT 'deep nest' AS n1;
/* Second level of nesting...
SELECT 'deeper nest' as n2;
/* Third level of nesting...
SELECT 'deepest nest' as n3;
*/
Hoo boy. Still two deep...
*/
Now just one deep...
*/
'deeply nested example' AS sixth;
{code}

 *bracketed comments*
 Bracketed comments are introduced by /* and end with */. 

[https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.sql.ref.doc/doc/c0056402.html]

[https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS]
 Feature ID:  T351


> ANSI SQL: Nested bracketed comments
> ---
>
> Key: SPARK-28880
> URL: https://issues.apache.org/jira/browse/SPARK-28880
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Spark SQL support these bracketed comments:
>  *Case 1*:
> {code:sql}
> /* This is an example of SQL which should not execute:
>  * select 'multi-line';
>  */
> {code}
> *Case 2*:
> {code:sql}
> /*
> SELECT 'trailing' as x1; -- inside block comment
> */
> {code}
> But Spark SQL not support nested bracketed comments show below:
> *Case 3*:
> {code:sql}
> /* This block comment surrounds a query which itself has a block comment...
> SELECT /* embedded single line */ 'embedded' AS x2;
> */
> {code}
> *Case 4*:
> {code:sql}
> SELECT -- continued after the following block comments...
> /* Deeply nested comment.
>This includes a single apostrophe to make sure we aren't decoding this 
> part as a string.
> SELECT 'deep nest' AS n1;
> /* Second level of nesting...
> SELECT 'deeper nest' as n2;
> /* Third level of nesting...
> SELECT 'deepest nest' as n3;
> */
> Hoo boy. Still two deep...
> */
> Now just one deep...
> */
> 'deeply nested example' AS sixth;
> {code}
> *bracketed comments*
>  Bracketed comments are introduced by /* and end with */. 
> [https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.sql.ref.doc/doc/c0056402.html]
> [https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS]
>  Feature ID:  T351



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28880) ANSI SQL: Nested bracketed comments

2020-02-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-28880:
---
Summary: ANSI SQL: Nested bracketed comments  (was: ANSI SQL: Bracketed 
comments)

> ANSI SQL: Nested bracketed comments
> ---
>
> Key: SPARK-28880
> URL: https://issues.apache.org/jira/browse/SPARK-28880
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> We can not support these bracketed comments:
> *Case 1*:
> {code:sql}
> /* This is an example of SQL which should not execute:
>  * select 'multi-line';
>  */
> {code}
> *Case 2*:
> {code:sql}
> /*
> SELECT 'trailing' as x1; -- inside block comment
> */
> {code}
> *Case 3*:
> {code:sql}
> /* This block comment surrounds a query which itself has a block comment...
> SELECT /* embedded single line */ 'embedded' AS x2;
> */
> {code}
> *Case 4*:
> {code:sql}
> SELECT -- continued after the following block comments...
> /* Deeply nested comment.
>This includes a single apostrophe to make sure we aren't decoding this 
> part as a string.
> SELECT 'deep nest' AS n1;
> /* Second level of nesting...
> SELECT 'deeper nest' as n2;
> /* Third level of nesting...
> SELECT 'deepest nest' as n3;
> */
> Hoo boy. Still two deep...
> */
> Now just one deep...
> */
> 'deeply nested example' AS sixth;
> {code}
>  *bracketed comments*
>  Bracketed comments are introduced by /* and end with */. 
> [https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.sql.ref.doc/doc/c0056402.html]
> [https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS]
>  Feature ID:  T351



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-30724) Support 'like any' and 'like all' operators

2020-02-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-30724:
---
Comment: was deleted

(was: I will investigate this feature.)

> Support 'like any' and 'like all' operators
> ---
>
> Key: SPARK-30724
> URL: https://issues.apache.org/jira/browse/SPARK-30724
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> In Teradata/Hive and PostgreSQL 'like any' and 'like all' operators are 
> mostly used when we are matching a text field with numbers of patterns. For 
> example:
> Teradata / Hive 3.0:
> {code:sql}
> --like any
> select 'foo' LIKE ANY ('%foo%','%bar%');
> --like all
> select 'foo' LIKE ALL ('%foo%','%bar%');
> {code}
> PostgreSQL:
> {code:sql}
> -- like any
> select 'foo' LIKE ANY (array['%foo%','%bar%']);
> -- like all
> select 'foo' LIKE ALL (array['%foo%','%bar%']);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org