date:20200321

[jira] [Updated] (SPARK-31185) implement VarianceThresholdSelector

2020-03-21 Thread Huaxin Gao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-31185:
---
Issue Type: New Feature  (was: Bug)

> implement VarianceThresholdSelector
> ---
>
> Key: SPARK-31185
> URL: https://issues.apache.org/jira/browse/SPARK-31185
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.1.0
>
>
> Implement a Feature selector that removes all low-variance features. Features 
> with a
>  variance lower than the threshold will be removed. The default is to keep 
> all features with non-zero variance, i.e. remove the features that have the 
> same value in all samples.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31185) implement VarianceThresholdSelector

2020-03-21 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-31185:


Assignee: Huaxin Gao

> implement VarianceThresholdSelector
> ---
>
> Key: SPARK-31185
> URL: https://issues.apache.org/jira/browse/SPARK-31185
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>
> Implement a Feature selector that removes all low-variance features. Features 
> with a
>  variance lower than the threshold will be removed. The default is to keep 
> all features with non-zero variance, i.e. remove the features that have the 
> same value in all samples.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31185) implement VarianceThresholdSelector

2020-03-21 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-31185.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 27954
[https://github.com/apache/spark/pull/27954]

> implement VarianceThresholdSelector
> ---
>
> Key: SPARK-31185
> URL: https://issues.apache.org/jira/browse/SPARK-31185
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.1.0
>
>
> Implement a Feature selector that removes all low-variance features. Features 
> with a
>  variance lower than the threshold will be removed. The default is to keep 
> all features with non-zero variance, i.e. remove the features that have the 
> same value in all samples.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31214) Upgrade Janino to 3.1.2

2020-03-21 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-31214:
-

 Summary: Upgrade Janino to 3.1.2
 Key: SPARK-31214
 URL: https://issues.apache.org/jira/browse/SPARK-31214
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31101) Upgrade Janino to 3.0.16

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31101:
-

Assignee: Jungtaek Lim

> Upgrade Janino to 3.0.16
> 
>
> Key: SPARK-31101
> URL: https://issues.apache.org/jira/browse/SPARK-31101
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> We got some report on failure on user's query which Janino throws error on 
> compiling generated code. The issue is here: janino-compiler/janino#113 It 
> contains the information of generated code, symptom (error), and analysis of 
> the bug, so please refer the link for more details.
> Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable 
> Janino to succeed to compile user's query properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31101) Upgrade Janino to 3.0.16

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31101.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 27932
[https://github.com/apache/spark/pull/27932]

> Upgrade Janino to 3.0.16
> 
>
> Key: SPARK-31101
> URL: https://issues.apache.org/jira/browse/SPARK-31101
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.1.0
>
>
> We got some report on failure on user's query which Janino throws error on 
> compiling generated code. The issue is here: janino-compiler/janino#113 It 
> contains the information of generated code, symptom (error), and analysis of 
> the bug, so please refer the link for more details.
> Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable 
> Janino to succeed to compile user's query properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31101) Upgrade Janino to 3.0.16

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31101:
--
Description: 
We got some report on failure on user's query which Janino throws error on 
compiling generated code. The issue is here: janino-compiler/janino#113 It 
contains the information of generated code, symptom (error), and analysis of 
the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable 
Janino to succeed to compile user's query properly.

  was:This issue is to track the effort on upgrading Janino 3.1.1 which 
contains the fix for [https://github.com/janino-compiler/janino/issues/113] 
which we encountered from Spark, as well as 
[https://github.com/janino-compiler/janino/issues/90] which Josh filed an issue 
and 3.1.1 seems to fix it.


> Upgrade Janino to 3.0.16
> 
>
> Key: SPARK-31101
> URL: https://issues.apache.org/jira/browse/SPARK-31101
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We got some report on failure on user's query which Janino throws error on 
> compiling generated code. The issue is here: janino-compiler/janino#113 It 
> contains the information of generated code, symptom (error), and analysis of 
> the bug, so please refer the link for more details.
> Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable 
> Janino to succeed to compile user's query properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31101) Upgrade Janino to 3.0.16

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31101:
--
Summary: Upgrade Janino to 3.0.16  (was: Upgrade Janino to 3.1.1)

> Upgrade Janino to 3.0.16
> 
>
> Key: SPARK-31101
> URL: https://issues.apache.org/jira/browse/SPARK-31101
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> This issue is to track the effort on upgrading Janino 3.1.1 which contains 
> the fix for [https://github.com/janino-compiler/janino/issues/113] which we 
> encountered from Spark, as well as 
> [https://github.com/janino-compiler/janino/issues/90] which Josh filed an 
> issue and 3.1.1 seems to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30541) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30541:
-

Assignee: Gabor Somogyi

> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite
> ---
>
> Key: SPARK-30541
> URL: https://issues.apache.org/jira/browse/SPARK-30541
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Gabor Somogyi
>Priority: Blocker
> Attachments: consoleText_NOK.txt, consoleText_OK.txt, 
> unit-tests_NOK.log, unit-tests_OK.log
>
>
> The test suite has been failing intermittently as of now:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116862/testReport/]
>  
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
>   
> {noformat}
> Error Details
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 3939 times over 
> 1.000122353532 minutes. Last failure message: KeeperErrorCode = 
> AuthFailed for /brokers/ids.
> Stack Trace
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 3939 times over 
> 1.000122353532 minutes. Last failure message: KeeperErrorCode = 
> AuthFailed for /brokers/ids.
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
>   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:337)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:336)
>   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:292)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: 
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed for /brokers/ids
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>   at 
> kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:554)
>   at kafka.zk.KafkaZkClient.getChildren(KafkaZkClient.scala:719)
>   at kafka.zk.KafkaZkClient.getSortedBrokerList(KafkaZkClient.scala:455)
>   at 
> kafka.zk.KafkaZkClient.getAllBrokersInCluster(KafkaZkClient.scala:404)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$setup$3(KafkaTestUtils.scala:293)
>   at 
> org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395)
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:409)
>   ... 20 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30541) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30541.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27877
[https://github.com/apache/spark/pull/27877]

> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite
> ---
>
> Key: SPARK-30541
> URL: https://issues.apache.org/jira/browse/SPARK-30541
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Gabor Somogyi
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: consoleText_NOK.txt, consoleText_OK.txt, 
> unit-tests_NOK.log, unit-tests_OK.log
>
>
> The test suite has been failing intermittently as of now:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116862/testReport/]
>  
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
>   
> {noformat}
> Error Details
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 3939 times over 
> 1.000122353532 minutes. Last failure message: KeeperErrorCode = 
> AuthFailed for /brokers/ids.
> Stack Trace
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 3939 times over 
> 1.000122353532 minutes. Last failure message: KeeperErrorCode = 
> AuthFailed for /brokers/ids.
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
>   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:337)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:336)
>   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:292)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: 
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed for /brokers/ids
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>   at 
> kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:554)
>   at kafka.zk.KafkaZkClient.getChildren(KafkaZkClient.scala:719)
>   at kafka.zk.KafkaZkClient.getSortedBrokerList(KafkaZkClient.scala:455)
>   at 
> kafka.zk.KafkaZkClient.getAllBrokersInCluster(KafkaZkClient.scala:404)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$setup$3(KafkaTestUtils.scala:293)
>   at 
> org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395)
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:409)
>   ... 20 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31213) Arrange the configuration of Spark SQL

2020-03-21 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-31213:
--

 Summary: Arrange the configuration of Spark SQL
 Key: SPARK-31213
 URL: https://issues.apache.org/jira/browse/SPARK-31213
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: jiaan.geng


The configuration of Spark SQL is a bit messy. Sorting them out can improve 
readability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31184) Support getTablesByType API of Hive Client

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31184.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 27952
[https://github.com/apache/spark/pull/27952]

> Support getTablesByType API of Hive Client
> --
>
> Key: SPARK-31184
> URL: https://issues.apache.org/jira/browse/SPARK-31184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xin Wu
>Assignee: Xin Wu
>Priority: Major
> Fix For: 3.1.0
>
>
> Hive 2.3+ supports getTablesByType API, which is a precondition to implement 
> SHOW VIEWS in HiveExternalCatalog. Currently, without this API, we can not 
> get hive table with type HiveTableType.VIRTUAL_VIEW directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31184) Support getTablesByType API of Hive Client

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31184:
-

Assignee: Xin Wu

> Support getTablesByType API of Hive Client
> --
>
> Key: SPARK-31184
> URL: https://issues.apache.org/jira/browse/SPARK-31184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xin Wu
>Assignee: Xin Wu
>Priority: Major
>
> Hive 2.3+ supports getTablesByType API, which is a precondition to implement 
> SHOW VIEWS in HiveExternalCatalog. Currently, without this API, we can not 
> get hive table with type HiveTableType.VIRTUAL_VIEW directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31209) Not compatible with new version of scalatest (3.1.0 and above)

2020-03-21 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-31209:

Affects Version/s: (was: 2.4.5)
   3.1.0

> Not compatible with new version of scalatest (3.1.0 and above)
> --
>
> Key: SPARK-31209
> URL: https://issues.apache.org/jira/browse/SPARK-31209
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Timothy Zhang
>Priority: Major
>
> Since  ScalaTest's style traits and classes were moved and renamed 
> ([http://www.scalatest.org/release_notes/3.1.0]) there are errors as not find 
> FunSpec when I add new version of scalatest in library dependency. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30494:
--
Affects Version/s: 2.0.2
   2.1.3

> Duplicates cached RDD when create or replace an existing view
> -
>
> Key: SPARK-30494
> URL: https://issues.apache.org/jira/browse/SPARK-30494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> We can reproduce by below commands:
> {code}
> beeline> create or replace temporary view temp1 as select 1
> beeline> cache table temp1
> beeline> create or replace temporary view temp1 as select 1, 2
> beeline> cache table temp1
> {code}
> The cached RDD for plan "select 1" stays in memory forever until the session 
> close. This cached data cannot be used since the view temp1 has been replaced 
> by another plan. It's a memory leak.
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
> 2")).isDefined)
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
> 1")).isDefined)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30494:
--
Description: 
We can reproduce by below commands:
{code}
beeline> create or replace temporary view temp1 as select 1
beeline> cache table temp1
beeline> create or replace temporary view temp1 as select 1, 2
beeline> cache table temp1
{code}

The cached RDD for plan "select 1" stays in memory forever until the session 
close. This cached data cannot be used since the view temp1 has been replaced 
by another plan. It's a memory leak.

assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
2")).isDefined)
assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isDefined)

  was:
We can reproduce by below commands:
{code}
beeline> create or replace temporary view temp1 as select 1
beeline> cache table tempView
beeline> create or replace temporary view temp1 as select 1, 2
beeline> cache table tempView
{code}

The cached RDD for plan "select 1" stays in memory forever until the session 
close. This cached data cannot be used since the view temp1 has been replaced 
by another plan. It's a memory leak.

assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
2")).isDefined)
assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isDefined)


> Duplicates cached RDD when create or replace an existing view
> -
>
> Key: SPARK-30494
> URL: https://issues.apache.org/jira/browse/SPARK-30494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.5, 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> We can reproduce by below commands:
> {code}
> beeline> create or replace temporary view temp1 as select 1
> beeline> cache table temp1
> beeline> create or replace temporary view temp1 as select 1, 2
> beeline> cache table temp1
> {code}
> The cached RDD for plan "select 1" stays in memory forever until the session 
> close. This cached data cannot be used since the view temp1 has been replaced 
> by another plan. It's a memory leak.
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
> 2")).isDefined)
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
> 1")).isDefined)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30494:
--
Affects Version/s: 2.2.3

> Duplicates cached RDD when create or replace an existing view
> -
>
> Key: SPARK-30494
> URL: https://issues.apache.org/jira/browse/SPARK-30494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.5, 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> We can reproduce by below commands:
> {code}
> beeline> create or replace temporary view temp1 as select 1
> beeline> cache table tempView
> beeline> create or replace temporary view temp1 as select 1, 2
> beeline> cache table tempView
> {code}
> The cached RDD for plan "select 1" stays in memory forever until the session 
> close. This cached data cannot be used since the view temp1 has been replaced 
> by another plan. It's a memory leak.
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
> 2")).isDefined)
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
> 1")).isDefined)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30494:
--
Affects Version/s: 2.3.4

> Duplicates cached RDD when create or replace an existing view
> -
>
> Key: SPARK-30494
> URL: https://issues.apache.org/jira/browse/SPARK-30494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> We can reproduce by below commands:
> {code}
> beeline> create or replace temporary view temp1 as select 1
> beeline> cache table tempView
> beeline> create or replace temporary view temp1 as select 1, 2
> beeline> cache table tempView
> {code}
> The cached RDD for plan "select 1" stays in memory forever until the session 
> close. This cached data cannot be used since the view temp1 has been replaced 
> by another plan. It's a memory leak.
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
> 2")).isDefined)
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
> 1")).isDefined)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view

2020-03-21 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30494:
--
Affects Version/s: 2.4.5

> Duplicates cached RDD when create or replace an existing view
> -
>
> Key: SPARK-30494
> URL: https://issues.apache.org/jira/browse/SPARK-30494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> We can reproduce by below commands:
> {code}
> beeline> create or replace temporary view temp1 as select 1
> beeline> cache table tempView
> beeline> create or replace temporary view temp1 as select 1, 2
> beeline> cache table tempView
> {code}
> The cached RDD for plan "select 1" stays in memory forever until the session 
> close. This cached data cannot be used since the view temp1 has been replaced 
> by another plan. It's a memory leak.
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
> 2")).isDefined)
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
> 1")).isDefined)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30494) Duplicates cached RDD when create or replace an existing view

2020-03-21 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064093#comment-17064093
 ] 

Dongjoon Hyun commented on SPARK-30494:
---

Hi, [~cltlfcjin]. How about older versions, 2.3.4 and 2.4.5?

> Duplicates cached RDD when create or replace an existing view
> -
>
> Key: SPARK-30494
> URL: https://issues.apache.org/jira/browse/SPARK-30494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> We can reproduce by below commands:
> {code}
> beeline> create or replace temporary view temp1 as select 1
> beeline> cache table tempView
> beeline> create or replace temporary view temp1 as select 1, 2
> beeline> cache table tempView
> {code}
> The cached RDD for plan "select 1" stays in memory forever until the session 
> close. This cached data cannot be used since the view temp1 has been replaced 
> by another plan. It's a memory leak.
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
> 2")).isDefined)
> assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
> 1")).isDefined)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type

2020-03-21 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064017#comment-17064017
 ] 

Maxim Gekk commented on SPARK-31212:


The isLeapYear() function in 2.4 assumes Proleptic Gregorian calendar:
https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L600-L602
but actually Spark 2.4 is based on the hybrid calendar Julian+Gregorian as we 
can see at
https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L513-L517

It means the following functions in DateTimeUtils return incorrect results for 
dates before Gregorian cutover days:
# getQuarter
# splitDate
# getMonth
# getDayOfMonth
# firstDayOfMonth
# dateAddMonths
# stringToTimestamp
# stringToDate
# monthsBetween
# getLastDayOfMonth

/cc [~cloud_fan] [~hyukjin.kwon]

> Failure of casting the '1000-02-29' string to the date type
> ---
>
> Key: SPARK-31212
> URL: https://issues.apache.org/jira/browse/SPARK-31212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Maxim Gekk
>Priority: Major
>
> The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for 
> dates before 1582-10-15 but casting the string to the date type fails:
> {code:scala}
> scala> val df = 
> Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> ++
> |date|
> ++
> |null|
> ++
> {code}
> Creating a dataset from java.sql.Date w/ the same input string works 
> correctly:
> {code:scala}
> scala> val df2 = 
> Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df2: org.apache.spark.sql.DataFrame = [date: date]
> scala> df2.show
> +--+
> |  date|
> +--+
> |1000-02-29|
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type

2020-03-21 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-31212:
--

 Summary: Failure of casting the '1000-02-29' string to the date 
type
 Key: SPARK-31212
 URL: https://issues.apache.org/jira/browse/SPARK-31212
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5
Reporter: Maxim Gekk


The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for 
dates before 1582-10-15 but casting the string to the date type fails:
{code:scala}
scala> val df = 
Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date"))
df: org.apache.spark.sql.DataFrame = [date: date]

scala> df.show
++
|date|
++
|null|
++
{code}
Creating a dataset from java.sql.Date w/ the same input string works correctly:
{code:scala}
scala> val df2 = 
Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
df2: org.apache.spark.sql.DataFrame = [date: date]

scala> df2.show
+--+
|  date|
+--+
|1000-02-29|
+--+
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31199) Separate connection timeout and idle timeout for netty

2020-03-21 Thread runnings (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runnings updated SPARK-31199:
-
Description: 
io.connectionTimeout only used for connection timeout for connection setup 
while io.idleTimeout is used to control how long to kill the connection if it 
seems to be idle([https://github.com/apache/spark/pull/5584])

 

These 2 timeouts could be quite different. connectiontimeout could be short to 
help fast fail the connection related problem when meet some high load/unstable 
nodes. While idle timeout may be related to business function performance which 
is more complicated.

  was:
io.connectionTimeout only used for connection timeout for connection setup 
while io.idleTimeout is used to control how long to kill the connection if it 
seems to be idle([https://github.com/apache/spark/pull/5584])

 

These 2 timeouts could be quite different and shorten connectiontimeout could 
help fast fail the connection related problem in some cases like when doing 
shuffle, we could fast fail the task and retry.


> Separate connection timeout and idle timeout for netty
> --
>
> Key: SPARK-31199
> URL: https://issues.apache.org/jira/browse/SPARK-31199
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: runnings
>Priority: Major
>
> io.connectionTimeout only used for connection timeout for connection setup 
> while io.idleTimeout is used to control how long to kill the connection if it 
> seems to be idle([https://github.com/apache/spark/pull/5584])
>  
> These 2 timeouts could be quite different. connectiontimeout could be short 
> to help fast fail the connection related problem when meet some high 
> load/unstable nodes. While idle timeout may be related to business function 
> performance which is more complicated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27097) Avoid embedding platform-dependent offsets literally in whole-stage generated code

2020-03-21 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063909#comment-17063909
 ] 

angerszhu commented on SPARK-27097:
---

[~irashid] to be honest, I meet this problem these days.

 

[~dbtsai] I have some question. 
We start a self-developed thrift server program  and use spark as compute 
engine with below javaOptions parameter
 
{color:#e14141}-Xmx64g {color}
{color:#e14141}-Djava.library.path=/home/hadoop/hadoop/lib/native {color}
{color:#e14141}-Djavax.security.auth.useSubjectCredsOnly=false {color}
{color:#e14141}-Dcom.sun.management.jmxremote.port=9021 {color}
{color:#e14141}-Dcom.sun.management.jmxremote.authenticate=false {color}
{color:#e14141}-Dcom.sun.management.jmxremote.ssl=false {color}
{color:#e14141}-XX:MaxPermSize=1024m -XX:PermSize=256m 
-XX:MaxDirectMemorySize=8192m -XX:-TraceClassUnloading {color}
{color:#e14141}-XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:+CMSClassUnloadingEnabled -XX:+UseCMSCompactAtFullCollection 
-XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSParallelRemarkEnabled 
-XX:+DisableExplicitGC -XX:+PrintTenuringDistribution 
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 
-Xnoclassgc -XX:+PrintGCDetails -XX:+PrintGCDateStamps {color}
{color:#e14141} {color}
{color:#e14141} {color}
Then the {color:#347eec}Platform{color}{color:#e14141}.{color} 
{color:#347eec}BYTE_ARRAY_OFFSET{color} will be 24, when we start a normal 
spark thrift server, the value will be 16, this problem cause strange data 
corruption. 
After few days check, I located the problem because of spark  *codegen*， and  
this pr can fix our problem , but I can’t find  evidence why 
Platform.BYTE_ARRAY_OFFSET will be 24 in above parameter. Since I test in local 
that when we set  {color:#e14141} -XX:+ UseCompressedOops,  {color} using 
pointer compression it's going to be 16.
{color:#e14141} -XX:- UseCompressedOops,  {color} not using pointer compression 
it's going to be 24. This is easy to understand why the offset is not same.
But I don’t know why above parameter will be 24 since I am not a professor  
about java compiler and  Basic computer knowledge.
 
Can you give me some advisor or information about how to understand and find 
the root cause.
 

> Avoid embedding platform-dependent offsets literally in whole-stage generated 
> code
> --
>
> Key: SPARK-27097
> URL: https://issues.apache.org/jira/browse/SPARK-27097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.3, 2.2.3, 2.3.4, 2.4.0
>Reporter: Xiao Li
>Assignee: Kris Mok
>Priority: Critical
>  Labels: correctness
> Fix For: 2.4.1
>
>
> Avoid embedding platform-dependent offsets literally in whole-stage generated 
> code.
> Spark SQL performs whole-stage code generation to speed up query execution. 
> There are two steps to it:
> Java source code is generated from the physical query plan on the driver. A 
> single version of the source code is generated from a query plan, and sent to 
> all executors.
> It's compiled to bytecode on the driver to catch compilation errors before 
> sending to executors, but currently only the generated source code gets sent 
> to the executors. The bytecode compilation is for fail-fast only.
> Executors receive the generated source code and compile to bytecode, then the 
> query runs like a hand-written Java program.
> In this model, there's an implicit assumption about the driver and executors 
> being run on similar platforms. Some code paths accidentally embedded 
> platform-dependent object layout information into the generated code, such as:
> {code:java}
> Platform.putLong(buffer, /* offset */ 24, /* value */ 1);
> {code}
> This code expects a field to be at offset +24 of the buffer object, and sets 
> a value to that field.
> But whole-stage code generation generally uses platform-dependent information 
> from the driver. If the object layout is significantly different on the 
> driver and executors, the generated code can be reading/writing to wrong 
> offsets on the executors, causing all kinds of data corruption.
> One code pattern that leads to such problem is the use of Platform.XXX 
> constants in generated code, e.g. Platform.BYTE_ARRAY_OFFSET.
> Bad:
> {code:java}
> val baseOffset = Platform.BYTE_ARRAY_OFFSET
> // codegen template:
> s"Platform.putLong($buffer, $baseOffset, $value);"
> This will embed the value of Platform.BYTE_ARRAY_OFFSET on the driver into 
> the generated code.
> {code}
> Good:
> {code:java}
> val baseOffset = "Platform.BYTE_ARRAY_OFFSET"
> // codegen template:
> s"Platform.putLong($buffer, $baseOffset, $value);"
> This will generate the offset symbolically -- Platform.putLong(buffer

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063851#comment-17063851
 ] 

jiaan.geng edited comment on SPARK-31210 at 3/21/20, 11:33 AM:
---

At finally, I guess your problem resolved by 
https://issues.apache.org/jira/browse/SPARK-30254
Before this ticket, the issue exists.
I build with master branch contains this bug fix.


was (Author: beliefer):
At finally, I guess your problem resolved by 
https://issues.apache.org/jira/browse/SPARK-30254
Before this ticket, the issue exists.

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063851#comment-17063851
 ] 

jiaan.geng commented on SPARK-31210:


At finally, I guess your problem resolved by 
https://issues.apache.org/jira/browse/SPARK-30254
Before this ticket, the issue exists.

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063830#comment-17063830
 ] 

jiaan.geng edited comment on SPARK-31210 at 3/21/20, 10:40 AM:
---

{code:java}
scala> spark.sql("select * from test_table_like2 where subject like '100^%' 
escape '^'");
res3: org.apache.spark.sql.DataFrame = [subject: string]

scala> res3.show();
+---+   
|subject|
+---+
|   100%|
+---+


scala> spark.sql("select * from test_table_like2 where subject like '100%'");
res5: org.apache.spark.sql.DataFrame = [subject: string]

scala> res5.show();
+--+
|   subject|
+--+
| 100 times|
|1000 times|
|  100%|
+--+
{code}



was (Author: beliefer):
{code:java}
scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100^%' escape '^'");
res3: org.apache.spark.sql.DataFrame = [subject: string]

scala> res3.show();
+---+   
|subject|
+---+
|   100%|
+---+


scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100%'");
res5: org.apache.spark.sql.DataFrame = [subject: string]

scala> res5.show();
+--+
|   subject|
+--+
| 100 times|
|1000 times|
|  100%|
+--+
{code}


> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063849#comment-17063849
 ] 

jiaan.geng edited comment on SPARK-31210 at 3/21/20, 10:40 AM:
---

The content of /home/test/test_table_like.json show below:

{code:java}
{"subject":"100 times"}
{"subject":"1000 times"}
{"subject":"100%"}
{code}

I follow your example and run it


{code:java}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
  /_/
 
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val path = "/home/test/test_table_like.json"
path: String = /home/test/test_table_like.json

scala> val tmpdf = spark.read.json(path)
tmpdf: org.apache.spark.sql.DataFrame = [subject: string]   

scala> tmpdf.createOrReplaceTempView("test_table_like")

scala> spark.sql("select * from test_table_like where subject like '100%' order 
by 1")
res1: org.apache.spark.sql.DataFrame = [subject: string]

scala> res1.show()
+--+
|   subject|
+--+
| 100 times|
|  100%|
|1000 times|
+--+
{code}



was (Author: beliefer):
The content of /home/test/test_table_like.json show below:

{code:java}
{"subject":"100 times"}
{"subject":"1000 times"}
{"subject":"100%"}
{code}

I follow your example and run it


{code:java}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
  /_/
 
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val path = "/home/test/test_table_like.json"
path: String = /home/test/test_table_like.json

scala> val tmpdf = spark.read.json(path)
20/03/21 18:33:56 ERROR LzoCodec: Failed to load/initialize native-lzo library
tmpdf: org.apache.spark.sql.DataFrame = [subject: string]   

scala> tmpdf.createOrReplaceTempView("test_table_like")

scala> spark.sql("select * from test_table_like where subject like '100%' order 
by 1")
res1: org.apache.spark.sql.DataFrame = [subject: string]

scala> res1.show()
+--+
|   subject|
+--+
| 100 times|
|  100%|
|1000 times|
+--+
{code}


> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063849#comment-17063849
 ] 

jiaan.geng commented on SPARK-31210:


The content of /home/test/test_table_like.json show below:

{code:java}
{"subject":"100 times"}
{"subject":"1000 times"}
{"subject":"100%"}
{code}

I follow your example and run it


{code:java}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
  /_/
 
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val path = "/home/test/test_table_like.json"
path: String = /home/test/test_table_like.json

scala> val tmpdf = spark.read.json(path)
20/03/21 18:33:56 ERROR LzoCodec: Failed to load/initialize native-lzo library
tmpdf: org.apache.spark.sql.DataFrame = [subject: string]   

scala> tmpdf.createOrReplaceTempView("test_table_like")

scala> spark.sql("select * from test_table_like where subject like '100%' order 
by 1")
res1: org.apache.spark.sql.DataFrame = [subject: string]

scala> res1.show()
+--+
|   subject|
+--+
| 100 times|
|  100%|
|1000 times|
+--+
{code}


> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31211) Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5

2020-03-21 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-31211:
--

 Summary: Failure on loading 1000-02-29 from parquet saved by Spark 
2.4.5
 Key: SPARK-31211
 URL: https://issues.apache.org/jira/browse/SPARK-31211
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for instance 
1000-02-29:
{code}
$ export TZ="America/Los_Angeles"
{code}
{code:scala}
scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
scala> 
df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap")

scala> val df = 
Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
df: org.apache.spark.sql.DataFrame = [date: date]

scala> df.show
+--+
|  date|
+--+
|1000-02-29|
+--+

scala> 
df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap")
{code}

Load the parquet files back by Spark 3.1.0-SNAPSHOT:
{code:scala}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
  /_/

Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
+--+
|  date|
+--+
|1000-03-06|
+--+


scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true)

scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a leap 
year
at java.time.LocalDate.create(LocalDate.java:429)
at java.time.LocalDate.of(LocalDate.java:269)
at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063743#comment-17063743
 ] 

jiaan.geng edited comment on SPARK-31210 at 3/21/20, 10:10 AM:
---

{code:java}
spark-sql> create table test_table_like2 (subject string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';

Time taken: 1.455 seconds

spark-sql> insert into test_table_like2 values ('100 times'), ('1000 times'), 
('100%');

Moved to trash: 
/home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1/-ext-1/_temporary

Moved to trash: 
/home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1

Time taken: 6.55 seconds

spark-sql> select * from test_table_like2 where subject like '100^%' escape '^';

100%

 
{code}



was (Author: beliefer):

{code:java}
spark-sql> create table test_table_like2 (subject string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';

Time taken: 1.455 seconds

spark-sql> insert into test_table_like2 values ('100 times'), ('1000 times'), 
('100%');

Moved to trash: 
/home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1/-ext-1/_temporary

Moved to trash: 
/home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1

Time taken: 6.55 seconds

spark-sql> select * from test_table_like2 where subject like '100^%' escape '^';

20/03/21 11:30:45 ERROR LzoCodec: Failed to load/initialize native-lzo library

100%

 
{code}


> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063830#comment-17063830
 ] 

jiaan.geng edited comment on SPARK-31210 at 3/21/20, 10:10 AM:
---

{code:java}
scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100^%' escape '^'");
res3: org.apache.spark.sql.DataFrame = [subject: string]

scala> res3.show();
+---+   
|subject|
+---+
|   100%|
+---+


scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100%'");
res5: org.apache.spark.sql.DataFrame = [subject: string]

scala> res5.show();
+--+
|   subject|
+--+
| 100 times|
|1000 times|
|  100%|
+--+
{code}



was (Author: beliefer):

{code:java}
scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100^%' escape '^'");
res3: org.apache.spark.sql.DataFrame = [subject: string]

scala> res3.show();
20/03/21 17:51:07 ERROR LzoCodec: Failed to load/initialize native-lzo library
+---+   
|subject|
+---+
|   100%|
+---+


scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100%'");
res5: org.apache.spark.sql.DataFrame = [subject: string]

scala> res5.show();
+--+
|   subject|
+--+
| 100 times|
|1000 times|
|  100%|
+--+
{code}


> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31199) Separate connection timeout and idle timeout for netty

2020-03-21 Thread runnings (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runnings updated SPARK-31199:
-
Description: 
io.connectionTimeout only used for connection timeout for connection setup 
while io.idleTimeout is used to control how long to kill the connection if it 
seems to be idle([https://github.com/apache/spark/pull/5584])

 

These 2 timeouts could be quite different and shorten connectiontimeout could 
help fast fail the connection related problem in some cases like when doing 
shuffle, we could fast fail the task and retry.

  was:
spark.shuffle.io.connectionTimeout only used for connection timeout for 
connection setup while spark.shuffle.io.idleTimeout is used to control how long 
to kill the connection if it seems to be 
idle([https://github.com/apache/spark/pull/5584])

 

These 2 timeouts could be quite different and shorten connectiontimeout could 
help fast fail the shuffle task in some cases


> Separate connection timeout and idle timeout for netty
> --
>
> Key: SPARK-31199
> URL: https://issues.apache.org/jira/browse/SPARK-31199
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: runnings
>Priority: Major
>
> io.connectionTimeout only used for connection timeout for connection setup 
> while io.idleTimeout is used to control how long to kill the connection if it 
> seems to be idle([https://github.com/apache/spark/pull/5584])
>  
> These 2 timeouts could be quite different and shorten connectiontimeout could 
> help fast fail the connection related problem in some cases like when doing 
> shuffle, we could fast fail the task and retry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31199) Separate connection timeout and idle timeout for netty

2020-03-21 Thread runnings (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runnings updated SPARK-31199:
-
Summary: Separate connection timeout and idle timeout for netty  (was: 
Separate connection timeout and idle timeout for shuffle)

> Separate connection timeout and idle timeout for netty
> --
>
> Key: SPARK-31199
> URL: https://issues.apache.org/jira/browse/SPARK-31199
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: runnings
>Priority: Major
>
> spark.shuffle.io.connectionTimeout only used for connection timeout for 
> connection setup while spark.shuffle.io.idleTimeout is used to control how 
> long to kill the connection if it seems to be 
> idle([https://github.com/apache/spark/pull/5584])
>  
> These 2 timeouts could be quite different and shorten connectiontimeout could 
> help fast fail the shuffle task in some cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063744#comment-17063744
 ] 

jiaan.geng edited comment on SPARK-31210 at 3/21/20, 9:56 AM:
--


{code:java}
spark-sql> select * from test_table_like2 where subject like '100^%' escape '^' 
order by 1;

100%

Time taken: 12.261 seconds, Fetched 1 row(s)
{code}



was (Author: beliefer):
spark-sql> select * from test_table_like2 where subject like '100^%' escape '^' 
order by 1;

100%

Time taken: 12.261 seconds, Fetched 1 row(s)

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063743#comment-17063743
 ] 

jiaan.geng edited comment on SPARK-31210 at 3/21/20, 9:56 AM:
--


{code:java}
spark-sql> create table test_table_like2 (subject string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';

Time taken: 1.455 seconds

spark-sql> insert into test_table_like2 values ('100 times'), ('1000 times'), 
('100%');

Moved to trash: 
/home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1/-ext-1/_temporary

Moved to trash: 
/home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1

Time taken: 6.55 seconds

spark-sql> select * from test_table_like2 where subject like '100^%' escape '^';

20/03/21 11:30:45 ERROR LzoCodec: Failed to load/initialize native-lzo library

100%

 
{code}



was (Author: beliefer):
spark-sql> create table test_table_like2 (subject string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';

Time taken: 1.455 seconds

spark-sql> insert into test_table_like2 values ('100 times'), ('1000 times'), 
('100%');

Moved to trash: 
/home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1/-ext-1/_temporary

Moved to trash: 
/home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1

Time taken: 6.55 seconds

spark-sql> select * from test_table_like2 where subject like '100^%' escape '^';

20/03/21 11:30:45 ERROR LzoCodec: Failed to load/initialize native-lzo library

100%

 

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063742#comment-17063742
 ] 

jiaan.geng edited comment on SPARK-31210 at 3/21/20, 9:56 AM:
--


{code:java}
spark-sql> create or replace temporary view test_table_like as SELECT * FROM 
VALUES ('100 times'), ('1000 times'), ('100%') as test_table_like (subject);

Time taken: 0.143 seconds

spark-sql> select * from test_table_like where subject like '100^%' escape '^';

100%

Time taken: 0.132 seconds, Fetched 1 row(s)
{code}



was (Author: beliefer):
spark-sql> create or replace temporary view test_table_like as SELECT * FROM 
VALUES ('100 times'), ('1000 times'), ('100%') as test_table_like (subject);

Time taken: 0.143 seconds

spark-sql> select * from test_table_like where subject like '100^%' escape '^';

100%

Time taken: 0.132 seconds, Fetched 1 row(s)

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063830#comment-17063830
 ] 

jiaan.geng edited comment on SPARK-31210 at 3/21/20, 9:56 AM:
--


{code:java}
scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100^%' escape '^'");
res3: org.apache.spark.sql.DataFrame = [subject: string]

scala> res3.show();
20/03/21 17:51:07 ERROR LzoCodec: Failed to load/initialize native-lzo library
+---+   
|subject|
+---+
|   100%|
+---+


scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100%'");
res5: org.apache.spark.sql.DataFrame = [subject: string]

scala> res5.show();
+--+
|   subject|
+--+
| 100 times|
|1000 times|
|  100%|
+--+
{code}



was (Author: beliefer):
scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100^%' escape '^'");
res3: org.apache.spark.sql.DataFrame = [subject: string]

scala> res3.show();
20/03/21 17:51:07 ERROR LzoCodec: Failed to load/initialize native-lzo library
+---+   
|subject|
+---+
|   100%|
+---+


scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100%'");
res5: org.apache.spark.sql.DataFrame = [subject: string]

scala> res5.show();
+--+
|   subject|
+--+
| 100 times|
|1000 times|
|  100%|
+--+

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063831#comment-17063831
 ] 

jiaan.geng commented on SPARK-31210:


I cannot reproduce it. Could you paste detail info for it?

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-21 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063830#comment-17063830
 ] 

jiaan.geng commented on SPARK-31210:


scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100^%' escape '^'");
res3: org.apache.spark.sql.DataFrame = [subject: string]

scala> res3.show();
20/03/21 17:51:07 ERROR LzoCodec: Failed to load/initialize native-lzo library
+---+   
|subject|
+---+
|   100%|
+---+


scala> spark.sql("select * from xsql.test_table_like2 where subject like 
'100%'");
res5: org.apache.spark.sql.DataFrame = [subject: string]

scala> res5.show();
+--+
|   subject|
+--+
| 100 times|
|1000 times|
|  100%|
+--+

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

40 matches

Mail list logo