[jira] [Updated] (SPARK-31185) implement VarianceThresholdSelector
[ https://issues.apache.org/jira/browse/SPARK-31185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-31185: --- Issue Type: New Feature (was: Bug) > implement VarianceThresholdSelector > --- > > Key: SPARK-31185 > URL: https://issues.apache.org/jira/browse/SPARK-31185 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.1.0 > > > Implement a Feature selector that removes all low-variance features. Features > with a > variance lower than the threshold will be removed. The default is to keep > all features with non-zero variance, i.e. remove the features that have the > same value in all samples. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31185) implement VarianceThresholdSelector
[ https://issues.apache.org/jira/browse/SPARK-31185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-31185: Assignee: Huaxin Gao > implement VarianceThresholdSelector > --- > > Key: SPARK-31185 > URL: https://issues.apache.org/jira/browse/SPARK-31185 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > > Implement a Feature selector that removes all low-variance features. Features > with a > variance lower than the threshold will be removed. The default is to keep > all features with non-zero variance, i.e. remove the features that have the > same value in all samples. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31185) implement VarianceThresholdSelector
[ https://issues.apache.org/jira/browse/SPARK-31185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-31185. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 27954 [https://github.com/apache/spark/pull/27954] > implement VarianceThresholdSelector > --- > > Key: SPARK-31185 > URL: https://issues.apache.org/jira/browse/SPARK-31185 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.1.0 > > > Implement a Feature selector that removes all low-variance features. Features > with a > variance lower than the threshold will be removed. The default is to keep > all features with non-zero variance, i.e. remove the features that have the > same value in all samples. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31214) Upgrade Janino to 3.1.2
Dongjoon Hyun created SPARK-31214: - Summary: Upgrade Janino to 3.1.2 Key: SPARK-31214 URL: https://issues.apache.org/jira/browse/SPARK-31214 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31101) Upgrade Janino to 3.0.16
[ https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31101: - Assignee: Jungtaek Lim > Upgrade Janino to 3.0.16 > > > Key: SPARK-31101 > URL: https://issues.apache.org/jira/browse/SPARK-31101 > Project: Spark > Issue Type: Dependency upgrade > Components: SQL >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > We got some report on failure on user's query which Janino throws error on > compiling generated code. The issue is here: janino-compiler/janino#113 It > contains the information of generated code, symptom (error), and analysis of > the bug, so please refer the link for more details. > Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable > Janino to succeed to compile user's query properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31101) Upgrade Janino to 3.0.16
[ https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31101. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 27932 [https://github.com/apache/spark/pull/27932] > Upgrade Janino to 3.0.16 > > > Key: SPARK-31101 > URL: https://issues.apache.org/jira/browse/SPARK-31101 > Project: Spark > Issue Type: Dependency upgrade > Components: SQL >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.1.0 > > > We got some report on failure on user's query which Janino throws error on > compiling generated code. The issue is here: janino-compiler/janino#113 It > contains the information of generated code, symptom (error), and analysis of > the bug, so please refer the link for more details. > Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable > Janino to succeed to compile user's query properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31101) Upgrade Janino to 3.0.16
[ https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31101: -- Description: We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details. Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly. was:This issue is to track the effort on upgrading Janino 3.1.1 which contains the fix for [https://github.com/janino-compiler/janino/issues/113] which we encountered from Spark, as well as [https://github.com/janino-compiler/janino/issues/90] which Josh filed an issue and 3.1.1 seems to fix it. > Upgrade Janino to 3.0.16 > > > Key: SPARK-31101 > URL: https://issues.apache.org/jira/browse/SPARK-31101 > Project: Spark > Issue Type: Dependency upgrade > Components: SQL >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Priority: Major > > We got some report on failure on user's query which Janino throws error on > compiling generated code. The issue is here: janino-compiler/janino#113 It > contains the information of generated code, symptom (error), and analysis of > the bug, so please refer the link for more details. > Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable > Janino to succeed to compile user's query properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31101) Upgrade Janino to 3.0.16
[ https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31101: -- Summary: Upgrade Janino to 3.0.16 (was: Upgrade Janino to 3.1.1) > Upgrade Janino to 3.0.16 > > > Key: SPARK-31101 > URL: https://issues.apache.org/jira/browse/SPARK-31101 > Project: Spark > Issue Type: Dependency upgrade > Components: SQL >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Priority: Major > > This issue is to track the effort on upgrading Janino 3.1.1 which contains > the fix for [https://github.com/janino-compiler/janino/issues/113] which we > encountered from Spark, as well as > [https://github.com/janino-compiler/janino/issues/90] which Josh filed an > issue and 3.1.1 seems to fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30541) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite
[ https://issues.apache.org/jira/browse/SPARK-30541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-30541: - Assignee: Gabor Somogyi > Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite > --- > > Key: SPARK-30541 > URL: https://issues.apache.org/jira/browse/SPARK-30541 > Project: Spark > Issue Type: Bug > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Gabor Somogyi >Priority: Blocker > Attachments: consoleText_NOK.txt, consoleText_OK.txt, > unit-tests_NOK.log, unit-tests_OK.log > > > The test suite has been failing intermittently as of now: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116862/testReport/] > > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it > is a sbt.testing.SuiteSelector) > > {noformat} > Error Details > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 3939 times over > 1.000122353532 minutes. Last failure message: KeeperErrorCode = > AuthFailed for /brokers/ids. > Stack Trace > sbt.ForkMain$ForkError: > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 3939 times over > 1.000122353532 minutes. Last failure message: KeeperErrorCode = > AuthFailed for /brokers/ids. > at > org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) > at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:337) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:336) > at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:292) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: > org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = > AuthFailed for /brokers/ids > at org.apache.zookeeper.KeeperException.create(KeeperException.java:130) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) > at > kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:554) > at kafka.zk.KafkaZkClient.getChildren(KafkaZkClient.scala:719) > at kafka.zk.KafkaZkClient.getSortedBrokerList(KafkaZkClient.scala:455) > at > kafka.zk.KafkaZkClient.getAllBrokersInCluster(KafkaZkClient.scala:404) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$setup$3(KafkaTestUtils.scala:293) > at > org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395) > at > org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:409) > ... 20 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30541) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite
[ https://issues.apache.org/jira/browse/SPARK-30541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30541. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27877 [https://github.com/apache/spark/pull/27877] > Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite > --- > > Key: SPARK-30541 > URL: https://issues.apache.org/jira/browse/SPARK-30541 > Project: Spark > Issue Type: Bug > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Gabor Somogyi >Priority: Blocker > Fix For: 3.0.0 > > Attachments: consoleText_NOK.txt, consoleText_OK.txt, > unit-tests_NOK.log, unit-tests_OK.log > > > The test suite has been failing intermittently as of now: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116862/testReport/] > > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it > is a sbt.testing.SuiteSelector) > > {noformat} > Error Details > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 3939 times over > 1.000122353532 minutes. Last failure message: KeeperErrorCode = > AuthFailed for /brokers/ids. > Stack Trace > sbt.ForkMain$ForkError: > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 3939 times over > 1.000122353532 minutes. Last failure message: KeeperErrorCode = > AuthFailed for /brokers/ids. > at > org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) > at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:337) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:336) > at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:292) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: > org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = > AuthFailed for /brokers/ids > at org.apache.zookeeper.KeeperException.create(KeeperException.java:130) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) > at > kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:554) > at kafka.zk.KafkaZkClient.getChildren(KafkaZkClient.scala:719) > at kafka.zk.KafkaZkClient.getSortedBrokerList(KafkaZkClient.scala:455) > at > kafka.zk.KafkaZkClient.getAllBrokersInCluster(KafkaZkClient.scala:404) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$setup$3(KafkaTestUtils.scala:293) > at > org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395) > at > org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:409) > ... 20 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31213) Arrange the configuration of Spark SQL
jiaan.geng created SPARK-31213: -- Summary: Arrange the configuration of Spark SQL Key: SPARK-31213 URL: https://issues.apache.org/jira/browse/SPARK-31213 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: jiaan.geng The configuration of Spark SQL is a bit messy. Sorting them out can improve readability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31184) Support getTablesByType API of Hive Client
[ https://issues.apache.org/jira/browse/SPARK-31184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31184. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 27952 [https://github.com/apache/spark/pull/27952] > Support getTablesByType API of Hive Client > -- > > Key: SPARK-31184 > URL: https://issues.apache.org/jira/browse/SPARK-31184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xin Wu >Assignee: Xin Wu >Priority: Major > Fix For: 3.1.0 > > > Hive 2.3+ supports getTablesByType API, which is a precondition to implement > SHOW VIEWS in HiveExternalCatalog. Currently, without this API, we can not > get hive table with type HiveTableType.VIRTUAL_VIEW directly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31184) Support getTablesByType API of Hive Client
[ https://issues.apache.org/jira/browse/SPARK-31184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31184: - Assignee: Xin Wu > Support getTablesByType API of Hive Client > -- > > Key: SPARK-31184 > URL: https://issues.apache.org/jira/browse/SPARK-31184 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xin Wu >Assignee: Xin Wu >Priority: Major > > Hive 2.3+ supports getTablesByType API, which is a precondition to implement > SHOW VIEWS in HiveExternalCatalog. Currently, without this API, we can not > get hive table with type HiveTableType.VIRTUAL_VIEW directly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31209) Not compatible with new version of scalatest (3.1.0 and above)
[ https://issues.apache.org/jira/browse/SPARK-31209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-31209: Affects Version/s: (was: 2.4.5) 3.1.0 > Not compatible with new version of scalatest (3.1.0 and above) > -- > > Key: SPARK-31209 > URL: https://issues.apache.org/jira/browse/SPARK-31209 > Project: Spark > Issue Type: Dependency upgrade > Components: Tests >Affects Versions: 3.1.0 >Reporter: Timothy Zhang >Priority: Major > > Since ScalaTest's style traits and classes were moved and renamed > ([http://www.scalatest.org/release_notes/3.1.0]) there are errors as not find > FunSpec when I add new version of scalatest in library dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view
[ https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30494: -- Affects Version/s: 2.0.2 2.1.3 > Duplicates cached RDD when create or replace an existing view > - > > Key: SPARK-30494 > URL: https://issues.apache.org/jira/browse/SPARK-30494 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0 >Reporter: Lantao Jin >Priority: Major > > We can reproduce by below commands: > {code} > beeline> create or replace temporary view temp1 as select 1 > beeline> cache table temp1 > beeline> create or replace temporary view temp1 as select 1, 2 > beeline> cache table temp1 > {code} > The cached RDD for plan "select 1" stays in memory forever until the session > close. This cached data cannot be used since the view temp1 has been replaced > by another plan. It's a memory leak. > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, > 2")).isDefined) > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select > 1")).isDefined) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view
[ https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30494: -- Description: We can reproduce by below commands: {code} beeline> create or replace temporary view temp1 as select 1 beeline> cache table temp1 beeline> create or replace temporary view temp1 as select 1, 2 beeline> cache table temp1 {code} The cached RDD for plan "select 1" stays in memory forever until the session close. This cached data cannot be used since the view temp1 has been replaced by another plan. It's a memory leak. assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 2")).isDefined) assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1")).isDefined) was: We can reproduce by below commands: {code} beeline> create or replace temporary view temp1 as select 1 beeline> cache table tempView beeline> create or replace temporary view temp1 as select 1, 2 beeline> cache table tempView {code} The cached RDD for plan "select 1" stays in memory forever until the session close. This cached data cannot be used since the view temp1 has been replaced by another plan. It's a memory leak. assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 2")).isDefined) assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1")).isDefined) > Duplicates cached RDD when create or replace an existing view > - > > Key: SPARK-30494 > URL: https://issues.apache.org/jira/browse/SPARK-30494 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.5, 3.0.0 >Reporter: Lantao Jin >Priority: Major > > We can reproduce by below commands: > {code} > beeline> create or replace temporary view temp1 as select 1 > beeline> cache table temp1 > beeline> create or replace temporary view temp1 as select 1, 2 > beeline> cache table temp1 > {code} > The cached RDD for plan "select 1" stays in memory forever until the session > close. This cached data cannot be used since the view temp1 has been replaced > by another plan. It's a memory leak. > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, > 2")).isDefined) > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select > 1")).isDefined) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view
[ https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30494: -- Affects Version/s: 2.2.3 > Duplicates cached RDD when create or replace an existing view > - > > Key: SPARK-30494 > URL: https://issues.apache.org/jira/browse/SPARK-30494 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.5, 3.0.0 >Reporter: Lantao Jin >Priority: Major > > We can reproduce by below commands: > {code} > beeline> create or replace temporary view temp1 as select 1 > beeline> cache table tempView > beeline> create or replace temporary view temp1 as select 1, 2 > beeline> cache table tempView > {code} > The cached RDD for plan "select 1" stays in memory forever until the session > close. This cached data cannot be used since the view temp1 has been replaced > by another plan. It's a memory leak. > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, > 2")).isDefined) > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select > 1")).isDefined) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view
[ https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30494: -- Affects Version/s: 2.3.4 > Duplicates cached RDD when create or replace an existing view > - > > Key: SPARK-30494 > URL: https://issues.apache.org/jira/browse/SPARK-30494 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Lantao Jin >Priority: Major > > We can reproduce by below commands: > {code} > beeline> create or replace temporary view temp1 as select 1 > beeline> cache table tempView > beeline> create or replace temporary view temp1 as select 1, 2 > beeline> cache table tempView > {code} > The cached RDD for plan "select 1" stays in memory forever until the session > close. This cached data cannot be used since the view temp1 has been replaced > by another plan. It's a memory leak. > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, > 2")).isDefined) > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select > 1")).isDefined) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30494) Duplicates cached RDD when create or replace an existing view
[ https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30494: -- Affects Version/s: 2.4.5 > Duplicates cached RDD when create or replace an existing view > - > > Key: SPARK-30494 > URL: https://issues.apache.org/jira/browse/SPARK-30494 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5, 3.0.0 >Reporter: Lantao Jin >Priority: Major > > We can reproduce by below commands: > {code} > beeline> create or replace temporary view temp1 as select 1 > beeline> cache table tempView > beeline> create or replace temporary view temp1 as select 1, 2 > beeline> cache table tempView > {code} > The cached RDD for plan "select 1" stays in memory forever until the session > close. This cached data cannot be used since the view temp1 has been replaced > by another plan. It's a memory leak. > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, > 2")).isDefined) > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select > 1")).isDefined) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30494) Duplicates cached RDD when create or replace an existing view
[ https://issues.apache.org/jira/browse/SPARK-30494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064093#comment-17064093 ] Dongjoon Hyun commented on SPARK-30494: --- Hi, [~cltlfcjin]. How about older versions, 2.3.4 and 2.4.5? > Duplicates cached RDD when create or replace an existing view > - > > Key: SPARK-30494 > URL: https://issues.apache.org/jira/browse/SPARK-30494 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Lantao Jin >Priority: Major > > We can reproduce by below commands: > {code} > beeline> create or replace temporary view temp1 as select 1 > beeline> cache table tempView > beeline> create or replace temporary view temp1 as select 1, 2 > beeline> cache table tempView > {code} > The cached RDD for plan "select 1" stays in memory forever until the session > close. This cached data cannot be used since the view temp1 has been replaced > by another plan. It's a memory leak. > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, > 2")).isDefined) > assert(spark.sharedState.cacheManager.lookupCachedData(sql("select > 1")).isDefined) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type
[ https://issues.apache.org/jira/browse/SPARK-31212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064017#comment-17064017 ] Maxim Gekk commented on SPARK-31212: The isLeapYear() function in 2.4 assumes Proleptic Gregorian calendar: https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L600-L602 but actually Spark 2.4 is based on the hybrid calendar Julian+Gregorian as we can see at https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L513-L517 It means the following functions in DateTimeUtils return incorrect results for dates before Gregorian cutover days: # getQuarter # splitDate # getMonth # getDayOfMonth # firstDayOfMonth # dateAddMonths # stringToTimestamp # stringToDate # monthsBetween # getLastDayOfMonth /cc [~cloud_fan] [~hyukjin.kwon] > Failure of casting the '1000-02-29' string to the date type > --- > > Key: SPARK-31212 > URL: https://issues.apache.org/jira/browse/SPARK-31212 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5 >Reporter: Maxim Gekk >Priority: Major > > The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for > dates before 1582-10-15 but casting the string to the date type fails: > {code:scala} > scala> val df = > Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date")) > df: org.apache.spark.sql.DataFrame = [date: date] > scala> df.show > ++ > |date| > ++ > |null| > ++ > {code} > Creating a dataset from java.sql.Date w/ the same input string works > correctly: > {code:scala} > scala> val df2 = > Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date")) > df2: org.apache.spark.sql.DataFrame = [date: date] > scala> df2.show > +--+ > | date| > +--+ > |1000-02-29| > +--+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type
Maxim Gekk created SPARK-31212: -- Summary: Failure of casting the '1000-02-29' string to the date type Key: SPARK-31212 URL: https://issues.apache.org/jira/browse/SPARK-31212 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.5 Reporter: Maxim Gekk The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for dates before 1582-10-15 but casting the string to the date type fails: {code:scala} scala> val df = Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date")) df: org.apache.spark.sql.DataFrame = [date: date] scala> df.show ++ |date| ++ |null| ++ {code} Creating a dataset from java.sql.Date w/ the same input string works correctly: {code:scala} scala> val df2 = Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date")) df2: org.apache.spark.sql.DataFrame = [date: date] scala> df2.show +--+ | date| +--+ |1000-02-29| +--+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31199) Separate connection timeout and idle timeout for netty
[ https://issues.apache.org/jira/browse/SPARK-31199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runnings updated SPARK-31199: - Description: io.connectionTimeout only used for connection timeout for connection setup while io.idleTimeout is used to control how long to kill the connection if it seems to be idle([https://github.com/apache/spark/pull/5584]) These 2 timeouts could be quite different. connectiontimeout could be short to help fast fail the connection related problem when meet some high load/unstable nodes. While idle timeout may be related to business function performance which is more complicated. was: io.connectionTimeout only used for connection timeout for connection setup while io.idleTimeout is used to control how long to kill the connection if it seems to be idle([https://github.com/apache/spark/pull/5584]) These 2 timeouts could be quite different and shorten connectiontimeout could help fast fail the connection related problem in some cases like when doing shuffle, we could fast fail the task and retry. > Separate connection timeout and idle timeout for netty > -- > > Key: SPARK-31199 > URL: https://issues.apache.org/jira/browse/SPARK-31199 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: runnings >Priority: Major > > io.connectionTimeout only used for connection timeout for connection setup > while io.idleTimeout is used to control how long to kill the connection if it > seems to be idle([https://github.com/apache/spark/pull/5584]) > > These 2 timeouts could be quite different. connectiontimeout could be short > to help fast fail the connection related problem when meet some high > load/unstable nodes. While idle timeout may be related to business function > performance which is more complicated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27097) Avoid embedding platform-dependent offsets literally in whole-stage generated code
[ https://issues.apache.org/jira/browse/SPARK-27097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063909#comment-17063909 ] angerszhu commented on SPARK-27097: --- [~irashid] to be honest, I meet this problem these days. [~dbtsai] I have some question. We start a self-developed thrift server program and use spark as compute engine with below javaOptions parameter {color:#e14141}-Xmx64g {color} {color:#e14141}-Djava.library.path=/home/hadoop/hadoop/lib/native {color} {color:#e14141}-Djavax.security.auth.useSubjectCredsOnly=false {color} {color:#e14141}-Dcom.sun.management.jmxremote.port=9021 {color} {color:#e14141}-Dcom.sun.management.jmxremote.authenticate=false {color} {color:#e14141}-Dcom.sun.management.jmxremote.ssl=false {color} {color:#e14141}-XX:MaxPermSize=1024m -XX:PermSize=256m -XX:MaxDirectMemorySize=8192m -XX:-TraceClassUnloading {color} {color:#e14141}-XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:+PrintTenuringDistribution -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 -Xnoclassgc -XX:+PrintGCDetails -XX:+PrintGCDateStamps {color} {color:#e14141} {color} {color:#e14141} {color} Then the {color:#347eec}Platform{color}{color:#e14141}.{color} {color:#347eec}BYTE_ARRAY_OFFSET{color} will be 24, when we start a normal spark thrift server, the value will be 16, this problem cause strange data corruption. After few days check, I located the problem because of spark *codegen*, and this pr can fix our problem , but I can’t find evidence why Platform.BYTE_ARRAY_OFFSET will be 24 in above parameter. Since I test in local that when we set {color:#e14141} -XX:+ UseCompressedOops, {color} using pointer compression it's going to be 16. {color:#e14141} -XX:- UseCompressedOops, {color} not using pointer compression it's going to be 24. This is easy to understand why the offset is not same. But I don’t know why above parameter will be 24 since I am not a professor about java compiler and Basic computer knowledge. Can you give me some advisor or information about how to understand and find the root cause. > Avoid embedding platform-dependent offsets literally in whole-stage generated > code > -- > > Key: SPARK-27097 > URL: https://issues.apache.org/jira/browse/SPARK-27097 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.1.3, 2.2.3, 2.3.4, 2.4.0 >Reporter: Xiao Li >Assignee: Kris Mok >Priority: Critical > Labels: correctness > Fix For: 2.4.1 > > > Avoid embedding platform-dependent offsets literally in whole-stage generated > code. > Spark SQL performs whole-stage code generation to speed up query execution. > There are two steps to it: > Java source code is generated from the physical query plan on the driver. A > single version of the source code is generated from a query plan, and sent to > all executors. > It's compiled to bytecode on the driver to catch compilation errors before > sending to executors, but currently only the generated source code gets sent > to the executors. The bytecode compilation is for fail-fast only. > Executors receive the generated source code and compile to bytecode, then the > query runs like a hand-written Java program. > In this model, there's an implicit assumption about the driver and executors > being run on similar platforms. Some code paths accidentally embedded > platform-dependent object layout information into the generated code, such as: > {code:java} > Platform.putLong(buffer, /* offset */ 24, /* value */ 1); > {code} > This code expects a field to be at offset +24 of the buffer object, and sets > a value to that field. > But whole-stage code generation generally uses platform-dependent information > from the driver. If the object layout is significantly different on the > driver and executors, the generated code can be reading/writing to wrong > offsets on the executors, causing all kinds of data corruption. > One code pattern that leads to such problem is the use of Platform.XXX > constants in generated code, e.g. Platform.BYTE_ARRAY_OFFSET. > Bad: > {code:java} > val baseOffset = Platform.BYTE_ARRAY_OFFSET > // codegen template: > s"Platform.putLong($buffer, $baseOffset, $value);" > This will embed the value of Platform.BYTE_ARRAY_OFFSET on the driver into > the generated code. > {code} > Good: > {code:java} > val baseOffset = "Platform.BYTE_ARRAY_OFFSET" > // codegen template: > s"Platform.putLong($buffer, $baseOffset, $value);" > This will generate the offset symbolically -- Platform.putLong(buffer
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063851#comment-17063851 ] jiaan.geng edited comment on SPARK-31210 at 3/21/20, 11:33 AM: --- At finally, I guess your problem resolved by https://issues.apache.org/jira/browse/SPARK-30254 Before this ticket, the issue exists. I build with master branch contains this bug fix. was (Author: beliefer): At finally, I guess your problem resolved by https://issues.apache.org/jira/browse/SPARK-30254 Before this ticket, the issue exists. > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063851#comment-17063851 ] jiaan.geng commented on SPARK-31210: At finally, I guess your problem resolved by https://issues.apache.org/jira/browse/SPARK-30254 Before this ticket, the issue exists. > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063830#comment-17063830 ] jiaan.geng edited comment on SPARK-31210 at 3/21/20, 10:40 AM: --- {code:java} scala> spark.sql("select * from test_table_like2 where subject like '100^%' escape '^'"); res3: org.apache.spark.sql.DataFrame = [subject: string] scala> res3.show(); +---+ |subject| +---+ | 100%| +---+ scala> spark.sql("select * from test_table_like2 where subject like '100%'"); res5: org.apache.spark.sql.DataFrame = [subject: string] scala> res5.show(); +--+ | subject| +--+ | 100 times| |1000 times| | 100%| +--+ {code} was (Author: beliefer): {code:java} scala> spark.sql("select * from xsql.test_table_like2 where subject like '100^%' escape '^'"); res3: org.apache.spark.sql.DataFrame = [subject: string] scala> res3.show(); +---+ |subject| +---+ | 100%| +---+ scala> spark.sql("select * from xsql.test_table_like2 where subject like '100%'"); res5: org.apache.spark.sql.DataFrame = [subject: string] scala> res5.show(); +--+ | subject| +--+ | 100 times| |1000 times| | 100%| +--+ {code} > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063849#comment-17063849 ] jiaan.geng edited comment on SPARK-31210 at 3/21/20, 10:40 AM: --- The content of /home/test/test_table_like.json show below: {code:java} {"subject":"100 times"} {"subject":"1000 times"} {"subject":"100%"} {code} I follow your example and run it {code:java} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60) Type in expressions to have them evaluated. Type :help for more information. scala> val path = "/home/test/test_table_like.json" path: String = /home/test/test_table_like.json scala> val tmpdf = spark.read.json(path) tmpdf: org.apache.spark.sql.DataFrame = [subject: string] scala> tmpdf.createOrReplaceTempView("test_table_like") scala> spark.sql("select * from test_table_like where subject like '100%' order by 1") res1: org.apache.spark.sql.DataFrame = [subject: string] scala> res1.show() +--+ | subject| +--+ | 100 times| | 100%| |1000 times| +--+ {code} was (Author: beliefer): The content of /home/test/test_table_like.json show below: {code:java} {"subject":"100 times"} {"subject":"1000 times"} {"subject":"100%"} {code} I follow your example and run it {code:java} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60) Type in expressions to have them evaluated. Type :help for more information. scala> val path = "/home/test/test_table_like.json" path: String = /home/test/test_table_like.json scala> val tmpdf = spark.read.json(path) 20/03/21 18:33:56 ERROR LzoCodec: Failed to load/initialize native-lzo library tmpdf: org.apache.spark.sql.DataFrame = [subject: string] scala> tmpdf.createOrReplaceTempView("test_table_like") scala> spark.sql("select * from test_table_like where subject like '100%' order by 1") res1: org.apache.spark.sql.DataFrame = [subject: string] scala> res1.show() +--+ | subject| +--+ | 100 times| | 100%| |1000 times| +--+ {code} > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063849#comment-17063849 ] jiaan.geng commented on SPARK-31210: The content of /home/test/test_table_like.json show below: {code:java} {"subject":"100 times"} {"subject":"1000 times"} {"subject":"100%"} {code} I follow your example and run it {code:java} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60) Type in expressions to have them evaluated. Type :help for more information. scala> val path = "/home/test/test_table_like.json" path: String = /home/test/test_table_like.json scala> val tmpdf = spark.read.json(path) 20/03/21 18:33:56 ERROR LzoCodec: Failed to load/initialize native-lzo library tmpdf: org.apache.spark.sql.DataFrame = [subject: string] scala> tmpdf.createOrReplaceTempView("test_table_like") scala> spark.sql("select * from test_table_like where subject like '100%' order by 1") res1: org.apache.spark.sql.DataFrame = [subject: string] scala> res1.show() +--+ | subject| +--+ | 100 times| | 100%| |1000 times| +--+ {code} > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31211) Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5
Maxim Gekk created SPARK-31211: -- Summary: Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5 Key: SPARK-31211 URL: https://issues.apache.org/jira/browse/SPARK-31211 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for instance 1000-02-29: {code} $ export TZ="America/Los_Angeles" {code} {code:scala} scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") scala> df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap") scala> val df = Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date")) df: org.apache.spark.sql.DataFrame = [date: date] scala> df.show +--+ | date| +--+ |1000-02-29| +--+ scala> df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap") {code} Load the parquet files back by Spark 3.1.0-SNAPSHOT: {code:scala} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show +--+ | date| +--+ |1000-03-06| +--+ scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true) scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show 20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a leap year at java.time.LocalDate.create(LocalDate.java:429) at java.time.LocalDate.of(LocalDate.java:269) at org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063743#comment-17063743 ] jiaan.geng edited comment on SPARK-31210 at 3/21/20, 10:10 AM: --- {code:java} spark-sql> create table test_table_like2 (subject string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Time taken: 1.455 seconds spark-sql> insert into test_table_like2 values ('100 times'), ('1000 times'), ('100%'); Moved to trash: /home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1/-ext-1/_temporary Moved to trash: /home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1 Time taken: 6.55 seconds spark-sql> select * from test_table_like2 where subject like '100^%' escape '^'; 100% {code} was (Author: beliefer): {code:java} spark-sql> create table test_table_like2 (subject string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Time taken: 1.455 seconds spark-sql> insert into test_table_like2 values ('100 times'), ('1000 times'), ('100%'); Moved to trash: /home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1/-ext-1/_temporary Moved to trash: /home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1 Time taken: 6.55 seconds spark-sql> select * from test_table_like2 where subject like '100^%' escape '^'; 20/03/21 11:30:45 ERROR LzoCodec: Failed to load/initialize native-lzo library 100% {code} > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063830#comment-17063830 ] jiaan.geng edited comment on SPARK-31210 at 3/21/20, 10:10 AM: --- {code:java} scala> spark.sql("select * from xsql.test_table_like2 where subject like '100^%' escape '^'"); res3: org.apache.spark.sql.DataFrame = [subject: string] scala> res3.show(); +---+ |subject| +---+ | 100%| +---+ scala> spark.sql("select * from xsql.test_table_like2 where subject like '100%'"); res5: org.apache.spark.sql.DataFrame = [subject: string] scala> res5.show(); +--+ | subject| +--+ | 100 times| |1000 times| | 100%| +--+ {code} was (Author: beliefer): {code:java} scala> spark.sql("select * from xsql.test_table_like2 where subject like '100^%' escape '^'"); res3: org.apache.spark.sql.DataFrame = [subject: string] scala> res3.show(); 20/03/21 17:51:07 ERROR LzoCodec: Failed to load/initialize native-lzo library +---+ |subject| +---+ | 100%| +---+ scala> spark.sql("select * from xsql.test_table_like2 where subject like '100%'"); res5: org.apache.spark.sql.DataFrame = [subject: string] scala> res5.show(); +--+ | subject| +--+ | 100 times| |1000 times| | 100%| +--+ {code} > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31199) Separate connection timeout and idle timeout for netty
[ https://issues.apache.org/jira/browse/SPARK-31199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runnings updated SPARK-31199: - Description: io.connectionTimeout only used for connection timeout for connection setup while io.idleTimeout is used to control how long to kill the connection if it seems to be idle([https://github.com/apache/spark/pull/5584]) These 2 timeouts could be quite different and shorten connectiontimeout could help fast fail the connection related problem in some cases like when doing shuffle, we could fast fail the task and retry. was: spark.shuffle.io.connectionTimeout only used for connection timeout for connection setup while spark.shuffle.io.idleTimeout is used to control how long to kill the connection if it seems to be idle([https://github.com/apache/spark/pull/5584]) These 2 timeouts could be quite different and shorten connectiontimeout could help fast fail the shuffle task in some cases > Separate connection timeout and idle timeout for netty > -- > > Key: SPARK-31199 > URL: https://issues.apache.org/jira/browse/SPARK-31199 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: runnings >Priority: Major > > io.connectionTimeout only used for connection timeout for connection setup > while io.idleTimeout is used to control how long to kill the connection if it > seems to be idle([https://github.com/apache/spark/pull/5584]) > > These 2 timeouts could be quite different and shorten connectiontimeout could > help fast fail the connection related problem in some cases like when doing > shuffle, we could fast fail the task and retry. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31199) Separate connection timeout and idle timeout for netty
[ https://issues.apache.org/jira/browse/SPARK-31199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runnings updated SPARK-31199: - Summary: Separate connection timeout and idle timeout for netty (was: Separate connection timeout and idle timeout for shuffle) > Separate connection timeout and idle timeout for netty > -- > > Key: SPARK-31199 > URL: https://issues.apache.org/jira/browse/SPARK-31199 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: runnings >Priority: Major > > spark.shuffle.io.connectionTimeout only used for connection timeout for > connection setup while spark.shuffle.io.idleTimeout is used to control how > long to kill the connection if it seems to be > idle([https://github.com/apache/spark/pull/5584]) > > These 2 timeouts could be quite different and shorten connectiontimeout could > help fast fail the shuffle task in some cases -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063744#comment-17063744 ] jiaan.geng edited comment on SPARK-31210 at 3/21/20, 9:56 AM: -- {code:java} spark-sql> select * from test_table_like2 where subject like '100^%' escape '^' order by 1; 100% Time taken: 12.261 seconds, Fetched 1 row(s) {code} was (Author: beliefer): spark-sql> select * from test_table_like2 where subject like '100^%' escape '^' order by 1; 100% Time taken: 12.261 seconds, Fetched 1 row(s) > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063743#comment-17063743 ] jiaan.geng edited comment on SPARK-31210 at 3/21/20, 9:56 AM: -- {code:java} spark-sql> create table test_table_like2 (subject string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Time taken: 1.455 seconds spark-sql> insert into test_table_like2 values ('100 times'), ('1000 times'), ('100%'); Moved to trash: /home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1/-ext-1/_temporary Moved to trash: /home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1 Time taken: 6.55 seconds spark-sql> select * from test_table_like2 where subject like '100^%' escape '^'; 20/03/21 11:30:45 ERROR LzoCodec: Failed to load/initialize native-lzo library 100% {code} was (Author: beliefer): spark-sql> create table test_table_like2 (subject string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Time taken: 1.455 seconds spark-sql> insert into test_table_like2 values ('100 times'), ('1000 times'), ('100%'); Moved to trash: /home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1/-ext-1/_temporary Moved to trash: /home/xitong/hive/stagingdir_hive_2020-03-21_11-28-44_313_1325551588233295250-1 Time taken: 6.55 seconds spark-sql> select * from test_table_like2 where subject like '100^%' escape '^'; 20/03/21 11:30:45 ERROR LzoCodec: Failed to load/initialize native-lzo library 100% > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063742#comment-17063742 ] jiaan.geng edited comment on SPARK-31210 at 3/21/20, 9:56 AM: -- {code:java} spark-sql> create or replace temporary view test_table_like as SELECT * FROM VALUES ('100 times'), ('1000 times'), ('100%') as test_table_like (subject); Time taken: 0.143 seconds spark-sql> select * from test_table_like where subject like '100^%' escape '^'; 100% Time taken: 0.132 seconds, Fetched 1 row(s) {code} was (Author: beliefer): spark-sql> create or replace temporary view test_table_like as SELECT * FROM VALUES ('100 times'), ('1000 times'), ('100%') as test_table_like (subject); Time taken: 0.143 seconds spark-sql> select * from test_table_like where subject like '100^%' escape '^'; 100% Time taken: 0.132 seconds, Fetched 1 row(s) > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063830#comment-17063830 ] jiaan.geng edited comment on SPARK-31210 at 3/21/20, 9:56 AM: -- {code:java} scala> spark.sql("select * from xsql.test_table_like2 where subject like '100^%' escape '^'"); res3: org.apache.spark.sql.DataFrame = [subject: string] scala> res3.show(); 20/03/21 17:51:07 ERROR LzoCodec: Failed to load/initialize native-lzo library +---+ |subject| +---+ | 100%| +---+ scala> spark.sql("select * from xsql.test_table_like2 where subject like '100%'"); res5: org.apache.spark.sql.DataFrame = [subject: string] scala> res5.show(); +--+ | subject| +--+ | 100 times| |1000 times| | 100%| +--+ {code} was (Author: beliefer): scala> spark.sql("select * from xsql.test_table_like2 where subject like '100^%' escape '^'"); res3: org.apache.spark.sql.DataFrame = [subject: string] scala> res3.show(); 20/03/21 17:51:07 ERROR LzoCodec: Failed to load/initialize native-lzo library +---+ |subject| +---+ | 100%| +---+ scala> spark.sql("select * from xsql.test_table_like2 where subject like '100%'"); res5: org.apache.spark.sql.DataFrame = [subject: string] scala> res5.show(); +--+ | subject| +--+ | 100 times| |1000 times| | 100%| +--+ > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063831#comment-17063831 ] jiaan.geng commented on SPARK-31210: I cannot reproduce it. Could you paste detail info for it? > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063830#comment-17063830 ] jiaan.geng commented on SPARK-31210: scala> spark.sql("select * from xsql.test_table_like2 where subject like '100^%' escape '^'"); res3: org.apache.spark.sql.DataFrame = [subject: string] scala> res3.show(); 20/03/21 17:51:07 ERROR LzoCodec: Failed to load/initialize native-lzo library +---+ |subject| +---+ | 100%| +---+ scala> spark.sql("select * from xsql.test_table_like2 where subject like '100%'"); res5: org.apache.spark.sql.DataFrame = [subject: string] scala> res5.show(); +--+ | subject| +--+ | 100 times| |1000 times| | 100%| +--+ > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org