[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747749#comment-16747749 ] Nikolay Izhikov commented on IGNITE-10314: -- https://github.com/apache/ignite/commit/015166804eb2d7fd596e8458a0eaf0a3c3f49edd - merged to master [~ldz] Thanks you, for the contribution! > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray Liu >Assignee: Ray Liu >Priority: Critical > Fix For: 2.8 > > Time Spent: 673.5h > Remaining Estimate: 0h > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Use GridQueryTypeDescriptor to replace QueryEntity -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747673#comment-16747673 ] Ray Liu commented on IGNITE-10314: -- [~NIzhikov] Thanks for the review, I've fixed the format issue you mentioned. And run the tests again, it's all green. https://ci.ignite.apache.org/viewLog.html?buildId=2857575 > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray Liu >Assignee: Ray Liu >Priority: Critical > Fix For: 2.8 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Use GridQueryTypeDescriptor to replace QueryEntity -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747446#comment-16747446 ] Nikolay Izhikov commented on IGNITE-10314: -- Hello, [~ldz] Sorry for the pause in the review. Looks good to me! I think this PR is ready to be merged. I left some very minor comments regarding the code formatting. Please, fix them before the merge. Also, please, make sure you use spaces instead of tabs for indenting. package.scala#sqlCacheName package.scala#isValidSchema. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray Liu >Assignee: Ray Liu >Priority: Critical > Fix For: 2.8 > > Time Spent: 40m > Remaining Estimate: 0h > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Use GridQueryTypeDescriptor to replace QueryEntity -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730017#comment-16730017 ] Ray Liu commented on IGNITE-10314: -- [~NIzhikov] I have fixed the tests. Please check it at https://ci.ignite.apache.org/viewLog.html?buildId=2665712=queuedBuildOverviewTab > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray Liu >Assignee: Ray Liu >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Use GridQueryTypeDescriptor to replace QueryEntity -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729833#comment-16729833 ] Nikolay Izhikov commented on IGNITE-10314: -- [~ldz] Seems, last changes broke some tests https://ci.ignite.apache.org/viewLog.html?buildId=2663454=queuedBuildOverviewTab Please, fix the tests. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray Liu >Assignee: Ray Liu >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Use GridQueryTypeDescriptor to replace QueryEntity -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729376#comment-16729376 ] Ray Liu commented on IGNITE-10314: -- [~NIzhikov] I fixed most of issues according to your comments except for two. I left detailed reason in comments. Please check. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray Liu >Assignee: Ray Liu >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Use GridQueryTypeDescriptor to replace QueryEntity -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729304#comment-16729304 ] Nikolay Izhikov commented on IGNITE-10314: -- [~ldz] Thanks for the PR update! I left some comments in PR, please, check it. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray Liu >Assignee: Ray Liu >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Use GridQueryTypeDescriptor to replace QueryEntity -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728914#comment-16728914 ] Ray Liu commented on IGNITE-10314: -- Hello, [~NIzhikov] I have implemented the fix, please review and comment. The tests in teamcity is all green. Here's the link. https://ci.ignite.apache.org/viewLog.html?buildId=2650060 > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray Liu >Assignee: Ray Liu >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723826#comment-16723826 ] Nikolay Izhikov commented on IGNITE-10314: -- Hello, [~ldz] Thanks for the PR update. I left comment in your PR, please, take a look. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723088#comment-16723088 ] Ignite TC Bot commented on IGNITE-10314: {panel:title=-- Run :: All: Possible Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}_Licenses Headers_{color} [[tests 0 Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=2574960]] {panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=2536034buildTypeId=IgniteTests24Java8_RunAll] > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723013#comment-16723013 ] Dmitriy Pavlov commented on IGNITE-10314: - [~ldz] thank you, I've retriggered couple suites on TC for PR 5598 License& headers were broken in the master, so it is not related to the PR. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719740#comment-16719740 ] Ray commented on IGNITE-10314: -- [~NIzhikov] I have ran the tests in TeamCity, and the results is green. Here's the link. https://ci.ignite.apache.org/viewLog.html?buildId=2535640=queuedBuildOverviewTab > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718702#comment-16718702 ] Ray commented on IGNITE-10314: -- [~NIzhikov] There's a bug in my code causing the error, now I have fixed it. Please review and comment. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718646#comment-16718646 ] Nikolay Izhikov commented on IGNITE-10314: -- [~ldz] Thanks for the update. I'll try to take a look shortly. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718610#comment-16718610 ] Ray commented on IGNITE-10314: -- [~NIzhikov] I have implemented the refreshFields using internal API after Vladimir confirmed in the dev list. But when running tests in IgniteDataFrameSchemaSpec, there's some odd exception. Exception in thread "main" java.lang.AssertionError: assertion failed: each serializer expression should contain at least one `BoundReference` at scala.Predef$.assert(Predef.scala:170) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:236) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:236) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:428) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:233) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164) at org.apache.ignite.spark.IgniteDataFrameSchemaSpec.beforeAll(IgniteDataFrameSchemaSpec.scala:122) at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) at org.apache.ignite.spark.AbstractDataFrameSpec.beforeAll(AbstractDataFrameSpec.scala:39) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) at org.apache.ignite.spark.AbstractDataFrameSpec.org$scalatest$BeforeAndAfter$$super$run(AbstractDataFrameSpec.scala:39) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.ignite.spark.AbstractDataFrameSpec.run(AbstractDataFrameSpec.scala:39) at org.scalatest.junit.JUnitRunner.run(JUnitRunner.scala:99) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) Can you take a look please? I add breakpoint at refreshFields method, and this method is working fine, the latest fields are in the map. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715271#comment-16715271 ] Nikolay Izhikov commented on IGNITE-10314: -- [~ldz] I left comment in PR. Also, please, see my main into discussion thread. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712607#comment-16712607 ] Nikolay Izhikov commented on IGNITE-10314: -- [~ldz] Thanks for the PR. I will take a look in next few days. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712313#comment-16712313 ] ASF GitHub Bot commented on IGNITE-10314: - GitHub user ldzhjn opened a pull request: https://github.com/apache/ignite/pull/5598 IGNITE-10314 Spark dataframe will get wrong schema if user executes add/drop column DDL You can merge this pull request into a Git repository by running: $ git pull https://github.com/ldzhjn/ignite IGNITE-10314 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/5598.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5598 commit bcdb69747f9a90fa0f2f9204708c576b72f1de94 Author: rayliu Date: 2018-11-22T06:44:01Z Fix bug commit 8dcc21800a8eb7d1e731c362b4a53860274e85bd Author: rayliu Date: 2018-11-26T06:39:27Z Add tests commit 1d58fdf068b14ac9847a825dca3464148f9b7596 Author: rayliu Date: 2018-11-26T06:53:16Z Optimize imports commit 336a291dd5a330b5e1e824e807187ef1960bb843 Author: rayliu Date: 2018-11-26T07:01:47Z Merge remote-tracking branch 'upstream/master' into IGNITE-10314 commit 304695781ac356ec3f81325c5ce3b244b9f01c8b Author: rayliu Date: 2018-12-06T09:46:28Z Merge remote-tracking branch 'upstream/master' into IGNITE-10314 > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712316#comment-16712316 ] Ray commented on IGNITE-10314: -- [~NIzhikov] I have implemented the fix for this issue, please review and comment. Some of the tests fail because currently there're bugs like IGNITE-10585 and IGNITE-10569 causing wrong table schema returned by thin JDBC driver. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the latest schema using JDBC thin driver's column metadata call, then > update fields in QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693532#comment-16693532 ] Nikolay Izhikov commented on IGNITE-10314: -- [~ldz] Using SQL commands to get schema makes sense for me. Looking forward for your contribution. Let me know if you need any help. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the schema using sql, get rid of QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691485#comment-16691485 ] Ray commented on IGNITE-10314: -- Currently, when user performs add/remove column DDL, the QueryEntity will not change. This result in Spark getting wrong schema because Spark relies on QueryEntity to construct data frame schema. After [~vozerov]'s reply in dev list, [http://apache-ignite-developers.2346864.n4.nabble.com/Schema-in-CacheConfig-is-not-updated-after-DDL-commands-Add-drop-column-Create-drop-index-td38002.html.] This behavior is by design, so I decide to fix this issue from the Spark side. So I propose this solution, instead of getting schema by QueryEntity I want to get schema by a SQL select command. [~NIzhikov], what do you think about this solution? If you think this solution OK then I'll start implementing. > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the schema using sql, get rid of QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10314) Spark dataframe will get wrong schema if user executes add/drop column DDL
[ https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691481#comment-16691481 ] ASF GitHub Bot commented on IGNITE-10314: - Github user ldzhjn closed the pull request at: https://github.com/apache/ignite/pull/5422 > Spark dataframe will get wrong schema if user executes add/drop column DDL > -- > > Key: IGNITE-10314 > URL: https://issues.apache.org/jira/browse/IGNITE-10314 > Project: Ignite > Issue Type: Bug > Components: spark >Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7 >Reporter: Ray >Assignee: Ray >Priority: Critical > Fix For: 2.8 > > > When user performs add/remove column in DDL, Spark will get the old/wrong > schema. > > Analyse > Currently Spark data frame API relies on QueryEntity to construct schema, but > QueryEntity in QuerySchema is a local copy of the original QueryEntity, so > the original QueryEntity is not updated when modification happens. > > Solution > Get the schema using sql, get rid of QueryEntity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)