[jira] [Comment Edited] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368823#comment-16368823 ] Kazuaki Ishizaki edited comment on SPARK-23427 at 2/19/18 7:01 AM: --- I got the OOM with the same stack trace for both configurations when I ran this program using 256gb heap. Of course, since we know that to throw OOM is a problem, I will look at this. We would like to confirm whether this issue depends on the option or not. was (Author: kiszk): I got the OOM with the same stack trace for both configurations when I ran this program using 256gb heap. > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368823#comment-16368823 ] Kazuaki Ishizaki commented on SPARK-23427: -- I got the OOM with the same stack trace for both configurations when I ran this program using 256gb heap. > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368728#comment-16368728 ] Kazuaki Ishizaki edited comment on SPARK-23427 at 2/19/18 12:49 AM: Thank you. I ran this program several times with 64GB heap size. I saw the following OOM in both cases `-1` or default (`10*1024`*1024`). I am running the program with other heap sizes. Is this OOM what you are seeing? If not, I would appreciate if you could upload stack trace when OOM occurred. {code} [info] org.apache.spark.sql.MyTest *** ABORTED *** (2 hours, 14 minutes, 36 seconds) [info] java.lang.OutOfMemoryError: [info] at java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161) [info] at java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155) [info] at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125) [info] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) [info] at java.lang.StringBuilder.append(StringBuilder.java:136) [info] at java.lang.StringBuilder.append(StringBuilder.java:131) [info] at scala.StringContext.standardInterpolator(StringContext.scala:125) [info] at scala.StringContext.s(StringContext.scala:95) [info] at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:199) [info] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) [info] at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) [info] at org.apache.spark.sql.Dataset.(Dataset.scala:190) [info] at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) [info] at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3295) [info] at org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3033) [info] at org.apache.spark.sql.MyTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(MyTest.scala:87) [info] at org.apache.spark.sql.catalyst.plans.PlanTestBase$class.withSQLConf(PlanTest.scala:176) [info] at org.apache.spark.sql.MyTest.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(MyTest.scala:27) [info] at org.apache.spark.sql.test.SQLTestUtilsBase$class.withSQLConf(SQLTestUtils.scala:167) [info] at org.apache.spark.sql.MyTest.withSQLConf(MyTest.scala:27) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply$mcV$sp(MyTest.scala:65) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65) ... {code} was (Author: kiszk): Thank you. I ran this program several times with 64GB heap size. I saw the following OOM in both cases `-1` or default (`10*1024`*1024`). I am running the program with other heap sizes. Is this OOM what you are seeing? If not, I would appreciate if you could upload stack trace when OOM occurred. {code:java} [info] org.apache.spark.sql.MyTest *** ABORTED *** (2 hours, 14 minutes, 36 seconds) [info] java.lang.OutOfMemoryError: [info] at java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161) [info] at java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155) [info] at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125) [info] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) [info] at java.lang.StringBuilder.append(StringBuilder.java:136) [info] at java.lang.StringBuilder.append(StringBuilder.java:131) [info] at scala.StringContext.standardInterpolator(StringContext.scala:125) [info] at scala.StringContext.s(StringContext.scala:95) [info] at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:199) [info] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) [info] at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) [info] at org.apache.spark.sql.Dataset.(Dataset.scala:190) [info] at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) [info] at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3295) [info] at org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3033) [info] at org.apache.spark.sql.MyTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(MyTest.scala:87) [info] at org.apache.spark.sql.catalyst.plans.PlanTestBase$class.withSQLConf(PlanTest.scala:176) [info] at org.apache.spark.sql.MyTest.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(MyTest.scala:27) [info] at org.apache.spark.sql.test.SQLTestUtilsBase$class.withSQLConf(SQLTestUtils.scala:167) [info] at org.apache.spark.sql.MyTest.withSQLConf(MyTest.scala:27) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply$mcV$sp(MyTest.scala:65) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65) ... {code:java} >
[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368728#comment-16368728 ] Kazuaki Ishizaki commented on SPARK-23427: -- Thank you. I ran this program several times with 64GB heap size. I saw the following OOM in both cases `-1` or default (`10*1024`*1024`). I am running the program with other heap sizes. Is this OOM what you are seeing? If not, I would appreciate if you could upload stack trace when OOM occurred. {code:java} [info] org.apache.spark.sql.MyTest *** ABORTED *** (2 hours, 14 minutes, 36 seconds) [info] java.lang.OutOfMemoryError: [info] at java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161) [info] at java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155) [info] at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125) [info] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) [info] at java.lang.StringBuilder.append(StringBuilder.java:136) [info] at java.lang.StringBuilder.append(StringBuilder.java:131) [info] at scala.StringContext.standardInterpolator(StringContext.scala:125) [info] at scala.StringContext.s(StringContext.scala:95) [info] at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:199) [info] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) [info] at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) [info] at org.apache.spark.sql.Dataset.(Dataset.scala:190) [info] at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) [info] at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3295) [info] at org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3033) [info] at org.apache.spark.sql.MyTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(MyTest.scala:87) [info] at org.apache.spark.sql.catalyst.plans.PlanTestBase$class.withSQLConf(PlanTest.scala:176) [info] at org.apache.spark.sql.MyTest.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(MyTest.scala:27) [info] at org.apache.spark.sql.test.SQLTestUtilsBase$class.withSQLConf(SQLTestUtils.scala:167) [info] at org.apache.spark.sql.MyTest.withSQLConf(MyTest.scala:27) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply$mcV$sp(MyTest.scala:65) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65) ... {code:java} > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23462) Improve the error message in `StructType`
Xiao Li created SPARK-23462: --- Summary: Improve the error message in `StructType` Key: SPARK-23462 URL: https://issues.apache.org/jira/browse/SPARK-23462 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Xiao Li The error message {{s"""Field "$name" does not exist."""}} is thrown when looking up an unknown field in StructType. In the error message, we should also contain the information about which columns/fields exist in this struct. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23462) Improve the error message in `StructType`
[ https://issues.apache.org/jira/browse/SPARK-23462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-23462: Labels: starter (was: ) > Improve the error message in `StructType` > - > > Key: SPARK-23462 > URL: https://issues.apache.org/jira/browse/SPARK-23462 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > Labels: starter > > The error message {{s"""Field "$name" does not exist."""}} is thrown when > looking up an unknown field in StructType. In the error message, we should > also contain the information about which columns/fields exist in this struct. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23461) vignettes should include model predictions for some ML models
[ https://issues.apache.org/jira/browse/SPARK-23461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-23461: - Description: eg. Linear Support Vector Machine (SVM) Classifier h4. Logistic Regression Tree - GBT, RF, DecisionTree (and ALS was disabled) By doing something like {{head(select(gmmFitted, "V1", "V2", "prediction"))}} was: eg. Linear Support Vector Machine (SVM) Classifier h4. Logistic Regression Tree (and ALS was disabled) By doing something like {{head(select(gmmFitted, "V1", "V2", "prediction"))}} > vignettes should include model predictions for some ML models > - > > Key: SPARK-23461 > URL: https://issues.apache.org/jira/browse/SPARK-23461 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1, 2.3.0 >Reporter: Felix Cheung >Priority: Major > > eg. > Linear Support Vector Machine (SVM) Classifier > h4. Logistic Regression > Tree - GBT, RF, DecisionTree > (and ALS was disabled) > By doing something like {{head(select(gmmFitted, "V1", "V2", "prediction"))}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23423) Application declines any offers when killed+active executors rich spark.dynamicAllocation.maxExecutors
[ https://issues.apache.org/jira/browse/SPARK-23423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368594#comment-16368594 ] Igor Berman commented on SPARK-23423: - [~skonto] today I haven't managed to run with dynamic allocation on, however attached details for following situation: one of executors failed while running without dyn.allocation. All parties seem like getting all the updates: slave agent, master, and even framework, but not MesosCoarseGrainedSchedulerBackend even though TaskSchedulerImpl did get "Lost executor 15" see [^no-dyn-allocation-failed-no-statusUpdate.txt] I'll enable dyn.allocation tomorrow. Meanwhile, do you think the attached behavior might be problematic? > Application declines any offers when killed+active executors rich > spark.dynamicAllocation.maxExecutors > -- > > Key: SPARK-23423 > URL: https://issues.apache.org/jira/browse/SPARK-23423 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.2.1 >Reporter: Igor Berman >Priority: Major > Labels: Mesos, dynamic_allocation > Attachments: no-dyn-allocation-failed-no-statusUpdate.txt > > > Hi > Mesos Version:1.1.0 > I've noticed rather strange behavior of MesosCoarseGrainedSchedulerBackend > when running on Mesos with dynamic allocation on and limiting number of max > executors by spark.dynamicAllocation.maxExecutors. > Suppose we have long running driver that has cyclic pattern of resource > consumption(with some idle times in between), due to dyn.allocation it > receives offers and then releases them after current chunk of work processed. > Since at > [https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L573] > the backend compares numExecutors < executorLimit and > numExecutors is defined as slaves.values.map(_.taskIDs.size).sum and slaves > holds all slaves ever "met", i.e. both active and killed (see comment > [https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L122)] > > On the other hand, number of taskIds should be updated due to statusUpdate, > but suppose this update is lost(actually I don't see logs of 'is now > TASK_KILLED') so this number of executors might be wrong > > I've created test that "reproduces" this behavior, not sure how good it is: > {code:java} > //MesosCoarseGrainedSchedulerBackendSuite > test("max executors registered stops to accept offers when dynamic allocation > enabled") { > setBackend(Map( > "spark.dynamicAllocation.maxExecutors" -> "1", > "spark.dynamicAllocation.enabled" -> "true", > "spark.dynamicAllocation.testing" -> "true")) > backend.doRequestTotalExecutors(1) > val (mem, cpu) = (backend.executorMemory(sc), 4) > val offer1 = createOffer("o1", "s1", mem, cpu) > backend.resourceOffers(driver, List(offer1).asJava) > verifyTaskLaunched(driver, "o1") > backend.doKillExecutors(List("0")) > verify(driver, times(1)).killTask(createTaskId("0")) > val offer2 = createOffer("o2", "s2", mem, cpu) > backend.resourceOffers(driver, List(offer2).asJava) > verify(driver, times(1)).declineOffer(offer2.getId) > }{code} > > > Workaround: Don't set maxExecutors with dynamicAllocation on > > Please advice > Igor > marking you friends since you were last to touch this piece of code and > probably can advice something([~vanzin], [~skonto], [~susanxhuynh]) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23423) Application declines any offers when killed+active executors rich spark.dynamicAllocation.maxExecutors
[ https://issues.apache.org/jira/browse/SPARK-23423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Berman updated SPARK-23423: Attachment: no-dyn-allocation-failed-no-statusUpdate.txt > Application declines any offers when killed+active executors rich > spark.dynamicAllocation.maxExecutors > -- > > Key: SPARK-23423 > URL: https://issues.apache.org/jira/browse/SPARK-23423 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.2.1 >Reporter: Igor Berman >Priority: Major > Labels: Mesos, dynamic_allocation > Attachments: no-dyn-allocation-failed-no-statusUpdate.txt > > > Hi > Mesos Version:1.1.0 > I've noticed rather strange behavior of MesosCoarseGrainedSchedulerBackend > when running on Mesos with dynamic allocation on and limiting number of max > executors by spark.dynamicAllocation.maxExecutors. > Suppose we have long running driver that has cyclic pattern of resource > consumption(with some idle times in between), due to dyn.allocation it > receives offers and then releases them after current chunk of work processed. > Since at > [https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L573] > the backend compares numExecutors < executorLimit and > numExecutors is defined as slaves.values.map(_.taskIDs.size).sum and slaves > holds all slaves ever "met", i.e. both active and killed (see comment > [https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L122)] > > On the other hand, number of taskIds should be updated due to statusUpdate, > but suppose this update is lost(actually I don't see logs of 'is now > TASK_KILLED') so this number of executors might be wrong > > I've created test that "reproduces" this behavior, not sure how good it is: > {code:java} > //MesosCoarseGrainedSchedulerBackendSuite > test("max executors registered stops to accept offers when dynamic allocation > enabled") { > setBackend(Map( > "spark.dynamicAllocation.maxExecutors" -> "1", > "spark.dynamicAllocation.enabled" -> "true", > "spark.dynamicAllocation.testing" -> "true")) > backend.doRequestTotalExecutors(1) > val (mem, cpu) = (backend.executorMemory(sc), 4) > val offer1 = createOffer("o1", "s1", mem, cpu) > backend.resourceOffers(driver, List(offer1).asJava) > verifyTaskLaunched(driver, "o1") > backend.doKillExecutors(List("0")) > verify(driver, times(1)).killTask(createTaskId("0")) > val offer2 = createOffer("o2", "s2", mem, cpu) > backend.resourceOffers(driver, List(offer2).asJava) > verify(driver, times(1)).declineOffer(offer2.getId) > }{code} > > > Workaround: Don't set maxExecutors with dynamicAllocation on > > Please advice > Igor > marking you friends since you were last to touch this piece of code and > probably can advice something([~vanzin], [~skonto], [~susanxhuynh]) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-23402. --- Resolution: Cannot Reproduce > Dataset write method not working as expected for postgresql database > > > Key: SPARK-23402 > URL: https://issues.apache.org/jira/browse/SPARK-23402 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 > Environment: PostgreSQL: 9.5.8 (10 + Also same issue) > OS: Cent OS 7 & Windows 7,8 > JDBC: 9.4-1201-jdbc41 > > Spark: I executed in both 2.1.0 and 2.2.1 > Mode: Standalone > OS: Windows 7 >Reporter: Pallapothu Jyothi Swaroop >Priority: Major > Attachments: Emsku[1].jpg > > > I am using spark dataset write to insert data on postgresql existing table. > For this I am using write method mode as append mode. While using i am > getting exception like table already exists. But, I gave option as append > mode. > It's strange. When i change options to sqlserver/oracle append mode is > working as expected. > > *Database Properties:* > {{destinationProps.put("driver", "org.postgresql.Driver"); > destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); > destinationProps.put("user", "dbmig");}} > {{destinationProps.put("password", "dbmig");}} > > *Dataset Write Code:* > {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), > "dqvalue", destinationdbProperties);}} > > > {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: > relation "dqvalue" already exists at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) > at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at > org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at > org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at > org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at > com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) > at > com.ads.dqam.action.AbstractValueAnalysis.persistAnalysis(AbstractValueAnalysis.java:81) > at com.ads.dqam.Analysis.doAnalysis(Analysis.java:32) at > com.ads.dqam.Client.main(Client.java:71)}} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23053) taskBinarySerialization and task partitions calculate in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status
[ https://issues.apache.org/jira/browse/SPARK-23053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368560#comment-16368560 ] Apache Spark commented on SPARK-23053: -- User 'ivoson' has created a pull request for this issue: https://github.com/apache/spark/pull/20635 > taskBinarySerialization and task partitions calculate in > DagScheduler.submitMissingTasks should keep the same RDD checkpoint status > --- > > Key: SPARK-23053 > URL: https://issues.apache.org/jira/browse/SPARK-23053 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 2.1.0 >Reporter: huangtengfei >Assignee: huangtengfei >Priority: Major > Fix For: 2.2.2, 2.3.1, 2.4.0 > > > When we run concurrent jobs using the same rdd which is marked to do > checkpoint. If one job has finished running the job, and start the process of > RDD.doCheckpoint, while another job is submitted, then submitStage and > submitMissingTasks will be called. In > [submitMissingTasks|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L961], > will serialize taskBinaryBytes and calculate task partitions which are both > affected by the status of checkpoint, if the former is calculated before > doCheckpoint finished, while the latter is calculated after doCheckpoint > finished, when run task, rdd.compute will be called, for some rdds with > particular partition type such as > [MapWithStateRDD|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/rdd/MapWithStateRDD.scala] > who will do partition type cast, will get a ClassCastException because the > part params is actually a CheckpointRDDPartition. > This error occurs because rdd.doCheckpoint occurs in the same thread that > called sc.runJob, while the task serialization occurs in the DAGSchedulers > event loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23461) vignettes should include model predictions for some ML models
Felix Cheung created SPARK-23461: Summary: vignettes should include model predictions for some ML models Key: SPARK-23461 URL: https://issues.apache.org/jira/browse/SPARK-23461 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 2.2.1, 2.3.0 Reporter: Felix Cheung eg. Linear Support Vector Machine (SVM) Classifier h4. Logistic Regression Tree (and ALS was disabled) By doing something like {{head(select(gmmFitted, "V1", "V2", "prediction"))}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org