[jira] [Commented] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364088#comment-16364088 ] Pallapothu Jyothi Swaroop commented on SPARK-23402: --- Ok I will try with master. I need to build in my local? Or Where I can get master branch maven dependencies? > Dataset write method not working as expected for postgresql database > > > Key: SPARK-23402 > URL: https://issues.apache.org/jira/browse/SPARK-23402 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 > Environment: PostgreSQL: 9.5.8 (10 + Also same issue) > OS: Cent OS 7 & Windows 7,8 > JDBC: 9.4-1201-jdbc41 > > Spark: I executed in both 2.1.0 and 2.2.1 > Mode: Standalone > OS: Windows 7 >Reporter: Pallapothu Jyothi Swaroop >Priority: Major > Attachments: Emsku[1].jpg > > > I am using spark dataset write to insert data on postgresql existing table. > For this I am using write method mode as append mode. While using i am > getting exception like table already exists. But, I gave option as append > mode. > It's strange. When i change options to sqlserver/oracle append mode is > working as expected. > > *Database Properties:* > {{destinationProps.put("driver", "org.postgresql.Driver"); > destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); > destinationProps.put("user", "dbmig");}} > {{destinationProps.put("password", "dbmig");}} > > *Dataset Write Code:* > {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), > "dqvalue", destinationdbProperties);}} > > > {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: > relation "dqvalue" already exists at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) > at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at > org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at > org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at > org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at > com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) > at > com.ads.dqam.action.AbstractValueAnalysis.persistAnalysis(AbstractValueAnalysis.java:81) > at com.ads.dqam.Analysis.doAnalysis(Analysis.java:32) at > com.ads.dqam.Client.main(Client.java:71)}} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363749#comment-16363749 ] Pallapothu Jyothi Swaroop commented on SPARK-23402: --- I am using 2.2.1 version in my project. So please check with 2.2.1 > Dataset write method not working as expected for postgresql database > > > Key: SPARK-23402 > URL: https://issues.apache.org/jira/browse/SPARK-23402 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 > Environment: PostgreSQL: 9.5.8 (10 + Also same issue) > OS: Cent OS 7 & Windows 7,8 > JDBC: 9.4-1201-jdbc41 > > Spark: I executed in both 2.1.0 and 2.2.1 > Mode: Standalone > OS: Windows 7 >Reporter: Pallapothu Jyothi Swaroop >Priority: Major > Attachments: Emsku[1].jpg > > > I am using spark dataset write to insert data on postgresql existing table. > For this I am using write method mode as append mode. While using i am > getting exception like table already exists. But, I gave option as append > mode. > It's strange. When i change options to sqlserver/oracle append mode is > working as expected. > > *Database Properties:* > {{destinationProps.put("driver", "org.postgresql.Driver"); > destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); > destinationProps.put("user", "dbmig");}} > {{destinationProps.put("password", "dbmig");}} > > *Dataset Write Code:* > {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), > "dqvalue", destinationdbProperties);}} > > > {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: > relation "dqvalue" already exists at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) > at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at > org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at > org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at > org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at > com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) > at > com.ads.dqam.action.AbstractValueAnalysis.persistAnalysis(AbstractValueAnalysis.java:81) > at com.ads.dqam.Analysis.doAnalysis(Analysis.java:32) at > com.ads.dqam.Client.main(Client.java:71)}} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363726#comment-16363726 ] Pallapothu Jyothi Swaroop commented on SPARK-23402: --- [~mgaido] please confirm, is table already existed in database? I am getting issue only for tables that are already existed in schema. I tried with Postgres 10, driver 42.2.1 in windows 8. No success. > Dataset write method not working as expected for postgresql database > > > Key: SPARK-23402 > URL: https://issues.apache.org/jira/browse/SPARK-23402 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 > Environment: PostgreSQL: 9.5.8 (10 + Also same issue) > OS: Cent OS 7 & Windows 7,8 > JDBC: 9.4-1201-jdbc41 > > Spark: I executed in both 2.1.0 and 2.2.1 > Mode: Standalone > OS: Windows 7 >Reporter: Pallapothu Jyothi Swaroop >Priority: Major > Attachments: Emsku[1].jpg > > > I am using spark dataset write to insert data on postgresql existing table. > For this I am using write method mode as append mode. While using i am > getting exception like table already exists. But, I gave option as append > mode. > It's strange. When i change options to sqlserver/oracle append mode is > working as expected. > > *Database Properties:* > {{destinationProps.put("driver", "org.postgresql.Driver"); > destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); > destinationProps.put("user", "dbmig");}} > {{destinationProps.put("password", "dbmig");}} > > *Dataset Write Code:* > {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), > "dqvalue", destinationdbProperties);}} > > > {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: > relation "dqvalue" already exists at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) > at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at > org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at > org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at > org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at > com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) > at > com.ads.dqam.action.AbstractValueAnalysis.persistAnalysis(AbstractValueAnalysis.java:81) > at com.ads.dqam.Analysis.doAnalysis(Analysis.java:32) at > com.ads.dqam.Client.main(Client.java:71)}} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional
[jira] [Comment Edited] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363555#comment-16363555 ] Pallapothu Jyothi Swaroop edited comment on SPARK-23402 at 2/14/18 6:48 AM: [~kevinyu98] Thanks for checking again. I tested with 9.5.4 Append mode is working with out exception. I analyzed some thing that may use full for you. Can you check above scala file. https://github.com/apache/spark/blob/v2.2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala Below is the statement for checking table exists or not. In this statement it is failing. val tableExists = JdbcUtils.tableExists(conn, options) But i am not sure. Why it is failing. I executed sql for table exists command taken from the postgres dialect. It is executed successfully in database. was (Author: swaroopp): Thanks for checking again. I tested with 9.5.4 Append mode is working with out exception. I analyzed some thing that may use full for you. Can you check above scala file. https://github.com/apache/spark/blob/v2.2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala Below is the statement for checking table exists or not. In this statement it is failing. val tableExists = JdbcUtils.tableExists(conn, options) But i am not sure. Why it is failing. I executed sql for table exists command taken from the postgres dialect. It is executed successfully in database. > Dataset write method not working as expected for postgresql database > > > Key: SPARK-23402 > URL: https://issues.apache.org/jira/browse/SPARK-23402 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 > Environment: PostgreSQL: 9.5.8 (10 + Also same issue) > OS: Cent OS 7 & Windows 7,8 > JDBC: 9.4-1201-jdbc41 > > Spark: I executed in both 2.1.0 and 2.2.1 > Mode: Standalone > OS: Windows 7 >Reporter: Pallapothu Jyothi Swaroop >Priority: Major > Attachments: Emsku[1].jpg > > > I am using spark dataset write to insert data on postgresql existing table. > For this I am using write method mode as append mode. While using i am > getting exception like table already exists. But, I gave option as append > mode. > It's strange. When i change options to sqlserver/oracle append mode is > working as expected. > > *Database Properties:* > {{destinationProps.put("driver", "org.postgresql.Driver"); > destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); > destinationProps.put("user", "dbmig");}} > {{destinationProps.put("password", "dbmig");}} > > *Dataset Write Code:* > {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), > "dqvalue", destinationdbProperties);}} > > > {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: > relation "dqvalue" already exists at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) > at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at > org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at > org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at >
[jira] [Commented] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363555#comment-16363555 ] Pallapothu Jyothi Swaroop commented on SPARK-23402: --- Thanks for checking again. I tested with 9.5.4 Append mode is working with out exception. I analyzed some thing that may use full for you. Can you check above scala file. https://github.com/apache/spark/blob/v2.2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala Below is the statement for checking table exists or not. In this statement it is failing. val tableExists = JdbcUtils.tableExists(conn, options) But i am not sure. Why it is failing. I executed sql for table exists command taken from the postgres dialect. It is executed successfully in database. > Dataset write method not working as expected for postgresql database > > > Key: SPARK-23402 > URL: https://issues.apache.org/jira/browse/SPARK-23402 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 > Environment: PostgreSQL: 9.5.8 (10 + Also same issue) > OS: Cent OS 7 & Windows 7,8 > JDBC: 9.4-1201-jdbc41 > > Spark: I executed in both 2.1.0 and 2.2.1 > Mode: Standalone > OS: Windows 7 >Reporter: Pallapothu Jyothi Swaroop >Priority: Major > Attachments: Emsku[1].jpg > > > I am using spark dataset write to insert data on postgresql existing table. > For this I am using write method mode as append mode. While using i am > getting exception like table already exists. But, I gave option as append > mode. > It's strange. When i change options to sqlserver/oracle append mode is > working as expected. > > *Database Properties:* > {{destinationProps.put("driver", "org.postgresql.Driver"); > destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); > destinationProps.put("user", "dbmig");}} > {{destinationProps.put("password", "dbmig");}} > > *Dataset Write Code:* > {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), > "dqvalue", destinationdbProperties);}} > > > {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: > relation "dqvalue" already exists at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) > at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at > org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at > org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at > org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at > com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) > at >
[jira] [Commented] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363512#comment-16363512 ] Pallapothu Jyothi Swaroop commented on SPARK-23402: --- [~kevinyu98] Did you create table before execute above instructions? It will throw exception only when table already exists in database. Please run above statements again you will get exception and let me know issue replicated or not. > Dataset write method not working as expected for postgresql database > > > Key: SPARK-23402 > URL: https://issues.apache.org/jira/browse/SPARK-23402 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 > Environment: PostgreSQL: 9.5.8 (10 + Also same issue) > OS: Cent OS 7 & Windows 7,8 > JDBC: 9.4-1201-jdbc41 > > Spark: I executed in both 2.1.0 and 2.2.1 > Mode: Standalone > OS: Windows 7 >Reporter: Pallapothu Jyothi Swaroop >Priority: Major > Attachments: Emsku[1].jpg > > > I am using spark dataset write to insert data on postgresql existing table. > For this I am using write method mode as append mode. While using i am > getting exception like table already exists. But, I gave option as append > mode. > It's strange. When i change options to sqlserver/oracle append mode is > working as expected. > > *Database Properties:* > {{destinationProps.put("driver", "org.postgresql.Driver"); > destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); > destinationProps.put("user", "dbmig");}} > {{destinationProps.put("password", "dbmig");}} > > *Dataset Write Code:* > {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), > "dqvalue", destinationdbProperties);}} > > > {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: > relation "dqvalue" already exists at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) > at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at > org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at > org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at > org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at > com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) > at > com.ads.dqam.action.AbstractValueAnalysis.persistAnalysis(AbstractValueAnalysis.java:81) > at com.ads.dqam.Analysis.doAnalysis(Analysis.java:32) at > com.ads.dqam.Client.main(Client.java:71)}} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Updated] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pallapothu Jyothi Swaroop updated SPARK-23402: -- Description: I am using spark dataset write to insert data on postgresql existing table. For this I am using write method mode as append mode. While using i am getting exception like table already exists. But, I gave option as append mode. It's strange. When i change options to sqlserver/oracle append mode is working as expected. *Database Properties:* {{destinationProps.put("driver", "org.postgresql.Driver"); destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); destinationProps.put("user", "dbmig");}} {{destinationProps.put("password", "dbmig");}} *Dataset Write Code:* {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), "dqvalue", destinationdbProperties);}} {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: relation "dqvalue" already exists at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) at com.ads.dqam.action.AbstractValueAnalysis.persistAnalysis(AbstractValueAnalysis.java:81) at com.ads.dqam.Analysis.doAnalysis(Analysis.java:32) at com.ads.dqam.Client.main(Client.java:71)}} was: I am using spark dataset write to insert data on postgresql existing table. For this I am using write method mode as append mode. While using i am getting exception like table already exists. But, I gave option as append mode. It's strange. When i change options to sqlserver/oracle append mode is working as expected. *Database Properties:* {{destinationProps.put("driver", "org.postgresql.Driver"); destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); destinationProps.put("user", "dbmig"); destinationProps.put("password", "dbmig");}} *Dataset Write Code:* {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), "dqvalue", destinationdbProperties);}} {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: relation "dqvalue" already exists at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at
[jira] [Updated] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pallapothu Jyothi Swaroop updated SPARK-23402: -- Description: I am using spark dataset write to insert data on postgresql existing table. For this I am using write method mode as append mode. While using i am getting exception like table already exists. But, I gave option as append mode. It's strange. When i change options to sqlserver/oracle append mode is working as expected. *Database Properties:* {{destinationProps.put("driver", "org.postgresql.Driver"); destinationProps.put("url", "jdbc:postgresql://127.0.0.1:30001/dbmig"); destinationProps.put("user", "dbmig"); destinationProps.put("password", "dbmig");}} *Dataset Write Code:* {{valueAnalysisDataset.write().mode(SaveMode.Append).jdbc(destinationDbMap.get("url"), "dqvalue", destinationdbProperties);}} {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: relation "dqvalue" already exists at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) at com.ads.dqam.action.AbstractValueAnalysis.persistAnalysis(AbstractValueAnalysis.java:81) at com.ads.dqam.Analysis.doAnalysis(Analysis.java:32) at com.ads.dqam.Client.main(Client.java:71)}} was: I am using spark dataset write to insert data on postgresql existing table. For this I am using write method mode as append mode. While using i am getting exception like table already exists. But, I gave option as append mode. It's strange. When i change options to sqlserver/oracle append mode is working as expected. {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: relation "dqvalue" already exists at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) at
[jira] [Updated] (SPARK-23402) Dataset write method not working as expected for postgresql database
[ https://issues.apache.org/jira/browse/SPARK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pallapothu Jyothi Swaroop updated SPARK-23402: -- Attachment: Emsku[1].jpg > Dataset write method not working as expected for postgresql database > > > Key: SPARK-23402 > URL: https://issues.apache.org/jira/browse/SPARK-23402 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 > Environment: PostgreSQL: 9.5.8 (10 + Also same issue) > OS: Cent OS 7 & Windows 7,8 > JDBC: 9.4-1201-jdbc41 > > Spark: I executed in both 2.1.0 and 2.2.1 > Mode: Standalone > OS: Windows 7 >Reporter: Pallapothu Jyothi Swaroop >Priority: Major > Attachments: Emsku[1].jpg > > > I am using spark dataset write to insert data on postgresql existing table. > For this I am using write method mode as append mode. While using i am > getting exception like table already exists. But, I gave option as append > mode. > It's strange. When i change options to sqlserver/oracle append mode is > working as expected. > > {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: > relation "dqvalue" already exists at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) > at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at > org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at > org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at > org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at > org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at > com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) > at > com.ads.dqam.action.AbstractValueAnalysis.persistAnalysis(AbstractValueAnalysis.java:81) > at com.ads.dqam.Analysis.doAnalysis(Analysis.java:32) at > com.ads.dqam.Client.main(Client.java:71)}} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23402) Dataset write method not working as expected for postgresql database
Pallapothu Jyothi Swaroop created SPARK-23402: - Summary: Dataset write method not working as expected for postgresql database Key: SPARK-23402 URL: https://issues.apache.org/jira/browse/SPARK-23402 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 2.2.1 Environment: PostgreSQL: 9.5.8 (10 + Also same issue) OS: Cent OS 7 & Windows 7,8 JDBC: 9.4-1201-jdbc41 Spark: I executed in both 2.1.0 and 2.2.1 Mode: Standalone OS: Windows 7 Reporter: Pallapothu Jyothi Swaroop I am using spark dataset write to insert data on postgresql existing table. For this I am using write method mode as append mode. While using i am getting exception like table already exists. But, I gave option as append mode. It's strange. When i change options to sqlserver/oracle append mode is working as expected. {{Exception in thread "main" org.postgresql.util.PSQLException: ERROR: relation "dqvalue" already exists at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:297) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301) at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264) at org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:244) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:806) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:460) at com.ads.dqam.action.impl.PostgresValueAnalysis.persistValueAnalysis(PostgresValueAnalysis.java:25) at com.ads.dqam.action.AbstractValueAnalysis.persistAnalysis(AbstractValueAnalysis.java:81) at com.ads.dqam.Analysis.doAnalysis(Analysis.java:32) at com.ads.dqam.Client.main(Client.java:71)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16567) how to increase performance of rdbms dataframe.
[ https://issues.apache.org/jira/browse/SPARK-16567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379105#comment-15379105 ] Pallapothu Jyothi Swaroop edited comment on SPARK-16567 at 7/15/16 9:30 AM: Thanks what is user@ how i open issue in user@ was (Author: swaroopp): what is user@ how i open issue in user@ > how to increase performance of rdbms dataframe. > --- > > Key: SPARK-16567 > URL: https://issues.apache.org/jira/browse/SPARK-16567 > Project: Spark > Issue Type: Question >Reporter: Pallapothu Jyothi Swaroop >Priority: Critical > > Hello, > how to increase performance of rdbms dataframe. > I need to perform group by on fetched data. > I performed like this. > DataFrame jdbcDF = > this.SQLCONTEXT.read().format("jdbc").options(options).load(); > Options is map contains db configuration > DataFrame groupedDataFrame = > jdbcDF.groupBy("UNQ_STR").count(); > How i tune this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16567) how to increase performance of rdbms dataframe.
[ https://issues.apache.org/jira/browse/SPARK-16567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379105#comment-15379105 ] Pallapothu Jyothi Swaroop commented on SPARK-16567: --- what is user@ how i open issue in user@ > how to increase performance of rdbms dataframe. > --- > > Key: SPARK-16567 > URL: https://issues.apache.org/jira/browse/SPARK-16567 > Project: Spark > Issue Type: Question >Reporter: Pallapothu Jyothi Swaroop >Priority: Critical > > Hello, > how to increase performance of rdbms dataframe. > I need to perform group by on fetched data. > I performed like this. > DataFrame jdbcDF = > this.SQLCONTEXT.read().format("jdbc").options(options).load(); > Options is map contains db configuration > DataFrame groupedDataFrame = > jdbcDF.groupBy("UNQ_STR").count(); > How i tune this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16567) how to increase performance of rdbms dataframe.
[ https://issues.apache.org/jira/browse/SPARK-16567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pallapothu Jyothi Swaroop updated SPARK-16567: -- Summary: how to increase performance of rdbms dataframe. (was: How to palatalize RDBMS dataframe and perform group by.) > how to increase performance of rdbms dataframe. > --- > > Key: SPARK-16567 > URL: https://issues.apache.org/jira/browse/SPARK-16567 > Project: Spark > Issue Type: Question >Reporter: Pallapothu Jyothi Swaroop >Priority: Critical > > Hello, > how to increase performance of rdbms dataframe. > I need to perform group by on fetched data. > I performed like this. > DataFrame jdbcDF = > this.SQLCONTEXT.read().format("jdbc").options(options).load(); > Options is map contains db configuration > DataFrame groupedDataFrame = > jdbcDF.groupBy("UNQ_STR").count(); > How i tune this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16567) How to palatalize RDBMS dataframe and perform group by.
Pallapothu Jyothi Swaroop created SPARK-16567: - Summary: How to palatalize RDBMS dataframe and perform group by. Key: SPARK-16567 URL: https://issues.apache.org/jira/browse/SPARK-16567 Project: Spark Issue Type: Question Reporter: Pallapothu Jyothi Swaroop Priority: Critical Hello, how to increase performance of rdbms dataframe. I need to perform group by on fetched data. I performed like this. DataFrame jdbcDF = this.SQLCONTEXT.read().format("jdbc").options(options).load(); Options is map contains db configuration DataFrame groupedDataFrame = jdbcDF.groupBy("UNQ_STR").count(); How i tune this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16565) Implementation for processing 50-70 GB data using java......
Pallapothu Jyothi Swaroop created SPARK-16565: - Summary: Implementation for processing 50-70 GB data using java.. Key: SPARK-16565 URL: https://issues.apache.org/jira/browse/SPARK-16565 Project: Spark Issue Type: Question Environment: For Development we are using i3 4core Processor, windows 7 OS, 8GB RAM. For Production we have cluster with 4 Nodes, i5 4Core Processor, cent os, 16 GB RAM. Reporter: Pallapothu Jyothi Swaroop Hello, I need the implementation and configuration steps for implementing following requirement I need to do analysis for columns on rdbms tables. Steps: Step - 1: Load required column from table, existed in rdbms(Oracle). Step - 2: Group by those data. Step - 3: Do analysis on group by data using udfs. Step - 4: Persist analyzed data to hive or mongodb(Please give suggestion for choosing this). I followed following steps but those have performance issues. 1. Loaded column data form rdbms to Dataframe. DataFrame jdbcDF = this.SQLCONTEXT.read().format("jdbc").options(options).load(); Options is map contains db configuration 2. Grouped that data DataFrame groupedDataFrame = jdbcDF.groupBy("UNQ_STR").count(); 3. Performing required analysis on data using udfs like length(i am running 7UDFS), which returns another dataframe. I used spark sql for applying udfs 4. Saving step 3 dataframe to hive or mongodb. For Hihe: used hivecontest.sql("insert .."); For MongoDb: Used MongoSpark Api Memory measurements: Column A has 50 GB(1Billing rows) data before analysis. After analysis completed, it may extend by 120 GB - 200 GB based on unique values. Performance measurements for single node with different components as follows: 1. RDBMS to spark to hive -- 50Hrs 2. RDBMS to spark to mongodb -- 35Hrs 3. Sqoop to hive to spark to mongodb -- 30Hrs How i increase the performance in cluster based on above requirement. Please provide steps for implementation. I need to process those data in 1Hrs. I am doing this requirement from last 45 days. i cant increase the performance please help. For testing i am using 4node cluster. thanks and kind regards. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org