Paul Wu created SPARK-7804: ------------------------------ Summary: Incorrect results from JDBCRDD -- one record repeatly and incorrect field value Key: SPARK-7804 URL: https://issues.apache.org/jira/browse/SPARK-7804 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1, 1.3.0 Reporter: Paul Wu
Getting only one record repeated in the RDD and repeated field value: I have a table like: attuid name email 12 john j...@appp.com 23 tom t...@appp.com 34 tony t...@appp.com My code: JavaSparkContext sc = new JavaSparkContext(sparkConf); String url = "...."; java.util.Properties prop = new Properties(); List<JDBCPartition> partitionList = new ArrayList<>(); //int i; partitionList.add(new JDBCPartition("1=1", 0)); List<StructField> fields = new ArrayList<StructField>(); fields.add(DataTypes.createStructField("attuid", DataTypes.StringType, true)); fields.add(DataTypes.createStructField("name", DataTypes.StringType, true)); fields.add(DataTypes.createStructField("email", DataTypes.StringType, true)); StructType schema = DataTypes.createStructType(fields); JDBCRDD jdbcRDD = new JDBCRDD(sc.sc(), JDBCRDD.getConnector("oracle.jdbc.OracleDriver", url, prop), schema, " USERS", new String[]{"attuid", "name", "email"}, new Filter[]{ }, partitionList.toArray(new JDBCPartition[0]) ); System.out.println("count before to Java RDD=" + jdbcRDD.cache().count()); JavaRDD<Row> jrdd = jdbcRDD.toJavaRDD(); System.out.println("count=" + jrdd.count()); List<Row> lr = jrdd.collect(); for (Row r : lr) { for (int ii = 0; ii < r.length(); ii++) { System.out.println(r.getString(ii)); } } =========================== result is : 34 34 t...@appp.com 34 34 t...@appp.com 34 34 t...@appp.com -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org