Hi Which of the following is better approach for too many values in database
final Dataset<Row> dataset = spark.sqlContext().read() .format("jdbc") .option("url", params.getJdbcUrl()) .option("driver", params.getDriver()) .option("dbtable", params.getSqlQuery()) // .option("partitionColumn", hashFunction) // .option("lowerBound", 0) // .option("upperBound", 10) // .option("numPartitions", 10) // .option("oracle.jdbc.timezoneAsRegion", "false") .option("fetchSize", 100000) .load(); dataset.write().parquet(params.getPath()); // target is to get count of persisted rows. // approach 1 i.e getting count directly from dataset // as I understood this count will be transalted to jdbcRdd.count and could be on database long count = dataset.count(); //approach 2 i.e read back saved parquet and get count from it. long count = spark.read().parquet(params.getPath()).count(); Regards Rohit