Pls be aware that Accumulators involve communication back with the driver and may not be efficient. I think OP wants some way to extract the stats from the sql plan if it is being stored in some internal data structure
Regards Sab On 5 Nov 2016 9:42 p.m., "Deepak Sharma" <deepakmc...@gmail.com> wrote: > Hi Rohit > You can use accumulators and increase it on every record processing. > At last you can get the value of accumulator on driver , which will give > you the count. > > HTH > Deepak > > On Nov 5, 2016 20:09, "Rohit Verma" <rohit.ve...@rokittech.com> wrote: > >> I am using spark to read from database and write in hdfs as parquet file. >> Here is code snippet. >> >> private long etlFunction(SparkSession spark){ >> spark.sqlContext().setConf("spark.sql.parquet.compression.codec", >> “SNAPPY"); >> Properties properties = new Properties(); >> properties.put("driver”,”oracle.jdbc.driver"); >> properties.put("fetchSize”,”5000"); >> Dataset<Row> dataset = spark.read().jdbc(jdbcUrl, query, properties); >> dataset.write.format(“parquet”).save(“pdfs-path”); >> return dataset.count(); >> } >> >> When I look at spark ui, during write I have stats of records written, >> visible in sql tab under query plan. >> >> While the count itself is a heavy task. >> >> Can someone suggest best way to get count in most optimized way. >> >> Thanks all.. >> >