what defines dataset partition number in spark sql

崔苗 Fri, 25 May 2018 23:24:35 -0700

Hi,
I want to know when I create a dataset by reading files in hdfs in spark sql,
like : Dataset<Row> user = spark.read().format("json").load(filePath) , what 
defines the partition number of the dataset?
And what if the filePath is a directory instead of a singe file ?
Why we can't get the partitions number of dataset by 
dataset.getNumPartitions()? why we must change the dataset to rdd to get 
partition number: dataset.rdd().getNumPartitions() ?



Thanks

what defines dataset partition number in spark sql

Reply via email to