Hi,
I want to know when I create a dataset by reading files in hdfs in spark sql,
like : Dataset<Row> user = spark.read().format("json").load(filePath) , what 
defines the partition number of the dataset?
And what if the filePath is a directory instead of a singe file ?
Why we can't get the partitions number of dataset by 
dataset.getNumPartitions()? why we must change the dataset to rdd to get 
partition number: dataset.rdd().getNumPartitions() ?


Thanks





Reply via email to