Hello,
I am trying to combine several small text files (each file is approx
hundreds of MBs to 2-3 gigs) into one big parquet file.
I am loading each one of them and trying to take a union, however this
leads to enormous amounts of partitions, as union keeps on adding the
partitions of the input
Hello everyone,
Generally speaking, I guess it's well known that dataframes are much faster
than RDD when it comes to performance.
My question is how do you go around when it comes to transforming a
dataframe using map.
I mean then the dataframe gets converted into RDD, hence now do you again
conv