I wanted to understand if there is any other advantage besides api syntax
when using hive/table api vs. dataset api in spark sql(v2.0)?
Any additional optimizations maybe?
I'm most interested in parquet partitioned tables stored on s3. Is there any
difference if I'm comfortable with dataset api too?

In general our usecase is to stream data into s3 data partitioned by some
business keys(3 levels of nesting)
In addition do hive api somehow helps with "small files" problem?(I'm aware
of coalesce)

Thanks in advance

View this message in context: 
Sent from the Apache Spark User List mailing list archive at Nabble.com.

To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to