Hi, I wanted to understand if there is any other advantage besides api syntax when using hive/table api vs. dataset api in spark sql(v2.0)? Any additional optimizations maybe? I'm most interested in parquet partitioned tables stored on s3. Is there any difference if I'm comfortable with dataset api too?
In general our usecase is to stream data into s3 data partitioned by some business keys(3 levels of nesting) In addition do hive api somehow helps with "small files" problem?(I'm aware of coalesce) Thanks in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hive-api-vs-Dataset-api-tp27741.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org