How large are they? A lot of (small) files will cause significant delay in progressing - try to merge as much as possible into one file.
Can you please share full source code in Hive and Spark as well as the versions you are using? > Am 31.10.2018 um 18:23 schrieb gpatcham <gpatc...@gmail.com>: > > > > When reading large number of orc files from HDFS under a directory spark > doesn't launch any tasks until some amount of time and I don't see any tasks > running during that time. I'm using below command to read orc and spark.sql > configs. > > What spark is doing under hoods when spark.read.orc is issued? > > spark.read.schema(schame1).orc("hdfs://test1").filter("date >= 20181001") > "spark.sql.orc.enabled": "true", > "spark.sql.orc.filterPushdown": "true > > Also instead of directly reading orc files I tried running Hive query on > same dataset. But I was not able to push filter predicate. Where should I > set the below config's "hive.optimize.ppd":"true", > "hive.optimize.ppd.storage":"true" > > Suggest what is the best way to read orc files from HDFS and tuning > parameters ? > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org