Hi Tapan I would suggest an architecture where you have different storage layer and data servng layer. Spark is still best for batch processing of data. So what i am suggesting here is you can have your data stored as it is in some hdfs raw layer , run your ELT in spark on this raw data and further store the processed/transformed data in some nosql db such as Cassandra to server the data to you that can handle large number of queries for you,
Thanks Deepak On Wed, May 4, 2016 at 6:59 AM, Tapan Upadhyay <tap...@gmail.com> wrote: > Hi, > > We are planning to move our adhoc queries from teradata to spark. We have > huge volume of queries during the day. What is best way to go about it - > > 1) Read data directly from teradata db using spark jdbc > > 2) Import data using sqoop by EOD jobs into hive tables stored as parquet > and then run queries on hive tables using spark sql or spark hive context. > > any other ways through which we can do it in a better/efficiently? > > Please guide. > > Regards, > Tapan > > -- Thanks Deepak www.bigdatabig.com www.keosha.net