Hi Tapan
I would suggest an architecture where you have different storage layer and
data servng layer.
Spark is still best for batch processing of data.
So what i am suggesting here is you can have your data stored as it is in
some hdfs raw layer , run your ELT in spark on this raw data and further
store the processed/transformed data in some nosql db such as Cassandra to
server the data to you that can handle large number of queries for you,

Thanks
Deepak

On Wed, May 4, 2016 at 6:59 AM, Tapan Upadhyay <tap...@gmail.com> wrote:

> Hi,
>
> We are planning to move our adhoc queries from teradata to spark. We have
> huge volume of queries during the day. What is best way to go about it -
>
> 1) Read data directly from teradata db using spark jdbc
>
> 2) Import data using sqoop by EOD jobs into hive tables stored as parquet
> and then run queries on hive tables using spark sql or spark hive context.
>
> any other ways through which we can do it in a better/efficiently?
>
> Please guide.
>
> Regards,
> Tapan
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Reply via email to