Hi Mahender,

Did you look at this? https://www.snappydata.io/blog/the-spark-database

But I believe that most people handle this use case by either using:
- Their favorite regular RDBMS (mySQL, postgres, Oracle, SQL-Server, ...)
if the data is not too big
- Their favorite New-SQL storage (Cassandra, HBase) if the data is too big
and needs to be distributed

Spark generally makes it easy enough to query these other databases to
allow you to perform analytics.

Hive and Spark have been designed as OLAP tools, not OLTP.
I'm not sure what features you are seeking for your SCD but they probably
won't be part of Spark's core design.

Hope this helps,

Furcy



On 4 April 2018 at 11:29, Mahender Sarangam <mahender.bigd...@outlook.com>
wrote:

> Hi,
> Does anyone has good architecture document/design principle for building
> warehouse application using Spark.
>
> Is it better way of having Hive Context created with HQL and perform
> transformation or Directly loading  files in dataframe and perform data
> transformation.
>
> We need to implement SCD 2 Type in Spark, Is there any better
> document/reference for building Type 2 warehouse object
>
> Thanks in advace
>
> /Mahender
>

Reply via email to