Hi,

I just posted some stuff regarding using Spark with Oracle, If you want to
do distributed processing like any DW of your choice be Oracle , Hive or
BigQuery, best in my experience to create Spark dataframes on top of the
underlying storage.either through JDBC or Spark API (Hive or BigQuery).

Your mileage varies as usual.

HTH




   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 23 Mar 2021 at 15:52, Harish Butani <rhbutani.sp...@gmail.com>
wrote:

> I have been developing 'Spark on Oracle', a project to provide better
> integration of Spark into an Oracle Data Warehouse. You can read about it
> at
> https://hbutani.github.io/spark-on-oracle/blog/Spark_on_Oracle_Blog.html
>
> The key features are Catalog Integration, translation and pushdown of
> Spark SQL to Oracle SQL/PL-SQL, Language Integration and Runtime
> Integration.
>
> These are provided as Spark extensions via a Catalog Plugin, v2
> DataSource, Logical and Physical Planner Rules, Parser Extension, automatic
> Function Registration and Spark SQL Macros(a generic Spark capability we
> have developed).
>
> The vision is to enable Oracle customers to deploy Spark Applications that
> take full advantage of the data and capabilities of their Oracle Data
> Warehouse; and also make Spark cluster operations simpler and unified with
> their existing Oracle warehouse operations.
>
> Looking for suggestion, comments from the Spark community.
>
> regards,
> Harish Butani.
>

Reply via email to