Using JDBC drivers much like accessing Oracle data, one can utilise the
power of Spark on Teradata via JDBC drivers.

I have seen connections in some articles which indicates this process is
pretty mature.

My question is if anyone has done this work and how is performance in Spark
vis-a-vis running the same code on Teradata itself. For example in Oracle
one can force parallel processing by using numPartitions

val s = HiveContext.read.format("jdbc").options(
       Map("url" -> _ORACLEserver,
       "dbtable" -> "(SELECT ID FROM scratchpad.dummy4)",
       "partitionColumn" -> "ID",
       "lowerBound" -> minID,
       "upperBound" -> maxID,
       "numPartitions" -> "5",
       "user" -> _username,
       "password" -> _password)).load

As both Oracle & Teradata are data warehouses, this may work. The intention
is to read from Teradata initially as tactical and use Hadoop/Hive/Spark as
strategic.

Obviously the underlying tables reading from Hive compared to Teradata will
be different. However the SQL to fetch, slice and dice data will be similar.

Let me know your thoughts

Thanks



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to