Using JDBC drivers much like accessing Oracle data, one can utilise the power of Spark on Teradata via JDBC drivers.
I have seen connections in some articles which indicates this process is pretty mature. My question is if anyone has done this work and how is performance in Spark vis-a-vis running the same code on Teradata itself. For example in Oracle one can force parallel processing by using numPartitions val s = HiveContext.read.format("jdbc").options( Map("url" -> _ORACLEserver, "dbtable" -> "(SELECT ID FROM scratchpad.dummy4)", "partitionColumn" -> "ID", "lowerBound" -> minID, "upperBound" -> maxID, "numPartitions" -> "5", "user" -> _username, "password" -> _password)).load As both Oracle & Teradata are data warehouses, this may work. The intention is to read from Teradata initially as tactical and use Hadoop/Hive/Spark as strategic. Obviously the underlying tables reading from Hive compared to Teradata will be different. However the SQL to fetch, slice and dice data will be similar. Let me know your thoughts Thanks LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.