Re: what is the best way to transfer data from RDBMS to spark?
Actually, Spark SQL provides a data source. Here is from documentation - JDBC To Other Databases Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.rdd.JdbcRDD. This is because the results are returned as a DataFrame and they can easily be processed in Spark SQL or joined with other data sources. The JDBC data source is also easier to use from Java or Python as it does not require the user to provide a ClassTag. (Note that this is different than the Spark SQL JDBC server, which allows other applications to run queries using Spark SQL). On Fri, Apr 24, 2015 at 6:27 PM, ayan guha guha.a...@gmail.com wrote: What is the specific usecase? I can think of couple of ways (write to hdfs and then read from spark or stream data to spark). Also I have seen people using mysql jars to bring data in. Essentially you want to simulate creation of rdd. On 24 Apr 2015 18:15, sequoiadb mailing-list-r...@sequoiadb.com wrote: If I run spark in stand-alone mode ( not YARN mode ), is there any tool like Sqoop that able to transfer data from RDBMS to spark storage? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards, Ayan Guha
Re: what is the best way to transfer data from RDBMS to spark?
If your use case is more to do with querying RDBMS and then bringing the results to spark do some analysis then Spark SQL JDBC datasource API http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/ is the best. If your use case is to bring entire data to spark, then you'll have to explore other options which depends on the datatype. For e.g. Spark RedShift integration http://spark-packages.org/package/databricks/spark-redshift Best Regards, Sujeevan. N On Sat, Apr 25, 2015 at 4:24 PM, ayan guha guha.a...@gmail.com wrote: Actually, Spark SQL provides a data source. Here is from documentation - JDBC To Other Databases Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.rdd.JdbcRDD. This is because the results are returned as a DataFrame and they can easily be processed in Spark SQL or joined with other data sources. The JDBC data source is also easier to use from Java or Python as it does not require the user to provide a ClassTag. (Note that this is different than the Spark SQL JDBC server, which allows other applications to run queries using Spark SQL). On Fri, Apr 24, 2015 at 6:27 PM, ayan guha guha.a...@gmail.com wrote: What is the specific usecase? I can think of couple of ways (write to hdfs and then read from spark or stream data to spark). Also I have seen people using mysql jars to bring data in. Essentially you want to simulate creation of rdd. On 24 Apr 2015 18:15, sequoiadb mailing-list-r...@sequoiadb.com wrote: If I run spark in stand-alone mode ( not YARN mode ), is there any tool like Sqoop that able to transfer data from RDBMS to spark storage? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards, Ayan Guha
what is the best way to transfer data from RDBMS to spark?
If I run spark in stand-alone mode ( not YARN mode ), is there any tool like Sqoop that able to transfer data from RDBMS to spark storage? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: what is the best way to transfer data from RDBMS to spark?
What is the specific usecase? I can think of couple of ways (write to hdfs and then read from spark or stream data to spark). Also I have seen people using mysql jars to bring data in. Essentially you want to simulate creation of rdd. On 24 Apr 2015 18:15, sequoiadb mailing-list-r...@sequoiadb.com wrote: If I run spark in stand-alone mode ( not YARN mode ), is there any tool like Sqoop that able to transfer data from RDBMS to spark storage? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org