Hi Nithya, I dont have a specific idea of connecting Spark / Spark Streaming to VoltDB. In general there are two ways you can push data from Spark / Spark Streaming to any external system.
1. Using the Hadoop Filesystem interface: Both Spark and Spark Streaming can write to any Hadoop-compatible file system using the Hadoop's OutputFormat classes. If there is a VoltDB-to-Hadoop connector out there, that may have the necessary OutputFormat / InputFormat classes to write to / read from VoltDB from Spark. Take a look at Spark documentation<http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html>on saving any RDD (the data abstraction in Spark) to a file using yourRDD.saveAsHadoopFile(....). This example shows how custom Hadoop format is used to write to Cassandra. For VoltDB it would be similar. From the point of view of Spark Streaming, you can use the same functionality in DStreams<http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html#dstreams>(the abstraction in Spark Streaming), by using yourDStream.saveAsHadoopFiles(...). 2. Even if you dont have (1), no worries. As long as you have a Java way for pushing data into VoltDB, you can do something like the following. - DStream.foreachRDD() is a function that allows you to do something for each RDD in the DStream (DStream is like a infinite sequence of RDDs, representing a stream of data). (Note that it is foreach() in pre Spark 0.9.) - RDD.foreach() are function thats allows you to execute something for each record in an RDD. You can use both of these together to push every tuple in a stream to voltDB, like this (Scala code) yourDStreamWithVoltDBData.foreach(youRdd => { //for every RDD this will be called yourRdd => youRdd.foreach(yourRecord => { // do what you have to do for every record in the RDD // write to voltDB with any java library } } Hope this helps. TD On Sun, Feb 9, 2014 at 9:22 PM, Nithya <nithya.narasim...@hp.com> wrote: > Hi, > > I am very new to Spark.I want to persist the information collected over a > window (say 5 seconds) through spark streaming in an in-memory datastore > (voltdb).what is the best way to connect to this datastore from spark > streaming? > > Thanks in advance. > > Nithya > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Connecting-to-an-inmemory-database-from-Spark-tp1343.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
