Hi Nithya,

I dont have a specific idea of connecting Spark / Spark Streaming to
VoltDB. In general there are two ways you can push data from Spark / Spark
Streaming to any external system.

1. Using the Hadoop Filesystem interface: Both Spark and Spark Streaming
can write to any Hadoop-compatible file system using the Hadoop's
OutputFormat classes. If there is a VoltDB-to-Hadoop connector out there,
that may have the necessary OutputFormat / InputFormat classes to write to
/ read from VoltDB from Spark. Take a look at Spark
documentation<http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html>on
saving any RDD (the data abstraction in Spark) to a file using
yourRDD.saveAsHadoopFile(....).  This example shows how custom Hadoop
format is used to write to Cassandra. For VoltDB it would be similar. From
the point of view of Spark Streaming, you can use the same functionality in
DStreams<http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html#dstreams>(the
abstraction in Spark Streaming), by using
yourDStream.saveAsHadoopFiles(...).

2. Even if you dont have (1), no worries. As long as you have a Java way
for pushing data into VoltDB, you can do something like the following.

- DStream.foreachRDD()  is a function that allows you to do something for
each RDD in the DStream (DStream is like a infinite sequence of RDDs,
representing a stream of data). (Note that it is foreach() in pre Spark
0.9.)
- RDD.foreach() are function thats allows you to execute something for each
record in an RDD.

You can use both of these together to push every tuple in a stream to
voltDB, like this (Scala code)

yourDStreamWithVoltDBData.foreach(youRdd => {

   //for every RDD this will be called

   yourRdd => youRdd.foreach(yourRecord => {

       // do what you have to do for every record in the RDD
       // write to voltDB with any java library
   }
}

Hope this helps.

TD


On Sun, Feb 9, 2014 at 9:22 PM, Nithya <nithya.narasim...@hp.com> wrote:

> Hi,
>
> I am very new to Spark.I want to persist the information collected over a
> window (say 5 seconds) through spark streaming  in an in-memory datastore
> (voltdb).what is the best way to connect to this datastore from spark
> streaming?
>
> Thanks in advance.
>
> Nithya
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Connecting-to-an-inmemory-database-from-Spark-tp1343.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to