Thanks Tathagata! I did mean using the transformation in the form of a UDF in Spark SQL. This function I envision of works on individual records as described by you.
On Fri, Aug 8, 2014 at 6:48 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > You can always define an arbitrary RDD-to-RDD function, use it from both > Spark and Spark Streaming. For example, > > def myTransofmration(rdd: RDD[X]): RDD[Y] = { .... } > > In spark you can obvious apply it on an RDD. In spark streaming, you can > apply on the RDDs of a DStream by > > myDStream.transform(rdd => myTransform(rdd)) > > I am not sure what you mean by reuse that transformation through Spark > SQL. Do you mean from a sql query? In Spark SQL you can register a > function, that operates on each records (so a map like function only), but > not a arbitrary transformation on tables. But then its easy to mix things > up with Spark and Spark SQL together, as you can do sqlContext.sql("sql > query"), get back the result RDDs, and then apply the myTransformation on > that RDD. > > Hope this clarifies things. > > TD > > > On Fri, Aug 8, 2014 at 11:10 AM, Jeevak Kasarkod <jee...@gmail.com> wrote: > >> Is it possible to create custom transformations in Spark? For example >> data security transforms such as encrypt and decrypt. Ideally its something >> one would like to reuse across Spark streaming, Spark SQL and Spark. >> >> > -- Cheers, Jeevak