Re: Recommendation of using StreamSinkProvider for a custom KairosDB Sink

Girish Subramanian Mon, 25 Jun 2018 16:56:01 -0700

Thanks Tathagata Das

Currently in my use case I am planning to collect() all the data in the
driver and publish it into KairosDB something like this


I am not worried about the size of the data. If I am doing something simple
like this do I still need to bother about the un-stability of the API ?

I really like the Service Provider pattern (the code looks neat). How will
this change with the new Spark 2.4 ?

case class KairosSink(
                       sqlContext: SQLContext,
                       parameters: Map[String, String],
                       partitionColumns: Seq[String],
                       outputMode: OutputMode,
                       kairosClient: HttpClient) extends Sink {

    override def addBatch(batchId: Long, data: DataFrame): Unit = {
        val rows = data.collect()
        val metrics: Map[String, Array[(String, Double)]] = rows
            .map(generateMetrics)
            .groupBy(_._1)
            .mapValues(_.map(_._2))
        val metricBuilder = createKairosMetricBuilder(metrics)
        try {
            val response = kairosClient.pushMetrics(metricBuilder)
            if (response.getStatusCode != 204) {
                import scala.collection.JavaConversions._
                println(response.getErrors.mkString("."))
            } else {
                println("kairos success")
            }
        } catch {
            case e: Exception => e.printStackTrace()
        }
    }

}



On Mon, Jun 25, 2018 at 2:33 PM Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> This is interface is actually unstable. The v2 of DataSource APIs is being
> designed right now which will be public and stable in a release or two. So
> unfortunately there is no stable interface right now that I can officially
> recommend.
>
> That said, you could always use the ForeachWriter interface (see
> DataStreamWriter.foreach).
> Also, in the next release, you will also have a foreachBatch interface
> that allows you to do custom operation on the output of each micro-batch
> represented as a DataFrame (exactly same as the Sink.addBatch).
> Both of these should be useful for you until the interfaces are stabilized.
>
> On Mon, Jun 25, 2018 at 9:55 AM, subramgr <subramanian.gir...@gmail.com>
> wrote:
>
>> We are using Spark 2.3 and would want to know if it is recommended to
>> create
>> a custom KairoDBSink by implementing the StreamSinkProvider ?
>>
>> The interface is marked experimental and in-stable ?
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>

-- 
Thanks & Regards,
Girish Subramanian

Re: Recommendation of using StreamSinkProvider for a custom KairosDB Sink

Reply via email to