Re: How to convert Dataset to Dataset in Spark Structured Streaming?

Jules Damji Wed, 31 May 2017 00:13:03 -0700

Hello Kant, 

See is the examples in this blog explains how to deal with your particular 
case: 
https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html


Cheers
Jules 

Sent from my iPhone
Pardon the dumb thumb typos :)

> On May 30, 2017, at 7:31 PM, kant kodali <kanth...@gmail.com> wrote:
> 
> Hi All, 
> 
> I have a Dataset<Row> and I am trying to convert it into Dataset<String> 
> (json String) using Spark Structured Streaming. I have tried the following.
> 
> df2.toJSON().writeStream().foreach(new KafkaSink())
> This doesn't seem to work for the following reason. 
> 
> "Queries with streaming sources must be executed with writeStream.start()"
> 
> My dataframe has looks like this
> 
> name, ratio, count  // column names
> 
> "hello", 1.56, 34     
> 
> If I try to convert a Row into a Json String it results into something like 
> this {"key1", "name", "value1": "hello",  "key2", "ratio", "value2": 1.56 , 
> "key3", "count", "value3": 34} but what I need is something like this { 
> result: {"name": "hello", "ratio": 1.56, "count": 34} } however I don't have 
> a result column. 
> 
> It looks like there are couple of functions to_json and json_tuple but they 
> seem to take only one Column as a first argument so should I call to_json on 
> every column? Also how would I turn this into DataSet<String> ?
> 
> Thanks!
> 
> 
>

Re: How to convert Dataset to Dataset in Spark Structured Streaming?

Reply via email to