Hi, I have data as RDD[(Long, String)], where the Long is a timestamp and the String is a JSON-encoded string. I want to infer the schema of the JSON and then do a SQL statement on the data (no aggregates, just column selection and UDF application), but still have the timestamp associated with each row of the result. I completely fail to see how that would be possible. Any suggestions?
I can't even see how I would get an RDD[(Long, Row)] so that I *might* be able to add the timestamp to the row after schema inference. Is there *any* way other than string-manipulating the JSON string and adding the timestamp to it? Thanks Tobias