Re: SQL query over (Long, JSON string) tuples

2015-01-29 Thread Michael Armbrust
Eventually it would be nice for us to have some sort of function to do the conversion you are talking about on a single column, but for now I usually hack it as you suggested: val withId = origRDD.map { case (id, str) => s"""{"id":$id, ${str.trim.drop(1)}""" } val table = sqlContext.jsonRDD(withId

Re: SQL query over (Long, JSON string) tuples

2015-01-29 Thread Tobias Pfeiffer
Hi Ayoub, thanks for your mail! On Thu, Jan 29, 2015 at 6:23 PM, Ayoub wrote: > > SQLContext and hiveContext have a "jsonRDD" method which accept an > RDD[String] where the string is a JSON String a returns a SchemaRDD, it > extends RDD[Row] which the type you want. > > After words you should be

Re: SQL query over (Long, JSON string) tuples

2015-01-29 Thread Ayoub
chema inference. Is there *any* > way other than string-manipulating the JSON string and adding the timestamp > to it? > > Thanks > Tobias > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Re-SQL-query-over-Long-JSON-string-tuples-tp21419

SQL query over (Long, JSON string) tuples

2015-01-29 Thread Tobias Pfeiffer
Hi, I have data as RDD[(Long, String)], where the Long is a timestamp and the String is a JSON-encoded string. I want to infer the schema of the JSON and then do a SQL statement on the data (no aggregates, just column selection and UDF application), but still have the timestamp associated with eac