thanks, I have seen this, but this doesn't cover my question. What I need is read json and include raw json as part of my dataframe.
On Friday, December 30, 2016 10:23 AM, Annabel Melongo <melongo_anna...@yahoo.com.INVALID> wrote: Richard, Below documentation will show you how to create a sparkSession and how to programmatically load data: Spark SQL and DataFrames - Spark 2.1.0 Documentation | | | Spark SQL and DataFrames - Spark 2.1.0 Documentation | | | On Thursday, December 29, 2016 5:16 PM, Richard Xin <richardxin...@yahoo.com.INVALID> wrote: Say I have following data in file:{"id":1234,"ln":"Doe","fn":"John","age":25} {"id":1235,"ln":"Doe","fn":"Jane","age":22} java code snippet: final SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("json_test"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); HiveContext hc = new HiveContext(ctx.sc()); DataFrame df = hc.read().json("files/json/example2.json"); what I need is a DataFrame with columns id, ln, fn, age as well as raw_json string any advice on the best practice in java?Thanks, Richard