subject:"Dealing with missing columns in SPARK SQL in JSON"

Re: Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Sam Elamin

ah if thats the case then you might need to define the schema before hand. Either that or if you want to infer it then ensure a jsonfile exists with the right schema so spark infers the right columns essentially making both files one dataframe if that makes sense On Tue, Feb 14, 2017 at 3:04 PM,

Re: Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Aseem Bansal

Sorry if I trivialized the example. It is the same kind of file and sometimes it could have "a", sometimes "b", sometimes both. I just don't know. That is what I meant by missing columns. It would be good if I read any of the JSON and if I do spark sql and it gave me for json1.json a | b 1 |

Re: Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Sam Elamin

I may be missing something super obvious here but can't you combine them into a single dataframe. Left join perhaps? Try writing it in sql " select a from json1 and b from josn2"then run explain to give you a hint to how to do it in code Regards Sam On Tue, 14 Feb 2017 at 14:30, Aseem Bansal

Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Aseem Bansal

Say I have two files containing single rows json1.json {"a": 1} json2.json {"b": 2} I read in this json file using spark's API into a dataframe one at a time. So I have Dataset json1DF and Dataset json2DF If I run "select a, b from __THIS__" in a SQLTransformer then I will get an exception