Re: Spark SQL Nested Array of JSON with empty field

2016-06-04 Thread Ewan Leith
The spark json read is unforgiving of things like missing elements from some 
json records, or mixed types.

If you want to pass invalid json files through spark you're best doing an 
initial parse through the Jackson APIs using a defined schema first, then you 
can set types like Option[String] where a column is optional, then convert the 
validated back into a new string variable, then read the string as a dataframe.

Thanks,
Ewan

On 3 Jun 2016 22:03, Jerry Wong  wrote:
Hi,

I met a problem of empty field in the nested JSON file with Spark SQL. For 
instance,
There are two lines of JSON file as follows,

{
"firstname": "Jack",
"lastname": "Nelson",
"address": {
"state": "New York",
"city": "New York"
}
}{
"firstname": "Landy",
"middlename": "Ken",
"lastname": "Yong",
"address": {
"state": "California",
"city": "Los Angles"
}
}

I use Spark SQL to get the files like,
val row = sqlContext.sql("SELECT firstname, middlename, lastname, 
address.state, address.city FROM jsontable")
The compile will tell me the error of line1: no "middlename".
How do I handle this case in the SQL sql?

Many thanks in advance!
Jerry




Re: Spark SQL Nested Array of JSON with empty field

2016-06-03 Thread Christian Hellström
If that's your JSON file, then the first problem is that it's incorrectly
formatted.

Apart from that you can just read the JSON into a DataFrame with
sqlContext.read.json() and then select directly on the DataFrame without
having to register a temporary table: jsonDF.select("firstname",
"address.state", ...). Works for me (with a properly formatted JSON
document). To make sure that your JSON is read correctly, check
jsonDF.printSchema. If there is an entry with corrupt records (or similar),
you know there's a problem with the JSON structure.

On 3 June 2016 at 21:31, Jerry Wong  wrote:

> Hi,
>
> I met a problem of empty field in the nested JSON file with Spark SQL. For
> instance,
> There are two lines of JSON file as follows,
>
> {
> "firstname": "Jack",
> "lastname": "Nelson",
> "address": {
> "state": "New York",
> "city": "New York"
> }
> }{
> "firstname": "Landy",
> "middlename": "Ken",
> "lastname": "Yong",
> "address": {
> "state": "California",
> "city": "Los Angles"
> }
> }
>
> I use Spark SQL to get the files like,
> val row = sqlContext.sql("SELECT firstname, middlename, lastname,
> address.state, address.city FROM jsontable")
> The compile will tell me the error of line1: no "middlename".
> How do I handle this case in the SQL sql?
>
> Many thanks in advance!
> Jerry
>
>
>


Spark SQL Nested Array of JSON with empty field

2016-06-03 Thread Jerry Wong
Hi,

I met a problem of empty field in the nested JSON file with Spark SQL. For
instance,
There are two lines of JSON file as follows,

{
"firstname": "Jack",
"lastname": "Nelson",
"address": {
"state": "New York",
"city": "New York"
}
}{
"firstname": "Landy",
"middlename": "Ken",
"lastname": "Yong",
"address": {
"state": "California",
"city": "Los Angles"
}
}

I use Spark SQL to get the files like,
val row = sqlContext.sql("SELECT firstname, middlename, lastname,
address.state, address.city FROM jsontable")
The compile will tell me the error of line1: no "middlename".
How do I handle this case in the SQL sql?

Many thanks in advance!
Jerry