Re: Reading nested JSON data with Spark SQL

2015-01-01 Thread Pankaj Narang
oops 

  sqlContext.setConf("spark.sql.parquet.binaryAsString", "true")

thois solved the issue important for everyone



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20936.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Reading nested JSON data with Spark SQL

2015-01-01 Thread Pankaj Narang
Also it looks like that when  I store the String in parquet and try to fetch
them using spark code I got classcast exception


below how my array of strings are saved. each character ascii value is
 present in array of ints
res25: Array[Seq[String]] r= Array(ArrayBuffer(Array(104, 116, 116, 112, 58,
47, 47, 102, 98, 46, 109, 101, 47, 51, 67, 111, 72, 108, 99, 101, 77, 103)),
ArrayBuffer(), ArrayBuffer(), ArrayBuffer(), ArrayBuffer(Array(104, 116,
116, 112, 58, 47, 47, 105, 110, 115, 116, 97, 103, 114, 97, 109, 46, 99,
111, 109, 47, 112, 47, 120, 84, 50, 51, 78, 76, 105, 85, 55, 102, 47)),
ArrayBuffer(), ArrayBuffer(Array(104, 116, 116, 112, 58, 47, 47, 105, 110,
115, 116, 97, 103, 114, 97, 109, 46, 99, 111, 109, 47, 112, 47, 120, 84, 50,
53, 72, 52, 111, 90, 95, 114, 47)), ArrayBuffer(Array(104, 116, 116, 112,
58, 47, 47, 101, 122, 101, 101, 99, 108, 97, 115, 115, 105, 102, 105, 101,
100, 97, 100, 115, 46, 99, 111, 109, 47, 47, 100, 101, 115, 99, 47, 106, 97,
105, 112, 117, 114, 47, 49, 48, 51, 54, 50, 50,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20935.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Reading nested JSON data with Spark SQL

2015-01-01 Thread Pankaj Narang
Hih

I am having simiiar problem and tries your solution with spark 1.2 build
withing hadoop

I am saving object to parquet files where some fields are of type Array.

When I fetch them as below I get 

 java.lang.ClassCastException: [B cannot be cast to java.lang.CharSequence



def fetchTags(rows: SchemaRDD) = {
   rows.flatMap ( x =>
((x.getAs[Buffer[CharSequence]](0)).map(_.toString())) )
  }



The value I am fetching have been stored as Array of Strings. I have tried
replacing Buffer[CharSequence] with Array[String] Seq[String] Seq[Seq[char]]
but still got errors

Can you provide clue. 

Pankaj



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20933.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Reading nested JSON data with Spark SQL

2014-11-19 Thread Simone Franzini
This works great, thank you!

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Wed, Nov 19, 2014 at 3:40 PM, Michael Armbrust 
wrote:

> You can extract the nested fields in sql: SELECT field.nestedField ...
>
> If you don't do that then nested fields are represented as rows within
> rows and can be retrieved as follows:
>
> t.getAs[Row](0).getInt(0)
>
> Also, I would write t.getAs[Buffer[CharSequence]](12) as
> t.getAs[Seq[String]](12) since we don't guarantee the return type will be
> a buffer.
>
>
> On Wed, Nov 19, 2014 at 1:33 PM, Simone Franzini 
> wrote:
>
>> I have been using Spark SQL to read in JSON data, like so:
>> val myJsonFile = sqc.jsonFile(args("myLocation"))
>> myJsonFile.registerTempTable("myTable")
>> sqc.sql("mySQLQuery").map { row =>
>> myFunction(row)
>> }
>>
>> And then in myFunction(row) I can read the various columns with the
>> Row.getX methods. However, this methods only work for basic types (string,
>> int, ...).
>> I was having some trouble reading columns that are arrays or maps (i.e.
>> other JSON objects).
>>
>> I am now using Spark 1.2 from the Cloudera snapshot and I noticed that
>> there is a new method getAs. I was able to use it to read for example an
>> array of strings like so:
>> t.getAs[Buffer[CharSequence]](12)
>>
>> However, if I try to read a column with a nested JSON object like this:
>> t.getAs[Map[String, Any]](11)
>>
>> I get the following error:
>> java.lang.ClassCastException:
>> org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to
>> scala.collection.immutable.Map
>>
>> How can I read such a field? Am I just missing something small or should
>> I be looking for a completely different alternative to reading JSON?
>>
>> Simone Franzini, PhD
>>
>> http://www.linkedin.com/in/simonefranzini
>>
>
>


Re: Reading nested JSON data with Spark SQL

2014-11-19 Thread Michael Armbrust
You can extract the nested fields in sql: SELECT field.nestedField ...

If you don't do that then nested fields are represented as rows within rows
and can be retrieved as follows:

t.getAs[Row](0).getInt(0)

Also, I would write t.getAs[Buffer[CharSequence]](12) as
t.getAs[Seq[String]](12) since we don't guarantee the return type will be a
buffer.


On Wed, Nov 19, 2014 at 1:33 PM, Simone Franzini 
wrote:

> I have been using Spark SQL to read in JSON data, like so:
> val myJsonFile = sqc.jsonFile(args("myLocation"))
> myJsonFile.registerTempTable("myTable")
> sqc.sql("mySQLQuery").map { row =>
> myFunction(row)
> }
>
> And then in myFunction(row) I can read the various columns with the
> Row.getX methods. However, this methods only work for basic types (string,
> int, ...).
> I was having some trouble reading columns that are arrays or maps (i.e.
> other JSON objects).
>
> I am now using Spark 1.2 from the Cloudera snapshot and I noticed that
> there is a new method getAs. I was able to use it to read for example an
> array of strings like so:
> t.getAs[Buffer[CharSequence]](12)
>
> However, if I try to read a column with a nested JSON object like this:
> t.getAs[Map[String, Any]](11)
>
> I get the following error:
> java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to
> scala.collection.immutable.Map
>
> How can I read such a field? Am I just missing something small or should I
> be looking for a completely different alternative to reading JSON?
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>


Reading nested JSON data with Spark SQL

2014-11-19 Thread Simone Franzini
I have been using Spark SQL to read in JSON data, like so:
val myJsonFile = sqc.jsonFile(args("myLocation"))
myJsonFile.registerTempTable("myTable")
sqc.sql("mySQLQuery").map { row =>
myFunction(row)
}

And then in myFunction(row) I can read the various columns with the
Row.getX methods. However, this methods only work for basic types (string,
int, ...).
I was having some trouble reading columns that are arrays or maps (i.e.
other JSON objects).

I am now using Spark 1.2 from the Cloudera snapshot and I noticed that
there is a new method getAs. I was able to use it to read for example an
array of strings like so:
t.getAs[Buffer[CharSequence]](12)

However, if I try to read a column with a nested JSON object like this:
t.getAs[Map[String, Any]](11)

I get the following error:
java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to
scala.collection.immutable.Map

How can I read such a field? Am I just missing something small or should I
be looking for a completely different alternative to reading JSON?

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini