spark sql: sqlContext.jsonFile date type detection and perforormance

2014-10-21 Thread tridib
Any help? or comments?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881p16939.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark sql: sqlContext.jsonFile date type detection and perforormance

2014-10-21 Thread Yin Huai
Is there any specific issues you are facing?

Thanks,

Yin

On Tue, Oct 21, 2014 at 4:00 PM, tridib tridib.sama...@live.com wrote:

 Any help? or comments?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881p16939.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: spark sql: sqlContext.jsonFile date type detection and perforormance

2014-10-21 Thread tridib
Yes, I am unable to use jsonFile() so that it can detect date type
automatically from json data.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881p16974.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL : sqlContext.jsonFile date type detection and perforormance

2014-10-21 Thread Yin Huai
Add one more thing about question 1. Once you get the SchemaRDD from
jsonFile/jsonRDD, you can use CAST(columnName as DATE) in your query to
cast the column type from the StringType to DateType (the string format
should be -[m]m-[d]d and you need to use hiveContext). Here is the
code snippet that may help.

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val schemaRDD = hiveContext.jsonFile(...)
schemaRDD.registerTempTable(jsonTable)
hiveContext.sql(SELECT CAST(columnName as DATE) FROM jsonTable)

Thanks,

Yin

On Tue, Oct 21, 2014 at 8:00 PM, Yin Huai huaiyin@gmail.com wrote:

 Hello Tridib,

 I just saw this one.

 1. Right now, jsonFile and jsonRDD do not detect date type. Right now,
 IntegerType, LongType, DoubleType, DecimalType, StringType, BooleanType,
 StructType and ArrayType will be automatically detected.
 2. The process of inferring schema will pass the entire dataset once to
 determine the schema. So, you will see a join is launched. Applying a
 specific schema to a dataset does not have this cost.
 3. It is hard to comment on it without seeing your implementation. For our
 built-in JSON support, jsonFile and jsonRDD provides a very convenient way
 to work with JSON datasets with SQL. You do not need to define the schema
 in advance and Spark SQL will automatically create the SchemaRDD for your
 dataset. You can start to query it with SQL by simply registering the
 returned SchemaRDD as a temp table. Regarding the implementation, we use a
 high performance JSON lib (Jackson, https://github.com/FasterXML/jackson)
 to parse JSON records.

 Thanks,

 Yin

 On Mon, Oct 20, 2014 at 10:56 PM, tridib tridib.sama...@live.com wrote:

 Hi Spark SQL team,
 I trying to explore automatic schema detection for json document. I have
 few
 questions:
 1. What should be the date format to detect the fields as date type?
 2. Is automatic schema infer slower than applying specific schema?
 3. At this moment I am parsing json myself using map Function and creating
 schema RDD from the parsed JavaRDD. Is there any performance impact not
 using inbuilt jsonFile()?

 Thanks
 Tridib




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Spark SQL : sqlContext.jsonFile date type detection and perforormance

2014-10-20 Thread tridib
Hi Spark SQL team,
I trying to explore automatic schema detection for json document. I have few
questions:
1. What should be the date format to detect the fields as date type?
2. Is automatic schema infer slower than applying specific schema?
3. At this moment I am parsing json myself using map Function and creating
schema RDD from the parsed JavaRDD. Is there any performance impact not
using inbuilt jsonFile()?

Thanks
Tridib




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org