spark sql: sqlContext.jsonFile date type detection and perforormance
Any help? or comments? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881p16939.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark sql: sqlContext.jsonFile date type detection and perforormance
Is there any specific issues you are facing? Thanks, Yin On Tue, Oct 21, 2014 at 4:00 PM, tridib tridib.sama...@live.com wrote: Any help? or comments? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881p16939.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark sql: sqlContext.jsonFile date type detection and perforormance
Yes, I am unable to use jsonFile() so that it can detect date type automatically from json data. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881p16974.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL : sqlContext.jsonFile date type detection and perforormance
Add one more thing about question 1. Once you get the SchemaRDD from jsonFile/jsonRDD, you can use CAST(columnName as DATE) in your query to cast the column type from the StringType to DateType (the string format should be -[m]m-[d]d and you need to use hiveContext). Here is the code snippet that may help. val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val schemaRDD = hiveContext.jsonFile(...) schemaRDD.registerTempTable(jsonTable) hiveContext.sql(SELECT CAST(columnName as DATE) FROM jsonTable) Thanks, Yin On Tue, Oct 21, 2014 at 8:00 PM, Yin Huai huaiyin@gmail.com wrote: Hello Tridib, I just saw this one. 1. Right now, jsonFile and jsonRDD do not detect date type. Right now, IntegerType, LongType, DoubleType, DecimalType, StringType, BooleanType, StructType and ArrayType will be automatically detected. 2. The process of inferring schema will pass the entire dataset once to determine the schema. So, you will see a join is launched. Applying a specific schema to a dataset does not have this cost. 3. It is hard to comment on it without seeing your implementation. For our built-in JSON support, jsonFile and jsonRDD provides a very convenient way to work with JSON datasets with SQL. You do not need to define the schema in advance and Spark SQL will automatically create the SchemaRDD for your dataset. You can start to query it with SQL by simply registering the returned SchemaRDD as a temp table. Regarding the implementation, we use a high performance JSON lib (Jackson, https://github.com/FasterXML/jackson) to parse JSON records. Thanks, Yin On Mon, Oct 20, 2014 at 10:56 PM, tridib tridib.sama...@live.com wrote: Hi Spark SQL team, I trying to explore automatic schema detection for json document. I have few questions: 1. What should be the date format to detect the fields as date type? 2. Is automatic schema infer slower than applying specific schema? 3. At this moment I am parsing json myself using map Function and creating schema RDD from the parsed JavaRDD. Is there any performance impact not using inbuilt jsonFile()? Thanks Tridib -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark SQL : sqlContext.jsonFile date type detection and perforormance
Hi Spark SQL team, I trying to explore automatic schema detection for json document. I have few questions: 1. What should be the date format to detect the fields as date type? 2. Is automatic schema infer slower than applying specific schema? 3. At this moment I am parsing json myself using map Function and creating schema RDD from the parsed JavaRDD. Is there any performance impact not using inbuilt jsonFile()? Thanks Tridib -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org