How to make symbol for one column in Spark SQL.
I have tried to use function where and filter in SchemaRDD. I have build class for tuple/record in the table like this: case class Region(num:Int, str1:String, str2:String) I also successfully create a SchemaRDD. scala val results = sqlContext.sql(select * from region) results: org.apache.spark.sql.SchemaRDD = SchemaRDD[99] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions at BasicOperators.scala:208 But I cannot use symbol in where and filter function. Here is the log: scala results.where('num === 1) console:22: error: value === is not a member of Symbol results.where('num === 1) ^ I don't know why. Any suggestions? Thanks, Tim
Re: How to make symbol for one column in Spark SQL.
... Thank you! I'm so stupid... This is the only thing I miss in the tutorial...orz Thanks, Tim 2014-12-04 16:49 GMT-06:00 Michael Armbrust mich...@databricks.com: You need to import sqlContext._ On Thu, Dec 4, 2014 at 2:26 PM, Tim Chou timchou@gmail.com wrote: I have tried to use function where and filter in SchemaRDD. I have build class for tuple/record in the table like this: case class Region(num:Int, str1:String, str2:String) I also successfully create a SchemaRDD. scala val results = sqlContext.sql(select * from region) results: org.apache.spark.sql.SchemaRDD = SchemaRDD[99] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions at BasicOperators.scala:208 But I cannot use symbol in where and filter function. Here is the log: scala results.where('num === 1) console:22: error: value === is not a member of Symbol results.where('num === 1) ^ I don't know why. Any suggestions? Thanks, Tim
How to create a new SchemaRDD which is not based on original SparkPlan?
Hi All, My question is about lazy running mode for SchemaRDD, I guess. I know lazy mode is good, however, I still have this demand. For example, here is the first SchemaRDD, named result.(select * from table where num1 and num 4): results: org.apache.spark.sql.SchemaRDD = SchemaRDD[59] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == Filter ((num#0 1) (num#0 4)) ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:208 Then I create the second RDD with: select num, str1 from table from result results1: org.apache.spark.sql.SchemaRDD = SchemaRDD[60] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == Project [num#0,str1#1] Filter ((num#0 1) (num#0 4)) ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:208 Actually, I want the second RDD's plan is based on result not the original table. How can I create a new SchemaRDD whose plan starts from last RDD? Thanks, Tim
How does Spark SQL traverse the physical tree?
Hi All, I'm learning the code of Spark SQL. I'm confused about how SchemaRDD executes each operator. I'm tracing the code. I found toRDD() function in QueryExecution is the start for running a query. toRDD function will run SparkPlan, which is a tree structure. However, I didn't find any iterative sentence in execute function for any detail operators. It seems Spark SQL will only run the top node in this tree. I know the conclusion is wrong.But which code have I missed? Thanks, Tim
Spark- How can I run MapReduce only on one partition in an RDD?
Hi All, I use textFile to create a RDD. However, I don't want to handle the whole data in this RDD. For example, maybe I only want to solve the data in 3rd partition of the RDD. How can I do it? Here are some possible solutions that I'm thinking: 1. Create multiple RDDs when reading the file 2. Run MapReduce functions with the specific partition for an RDD. However, I cannot find any appropriate function. Thank you and wait for your suggestions. Best, Tim
How to add elements into map?
Here is the code I run in spark-shell: val table = sc.textFile(args(1)) val histMap = collection.mutable.Map[Int,Int]() for (x - table) { val tuple = x.split('|') histMap.put(tuple(0).toInt, 1) } Why is histMap still null? Is there something wrong with my code? Thanks, Fang
Fwd: How to add elements into map?
Here is the code I run in spark-shell: val table = sc.textFile(args(1)) val histMap = collection.mutable.Map[Int,Int]() for (x - table) { val tuple = x.split('|') histMap.put(tuple(0).toInt, 1) } Why is histMap still null? Is there something wrong with my code? Thanks, Tim