How to make symbol for one column in Spark SQL.

2014-12-04 Thread Tim Chou
I have tried to use function where and filter in SchemaRDD.

I have build class for tuple/record in the table like this:
case class Region(num:Int, str1:String, str2:String)

I also successfully create a SchemaRDD.

scala val results = sqlContext.sql(select * from region)
results: org.apache.spark.sql.SchemaRDD =
SchemaRDD[99] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions at
BasicOperators.scala:208

But I cannot use symbol in where and filter function. Here is the log:

scala results.where('num === 1)
console:22: error: value === is not a member of Symbol
  results.where('num === 1)
  ^

I don't know why.
Any suggestions?

Thanks,
Tim


Re: How to make symbol for one column in Spark SQL.

2014-12-04 Thread Tim Chou
...

Thank you! I'm so stupid... This is the only thing I miss in the
tutorial...orz

Thanks,
Tim

2014-12-04 16:49 GMT-06:00 Michael Armbrust mich...@databricks.com:

 You need to import sqlContext._

 On Thu, Dec 4, 2014 at 2:26 PM, Tim Chou timchou@gmail.com wrote:

 I have tried to use function where and filter in SchemaRDD.

 I have build class for tuple/record in the table like this:
 case class Region(num:Int, str1:String, str2:String)

 I also successfully create a SchemaRDD.

 scala val results = sqlContext.sql(select * from region)
 results: org.apache.spark.sql.SchemaRDD =
 SchemaRDD[99] at RDD at SchemaRDD.scala:103
 == Query Plan ==
 == Physical Plan ==
 ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions
 at BasicOperators.scala:208

 But I cannot use symbol in where and filter function. Here is the log:

 scala results.where('num === 1)
 console:22: error: value === is not a member of Symbol
   results.where('num === 1)
   ^

 I don't know why.
 Any suggestions?

 Thanks,
 Tim





How to create a new SchemaRDD which is not based on original SparkPlan?

2014-12-03 Thread Tim Chou
Hi All,

My question is about lazy running mode for SchemaRDD, I guess. I know lazy
mode is good, however, I still have this demand.

For example, here is the first SchemaRDD, named result.(select * from table
where num1 and num  4):

results: org.apache.spark.sql.SchemaRDD =
SchemaRDD[59] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Filter ((num#0  1)  (num#0  4))
 ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions at
basicOperators.scala:208

Then I create the second RDD with: select num, str1 from table from result

results1: org.apache.spark.sql.SchemaRDD =
SchemaRDD[60] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Project [num#0,str1#1]
 Filter ((num#0  1)  (num#0  4))
  ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions
at basicOperators.scala:208

Actually, I want the second RDD's plan is based on result not the original
table.

How can I create a new SchemaRDD whose plan starts from last RDD?

Thanks,
Tim


How does Spark SQL traverse the physical tree?

2014-11-24 Thread Tim Chou
Hi All,

I'm learning the code of Spark SQL.

I'm confused about how SchemaRDD executes each operator.

I'm tracing the code. I found toRDD() function in QueryExecution is the
start for running a query. toRDD function will run SparkPlan, which is a
tree structure.

However, I didn't find any iterative sentence in execute function for any
detail operators. It seems Spark SQL will only run the top node in this
tree.

I know the conclusion is wrong.But which code have I missed?

Thanks,
Tim


Spark- How can I run MapReduce only on one partition in an RDD?

2014-11-13 Thread Tim Chou
Hi All,

I use textFile to create a RDD. However, I don't want to handle the whole
data in this RDD. For example, maybe I only want to solve the data in 3rd
partition of the RDD.

How can I do it? Here are some possible solutions that I'm thinking:
1. Create multiple RDDs when reading the file
2.  Run MapReduce functions with the specific partition for an RDD.

However, I cannot find any appropriate function.

Thank you and wait for your suggestions.

Best,
Tim


How to add elements into map?

2014-11-07 Thread Tim Chou
Here is the code I run in spark-shell:

val table = sc.textFile(args(1))
val histMap = collection.mutable.Map[Int,Int]()
for (x - table) {


val tuple = x.split('|')

  histMap.put(tuple(0).toInt, 1)


}

Why is histMap still null?
Is there something wrong with my code?

Thanks,
Fang


Fwd: How to add elements into map?

2014-11-07 Thread Tim Chou
Here is the code I run in spark-shell:

val table = sc.textFile(args(1))
val histMap = collection.mutable.Map[Int,Int]()
for (x - table) {


val tuple = x.split('|')


histMap.put(tuple(0).toInt, 1)


}

Why is histMap still null?
Is there something wrong with my code?

Thanks,
Tim