can communication and computation be overlapped in spark?

2014-05-24 Thread wxhsdp
Hi, all fetch wait time: * Time the task spent waiting for remote shuffle blocks. This only includes the time * blocking on shuffle input data. For instance if block B is being fetched while the task is * still not finished processing block A, it is not considered to be blocking on

Issue with the parallelize method in SparkContext

2014-05-24 Thread Wisc Forum
Hi, dear user group: I recently try to use the parallelize method of SparkContext to slice original data into small pieces for further handling. Something like the below: val partitionedSource = sparkContext.parallelize(seq, sparkPartitionSize) The size of my original testing data is 88

Re: Working with Avro Generic Records in the interactive scala shell

2014-05-24 Thread Josh Marcus
Jeremy, Just to be clear, are you assembling a jar with that class compiled (with its dependencies) and including the path to that jar on the command line in an environment variable (e.g. SPARK_CLASSPATH=path ./spark-shell)? --j On Saturday, May 24, 2014, Jeremy Lewi jer...@lewi.us wrote: Hi

Re: Working with Avro Generic Records in the interactive scala shell

2014-05-24 Thread Jeremy Lewi
Hi Josh, Thanks for the help. The class should be on the path on all nodes. Here's what I did: 1) I built a jar from my scala code. 2) I copied that jar to a location on all nodes in my cluster (/usr/local/spark) 3) I edited bin/compute-classpath.sh to add my jar to the class path. 4) I

Re: Using Spark to analyze complex JSON

2014-05-24 Thread Michael Armbrust
But going back to your presented pattern, I have a question. Say your data does have a fixed structure, but some of the JSON values are lists. How would you map that to a SchemaRDD? (I didn’t notice any list values in the CandyCrush example.) Take the likes field from my original example:

Re: Using Spark to analyze complex JSON

2014-05-24 Thread Mayur Rustagi
Hi Michael, Is the in-memory columnar store planned as part of SparkSQL ? Also will both HiveQL SQLParser be kept updated? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Sun, May 25, 2014 at 2:44 AM, Michael Armbrust

Custom Accumulator: Type Mismatch Error

2014-05-24 Thread Muttineni, Vinay
Hello, I have been trying to implement a custom accumulator as below import org.apache.spark._ class VectorNew1(val data: Array[Double]) {} implicit object VectorAP extends AccumulatorParam[VectorNew1] { def zero(v: VectorNew1) = new VectorNew1(new