Hi, all
fetch wait time:
* Time the task spent waiting for remote shuffle blocks. This only
includes the time
* blocking on shuffle input data. For instance if block B is being
fetched while the task is
* still not finished processing block A, it is not considered to be
blocking on
Hi, dear user group:
I recently try to use the parallelize method of SparkContext to slice original
data into small pieces for further handling. Something like the below:
val partitionedSource = sparkContext.parallelize(seq, sparkPartitionSize)
The size of my original testing data is 88
Jeremy,
Just to be clear, are you assembling a jar with that class compiled (with
its dependencies) and including the path to that jar on the command line in
an environment variable (e.g. SPARK_CLASSPATH=path ./spark-shell)?
--j
On Saturday, May 24, 2014, Jeremy Lewi jer...@lewi.us wrote:
Hi
Hi Josh,
Thanks for the help.
The class should be on the path on all nodes. Here's what I did:
1) I built a jar from my scala code.
2) I copied that jar to a location on all nodes in my cluster
(/usr/local/spark)
3) I edited bin/compute-classpath.sh to add my jar to the class path.
4) I
But going back to your presented pattern, I have a question. Say your data
does have a fixed structure, but some of the JSON values are lists. How
would you map that to a SchemaRDD? (I didn’t notice any list values in the
CandyCrush example.) Take the likes field from my original example:
Hi Michael,
Is the in-memory columnar store planned as part of SparkSQL ?
Also will both HiveQL SQLParser be kept updated?
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Sun, May 25, 2014 at 2:44 AM, Michael Armbrust
Hello,
I have been trying to implement a custom accumulator as below
import org.apache.spark._
class VectorNew1(val data: Array[Double]) {}
implicit object VectorAP extends AccumulatorParam[VectorNew1] {
def zero(v: VectorNew1) = new VectorNew1(new