Hi,
I have the following usecase, assuming that I have my data in e.g. hdfs, a
single file sequence file containing rows of CSV entries that I can split and
build an RDD of arrays of (smaller) strings.
What I want to do is to build two RDDs where the first RDD contains a subset of
columns and t
Hi,
I am trying to upgrade from spark v0.91 to v1.0.0 and getting into some wierd
behavior.
When, in pyspark, I invoke
sc.textFile("hdfs://hadoop-ha01:/user/x/events_2.1").take(1) the
call crashes with the below stack trace.
The file resides in hadoop 2.2, it is a large event data,
A few questions about the resilience of the client side of spark.
what would happen if the client process crashes, can it reconstruct its state ?
Suppose I just want to serialize it and reload it back is this possible ?
More advanced use case, is there a way to move SparkContext between
jvms/mac
I would check the DNS setting.
Akka seems to pick configuration from FQDN on my system
Sagi
From: Hahn Jiang [mailto:hahn.jiang@gmail.com]
Sent: Friday, April 11, 2014 10:56 AM
To: user
Subject: Error when I use spark-streaming
hi all,
When I run spark-streaming use NetworkWordCount in