Hi TD,
"You can always run two jobs on the same cached RDD, and they can run in
parallel (assuming you launch the 2 jobs from two different threads)"
Is this a correct way to launch jobs from two different threads?
val threadA = new Thread(new Runnable {
def run() {
for(i<- 0 until e
Hi,
I just wrote an application that intends to submit its actions(jobs) via
independent threads keeping in view of the point: "Second, within each
Spark application, multiple “jobs” (Spark actions) may be running
concurrently if they were submitted by different threads", mentioned in:
https://spa
Hi,
I run into Task not Serializable excption with following code below. When I
remove the threads and run, it works, but with threads I run into Task not
serializable exception.
object SparkKart extends Serializable{
def parseVector(line: String): Vector[Double] = {
DenseVector(line.split('
I could trace where the problem is. If I run without any threads, it works
fine. When I allocate threads, I run into Not serializable problem. But, I
need to have threads in my code.
Any help please!!!
This is my code:
object SparkKart
{
def parseVector(line: String): Vector[Double] = {
Dens
Hi,
I have a file containig data in the following way:
0.0 0.0 0.0
0.1 0.1 0.1
0.2 0.2 0.2
9.0 9.0 9.0
9.1 9.1 9.1
9.2 9.2 9.2
Now I do the folloowing:
val kPoints = data.takeSample(withReplacement = false, 4, 42).toArray
val thread1= new Thread(new Runnable {
def run() {
v
Are you replicating any RDDs?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-Filesystem-closed-tp20150p21749.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
Hi,
I have HDFS file of size 598MB. I create RDD over this file and cache it in
RAM in a 7 node cluster with 2G RAM each. I find that each partition gets
replicated thrice or even 4 times in the cluster even without me specifying
in code. Total partitions are 5 for the RDD created but cached partit
Hi,
My spark cluster contains machines like Pentium-4, dual core and quad-core
machines. I am trying to run a character frequency count application. The
application contains several threads, each submitting a job(action) that
counts the frequency of a single character. But, my problem is, I get
dif
Hi,
Can someone please suggest some real life application implemented in spark
( things like gene sequencing) that is of type below code. Basically, the
application should have jobs submitted via as many threads as possible. I
need similar kind of spark application for benchmarking.
val threadA
Hi,
I have this doubt: Assume that an rdd is stored across multiple nodes and
one of the nodes fails. So, a partition is lost. Now, I know that when this
node is back, it uses the lineage from its neighbours and recomputes that
partition alone.
1) How does it get the source data (original data be
Hi,
I keep facing this error when I run my application:
java.io.IOException: Connection from s1/- closed +details
java.io.IOException: Connection from s1/:43741 closed
at
org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:9
When I increase the executor.memory size, I run it smoothly without any
errors.
On Sat, Jan 24, 2015 at 9:29 PM, Rapelly Kartheek
wrote:
> Hi,
> While running spark application, I get the following Exception leading to
> several failed stages.
>
> Exception in thread "Thread-46" org.apache.sp
Hi,
While running spark application, I get the following Exception leading to
several failed stages.
Exception in thread "Thread-46" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 11.0 (TID 262, s
-- Forwarded message --
From: Rapelly Kartheek
Date: Mon, Jan 19, 2015 at 3:03 PM
Subject: UnknownhostException : home
To: "user@spark.apache.org"
Hi,
I get the following exception when I run my application:
karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class
org.apache.
Hi,
This is what I am trying to do:
karthik@s4:~/spark-1.2.0$ SPARK_HADOOP_VERSION=2.3.0 sbt/sbt clean
Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
[info] Loading project definition from
/home/karthik/spark-1.2.0/project/project
C
The problem is that my network is not able to access github.com for cloning
some dependencies as github is blocked in India. What are the other
possible ways for this problem??
Thank you!
On Sun, Jan 4, 2015 at 9:45 PM, Rapelly Kartheek
wrote:
> Hi,
>
> I get the following error when I build sp
Hi,
I get the following error when I build spark-1.2.0 using sbt:
[error] Nonzero exit code (128): git clone
https://github.com/ScrapCodes/sbt-pom-reader.git
/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader
[error] Use 'last' for the full log.
Any help please?
Thanks
--
V
Hi Deng,
Thank you. That works perfectly:)
Regards
Karthik.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-access-application-name-in-the-spark-framework-code-tp19719p19723.html
Sent from the Apache Spark User List mailing list archive at Nabble.co
Does SparkContext exists when this part (AskDriverWithReply()) of the
scheduler code gets executed?
On Sun, Oct 12, 2014 at 1:54 PM, rapelly kartheek
wrote:
> Hi Sean,
> I tried even with sc as: sc.parallelize(data). But. I get the error: value
> sc not found.
>
> On Sun, Oct 12, 2014 at 1:47 PM
Hi Sean,
I tried even with sc as: sc.parallelize(data). But. I get the error: value
sc not found.
On Sun, Oct 12, 2014 at 1:47 PM, sowen [via Apache Spark User List] <
ml-node+s1001560n16233...@n3.nabble.com> wrote:
> It is a method of the class, not a static method of the object. Since a
> Spark
When I see the storage details of the rdd in the webUI, I find that each
block is replicated twice and not on a single node. All the nodes in the
cluster are hosting some block or the other.
Why is this difference?? The trace of replicate() method shows only one
node. But, webUI shows multiple nod
Thank you yuanbosoft.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13444.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-
Thank you Raymond and Tobias.
Yeah, I am very clear about what I was asking. I was talking about
"replicated" rdd only. Now that I've got my understanding about job and
application validated, I wanted to know if we can replicate an rdd and run
two jobs (that need same rdd) of an application in par
Thank you Andrew for the updated link.
regards
Karthik
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Scheduling-in-spark-tp9035p9717.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thank you so much for the link, Sujeet.
regards
Karthik
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Scheduling-in-spark-tp9035p9716.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
25 matches
Mail list logo