Does fair scheduling preempt running tasks?

2014-02-11 Thread Mingyu Kim
Does fair scheduling in Spark (http://spark.incubator.apache.org/docs/latest/job-scheduling.html#schedulin g-within-an-application) preempt running tasks if a job with higher priority is submitted? If not, is this part of the plan at some point? Thanks! Mingyu smime.p7s Description: S/MIME cry

Clean up app metadata on worker nodes

2014-02-05 Thread Mingyu Kim
After creating a lot of Spark connections, work/app-* folders in Worker nodes keep getting created without any clean-up being done. This particularly becomes a problem when the Spark driver programs ship jars or files. Is there any way to garbage collect these without manually deleting them? Thanks

Row order of RDDs

2014-01-29 Thread Mingyu Kim
Here¹s my understanding of row order guarantees by RDD in the context of limit() and collect(). Can someone confirm this? * sparkContext.parallelize(myList) returns an RDD that may have a different row order than myList. * Every RDD loaded with the same file in HDFS (e.g. sparkContext.textFile(³hdf

Re: Suggestion for ec2 script

2014-01-24 Thread Mingyu Kim
solution might be to create your own setup script to run on the instances after ³start². Matei On Jan 24, 2014, at 2:19 PM, Mingyu Kim wrote: > Hi all, > > I found it confusing that "./spark-ec2 start² actually reinstalls the cluster, > which ends up wiping out all the configur

Suggestion for ec2 script

2014-01-24 Thread Mingyu Kim
Hi all, I found it confusing that "./spark-ec2 start² actually reinstalls the cluster, which ends up wiping out all the configurations. How about renaming ³start² to ³install² and add a real light-weight ³start² for frequently starting and stopping ec2 instances for mostly cost reasons? The light-

Re: Is SparkContext.stop() optional or required?

2014-01-23 Thread Mingyu Kim
tei On Jan 23, 2014, at 11:16 AM, Mingyu Kim wrote: > Hi all, > > How important is it to call stop() when the process which started the > SparkContext is dying anyways? Will I see resource leaks if I don¹t? > > Mingyu smime.p7s Description: S/MIME cryptographic signature

Is SparkContext.stop() optional or required?

2014-01-23 Thread Mingyu Kim
Hi all, How important is it to call stop() when the process which started the SparkContext is dying anyways? Will I see resource leaks if I don¹t? Mingyu smime.p7s Description: S/MIME cryptographic signature

How to clean up jars on worker nodes

2014-01-21 Thread Mingyu Kim
Hi all, I¹d like the added jars on worker nodes (i.e. SparkContext.addJar()) to be cleaned up on tear down. However, SparkContext.stop() doesn¹t seem to delete them. What would be the best way to clear them? Or, is there an easy way to add this functionality? Mingyu smime.p7s Description: S/M

Gathering exception stack trace

2014-01-20 Thread Mingyu Kim
Hi all, I¹m having hard time trying to find out ways to report exception that happens during computation to the end-user of Spark system without having them ssh into the worker nodes or accessing Spark UI. For example, if some exception happens in the code that runs on worker nodes (e.g. IllegalSt

master and scala-2.10 merge

2013-11-30 Thread Mingyu Kim
Hi, Scala-2.10 branch seems to have been kept out of sync with master. Can I request a merge with master? I especially would like to have the ³job group² changes (https://github.com/apache/incubator-spark/pull/29 and https://github.com/apache/incubator-spark/pull/74). Also, is there any timeline

Re: Multiple SparkContexts in one JVM

2013-11-20 Thread Mingyu Kim
>> However, if you are just after spark query concurrency, spark 0.8 seems to be >> supporting concurrent (reentrant) requests to the same session >> (SparkContext). One should also be able to use FAIR scheduler in this case it >> seems (at least that's what i request). So

Re: Job cancellation

2013-11-20 Thread Mingyu Kim
/incubator-spark/pull/190> . On Wed, Nov 20, 2013 at 3:39 AM, Mingyu Kim wrote: > Hi all, > > Cancellation seems to be supported at application level. In other words, you > can call stop() on your instance of SparkContext in order to stop the > computation associated with the Spark

Job cancellation

2013-11-20 Thread Mingyu Kim
Hi all, Cancellation seems to be supported at application level. In other words, you can call stop() on your instance of SparkContext in order to stop the computation associated with the SparkContext. Is there any way to cancel a job? (To be clear, job is "a parallel computation consisting of mult

Multiple SparkContextx in one JVM

2013-11-20 Thread Mingyu Kim
Hi all, I¹ve been searching to find out the current status of the multiple SparkContext support in one JVM. I found https://groups.google.com/forum/#!topic/spark-developers/GLx8yunSj0A and https://groups.google.com/forum/#!topic/spark-users/cOYP96I668I. According to the threads, I should be able t

Re: How to exclude a library from "sbt assembly"

2013-10-30 Thread Mingyu Kim
NmWnbd3 >eEJ9hVUdMk%3D%0A&m=ZTFNyaCyeYcrRQk9a5LSvYYYKFjWEAvdrxCsh2naFOM%3D%0A&s=552 >138bf1348ecb763f024a55626cf819d5ec00d095653404702683468f589ad, it seems >you can add the following into extraAssemblySettings: > >assemblyOption in assembly ~= { _.copy(includeScala = false) } > >Matei > >On O

How to exclude a library from "sbt assembly"

2013-10-30 Thread Mingyu Kim
Hi, In order to work around the library dependency problem, I¹d like to build the spark jar such that it doesn¹t contain certain libraries. I will import the libraries separately and have them available at runtime. More specifically, I¹d like to remove scala-2.9.3 out of the spark jar built by ³sb

Re: Spark dependency library causing problems with conflicting versions at import

2013-10-08 Thread Mingyu Kim
Thanks for the response! I'll try out the 2.10 branch. That seems to be the best bet for now. Btw, how does updating maven file do the private namespacing? We've been trying out jarjar (https://code.google.com/p/jarjar/), but as you mentioned, reflection has been biting us painfully so far. I'm no

Spark dependency library causing problems with conflicting versions at import

2013-10-07 Thread Mingyu Kim
Hi all, I'm trying to use spark in our existing code base. However, a lot of spark dependencies are not updated to the latest versions and they conflict with our versions of the libraries. Most notably scala-2.9.2 and scala-2.10.1. Have people run into these problems before? How did you work aroun

Re: Sort order of RDD rows

2013-10-03 Thread Mingyu Kim
#x27;t is when you change the RDD's partitioner, e.g. by doing sortByKey or groupByKey. It would definitely be good to document this more formally. Matei On Oct 3, 2013, at 3:33 PM, Mingyu Kim wrote: > Hi all, > > Is the sort order guaranteed if you apply operations like map(), f

Sort order of RDD rows

2013-10-03 Thread Mingyu Kim
Hi all, Is the sort order guaranteed if you apply operations like map(), filter() or distinct() after sort in a distributed setting (run on a cluster of machines backed by HDFS)? In other words, does rdd.sortByKey().map() have the same sort order as rdd.sortByKey()? If so, is it documented somewhe