Java/Spark Library for interacting with Spark API
Hi, Does anyone know of a Java/Scala library (not simply a HTTP library) for interacting with Spark through its REST/HTTP API? My “problem” is that interacting through REST induces a lot of work mapping the JSON to sensible Spark/Scala objects. So a simple example, I hope there is a library which allows me to do something like this (not a prerequisite, only as example): sparkHost(“10.0.01”).getApplications().first().getJobs().first().status In broader scope, is using the REST API the only way to retrieve information from Spark by a different (JVM) process? Regards, Hans -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Java-Spark-Library-for-interacting-with-Spark-API-tp26353.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: CREATE TABLE ignores database when using PARQUET option
I'm having the same problem, did you solve this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CREATE-TABLE-ignores-database-when-using-PARQUET-option-tp22824p24679.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Scheduler delay vs. Getting result time
Hi, In the Spark UI, under “Show additional metrics”, there are two extra metrics you can show .1 Scheduler delay .2 and Getting result time When hovering “Scheduler Delay it says (among other things): …time to send task result from executor… When hovering “Getting result time”: Time that the driver spends fetching task results from workers. What are the differences between the two? In my case I’m benchmarking with some sleep commands and returning some big arrays, per task, to emulate execution time and network communication respectively. I can’t see any “Getting Result Time” increases, they are simple 0ms. I’m using a ‘collect’ command and can see the synthetic result arrays when I use a spark-shell. Regards, Hans -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scheduler-delay-vs-Getting-result-time-tp23752.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark shell crumbles after memory is full
I'm running a query from the BigDataBenchmark, query 1B to be precise. When running this with Spark (1.3.1)+ mesos(0.21) in coarse grained mode with 5 mesos slave, through a spark shell, all is well. However rerunning the query a few times: scala sqlContext.sql(SELECT pageURL, pageRank FROM rankings where pageRank 100).collect Builds up loads of memory for the spark-shell process. Up to the point that 19GB (spark.driver.memory=30GB) is full and then the same collect of the above query goes from, approx 10s to 40+s with obvious stalls (garbage collection). I'm I doing something wrong? Why isn't spark releasing the results' memory, I'm not saving them anywhere using the .collect am I? I'm loading in the following file, and then execute its _loadRankings_ method: http://pastebin.com/rzJmWDxJ Hope someone can clearify this. PS java 1.7.0 is used, if more environment info is needed, please let me know. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-shell-crumbles-after-memory-is-full-tp23533.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark on Mesos fine-grained - has one core less per executor
I'm doing a performance analysis for Spark on Mesos and I can see that the Coarse-grained backend simply launches tasks in wave size of the amount of cores available. But it seems Fine-grained mode the Mesos executor takes 1 core for itself (so -1 core per mesos slave). Shouldn't fine- and coarse-grained mode have the same behaviour? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Mesos-fine-grained-has-one-core-less-per-executor-tp23273.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos cluster
Well that are only the logs of the slaves on mesos level, I'm not sure from your reply if you can ssh into a specific slave or not, if you can, you should look at actual output of the application (spark in this case) on a slave in e.g. /tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948/std{err,out} actual UUIDs, run number (in this example '4') in the path can differ from slave-node to slave-node. look into those stderr and stdout files and you'll probably have your answer why it is failing. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22319.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos cluster
Hi, What do the mesos slave logs say? Usually this gives a clearcut error, they are probably local on a slave node. I'm not sure about your config, so I can;t pinpoint you to a specific path. might look something like: /???/mesos/slaves/20150213-092641-84118794-5050-14978-S0/frameworks/20150329-232522-84118794-5050-18181-/executors/5/runs/latest/stderr -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22280.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Recreating the Mesos/Spark paper's experiments
Hi all, For my master thesis I will be characterising performance of two-level schedulers like Mesos and after reading the paper: https://www.cs.berkeley.edu/~alig/papers/mesos.pdf where Spark is also introduced I am wondering how some experiments and results came about. If this is not the place to ask these questions, or someone knows better places, please let me know. I am wondering if the experiment could show the same results if we would use the current release of Spark, because in the macro-benchmarks (Fig. 5c), we can see 4 instances (though the text talks of of 5 instances) of Spark applications being run. During 1 instance Sparks seems to elastically grow especially between [0,200] and [900,1100]. Already this would be problematic to recreate in current Spark on Mesos, because once an application context starts, it 1) allocates all available nodes in the cluster and does not scale up or down during that application’s lifetime in CoarseGrained mode — or 2) it allocates all memory, and does not release it, though it scales up and down with regard to CPUs in FineGrained mode. Even in FineGrained mode it would not work well if there are other frameworks who need a lot of memory, because they simply wouldn’t be able to allocate it, because even during idle times of a spark application, the cluster’s memory is taken. Of course we could limit the memory usage, but this defeats the purpose of having Mesos. Does someone know, 1) Was there a memory limit for Spark during the experiments in the paper (and thus was the nowadays FineGrained mode chosen), so that other frameworks would also be able to run? or 2) Was the Spark architecture vastly different back then? Any other remarks, even anecdotal, are very welcome Hans -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Recreating-the-Mesos-Spark-paper-s-experiments-tp22252.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org