Java/Spark Library for interacting with Spark API

2016-02-28 Thread hbogert
Hi, 

Does anyone know of a Java/Scala library (not simply a HTTP library) for
interacting with Spark through its REST/HTTP API? My “problem” is that
interacting through REST induces a lot of work mapping the JSON to sensible
Spark/Scala objects. 

So a simple example, I hope there is a library which allows me to do
something like this (not a prerequisite, only as example):

   sparkHost(“10.0.01”).getApplications().first().getJobs().first().status

In broader scope, is using the REST API the only way to retrieve information
from Spark by a different (JVM) process? 

Regards,

Hans



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Java-Spark-Library-for-interacting-with-Spark-API-tp26353.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: CREATE TABLE ignores database when using PARQUET option

2015-09-13 Thread hbogert
I'm having the same problem, did you solve this?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/CREATE-TABLE-ignores-database-when-using-PARQUET-option-tp22824p24679.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Scheduler delay vs. Getting result time

2015-07-09 Thread hbogert
Hi, 

In the Spark UI, under “Show additional metrics”, there are two extra
metrics you can show 
.1 Scheduler delay
.2 and Getting result time

When hovering “Scheduler Delay it says (among other things):
…time to send task result from executor…

When hovering “Getting result time”:
Time that the driver spends fetching task results from workers.

What are the differences between the two?

In my case I’m benchmarking with some sleep commands and returning some big
arrays, per task, to emulate execution time and network communication
respectively. I can’t see any “Getting Result Time” increases, they are
simple 0ms. I’m using a ‘collect’ command and can see the synthetic result
arrays when I use a spark-shell.

Regards,

Hans



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Scheduler-delay-vs-Getting-result-time-tp23752.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark shell crumbles after memory is full

2015-06-29 Thread hbogert
I'm running a query from the BigDataBenchmark, query 1B to be precise.

When running this with Spark (1.3.1)+ mesos(0.21) in coarse grained mode
with 5 mesos slave, through a spark shell, all is well.
However rerunning the query a few times:
scala sqlContext.sql(SELECT pageURL, pageRank FROM rankings where
pageRank  100).collect
Builds up loads of memory for the spark-shell process. Up to the point that
19GB (spark.driver.memory=30GB) is full and then the same collect of the
above query goes from, approx 10s to 40+s with obvious stalls (garbage
collection).
I'm I doing something wrong? Why isn't spark releasing the results' memory,
I'm not saving them anywhere using the .collect am I?

I'm loading in the following file, and then execute its _loadRankings_
method:
http://pastebin.com/rzJmWDxJ


Hope someone can clearify this.


PS 
java 1.7.0 is used, if more environment info is needed, please let me know.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-shell-crumbles-after-memory-is-full-tp23533.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark on Mesos fine-grained - has one core less per executor

2015-06-11 Thread hbogert
I'm doing a performance analysis for Spark on Mesos and I can see that the
Coarse-grained backend simply launches tasks in wave size of the amount of
cores available. But it seems Fine-grained mode the Mesos executor takes 1
core for itself (so -1 core per mesos slave). Shouldn't fine- and
coarse-grained mode have the same behaviour?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Mesos-fine-grained-has-one-core-less-per-executor-tp23273.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't run spark-submit with an application jar on a Mesos cluster

2015-03-31 Thread hbogert
Well that are only the logs of the slaves on mesos level,  I'm not sure from
your reply if you can ssh into a specific slave or not, if you can, you
should  look at actual output of the application (spark in this case) on a
slave in e.g.
 
/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948/std{err,out}

actual UUIDs, run number (in this example '4') in the path can differ from
slave-node to slave-node.

look into those stderr and stdout files and you'll probably have your answer
why it is failing.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22319.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't run spark-submit with an application jar on a Mesos cluster

2015-03-29 Thread hbogert
Hi, 

What do the mesos slave logs say? Usually this gives a clearcut error, they
are probably local on a slave node.

I'm not sure about your config, so I can;t pinpoint you to a specific path.

might look something like:

/???/mesos/slaves/20150213-092641-84118794-5050-14978-S0/frameworks/20150329-232522-84118794-5050-18181-/executors/5/runs/latest/stderr





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22280.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Recreating the Mesos/Spark paper's experiments

2015-03-26 Thread hbogert
Hi all, 

For my master thesis I will be characterising performance of two-level
schedulers like Mesos and after reading the paper:
 https://www.cs.berkeley.edu/~alig/papers/mesos.pdf
where Spark is also introduced I am wondering how some experiments and
results came about. 
If this is not the place to ask these questions, or someone knows better
places, please let me know.

I am wondering if the experiment could show the same results if we would use
the current release of Spark, because in
the macro-benchmarks (Fig. 5c), we can see 4 instances (though the text
talks of of 5 instances) of Spark applications being run. 
During 1 instance Sparks seems to elastically grow especially between
[0,200] and [900,1100].

Already this would be problematic to recreate in current Spark on Mesos,
because once an application context starts, it 1) allocates 
all available nodes in the cluster and does not scale up or down during that
application’s lifetime in CoarseGrained mode — or 2) it 
allocates all memory, and does not release it, though it scales up and down
with regard to CPUs in FineGrained mode. 

Even in FineGrained mode it would not work well if there are other
frameworks who need a lot of memory, because they simply wouldn’t be able
to allocate it, because even during idle times of a spark application, the
cluster’s memory is taken. Of course we could limit the memory usage, but
this defeats the purpose of having Mesos.

Does someone know, 
1) Was there a memory limit for Spark during the experiments in the paper
(and thus was the nowadays FineGrained mode chosen), so that other
frameworks would also be able to run?
 or 
2) Was the Spark architecture vastly different back then?

Any other remarks, even anecdotal, are very welcome

Hans



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Recreating-the-Mesos-Spark-paper-s-experiments-tp22252.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org