Hi there,

we have a set of relatively light weight jobs that we would like to run
repeatedly on our Spark cluster.

The situation is as follows. we have a reliable source of data, a Cassandra
database. One table contains time series data, which we would like to
analyse. To do so we read a window of records defined by a start time stamp
and an end time stamp from the table and process all records of this
window. Since data is permanently added to the Cassandra table, we would
like to run the analysis job repeatedly, in fact every other second. Would
we have to keep re-submitting this Spark job every other second? How about
start-up times, if the expected job run time is just 2 seconds?

Or is there a way to define a long running job by surrounding the Java Job
we have with a  while(true) { run job) } ?

What's more the table contains data for several tenants (about 25) and from
several different sources. We would want to analyse those (tenants and
sources) using a job for each combination of tenant and source. This leads
to quite a few jobs. We expect something like 50 jobs.

We see that this is similar to Spark streaming. But we're not sure whether
we should use Spark streaming. If we have about 50 jobs running in
parallel, we would need 50 receivers. And each receiver would require 1
core. The jobs would also require quite a few cores, though we probably
don't need a core per job. But it easily adds up to 75 cores, which seems
quite a lot for the little processing we do.

I expect retrieving data repeatedly from a database table and analysing the
data with several jobs is a pretty standard situation in Spark
applications. But couldn't find anything about this in the docs or on the
internet.

Any ideas or hints would be very welcome.

Thanks a lot,

Stephan

-- 
Dr. Stephan Kepser | Senior IT-Consultant

codecentric AG | Merscheider Straße 1 | 42699 Solingen | Deutschland
tel: +49 (0) 212.23362845 | fax: +49 (0) 212.23362879 | mobil: +49 (0)
151.52883635
www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
www.more4fi.de

Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
Vorstand: Michael Hochgürtel, Mirko Novakovic, Rainer Vehns
Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz

Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
nicht gestattet.

Reply via email to