Hi there, we have a set of relatively light weight jobs that we would like to run repeatedly on our Spark cluster.
The situation is as follows. we have a reliable source of data, a Cassandra database. One table contains time series data, which we would like to analyse. To do so we read a window of records defined by a start time stamp and an end time stamp from the table and process all records of this window. Since data is permanently added to the Cassandra table, we would like to run the analysis job repeatedly, in fact every other second. Would we have to keep re-submitting this Spark job every other second? How about start-up times, if the expected job run time is just 2 seconds? Or is there a way to define a long running job by surrounding the Java Job we have with a while(true) { run job) } ? What's more the table contains data for several tenants (about 25) and from several different sources. We would want to analyse those (tenants and sources) using a job for each combination of tenant and source. This leads to quite a few jobs. We expect something like 50 jobs. We see that this is similar to Spark streaming. But we're not sure whether we should use Spark streaming. If we have about 50 jobs running in parallel, we would need 50 receivers. And each receiver would require 1 core. The jobs would also require quite a few cores, though we probably don't need a core per job. But it easily adds up to 75 cores, which seems quite a lot for the little processing we do. I expect retrieving data repeatedly from a database table and analysing the data with several jobs is a pretty standard situation in Spark applications. But couldn't find anything about this in the docs or on the internet. Any ideas or hints would be very welcome. Thanks a lot, Stephan -- Dr. Stephan Kepser | Senior IT-Consultant codecentric AG | Merscheider Straße 1 | 42699 Solingen | Deutschland tel: +49 (0) 212.23362845 | fax: +49 (0) 212.23362879 | mobil: +49 (0) 151.52883635 www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | www.more4fi.de Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal Vorstand: Michael Hochgürtel, Mirko Novakovic, Rainer Vehns Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.