Re: Spark Pattern and Anti-Pattern

2016-02-02 Thread Lars Albertsson
Querying a service or a database from a Spark job is in most cases an anti-pattern, but there are exceptions. The jobs become unstable and indeterministic by relying on a live database. The recommended pattern is to take regular dumps of the database to your cluster storage, e.g. HDFS, and join th

Re: Spark Pattern and Anti-Pattern

2016-01-26 Thread Jörn Franke
Spark has its best use cases in in-memory batch processing / machine learning. Connecting multiple different sources/destination requires some thinking and probably more than spark. Connecting spark to a database makes only in very few cases sense. You will have huge performance issues due to th