subject:"Re\: Spark Pattern and Anti\-Pattern"

Re: Spark Pattern and Anti-Pattern

2016-02-02 Thread Lars Albertsson

Querying a service or a database from a Spark job is in most cases an anti-pattern, but there are exceptions. The jobs become unstable and indeterministic by relying on a live database. The recommended pattern is to take regular dumps of the database to your cluster storage, e.g. HDFS, and join th

Re: Spark Pattern and Anti-Pattern

2016-01-26 Thread Jörn Franke

Spark has its best use cases in in-memory batch processing / machine learning. Connecting multiple different sources/destination requires some thinking and probably more than spark. Connecting spark to a database makes only in very few cases sense. You will have huge performance issues due to th