Re: Experiences about NoSQL databases with Spark

Jörn Franke Sat, 28 Nov 2015 23:58:49 -0800

I would not use MongoDB because it does not fit well into the Spark or Hadoop 
architecture. You can use it if your data amount is very small and already 
preaggregated, but this is a very limited use case. You can use Hbase or with 
future versions of Hive (if they use TEZ > 0.8) For interactive queries.
Hbase with Phoenix and hive offer standard sql interfaces and can easily be 
integrated with a web interface.
With Hive you can already today use the ORC and parquet format on HDFS. They 
support storage indexes and bloom filters to accelerate your queries. You could 
also just use HDFS with these storage formats.


Maybe you can elaborate more on data volumes and queries you want to do on the 
processed part? Is the processed data updated?

Depending on your use case/data another option for interactive queries are 
solr/elastic search for text analytics and titandb for interactive graph 
queries (it supports amongst others hbase as the storage layer). Of course 
there are some more (also commercial). Both offer REST interfaces and would be 
easy to integrate with a web application using JSON/ds3.js In some cases a 
relational database can make sense.



> On 24 Nov 2015, at 13:46, sparkuser2345 <hm.spark.u...@gmail.com> wrote:
> 
> I'm interested in knowing which NoSQL databases you use with Spark and what
> are your experiences. 
> 
> On a general level, I would like to use Spark streaming to process incoming
> data, fetch relevant aggregated data from the database, and update the
> aggregates in the DB based on the incoming records. The data in the DB
> should be indexed to be able to fetch the relevant data fast and to allow
> fast interactive visualization of the data. 
> 
> I've been reading about MongoDB+Spark and I've got the impression that there
> are some challenges in fetching data by indices and in updating documents,
> but things are moving so fast, so I don't know if these are relevant
> anymore. Do you find any benefit from using HBase with Spark as HBase is
> built on top of HDFS? 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Experiences-about-NoSQL-databases-with-Spark-tp25462.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Experiences about NoSQL databases with Spark

Reply via email to