Hello Vincent, Cassandra may not fit my bill if I need to define my partition and other indexes upfront. Is this right?
Hello Richard, Let me evaluate Apache Ignite. I did evaluate it 3 months back and back then the connector to Apache Spark did not support Spark 2.0. Another drastic thought may be repartition the result count to 1 (but have to be cautions on making sure I don't run into Heap issues if the result is too large to fit into an executor) and write to a relational database like mysql / postgres. But, I believe I can do the same using ElasticSearch too. A slightly over-kill solution may be Spark to Kafka to ElasticSearch? More thoughts welcome please. Thanks, Muthu On Wed, Mar 15, 2017 at 4:53 AM, Richard Siebeling <rsiebel...@gmail.com> wrote: > maybe Apache Ignite does fit your requirements > > On 15 March 2017 at 08:44, vincent gromakowski < > vincent.gromakow...@gmail.com> wrote: > >> Hi >> If queries are statics and filters are on the same columns, Cassandra is >> a good option. >> >> Le 15 mars 2017 7:04 AM, "muthu" <bablo...@gmail.com> a écrit : >> >> Hello there, >> >> I have one or more parquet files to read and perform some aggregate >> queries >> using Spark Dataframe. I would like to find a reasonable fast datastore >> that >> allows me to write the results for subsequent (simpler queries). >> I did attempt to use ElasticSearch to write the query results using >> ElasticSearch Hadoop connector. But I am running into connector write >> issues >> if the number of Spark executors are too many for ElasticSearch to handle. >> But in the schema sense, this seems a great fit as ElasticSearch has >> smartz >> in place to discover the schema. Also in the query sense, I can perform >> simple filters and sort using ElasticSearch and for more complex >> aggregate, >> Spark Dataframe can come back to the rescue :). >> Please advice on other possible data-stores I could use? >> >> Thanks, >> Muthu >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Fast-write-datastore-tp28497.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >> >