Re: OLAP query using spark dataframe with cassandra
You can also evaluate Stratio Sparkta. It is a real time aggregation tool based on Spark Streaming. It is able to write in Cassandra and in other databases like MongoDB, Elasticsearch,... It is prepared to deploy this aggregations in Mesos so maybe it fits your necessities. There is no a query layer that could abstract the analytics part in OLAP but it is on the roadmap. Disclaimer: I work in this product -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/OLAP-query-using-spark-dataframe-with-cassandra-tp15082p15113.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: OLAP query using spark dataframe with cassandra
Hi, I'm in the same position right now: we are going to implement something like OLAP BI + Machine Learning explorations on the same cluster. Well, the question is quite ambivalent: from one hand, we have terabytes of versatile data and the necessity to make something like cubes (Hive and Hive on HBase are unsatisfactory). From the other, our users get accustomed to Tableau + Vertica. So, right now I consider the following choices: 1) Platfora (not free, I don't know price right now) + Spark 2) AtScale + Tableau(not free, I don't know price right now) + Spark 3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + some storage 4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka + Flume (has somebody use it in production?) 5) Spark + Tableau (cubes?) For myself, I decided not to dive into Mesos. Cassandra is hardly configurable, you'll have to dedicate special employee to support it. I'll be glad to hear other ideas & propositions as we are at the beginning of the process too. Sincerely yours, Tim Shenkao On 11/09/2015 09:46 AM, fightf...@163.com wrote: Hi, Thanks for suggesting. Actually we are now evaluating and stressing the spark sql on cassandra, while trying to define business models. FWIW, the solution mentioned here is different from traditional OLAP cube engine, right ? So we are hesitating on the common sense or direction choice of olap architecture. And we are happy to hear more use case from this community. Best, Sun. fightf...@163.com *From:* Jörn Franke <mailto:jornfra...@gmail.com> *Date:* 2015-11-09 14:40 *To:* fightf...@163.com <mailto:fightf...@163.com> *CC:* user <mailto:u...@spark.apache.org>; dev <mailto:dev@spark.apache.org> *Subject:* Re: OLAP query using spark dataframe with cassandra Is there any distributor supporting these software components in combination? If no and your core business is not software then you may want to look for something else, because it might not make sense to build up internal know-how in all of these areas. In any case - it depends all highly on your data and queries. You will have to do your own experiments. On 09 Nov 2015, at 07:02, "fightf...@163.com <mailto:fightf...@163.com>" <fightf...@163.com <mailto:fightf...@163.com>> wrote: Hi, community We are specially interested about this featural integration according to some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) seems good implementation for lambda architecure in the open-source world, especially non-hadoop based cluster environment. As we can see, the advantages obviously consist of : 1 the feasibility and scalability of spark datafram api, which can also make a perfect complement for Apache Cassandra native cql feature. 2 both streaming and batch process availability using the ALL-STACK thing, cool. 3 we can both achieve compacity and usability for spark with cassandra, including seemlessly integrating with job scheduling and resource management. Only one concern goes to the OLAP query performance issue, which mainly caused by frequent aggregation work between daily increased large tables, for both spark sql and cassandra. I can see that the [1] use case facilitates FiloDB to achieve columnar storage and query performance, but we had nothing more knowledge. Question is : Any guy had such use case for now, especially using in your production environment ? Would be interested in your architeture for designing this OLAP engine using spark + cassandra. What do you think the comparison between the scenario with traditional OLAP cube design? Like Apache Kylin or pentaho mondrian ? Best Regards, Sun. [1] http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark fightf...@163.com <mailto:fightf...@163.com>
Re: OLAP query using spark dataframe with cassandra
Some friends refer me this thread about OLAP/Kylin and Spark... Here's my 2 cents.. If you are trying to setup OLAP, Apache Kylin should be one good idea for you to evaluate. The project has developed more than 2 years and going to graduate to Apache Top Level Project [1]. There are many deployments on production already include eBay, Exponential, JD.com, VIP.com and others, refer to powered by page [2]. Apache Kylin's spark engine also on the way, there's discussion about turning the performance [3]. There are variety clients are available to interactive with Kylin with ANSI SQL, including Tableau, Zeppelin, Pentaho/mondrian, Saiku/mondrian, and the Excel/PowerBI support will roll out this week. Apache Kylin is young but mature with huge case validation (one biggest cube in eBay contains 85+B rows, 90%ile production platform's query latency in few seconds). StreamingOLAP is coming in Kylin v2.0 with plug-able architecture, there's already one real case on production inside eBay, please refer to our design deck [4] We are really welcome everyone to join and contribute to Kylin as OLAP engine for Big Data:-) Please feel free to contact our community or me for any question. Thanks. 1. http://s.apache.org/bah 2. http://kylin.incubator.apache.org/community/poweredby.html 3. http://s.apache.org/lHA 4. http://www.slideshare.net/lukehan/1-apache-kylin-deep-dive-streaming-and-plugin-architecture-apache-kylin-meetup-shanghai 5. http://kylin.io Best Regards! - Luke Han On Tue, Nov 10, 2015 at 2:56 AM, tsh <t...@timshenkao.su> wrote: > Hi, > > I'm in the same position right now: we are going to implement something > like OLAP BI + Machine Learning explorations on the same cluster. > Well, the question is quite ambivalent: from one hand, we have terabytes > of versatile data and the necessity to make something like cubes (Hive and > Hive on HBase are unsatisfactory). From the other, our users get accustomed > to Tableau + Vertica. > So, right now I consider the following choices: > 1) Platfora (not free, I don't know price right now) + Spark > 2) AtScale + Tableau(not free, I don't know price right now) + Spark > 3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + some > storage > 4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka + > Flume (has somebody use it in production?) > 5) Spark + Tableau (cubes?) > > For myself, I decided not to dive into Mesos. Cassandra is hardly > configurable, you'll have to dedicate special employee to support it. > > I'll be glad to hear other ideas & propositions as we are at the beginning > of the process too. > > Sincerely yours, Tim Shenkao > > > On 11/09/2015 09:46 AM, fightf...@163.com wrote: > > Hi, > > Thanks for suggesting. Actually we are now evaluating and stressing the > spark sql on cassandra, while > > trying to define business models. FWIW, the solution mentioned here is > different from traditional OLAP > > cube engine, right ? So we are hesitating on the common sense or direction > choice of olap architecture. > > And we are happy to hear more use case from this community. > > Best, > Sun. > > -- > fightf...@163.com > > > *From:* Jörn Franke <jornfra...@gmail.com> > *Date:* 2015-11-09 14:40 > *To:* fightf...@163.com > *CC:* user <u...@spark.apache.org>; dev <dev@spark.apache.org> > *Subject:* Re: OLAP query using spark dataframe with cassandra > > Is there any distributor supporting these software components in > combination? If no and your core business is not software then you may want > to look for something else, because it might not make sense to build up > internal know-how in all of these areas. > > In any case - it depends all highly on your data and queries. You will > have to do your own experiments. > > On 09 Nov 2015, at 07:02, "fightf...@163.com" <fightf...@163.com> wrote: > > Hi, community > > We are specially interested about this featural integration according to > some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) > > seems good implementation for lambda architecure in the open-source world, > especially non-hadoop based cluster environment. As we can see, > > the advantages obviously consist of : > > 1 the feasibility and scalability of spark datafram api, which can also > make a perfect complement for Apache Cassandra native cql feature. > > 2 both streaming and batch process availability using the ALL-STACK thing, > cool. > > 3 we can both achieve compacity and usability for spark with cassandra, > including seemlessly integrating with job scheduling and resource > management. > > Only one concern goes to the OLAP query performance issue, whic
Re: Re: OLAP query using spark dataframe with cassandra
Hi, According to my experience, I would recommend option 3) using Apache Kylin for your requirements. This is a suggestion based on the open-source world. For the per cassandra thing, I accept your advice for the special support thing. But the community is very open and convinient for prompt response. fightf...@163.com From: tsh Date: 2015-11-10 02:56 To: fightf...@163.com; user; dev Subject: Re: OLAP query using spark dataframe with cassandra Hi, I'm in the same position right now: we are going to implement something like OLAP BI + Machine Learning explorations on the same cluster. Well, the question is quite ambivalent: from one hand, we have terabytes of versatile data and the necessity to make something like cubes (Hive and Hive on HBase are unsatisfactory). From the other, our users get accustomed to Tableau + Vertica. So, right now I consider the following choices: 1) Platfora (not free, I don't know price right now) + Spark 2) AtScale + Tableau(not free, I don't know price right now) + Spark 3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + some storage 4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka + Flume (has somebody use it in production?) 5) Spark + Tableau (cubes?) For myself, I decided not to dive into Mesos. Cassandra is hardly configurable, you'll have to dedicate special employee to support it. I'll be glad to hear other ideas & propositions as we are at the beginning of the process too. Sincerely yours, Tim Shenkao On 11/09/2015 09:46 AM, fightf...@163.com wrote: Hi, Thanks for suggesting. Actually we are now evaluating and stressing the spark sql on cassandra, while trying to define business models. FWIW, the solution mentioned here is different from traditional OLAP cube engine, right ? So we are hesitating on the common sense or direction choice of olap architecture. And we are happy to hear more use case from this community. Best, Sun. fightf...@163.com From: Jörn Franke Date: 2015-11-09 14:40 To: fightf...@163.com CC: user; dev Subject: Re: OLAP query using spark dataframe with cassandra Is there any distributor supporting these software components in combination? If no and your core business is not software then you may want to look for something else, because it might not make sense to build up internal know-how in all of these areas. In any case - it depends all highly on your data and queries. You will have to do your own experiments. On 09 Nov 2015, at 07:02, "fightf...@163.com" <fightf...@163.com> wrote: Hi, community We are specially interested about this featural integration according to some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) seems good implementation for lambda architecure in the open-source world, especially non-hadoop based cluster environment. As we can see, the advantages obviously consist of : 1 the feasibility and scalability of spark datafram api, which can also make a perfect complement for Apache Cassandra native cql feature. 2 both streaming and batch process availability using the ALL-STACK thing, cool. 3 we can both achieve compacity and usability for spark with cassandra, including seemlessly integrating with job scheduling and resource management. Only one concern goes to the OLAP query performance issue, which mainly caused by frequent aggregation work between daily increased large tables, for both spark sql and cassandra. I can see that the [1] use case facilitates FiloDB to achieve columnar storage and query performance, but we had nothing more knowledge. Question is : Any guy had such use case for now, especially using in your production environment ? Would be interested in your architeture for designing this OLAP engine using spark + cassandra. What do you think the comparison between the scenario with traditional OLAP cube design? Like Apache Kylin or pentaho mondrian ? Best Regards, Sun. [1] http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark fightf...@163.com
Re: Re: OLAP query using spark dataframe with cassandra
Hi, Have you ever considered cassandra as a replacement ? We are now almost the seem usage as your engine, e.g. using mysql to store initial aggregated data. Can you share more about your kind of Cube queries ? We are very interested in that arch too : ) Best, Sun. fightf...@163.com From: Andrés Ivaldi Date: 2015-11-10 07:03 To: tsh CC: fightf...@163.com; user; dev Subject: Re: OLAP query using spark dataframe with cassandra Hi, I'm also considering something similar, Spark plain is too slow for my case, a possible solution is use Spark as Multiple Source connector and basic transformation layer, then persist the information (actually is a RDBM), after that with our engine we build a kind of Cube queries, and the result is processed again by Spark adding Machine Learning. Our Missing part is reemplace the RDBM with something more suitable and scalable than RDBM, dont care about pre processing information if after pre processing the queries are fast. Regards On Mon, Nov 9, 2015 at 3:56 PM, tsh <t...@timshenkao.su> wrote: Hi, I'm in the same position right now: we are going to implement something like OLAP BI + Machine Learning explorations on the same cluster. Well, the question is quite ambivalent: from one hand, we have terabytes of versatile data and the necessity to make something like cubes (Hive and Hive on HBase are unsatisfactory). From the other, our users get accustomed to Tableau + Vertica. So, right now I consider the following choices: 1) Platfora (not free, I don't know price right now) + Spark 2) AtScale + Tableau(not free, I don't know price right now) + Spark 3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + some storage 4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka + Flume (has somebody use it in production?) 5) Spark + Tableau (cubes?) For myself, I decided not to dive into Mesos. Cassandra is hardly configurable, you'll have to dedicate special employee to support it. I'll be glad to hear other ideas & propositions as we are at the beginning of the process too. Sincerely yours, Tim Shenkao On 11/09/2015 09:46 AM, fightf...@163.com wrote: Hi, Thanks for suggesting. Actually we are now evaluating and stressing the spark sql on cassandra, while trying to define business models. FWIW, the solution mentioned here is different from traditional OLAP cube engine, right ? So we are hesitating on the common sense or direction choice of olap architecture. And we are happy to hear more use case from this community. Best, Sun. fightf...@163.com From: Jörn Franke Date: 2015-11-09 14:40 To: fightf...@163.com CC: user; dev Subject: Re: OLAP query using spark dataframe with cassandra Is there any distributor supporting these software components in combination? If no and your core business is not software then you may want to look for something else, because it might not make sense to build up internal know-how in all of these areas. In any case - it depends all highly on your data and queries. You will have to do your own experiments. On 09 Nov 2015, at 07:02, "fightf...@163.com" <fightf...@163.com> wrote: Hi, community We are specially interested about this featural integration according to some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) seems good implementation for lambda architecure in the open-source world, especially non-hadoop based cluster environment. As we can see, the advantages obviously consist of : 1 the feasibility and scalability of spark datafram api, which can also make a perfect complement for Apache Cassandra native cql feature. 2 both streaming and batch process availability using the ALL-STACK thing, cool. 3 we can both achieve compacity and usability for spark with cassandra, including seemlessly integrating with job scheduling and resource management. Only one concern goes to the OLAP query performance issue, which mainly caused by frequent aggregation work between daily increased large tables, for both spark sql and cassandra. I can see that the [1] use case facilitates FiloDB to achieve columnar storage and query performance, but we had nothing more knowledge. Question is : Any guy had such use case for now, especially using in your production environment ? Would be interested in your architeture for designing this OLAP engine using spark + cassandra. What do you think the comparison between the scenario with traditional OLAP cube design? Like Apache Kylin or pentaho mondrian ? Best Regards, Sun. [1] http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark fightf...@163.com -- Ing. Ivaldi Andres
Re: OLAP query using spark dataframe with cassandra
Please consider using NoSQL engine such as hbase. Cheers > On Nov 9, 2015, at 3:03 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > > Hi, > I'm also considering something similar, Spark plain is too slow for my case, > a possible solution is use Spark as Multiple Source connector and basic > transformation layer, then persist the information (actually is a RDBM), > after that with our engine we build a kind of Cube queries, and the result is > processed again by Spark adding Machine Learning. > Our Missing part is reemplace the RDBM with something more suitable and > scalable than RDBM, dont care about pre processing information if after pre > processing the queries are fast. > > Regards > >> On Mon, Nov 9, 2015 at 3:56 PM, tsh <t...@timshenkao.su> wrote: >> Hi, >> >> I'm in the same position right now: we are going to implement something like >> OLAP BI + Machine Learning explorations on the same cluster. >> Well, the question is quite ambivalent: from one hand, we have terabytes >> of versatile data and the necessity to make something like cubes (Hive and >> Hive on HBase are unsatisfactory). From the other, our users get accustomed >> to Tableau + Vertica. >> So, right now I consider the following choices: >> 1) Platfora (not free, I don't know price right now) + Spark >> 2) AtScale + Tableau(not free, I don't know price right now) + Spark >> 3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + some >> storage >> 4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka + Flume >> (has somebody use it in production?) >> 5) Spark + Tableau (cubes?) >> >> For myself, I decided not to dive into Mesos. Cassandra is hardly >> configurable, you'll have to dedicate special employee to support it. >> >> I'll be glad to hear other ideas & propositions as we are at the beginning >> of the process too. >> >> Sincerely yours, Tim Shenkao >> >> >>> On 11/09/2015 09:46 AM, fightf...@163.com wrote: >>> Hi, >>> >>> Thanks for suggesting. Actually we are now evaluating and stressing the >>> spark sql on cassandra, while >>> >>> trying to define business models. FWIW, the solution mentioned here is >>> different from traditional OLAP >>> >>> cube engine, right ? So we are hesitating on the common sense or direction >>> choice of olap architecture. >>> >>> And we are happy to hear more use case from this community. >>> >>> Best, >>> Sun. >>> >>> fightf...@163.com >>> >>> From: Jörn Franke >>> Date: 2015-11-09 14:40 >>> To: fightf...@163.com >>> CC: user; dev >>> Subject: Re: OLAP query using spark dataframe with cassandra >>> >>> Is there any distributor supporting these software components in >>> combination? If no and your core business is not software then you may want >>> to look for something else, because it might not make sense to build up >>> internal know-how in all of these areas. >>> >>> In any case - it depends all highly on your data and queries. You will have >>> to do your own experiments. >>> >>> On 09 Nov 2015, at 07:02, "fightf...@163.com" <fightf...@163.com> wrote: >>> >>>> Hi, community >>>> >>>> We are specially interested about this featural integration according to >>>> some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) >>>> >>>> seems good implementation for lambda architecure in the open-source world, >>>> especially non-hadoop based cluster environment. As we can see, >>>> >>>> the advantages obviously consist of : >>>> >>>> 1 the feasibility and scalability of spark datafram api, which can also >>>> make a perfect complement for Apache Cassandra native cql feature. >>>> >>>> 2 both streaming and batch process availability using the ALL-STACK thing, >>>> cool. >>>> >>>> 3 we can both achieve compacity and usability for spark with cassandra, >>>> including seemlessly integrating with job scheduling and resource >>>> management. >>>> >>>> Only one concern goes to the OLAP query performance issue, which mainly >>>> caused by frequent aggregation work between daily increased large tables, >>>> for >>>> >>>> both spark sql and cassandra. I can see that the [1] use case facilitates >>>> FiloDB to achieve columnar storage and query performance, but we had >>>> nothing more >>>> >>>> knowledge. >>>> >>>> Question is : Any guy had such use case for now, especially using in your >>>> production environment ? Would be interested in your architeture for >>>> designing this >>>> >>>> OLAP engine using spark + cassandra. What do you think the comparison >>>> between the scenario with traditional OLAP cube design? Like Apache Kylin >>>> or >>>> >>>> pentaho mondrian ? >>>> >>>> Best Regards, >>>> >>>> Sun. >>>> >>>> >>>> [1] >>>> http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark >>>> >>>> fightf...@163.com > > > > -- > Ing. Ivaldi Andres
Re: OLAP query using spark dataframe with cassandra
Is there any distributor supporting these software components in combination? If no and your core business is not software then you may want to look for something else, because it might not make sense to build up internal know-how in all of these areas. In any case - it depends all highly on your data and queries. You will have to do your own experiments. > On 09 Nov 2015, at 07:02, "fightf...@163.com"wrote: > > Hi, community > > We are specially interested about this featural integration according to some > slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) > > seems good implementation for lambda architecure in the open-source world, > especially non-hadoop based cluster environment. As we can see, > > the advantages obviously consist of : > > 1 the feasibility and scalability of spark datafram api, which can also make > a perfect complement for Apache Cassandra native cql feature. > > 2 both streaming and batch process availability using the ALL-STACK thing, > cool. > > 3 we can both achieve compacity and usability for spark with cassandra, > including seemlessly integrating with job scheduling and resource management. > > Only one concern goes to the OLAP query performance issue, which mainly > caused by frequent aggregation work between daily increased large tables, for > > both spark sql and cassandra. I can see that the [1] use case facilitates > FiloDB to achieve columnar storage and query performance, but we had nothing > more > > knowledge. > > Question is : Any guy had such use case for now, especially using in your > production environment ? Would be interested in your architeture for > designing this > > OLAP engine using spark + cassandra. What do you think the comparison > between the scenario with traditional OLAP cube design? Like Apache Kylin or > > pentaho mondrian ? > > Best Regards, > > Sun. > > > [1] > http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark > > fightf...@163.com
Re: Re: OLAP query using spark dataframe with cassandra
Hi, Thanks for suggesting. Actually we are now evaluating and stressing the spark sql on cassandra, while trying to define business models. FWIW, the solution mentioned here is different from traditional OLAP cube engine, right ? So we are hesitating on the common sense or direction choice of olap architecture. And we are happy to hear more use case from this community. Best, Sun. fightf...@163.com From: Jörn Franke Date: 2015-11-09 14:40 To: fightf...@163.com CC: user; dev Subject: Re: OLAP query using spark dataframe with cassandra Is there any distributor supporting these software components in combination? If no and your core business is not software then you may want to look for something else, because it might not make sense to build up internal know-how in all of these areas. In any case - it depends all highly on your data and queries. You will have to do your own experiments. On 09 Nov 2015, at 07:02, "fightf...@163.com" <fightf...@163.com> wrote: Hi, community We are specially interested about this featural integration according to some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) seems good implementation for lambda architecure in the open-source world, especially non-hadoop based cluster environment. As we can see, the advantages obviously consist of : 1 the feasibility and scalability of spark datafram api, which can also make a perfect complement for Apache Cassandra native cql feature. 2 both streaming and batch process availability using the ALL-STACK thing, cool. 3 we can both achieve compacity and usability for spark with cassandra, including seemlessly integrating with job scheduling and resource management. Only one concern goes to the OLAP query performance issue, which mainly caused by frequent aggregation work between daily increased large tables, for both spark sql and cassandra. I can see that the [1] use case facilitates FiloDB to achieve columnar storage and query performance, but we had nothing more knowledge. Question is : Any guy had such use case for now, especially using in your production environment ? Would be interested in your architeture for designing this OLAP engine using spark + cassandra. What do you think the comparison between the scenario with traditional OLAP cube design? Like Apache Kylin or pentaho mondrian ? Best Regards, Sun. [1] http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark fightf...@163.com
OLAP query using spark dataframe with cassandra
Hi, community We are specially interested about this featural integration according to some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) seems good implementation for lambda architecure in the open-source world, especially non-hadoop based cluster environment. As we can see, the advantages obviously consist of : 1 the feasibility and scalability of spark datafram api, which can also make a perfect complement for Apache Cassandra native cql feature. 2 both streaming and batch process availability using the ALL-STACK thing, cool. 3 we can both achieve compacity and usability for spark with cassandra, including seemlessly integrating with job scheduling and resource management. Only one concern goes to the OLAP query performance issue, which mainly caused by frequent aggregation work between daily increased large tables, for both spark sql and cassandra. I can see that the [1] use case facilitates FiloDB to achieve columnar storage and query performance, but we had nothing more knowledge. Question is : Any guy had such use case for now, especially using in your production environment ? Would be interested in your architeture for designing this OLAP engine using spark + cassandra. What do you think the comparison between the scenario with traditional OLAP cube design? Like Apache Kylin or pentaho mondrian ? Best Regards, Sun. [1] http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark fightf...@163.com