So basically you will store that files to HDFS and use Spark to process it ?
On Sun, Oct 23, 2016 at 6:03 PM, Joaquin Alzola <joaquin.alz...@lebara.com> wrote: > > > I think what Ali mentions is correct: > > If you need a lot of queries that require joins, or complex analytics of > the kind that Cassandra isn't suited for, then HDFS / HBase may be better. > > > > We have files in which one line contains 500 fields (separated by pipe) > and each of this fields is particularly important. > > Cassandra will not manage that since you will need 500 indexes. HDFS is > the proper way. > > > > > > *From:* Welly Tambunan [mailto:if05...@gmail.com] > *Sent:* 23 October 2016 10:19 > *To:* user@cassandra.apache.org > *Subject:* Re: Hadoop vs Cassandra > > > > I like muti data centre resillience in cassandra. > > I think thats plus one for cassandra. > > Ali, complex analytics can be done in spark right? > > On 23 Oct 2016 4:08 p.m., "Ali Akhtar" <ali.rac...@gmail.com> wrote: > > > > > > I would say it depends on your use case. > > > > If you need a lot of queries that require joins, or complex analytics of > the kind that Cassandra isn't suited for, then HDFS / HBase may be better. > > > > If you can work with the cassandra way of doing things (creating new > tables for each query you'll need to do, duplicating data - doing extra > writes for faster reads) , then Cassandra should work for you. It is easier > to setup and do dev ops with, in my experience. > > > > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan <if05...@gmail.com> > wrote: > > >> > > >> I mean. HDFS and HBase. > >> > >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar <ali.rac...@gmail.com> > wrote: > > >>> > > >>> By Hadoop do you mean HDFS? > >>> > >>> > >>> > >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan <if05...@gmail.com> > wrote: > > >>>> > > >>>> Hi All, > >>>> > >>>> I read the following comparison between hadoop and cassandra. Seems > the conclusion that we use hadoop for data lake ( cold data ) and Cassandra > for hot data (real time data). > >>>> > >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop > <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop> > >>>> > >>>> My question is, can we just use cassandra to rule them all ? > >>>> > >>>> What we are trying to achieve is to minimize the moving part on our > system. > >>>> > >>>> Any response would be really appreciated. > >>>> > >>>> > >>>> Cheers > >>>> > >>>> -- > >>>> Welly Tambunan > >>>> Triplelands > >>>> > >>>> http://weltam.wordpress.com <http://weltam.wordpress.com> > >>>> http://www.triplelands.com <http://www.triplelands.com/blog/> > >>> > >>> > >> > >> > >> > >> -- > >> Welly Tambunan > >> Triplelands > >> > >> http://weltam.wordpress.com <http://weltam.wordpress.com> > >> http://www.triplelands.com <http://www.triplelands.com/blog/> > > > > > This email is confidential and may be subject to privilege. If you are not > the intended recipient, please do not copy or disclose its content but > contact the sender immediately upon receipt. > -- Welly Tambunan Triplelands http://weltam.wordpress.com http://www.triplelands.com <http://www.triplelands.com/blog/>