removing dev@ from this conversation, as the thread is more appropriately for user@
On Mon, Jun 12, 2017 at 4:51 AM, Eduardo Alonso <eduardoalo...@stratio.com> wrote: > -Virtual tokens are not recommended when using SOLR or > cassandra-lucene-index. > > If you use your table schema you will not have any problem with partition > size because your table is *not* a WIDE row table (it does not have > clustering keys) > The limit for 1 record with those 15 or 20 columns must not be larger that > 100MB. You will have enough. > > Eduardo Alonso > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > 28224 Pozuelo de Alarcón, Madrid > Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd > <https://twitter.com/StratioBD>* > > 2017-06-12 12:36 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.com>: > > > And due to single table videos, maybe it will go with around 15,20 > > columns, then we need to also think very carefully about partition sizes > > also. > > > > On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@ <nandanpriyadarshi298@gmail. > com> > > wrote: > > > >> Yes this is only Option I am also thinking like this as my second > >> options. Before this I was thinking to do denormalize table based on > search > >> columns, but due to partial search this will be not that effective. > >> > >> Now suppose , if we are going with this single table as videos. and > >> implemented with Solr/Lucene, then need to also care about num_tokens ? > >> > >> > >> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso < > >> eduardoalo...@stratio.com> wrote: > >> > >>> Using cassandra collections > >>> > >>> CREATE TABLE videos ( > >>> videoid uuid primary key, > >>> title text, > >>> actor list<text>, > >>> producer list<text>, > >>> release_date timestamp, > >>> description text, > >>> music text, > >>> etc... > >>> ); > >>> > >>> When using collection you need to take care of its length. Collections > >>> are designed to store > >>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_ > collections_c.html>only > >>> a small amount of data > >>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_ > collections_c.html> > >>> . > >>> 5/10 actors per movie is ok. > >>> > >>> > >>> Eduardo Alonso > >>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta > >>> 28224 Pozuelo de Alarcón, Madrid > >>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // > *@stratiobd > >>> <https://twitter.com/StratioBD>* > >>> > >>> 2017-06-12 11:54 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.com>: > >>> > >>>> So In short we have to go with one single table as videos and put > >>>> primary key as videoid uuid. > >>>> But then how can we able to handle multiple actor name and producer > >>>> name. ? > >>>> > >>>> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso < > >>>> eduardoalo...@stratio.com> wrote: > >>>> > >>>>> Yes, you are right. > >>>>> > >>>>> Table denormalization is useful just when you have unique primary > >>>>> keys, not your case. > >>>>> Denormalized tables are only different in its primary key, every > >>>>> denormalized table contains all the data (it just change how it is > >>>>> structured). So, if you need to index it, do it with just one table > (the > >>>>> one you showed us with videoid as the primary key is ok). > >>>>> > >>>>> Solr, Elastic and cassandra-lucene-index are both based on Lucene and > >>>>> all of them fulfill all your needs. > >>>>> > >>>>> Solr (in DSE) and cassandra-lucene-index > >>>>> <https://github.com/stratio/cassandra-lucene-index> are very well > >>>>> integrated with cassandra using its secondary index interface. If you > >>>>> choose elastic search you will need to code the integration (write > mutex, > >>>>> both cluster synchronization (imagine something written in cassandra > but > >>>>> failed to write in elastic)) > >>>>> > >>>>> I know i am not the most suitable to recommend you to use our product > >>>>> cassandra-lucene-index > >>>>> <https://github.com/stratio/cassandra-lucene-index> but it is open > >>>>> source, just take a look. > >>>>> > >>>>> Eduardo Alonso > >>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta > >>>>> 28224 Pozuelo de Alarcón, Madrid > >>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com > // *@stratiobd > >>>>> <https://twitter.com/StratioBD>* > >>>>> > >>>>> 2017-06-12 11:18 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.com > >: > >>>>> > >>>>>> Hi Eduardo, > >>>>>> > >>>>>> And As we are trying to build an advanced search functionality in > >>>>>> which we can able to do partial search based on actor, producer, > director, > >>>>>> etc. columns. > >>>>>> So if we do denormalization of tables then we have to create tables > >>>>>> such as below :- > >>>>>> video_by_actor > >>>>>> video_by_producer > >>>>>> video_by_director > >>>>>> video_by_date > >>>>>> etc.. > >>>>>> By using denormalized, Cassandra only allows us to do equality > >>>>>> search, but for implementing Partial search we need to implement > solr on > >>>>>> all above tables. > >>>>>> > >>>>>> This is my thinking, but I think this will be not correct way to > >>>>>> implement Apache Solr on all tables. > >>>>>> > >>>>>> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ < > >>>>>> nandanpriyadarshi...@gmail.com> wrote: > >>>>>> > >>>>>>> Hi Edurado, > >>>>>>> > >>>>>>> As you mentioned queries 1-6 , > >>>>>>> In this condition, we have to proceed with a table like as below :- > >>>>>>> create table videos ( > >>>>>>> videoid uuid primary key, > >>>>>>> title text, > >>>>>>> actor text, > >>>>>>> producer text, > >>>>>>> release_date timestamp, > >>>>>>> description text, > >>>>>>> music text, > >>>>>>> etc... > >>>>>>> ); > >>>>>>> This table will help to store video datas based on PK videoid and > >>>>>>> will give uniqeness due to uuid. > >>>>>>> But as we know , in one movie there are multiple actor, multiple > >>>>>>> producer, multiple music worked, So how can we store all these.. > Only one > >>>>>>> option will left as to use collection type columns. > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso < > >>>>>>> eduardoalo...@stratio.com> wrote: > >>>>>>> > >>>>>>>> TLDR shouldBe *PD > >>>>>>>> > >>>>>>>> Eduardo Alonso > >>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta > >>>>>>>> 28224 Pozuelo de Alarcón, Madrid > >>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // > www.stratio.com > >>>>>>>> // *@stratiobd <https://twitter.com/StratioBD>* > >>>>>>>> > >>>>>>>> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso < > >>>>>>>> eduardoalo...@stratio.com>: > >>>>>>>> > >>>>>>>>> Hi Nandan: > >>>>>>>>> > >>>>>>>>> So, your system must provide these queries: > >>>>>>>>> > >>>>>>>>> 1 - SELECT video FROM ... WHERE actor = '...'; > >>>>>>>>> 2 - SELECT video FROM ... WHERE producer = '...'; > >>>>>>>>> 3 - SELECT video FROM ... WHERE music = '...'; > >>>>>>>>> 4 - SELECT video FROM ... WHERE actor = '...' AND producer > ='...'; > >>>>>>>>> 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...'; > >>>>>>>>> 6 - SELECT video WHERE title CONTAINS 'Harry'; > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> For queries 1-5 you can get them with just cassandra, > >>>>>>>>> denormalizing tables just the way your mentioned but without > solr, just > >>>>>>>>> cassandra (Indeed, just for equality clauses) > >>>>>>>>> > >>>>>>>>> video_by_actor; > >>>>>>>>> video_by_producer; > >>>>>>>>> video_by_music; > >>>>>>>>> video_by_actor_and_producer; > >>>>>>>>> video_by_actor_and_music; > >>>>>>>>> > >>>>>>>>> For queries number 6 you need a search engine. > >>>>>>>>> > >>>>>>>>> SOL > >>>>>>>>> ElasticSearch > >>>>>>>>> cassandra-lucene-index > >>>>>>>>> <https://github.com/stratio/cassandra-lucene-index> > >>>>>>>>> SASI > >>>>>>>>> <http://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/ > cql_commands/cqlCreateCustomIndex.html> > >>>>>>>>> > >>>>>>>>> I think, just for your query, the easiest way to get it is to > >>>>>>>>> build a SASI index. > >>>>>>>>> TLDR: I work for stratio in cassandra-lucene-index but for your > >>>>>>>>> basic query (only one dimension), SASI indexes will work for you. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Eduardo Alonso > >>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta > >>>>>>>>> 28224 Pozuelo de Alarcón, Madrid > >>>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // > www.stratio.com > >>>>>>>>> // *@stratiobd <https://twitter.com/StratioBD>* > >>>>>>>>> > >>>>>>>>> 2017-06-12 9:50 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail. > com > >>>>>>>>> >: > >>>>>>>>> > >>>>>>>>>> But Condition is , I am working with Apache Cassandra Database > in > >>>>>>>>>> which I have to store my data into Cassandra and then have to > implement > >>>>>>>>>> partial search capability. > >>>>>>>>>> If we need to search based on full search primary key, then it > >>>>>>>>>> really best and easy to work with Cassandra , but in case of > flexible > >>>>>>>>>> search , I am getting confused. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin < > >>>>>>>>>> oskar.kjel...@gmail.com> wrote: > >>>>>>>>>> > >>>>>>>>>>> I haven't run solr with Cassandra myself. I just meant to run > >>>>>>>>>>> elasticsearch as a completely separate service and write there > as well. > >>>>>>>>>>> > >>>>>>>>>>> On 12 Jun 2017, at 09:45, @Nandan@ < > >>>>>>>>>>> nandanpriyadarshi...@gmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Do you mean to use Elastic Search with Cassandra? > >>>>>>>>>>> Even I am thinking to use Apache Solr With Cassandra. > >>>>>>>>>>> In that case I have to create distributed tables such as:- > >>>>>>>>>>> 1) video_by_title, video_by_actor, video_by_year etc.. > >>>>>>>>>>> 2) After creating Tables , will have to configure solr core on > >>>>>>>>>>> all tables. > >>>>>>>>>>> > >>>>>>>>>>> Is it like this ? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin < > >>>>>>>>>>> oskar.kjel...@gmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Why not elasticsearch for this use case? It will make your > life > >>>>>>>>>>>> much simpler > >>>>>>>>>>>> > >>>>>>>>>>>> > On 12 Jun 2017, at 04:40, @Nandan@ < > >>>>>>>>>>>> nandanpriyadarshi...@gmail.com> wrote: > >>>>>>>>>>>> > > >>>>>>>>>>>> > Hi, > >>>>>>>>>>>> > > >>>>>>>>>>>> > Currently, I am working on data modeling for Video Company > in > >>>>>>>>>>>> which we have different types of users as well as different > user > >>>>>>>>>>>> functionality. > >>>>>>>>>>>> > But currently, my concern is about Search video module based > >>>>>>>>>>>> on different fields. > >>>>>>>>>>>> > > >>>>>>>>>>>> > Query patterns are as below:- > >>>>>>>>>>>> > 1) Select video by actor. > >>>>>>>>>>>> > 2) select video by producer. > >>>>>>>>>>>> > 3) select video by music. > >>>>>>>>>>>> > 4) select video by actor and producer. > >>>>>>>>>>>> > 5) select video by actor and music. > >>>>>>>>>>>> > > >>>>>>>>>>>> > Note: - In short, We want to establish an advanced search > >>>>>>>>>>>> module by which we can search by anyway and get the desired > results. > >>>>>>>>>>>> > > >>>>>>>>>>>> > During a search , we need partial search also such that if > >>>>>>>>>>>> any user can search "Harry" title, then we are able to give > them result as > >>>>>>>>>>>> all videos whose > >>>>>>>>>>>> > title contains "Harry" at any location. > >>>>>>>>>>>> > > >>>>>>>>>>>> > As per my ideas, I have to create separate tables such as > >>>>>>>>>>>> video_by_actor, video_by_producer etc.. and implement solr > query on all > >>>>>>>>>>>> tables. Otherwise, > >>>>>>>>>>>> > is there any others way by which we can implement this > search > >>>>>>>>>>>> module effectively. > >>>>>>>>>>>> > > >>>>>>>>>>>> > Please suggest. > >>>>>>>>>>>> > > >>>>>>>>>>>> > Best regards, > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > >