Re: Need guidance solrcloud shardings with date interval
Slight typo: formerly called “composite ID routing” should read formerly called “implicit routing” On Tue, Jul 25, 2017 at 9:57 AM, Walter Underwoodwrote: > Solr is not Oracle. Designs that might be great for Oracle can be terrible > for Solr. > > Solr really does not do this automatically, so you won’t find that. If your > job is to find that feature, you will fail. If your job is “find or write the > feature”, you will be writing it. > > As I said before, you will need to write automation to create daily shards. > You will need to configure manual shard routing (formerly called “composite > ID routing”). Documents sent to Solr will need IDs that work with manual > routing. You will need automation to delete old shards. You will also need to > manage where the shards are created to keep load and disk usage distributed. > If you want search to keep working after a failure, you will also need to > create and delete additional shards as replicas. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Jul 22, 2017, at 4:02 AM, m rehman kahloon >> wrote: >> >> Hi Sir Walter, >> >> Ya, you are right, i am trying to create a structure like >> oracle partitioning. >> >> each day partition like each day shard. >> >> already creted date wise shards and loading time using specific shard name >> to load data. >> >> but my R is to find a way, not to use shard name using loading >> time,solrcloud automatically load data into predefined shard/date specific >> shard. >> >> Is there any way to perfome this? >> >> Once again thanks Sir. >> >> >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Need-guidance-solrcloud-shardings-with-date-interval-tp4346936p4347250.html >> Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Need guidance solrcloud shardings with date interval
Solr is not Oracle. Designs that might be great for Oracle can be terrible for Solr. Solr really does not do this automatically, so you won’t find that. If your job is to find that feature, you will fail. If your job is “find or write the feature”, you will be writing it. As I said before, you will need to write automation to create daily shards. You will need to configure manual shard routing (formerly called “composite ID routing”). Documents sent to Solr will need IDs that work with manual routing. You will need automation to delete old shards. You will also need to manage where the shards are created to keep load and disk usage distributed. If you want search to keep working after a failure, you will also need to create and delete additional shards as replicas. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 22, 2017, at 4:02 AM, m rehman kahloon> wrote: > > Hi Sir Walter, > > Ya, you are right, i am trying to create a structure like > oracle partitioning. > > each day partition like each day shard. > > already creted date wise shards and loading time using specific shard name > to load data. > > but my R is to find a way, not to use shard name using loading > time,solrcloud automatically load data into predefined shard/date specific > shard. > > Is there any way to perfome this? > > Once again thanks Sir. > > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Need-guidance-solrcloud-shardings-with-date-interval-tp4346936p4347250.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need guidance solrcloud shardings with date interval
If you decide to go with multiple collection and aliasing, this would be useful https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/ On Sat, Jul 22, 2017 at 10:37 AM, Shawn Heiseywrote: > On 7/22/2017 5:02 AM, m rehman kahloon wrote: > > but my R is to find a way, not to use shard name using loading > > time,solrcloud automatically load data into predefined shard/date > specific > > shard. > > The implicit router is the only one you can use when you're doing time > interval sharding, because it's the only one that allows the addition of > new shards after collection creation. > > There is a reason that we are considering renaming the "implicit" router > to "manual" instead. > > https://issues.apache.org/jira/browse/SOLR-6630 > > When you use the implicit router, you're completely in charge of > sharding. There is no automation. You must name the shards, create > them, delete them, and inform Solr during indexing about which shard > will get new documents. If you want to only query a subset of the > shards in a collection, you are responsible for telling Solr that with > the shards parameter on the query. > > If you want to come up with a way to patch the Solr source code to add a > new router that does automated time interval sharding, that will be > welcome. > > Thanks, > Shawn > >
Re: Need guidance solrcloud shardings with date interval
On 7/22/2017 5:02 AM, m rehman kahloon wrote: > but my R is to find a way, not to use shard name using loading > time,solrcloud automatically load data into predefined shard/date specific > shard. The implicit router is the only one you can use when you're doing time interval sharding, because it's the only one that allows the addition of new shards after collection creation. There is a reason that we are considering renaming the "implicit" router to "manual" instead. https://issues.apache.org/jira/browse/SOLR-6630 When you use the implicit router, you're completely in charge of sharding. There is no automation. You must name the shards, create them, delete them, and inform Solr during indexing about which shard will get new documents. If you want to only query a subset of the shards in a collection, you are responsible for telling Solr that with the shards parameter on the query. If you want to come up with a way to patch the Solr source code to add a new router that does automated time interval sharding, that will be welcome. Thanks, Shawn
Re: Need guidance solrcloud shardings with date interval
Hi Sir Walter, Ya, you are right, i am trying to create a structure like oracle partitioning. each day partition like each day shard. already creted date wise shards and loading time using specific shard name to load data. but my R is to find a way, not to use shard name using loading time,solrcloud automatically load data into predefined shard/date specific shard. Is there any way to perfome this? Once again thanks Sir. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-guidance-solrcloud-shardings-with-date-interval-tp4346936p4347250.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need guidance solrcloud shardings with date interval
Thanks for your response, actually my per day data size is too big, round 400gb, so thats why my plan to use date interval ,each shard will represent to pre defined date. delete will not possible. i am looking for some way, to perform automatically, loading time not give any shard name, document automatially load with respect to its date. thanks, waiting still guidance -- View this message in context: http://lucene.472066.n3.nabble.com/Need-guidance-solrcloud-shardings-with-date-interval-tp4346936p4347249.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Need guidance solrcloud shardings with date interval
Muhammad, This sounds like it might be handled better by multiple collections rather than multiple "sub collections". If you create a new collection for each date, all using the same common config set, and then create an alias that contains all of these collections. Then, the alias will function as your "collection", and the date-specific collections will function as your "sub-collections". This is a supported scenario, and I agree with the others that playing around with specific shard placement and shards is a poor choice. One way you could do something similar is to limit the # of shards/replicas used for date-specific collections. Hope this helps, Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Thursday, July 20, 2017 1:24 PM To: solr-user@lucene.apache.org Subject: Re: Need guidance solrcloud shardings with date interval Well, you have bad problem. You have a requirement that forces you to build an expensive, unreliable search system. You need to do specific shard creation at specific times every day. What happens if that fails? Does search go down until it is fixed because all searches are going to a shard that doesn’t exist? Or do the documents get randomly sent to existing shards, so you need to search all the shards anyway? If docs are distributed, you’ll need to clean that day up with delete by query. You need to build that as a failure recovery. Does your code handle leap years for shard creation? Daylight saving time? How do you test that code? You’ll be writing a lot of custom code that other people don’t need. If you are a consultant, this is great. For the customer, not so good. Whoever wrote that requirement does not know very much about Solr. It sounds like they are trying to force RDBMS sharding onto Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 20, 2017, at 8:09 AM, rehman kahloon > <mrehman_kahl...@yahoo.com.INVALID> wrote: > > blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px > #715FFA solid !important; padding-left:1ex !important; background-color:white > !important; } Hi Eric, > Thank you very much for your guidance. > No sir that is our requirmnt to load data into specific shard and later after > rentention time we will delete that shard. > Please share if you have any manual sharding exercise dicument. > 2nd is it posible data automatically load into specific shard without using > shard name during loading. > > Is there any solr file where i mentioned all my shards name with specific > date. When data come automTically load dara into alredy mentioned shard? > Once again Thank you very much. > Kind regards,Muhammad rehman kahloon > > Sent from Yahoo Mail for iPhone > > > On Thursday, July 20, 2017, 19:57, Erick Erickson <erickerick...@gmail.com> > wrote: > > Use the "implicit" router (being renamed "manual". that takes the > value of a particular field (_route_ by default) and sends docs to > that exact shard. > > But I also question whether sharding on this schema is a good idea. If > you have an access pattern where most queries are for, say, the last > two days then all the work will be done on only 2 machines and all the > rest will be idle. You should at least consider just using normal > routing that distributes the data across all shards and then use > delete-by-query to delete the data older than 10 days. > > Best, > Erick > > On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon > <mrehman_kahl...@yahoo.com.invalid> wrote: >> >> Hi Sir, >> Taken your id from your document on SlideShare. >> Need your guidance on my plan ,My target is to create sub-collection/shards >> within a collection. >> e.g >> Currently 1 have 10 days data and want to store data >> against each date in separate partitions. like oracle partition concepts >> (one table can have many partitions) Plan is to store each date data with in >> separate node, Total physical nodes are 10 and after 10 days, 11th date data >> load in node1 and existing data backup (oldest date data with purge and >> backed up). >> Please guide me how can i perform that using SolrCloud. 1 collection with >> unlimited sub collection. >> >> Thank you very much in advanced. >> >> Kind Regards,Muhammad Rehman Kahloon. > > >
Re: Need guidance solrcloud shardings with date interval
Well, you have bad problem. You have a requirement that forces you to build an expensive, unreliable search system. You need to do specific shard creation at specific times every day. What happens if that fails? Does search go down until it is fixed because all searches are going to a shard that doesn’t exist? Or do the documents get randomly sent to existing shards, so you need to search all the shards anyway? If docs are distributed, you’ll need to clean that day up with delete by query. You need to build that as a failure recovery. Does your code handle leap years for shard creation? Daylight saving time? How do you test that code? You’ll be writing a lot of custom code that other people don’t need. If you are a consultant, this is great. For the customer, not so good. Whoever wrote that requirement does not know very much about Solr. It sounds like they are trying to force RDBMS sharding onto Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 20, 2017, at 8:09 AM, rehman kahloon >wrote: > > blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px > #715FFA solid !important; padding-left:1ex !important; background-color:white > !important; } Hi Eric, > Thank you very much for your guidance. > No sir that is our requirmnt to load data into specific shard and later after > rentention time we will delete that shard. > Please share if you have any manual sharding exercise dicument. > 2nd is it posible data automatically load into specific shard without using > shard name during loading. > > Is there any solr file where i mentioned all my shards name with specific > date. When data come automTically load dara into alredy mentioned shard? > Once again Thank you very much. > Kind regards,Muhammad rehman kahloon > > Sent from Yahoo Mail for iPhone > > > On Thursday, July 20, 2017, 19:57, Erick Erickson > wrote: > > Use the "implicit" router (being renamed "manual". that takes the > value of a particular field (_route_ by default) and sends docs to > that exact shard. > > But I also question whether sharding on this schema is a good idea. If > you have an access pattern where most queries are for, say, the last > two days then all the work will be done on only 2 machines and all the > rest will be idle. You should at least consider just using normal > routing that distributes the data across all shards and then use > delete-by-query to delete the data older than 10 days. > > Best, > Erick > > On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon > wrote: >> >> Hi Sir, >> Taken your id from your document on SlideShare. >> Need your guidance on my plan ,My target is to create sub-collection/shards >> within a collection. >> e.g >> Currently 1 have 10 days data and want to store data against each >> date in separate partitions. like oracle partition concepts (one table can >> have many partitions) >> Plan is to store each date data with in separate node, Total physical nodes >> are 10 and after 10 days, 11th date data load in node1 and existing data >> backup (oldest date data with purge and backed up). >> Please guide me how can i perform that using SolrCloud. 1 collection with >> unlimited sub collection. >> >> Thank you very much in advanced. >> >> Kind Regards,Muhammad Rehman Kahloon. > > >
Re: Need guidance solrcloud shardings with date interval
bq: that is our requirmnt to load data into specific shard and later after rentention time we will delete that shard Why is it necessary to delete a shard when deleting the old data by query removes it? This sounds like an XY problem. Someone has "required" that you enforce data retention by deleting a shard so you're asking about deleting shards. Whereas the problem is to purge old data that _could_ be accomplished by rotating shards, but is _also_ accomplished by just issuing a "delete all data more than 10 days old" query. But if it's a requirement, se the reference guide for "implicit" routing in the document routing sections. Best, Erick On Thu, Jul 20, 2017 at 8:15 AM, Susheel Kumarwrote: > Agree. One should first try to measure the performance with standard/common > approach. > > On Thu, Jul 20, 2017 at 11:00 AM, Walter Underwood > wrote: > >> I agree. Use the standard shard distribution and delete by query to remove >> older documents. >> >> Much, much simpler and probably faster at query time. >> >> I’m seeing a lot of e-mails about people trying to do fancy things with >> sharding before they’ve even tried and measured the performance. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >> > On Jul 20, 2017, at 7:57 AM, Erick Erickson >> wrote: >> > >> > Use the "implicit" router (being renamed "manual". that takes the >> > value of a particular field (_route_ by default) and sends docs to >> > that exact shard. >> > >> > But I also question whether sharding on this schema is a good idea. If >> > you have an access pattern where most queries are for, say, the last >> > two days then all the work will be done on only 2 machines and all the >> > rest will be idle. You should at least consider just using normal >> > routing that distributes the data across all shards and then use >> > delete-by-query to delete the data older than 10 days. >> > >> > Best, >> > Erick >> > >> > On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon >> > wrote: >> >> >> >> Hi Sir, >> >>Taken your id from your document on SlideShare. >> >> Need your guidance on my plan ,My target is to create >> sub-collection/shards within a collection. >> >> e.g >> >> Currently 1 have 10 days data and want to store data against >> each date in separate partitions. like oracle partition concepts (one >> table can have many partitions) >> >> Plan is to store each date data with in separate node, Total physical >> nodes are 10 and after 10 days, 11th date data load in node1 and existing >> data backup (oldest date data with purge and backed up). >> >> Please guide me how can i perform that using SolrCloud. 1 collection >> with unlimited sub collection. >> >> >> >> Thank you very much in advanced. >> >> >> >> Kind Regards,Muhammad Rehman Kahloon. >> >>
Re: Need guidance solrcloud shardings with date interval
blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px #715FFA solid !important; padding-left:1ex !important; background-color:white !important; } Hi Eric, Thank you very much for your guidance. No sir that is our requirmnt to load data into specific shard and later after rentention time we will delete that shard. Please share if you have any manual sharding exercise dicument. 2nd is it posible data automatically load into specific shard without using shard name during loading. Is there any solr file where i mentioned all my shards name with specific date. When data come automTically load dara into alredy mentioned shard? Once again Thank you very much. Kind regards,Muhammad rehman kahloon Sent from Yahoo Mail for iPhone On Thursday, July 20, 2017, 19:57, Erick Ericksonwrote: Use the "implicit" router (being renamed "manual". that takes the value of a particular field (_route_ by default) and sends docs to that exact shard. But I also question whether sharding on this schema is a good idea. If you have an access pattern where most queries are for, say, the last two days then all the work will be done on only 2 machines and all the rest will be idle. You should at least consider just using normal routing that distributes the data across all shards and then use delete-by-query to delete the data older than 10 days. Best, Erick On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon wrote: > > Hi Sir, > Taken your id from your document on SlideShare. > Need your guidance on my plan ,My target is to create sub-collection/shards > within a collection. > e.g > Currently 1 have 10 days data and want to store data against each >date in separate partitions. like oracle partition concepts (one table can >have many partitions) > Plan is to store each date data with in separate node, Total physical nodes > are 10 and after 10 days, 11th date data load in node1 and existing data > backup (oldest date data with purge and backed up). > Please guide me how can i perform that using SolrCloud. 1 collection with > unlimited sub collection. > > Thank you very much in advanced. > > Kind Regards,Muhammad Rehman Kahloon.
Re: Need guidance solrcloud shardings with date interval
Agree. One should first try to measure the performance with standard/common approach. On Thu, Jul 20, 2017 at 11:00 AM, Walter Underwoodwrote: > I agree. Use the standard shard distribution and delete by query to remove > older documents. > > Much, much simpler and probably faster at query time. > > I’m seeing a lot of e-mails about people trying to do fancy things with > sharding before they’ve even tried and measured the performance. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Jul 20, 2017, at 7:57 AM, Erick Erickson > wrote: > > > > Use the "implicit" router (being renamed "manual". that takes the > > value of a particular field (_route_ by default) and sends docs to > > that exact shard. > > > > But I also question whether sharding on this schema is a good idea. If > > you have an access pattern where most queries are for, say, the last > > two days then all the work will be done on only 2 machines and all the > > rest will be idle. You should at least consider just using normal > > routing that distributes the data across all shards and then use > > delete-by-query to delete the data older than 10 days. > > > > Best, > > Erick > > > > On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon > > wrote: > >> > >> Hi Sir, > >>Taken your id from your document on SlideShare. > >> Need your guidance on my plan ,My target is to create > sub-collection/shards within a collection. > >> e.g > >> Currently 1 have 10 days data and want to store data against > each date in separate partitions. like oracle partition concepts (one > table can have many partitions) > >> Plan is to store each date data with in separate node, Total physical > nodes are 10 and after 10 days, 11th date data load in node1 and existing > data backup (oldest date data with purge and backed up). > >> Please guide me how can i perform that using SolrCloud. 1 collection > with unlimited sub collection. > >> > >> Thank you very much in advanced. > >> > >> Kind Regards,Muhammad Rehman Kahloon. > >
Re: Need guidance solrcloud shardings with date interval
I agree. Use the standard shard distribution and delete by query to remove older documents. Much, much simpler and probably faster at query time. I’m seeing a lot of e-mails about people trying to do fancy things with sharding before they’ve even tried and measured the performance. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 20, 2017, at 7:57 AM, Erick Ericksonwrote: > > Use the "implicit" router (being renamed "manual". that takes the > value of a particular field (_route_ by default) and sends docs to > that exact shard. > > But I also question whether sharding on this schema is a good idea. If > you have an access pattern where most queries are for, say, the last > two days then all the work will be done on only 2 machines and all the > rest will be idle. You should at least consider just using normal > routing that distributes the data across all shards and then use > delete-by-query to delete the data older than 10 days. > > Best, > Erick > > On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon > wrote: >> >> Hi Sir, >>Taken your id from your document on SlideShare. >> Need your guidance on my plan ,My target is to create sub-collection/shards >> within a collection. >> e.g >> Currently 1 have 10 days data and want to store data against each >> date in separate partitions. like oracle partition concepts (one table can >> have many partitions) >> Plan is to store each date data with in separate node, Total physical nodes >> are 10 and after 10 days, 11th date data load in node1 and existing data >> backup (oldest date data with purge and backed up). >> Please guide me how can i perform that using SolrCloud. 1 collection with >> unlimited sub collection. >> >> Thank you very much in advanced. >> >> Kind Regards,Muhammad Rehman Kahloon.
Re: Need guidance solrcloud shardings with date interval
Use the "implicit" router (being renamed "manual". that takes the value of a particular field (_route_ by default) and sends docs to that exact shard. But I also question whether sharding on this schema is a good idea. If you have an access pattern where most queries are for, say, the last two days then all the work will be done on only 2 machines and all the rest will be idle. You should at least consider just using normal routing that distributes the data across all shards and then use delete-by-query to delete the data older than 10 days. Best, Erick On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloonwrote: > > Hi Sir, > Taken your id from your document on SlideShare. > Need your guidance on my plan ,My target is to create sub-collection/shards > within a collection. > e.g > Currently 1 have 10 days data and want to store data against each > date in separate partitions. like oracle partition concepts (one table can > have many partitions) > Plan is to store each date data with in separate node, Total physical nodes > are 10 and after 10 days, 11th date data load in node1 and existing data > backup (oldest date data with purge and backed up). > Please guide me how can i perform that using SolrCloud. 1 collection with > unlimited sub collection. > > Thank you very much in advanced. > > Kind Regards,Muhammad Rehman Kahloon.