Re: Need guidance solrcloud shardings with date interval

2017-07-25 Thread Erick Erickson
Slight typo:

formerly called “composite ID routing”
should read
formerly called “implicit routing”

On Tue, Jul 25, 2017 at 9:57 AM, Walter Underwood  wrote:
> Solr is not Oracle. Designs that might be great for Oracle can be terrible 
> for Solr.
>
> Solr really does not do this automatically, so you won’t find that. If your 
> job is to find that feature, you will fail. If your job is “find or write the 
> feature”, you will be writing it.
>
> As I said before, you will need to write automation to create daily shards. 
> You will need to configure manual shard routing (formerly called “composite 
> ID routing”). Documents sent to Solr will need IDs that work with manual 
> routing. You will need automation to delete old shards. You will also need to 
> manage where the shards are created to keep load and disk usage distributed. 
> If you want search to keep working after a failure, you will also need to 
> create and delete additional shards as replicas.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Jul 22, 2017, at 4:02 AM, m rehman kahloon  
>> wrote:
>>
>> Hi Sir Walter,
>>
>>   Ya, you are right, i am trying to create a structure like
>> oracle partitioning.
>>
>> each day partition like each day shard.
>>
>> already creted date wise shards and loading time using specific shard name
>> to load data.
>>
>> but my R is to find a way, not to use shard name using loading
>> time,solrcloud automatically load data into predefined shard/date specific
>> shard.
>>
>> Is there any way to perfome this?
>>
>> Once again thanks Sir.
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Need-guidance-solrcloud-shardings-with-date-interval-tp4346936p4347250.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Need guidance solrcloud shardings with date interval

2017-07-25 Thread Walter Underwood
Solr is not Oracle. Designs that might be great for Oracle can be terrible for 
Solr.

Solr really does not do this automatically, so you won’t find that. If your job 
is to find that feature, you will fail. If your job is “find or write the 
feature”, you will be writing it.

As I said before, you will need to write automation to create daily shards. You 
will need to configure manual shard routing (formerly called “composite ID 
routing”). Documents sent to Solr will need IDs that work with manual routing. 
You will need automation to delete old shards. You will also need to manage 
where the shards are created to keep load and disk usage distributed. If you 
want search to keep working after a failure, you will also need to create and 
delete additional shards as replicas.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 22, 2017, at 4:02 AM, m rehman kahloon  
> wrote:
> 
> Hi Sir Walter,
> 
>   Ya, you are right, i am trying to create a structure like
> oracle partitioning. 
> 
> each day partition like each day shard.
> 
> already creted date wise shards and loading time using specific shard name
> to load data.
> 
> but my R is to find a way, not to use shard name using loading
> time,solrcloud automatically load data into predefined shard/date specific
> shard.
> 
> Is there any way to perfome this?
> 
> Once again thanks Sir.
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Need-guidance-solrcloud-shardings-with-date-interval-tp4346936p4347250.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need guidance solrcloud shardings with date interval

2017-07-23 Thread Susheel Kumar
If you decide to go with multiple collection and aliasing, this would be
useful

https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/


On Sat, Jul 22, 2017 at 10:37 AM, Shawn Heisey  wrote:

> On 7/22/2017 5:02 AM, m rehman kahloon wrote:
> > but my R is to find a way, not to use shard name using loading
> > time,solrcloud automatically load data into predefined shard/date
> specific
> > shard.
>
> The implicit router is the only one you can use when you're doing time
> interval sharding, because it's the only one that allows the addition of
> new shards after collection creation.
>
> There is a reason that we are considering renaming the "implicit" router
> to "manual" instead.
>
> https://issues.apache.org/jira/browse/SOLR-6630
>
> When you use the implicit router, you're completely in charge of
> sharding.  There is no automation.  You must name the shards, create
> them, delete them, and inform Solr during indexing about which shard
> will get new documents.  If you want to only query a subset of the
> shards in a collection, you are responsible for telling Solr that with
> the shards parameter on the query.
>
> If you want to come up with a way to patch the Solr source code to add a
> new router that does automated time interval sharding, that will be
> welcome.
>
> Thanks,
> Shawn
>
>


Re: Need guidance solrcloud shardings with date interval

2017-07-22 Thread Shawn Heisey
On 7/22/2017 5:02 AM, m rehman kahloon wrote:
> but my R is to find a way, not to use shard name using loading
> time,solrcloud automatically load data into predefined shard/date specific
> shard.

The implicit router is the only one you can use when you're doing time
interval sharding, because it's the only one that allows the addition of
new shards after collection creation.

There is a reason that we are considering renaming the "implicit" router
to "manual" instead.

https://issues.apache.org/jira/browse/SOLR-6630

When you use the implicit router, you're completely in charge of
sharding.  There is no automation.  You must name the shards, create
them, delete them, and inform Solr during indexing about which shard
will get new documents.  If you want to only query a subset of the
shards in a collection, you are responsible for telling Solr that with
the shards parameter on the query.

If you want to come up with a way to patch the Solr source code to add a
new router that does automated time interval sharding, that will be welcome.

Thanks,
Shawn



Re: Need guidance solrcloud shardings with date interval

2017-07-22 Thread m rehman kahloon
Hi Sir Walter,

   Ya, you are right, i am trying to create a structure like
oracle partitioning. 

each day partition like each day shard.

already creted date wise shards and loading time using specific shard name
to load data.

but my R is to find a way, not to use shard name using loading
time,solrcloud automatically load data into predefined shard/date specific
shard.

Is there any way to perfome this?

Once again thanks Sir.







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-guidance-solrcloud-shardings-with-date-interval-tp4346936p4347250.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need guidance solrcloud shardings with date interval

2017-07-22 Thread m rehman kahloon
Thanks for your response,

 actually my per day data size is too big, round 400gb, so
thats why my plan to use date interval ,each shard will represent to pre
defined date. delete will not possible.

i am looking for some way, to perform automatically,  loading time not give
any shard name, document automatially load with respect to its date.

thanks,  waiting still guidance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-guidance-solrcloud-shardings-with-date-interval-tp4346936p4347249.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Davis, Daniel (NIH/NLM) [C]
Muhammad,

This sounds like it might be handled better by multiple collections rather than 
multiple "sub collections".   If you create a new collection for each date, all 
using the same common config set, and then create an alias that contains all of 
 these collections.   Then, the alias will function as your "collection", and 
the date-specific collections will function as your "sub-collections".

This is a supported scenario, and I agree with the others that playing around 
with specific shard placement and shards is a poor choice.

One way you could do something similar is to limit the # of shards/replicas 
used for date-specific collections.

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH


-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Thursday, July 20, 2017 1:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Need guidance solrcloud shardings with date interval

Well, you have bad problem. You have a requirement that forces you to build an 
expensive, unreliable search system.

You need to do specific shard creation at specific times every day. What 
happens if that fails? Does search go down until it is fixed because all 
searches are going to a shard that doesn’t exist? Or do the documents get 
randomly sent to existing shards, so you need to search all the shards anyway? 
If docs are distributed, you’ll need to clean that day up with delete by query. 
You need to build that as a failure recovery.

Does your code handle leap years for shard creation? Daylight saving time? How 
do you test that code?

You’ll be writing a lot of custom code that other people don’t need. If you are 
a consultant, this is great. For the customer, not so good.

Whoever wrote that requirement does not know very much about Solr. It sounds 
like they are trying to force RDBMS sharding onto Solr.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 20, 2017, at 8:09 AM, rehman kahloon 
> <mrehman_kahl...@yahoo.com.INVALID> wrote:
> 
> blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px 
> #715FFA solid !important; padding-left:1ex !important; background-color:white 
> !important; } Hi Eric,
>   Thank you very much for your guidance.
> No sir that is our requirmnt to load data into specific shard and later after 
> rentention time we will delete that shard.
> Please share if you have any manual sharding exercise dicument. 
> 2nd is it posible data automatically load into specific shard without using 
> shard name during loading. 
> 
> Is there any solr file where i mentioned all my shards name with specific 
> date. When data come automTically load dara into alredy mentioned shard?
> Once again Thank you very much. 
> Kind regards,Muhammad rehman kahloon
> 
> Sent from Yahoo Mail for iPhone
> 
> 
> On Thursday, July 20, 2017, 19:57, Erick Erickson <erickerick...@gmail.com> 
> wrote:
> 
> Use the "implicit" router (being renamed "manual". that takes the 
> value of a particular field (_route_ by default) and sends docs to 
> that exact shard.
> 
> But I also question whether sharding on this schema is a good idea. If 
> you have an access pattern where most queries are for, say, the last 
> two days then all the work will be done on only 2 machines and all the 
> rest will be idle. You should at least consider just using normal 
> routing that distributes the data across all shards and then use 
> delete-by-query to delete the data older than 10 days.
> 
> Best,
> Erick
> 
> On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon 
> <mrehman_kahl...@yahoo.com.invalid> wrote:
>> 
>> Hi Sir,
>> Taken your id from your document on SlideShare.
>> Need your guidance on my plan ,My target is to create sub-collection/shards 
>> within a collection.
>> e.g
>>   Currently 1 have 10 days data and want to store data 
>> against each date in separate partitions.  like oracle partition concepts 
>> (one table can have many partitions) Plan is to store each date data with in 
>> separate node, Total physical nodes are 10 and after 10 days, 11th date data 
>> load in node1 and existing data backup (oldest date data with purge and 
>> backed up).
>> Please guide me how can i perform that using SolrCloud.  1 collection with 
>> unlimited sub collection.
>> 
>> Thank you very much in advanced.
>> 
>> Kind Regards,Muhammad Rehman Kahloon.
> 
> 
> 



Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Walter Underwood
Well, you have bad problem. You have a requirement that forces you to build an 
expensive, unreliable search system.

You need to do specific shard creation at specific times every day. What 
happens if that fails? Does search go down until it is fixed because all 
searches are going to a shard that doesn’t exist? Or do the documents get 
randomly sent to existing shards, so you need to search all the shards anyway? 
If docs are distributed, you’ll need to clean that day up with delete by query. 
You need to build that as a failure recovery.

Does your code handle leap years for shard creation? Daylight saving time? How 
do you test that code?

You’ll be writing a lot of custom code that other people don’t need. If you are 
a consultant, this is great. For the customer, not so good.

Whoever wrote that requirement does not know very much about Solr. It sounds 
like they are trying to force RDBMS sharding onto Solr.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 20, 2017, at 8:09 AM, rehman kahloon 
>  wrote:
> 
> blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px 
> #715FFA solid !important; padding-left:1ex !important; background-color:white 
> !important; } Hi Eric,
>   Thank you very much for your guidance.
> No sir that is our requirmnt to load data into specific shard and later after 
> rentention time we will delete that shard.
> Please share if you have any manual sharding exercise dicument. 
> 2nd is it posible data automatically load into specific shard without using 
> shard name during loading. 
> 
> Is there any solr file where i mentioned all my shards name with specific 
> date. When data come automTically load dara into alredy mentioned shard?
> Once again Thank you very much. 
> Kind regards,Muhammad rehman kahloon
> 
> Sent from Yahoo Mail for iPhone
> 
> 
> On Thursday, July 20, 2017, 19:57, Erick Erickson  
> wrote:
> 
> Use the "implicit" router (being renamed "manual". that takes the
> value of a particular field (_route_ by default) and sends docs to
> that exact shard.
> 
> But I also question whether sharding on this schema is a good idea. If
> you have an access pattern where most queries are for, say, the last
> two days then all the work will be done on only 2 machines and all the
> rest will be idle. You should at least consider just using normal
> routing that distributes the data across all shards and then use
> delete-by-query to delete the data older than 10 days.
> 
> Best,
> Erick
> 
> On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon
>  wrote:
>> 
>> Hi Sir,
>> Taken your id from your document on SlideShare.
>> Need your guidance on my plan ,My target is to create sub-collection/shards 
>> within a collection.
>> e.g
>>   Currently 1 have 10 days data and want to store data against each 
>> date in separate partitions.  like oracle partition concepts (one table can 
>> have many partitions)
>> Plan is to store each date data with in separate node, Total physical nodes 
>> are 10 and after 10 days, 11th date data load in node1 and existing data 
>> backup (oldest date data with purge and backed up).
>> Please guide me how can i perform that using SolrCloud.  1 collection with 
>> unlimited sub collection.
>> 
>> Thank you very much in advanced.
>> 
>> Kind Regards,Muhammad Rehman Kahloon.
> 
> 
> 



Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Erick Erickson
bq:  that is our requirmnt to load data into specific shard and later
after rentention time we will delete that shard

Why is it necessary to delete a shard when deleting the old data by
query removes it? This sounds like an XY problem. Someone has
"required" that you enforce data retention by deleting a shard so
you're asking about deleting shards. Whereas the problem is to purge
old data that _could_ be accomplished by rotating shards, but is
_also_ accomplished by just issuing a "delete all data more than 10
days old" query.

But if it's a requirement, se the reference guide for "implicit"
routing in the document routing sections.

Best,
Erick

On Thu, Jul 20, 2017 at 8:15 AM, Susheel Kumar  wrote:
> Agree. One should first try to measure the performance with standard/common
> approach.
>
> On Thu, Jul 20, 2017 at 11:00 AM, Walter Underwood 
> wrote:
>
>> I agree. Use the standard shard distribution and delete by query to remove
>> older documents.
>>
>> Much, much simpler and probably faster at query time.
>>
>> I’m seeing a lot of e-mails about people trying to do fancy things with
>> sharding before they’ve even tried and measured the performance.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Jul 20, 2017, at 7:57 AM, Erick Erickson 
>> wrote:
>> >
>> > Use the "implicit" router (being renamed "manual". that takes the
>> > value of a particular field (_route_ by default) and sends docs to
>> > that exact shard.
>> >
>> > But I also question whether sharding on this schema is a good idea. If
>> > you have an access pattern where most queries are for, say, the last
>> > two days then all the work will be done on only 2 machines and all the
>> > rest will be idle. You should at least consider just using normal
>> > routing that distributes the data across all shards and then use
>> > delete-by-query to delete the data older than 10 days.
>> >
>> > Best,
>> > Erick
>> >
>> > On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon
>> >  wrote:
>> >>
>> >> Hi Sir,
>> >>Taken your id from your document on SlideShare.
>> >> Need your guidance on my plan ,My target is to create
>> sub-collection/shards within a collection.
>> >> e.g
>> >> Currently 1 have 10 days data and want to store data against
>> each date in separate partitions.  like oracle partition concepts (one
>> table can have many partitions)
>> >> Plan is to store each date data with in separate node, Total physical
>> nodes are 10 and after 10 days, 11th date data load in node1 and existing
>> data backup (oldest date data with purge and backed up).
>> >> Please guide me how can i perform that using SolrCloud.  1 collection
>> with unlimited sub collection.
>> >>
>> >> Thank you very much in advanced.
>> >>
>> >> Kind Regards,Muhammad Rehman Kahloon.
>>
>>


Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread rehman kahloon
 blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px 
#715FFA solid !important; padding-left:1ex !important; background-color:white 
!important; } Hi Eric,
  Thank you very much for your guidance.
No sir that is our requirmnt to load data into specific shard and later after 
rentention time we will delete that shard.
Please share if you have any manual sharding exercise dicument. 
2nd is it posible data automatically load into specific shard without using 
shard name during loading. 

Is there any solr file where i mentioned all my shards name with specific date. 
When data come automTically load dara into alredy mentioned shard?
Once again Thank you very much. 
Kind regards,Muhammad rehman kahloon

Sent from Yahoo Mail for iPhone


On Thursday, July 20, 2017, 19:57, Erick Erickson  
wrote:

Use the "implicit" router (being renamed "manual". that takes the
value of a particular field (_route_ by default) and sends docs to
that exact shard.

But I also question whether sharding on this schema is a good idea. If
you have an access pattern where most queries are for, say, the last
two days then all the work will be done on only 2 machines and all the
rest will be idle. You should at least consider just using normal
routing that distributes the data across all shards and then use
delete-by-query to delete the data older than 10 days.

Best,
Erick

On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon
 wrote:
>
> Hi Sir,
>            Taken your id from your document on SlideShare.
> Need your guidance on my plan ,My target is to create sub-collection/shards 
> within a collection.
> e.g
>          Currently 1 have 10 days data and want to store data against each 
>date in separate partitions.  like oracle partition concepts (one table can 
>have many partitions)
> Plan is to store each date data with in separate node, Total physical nodes 
> are 10 and after 10 days, 11th date data load in node1 and existing data 
> backup (oldest date data with purge and backed up).
> Please guide me how can i perform that using SolrCloud.  1 collection with 
> unlimited sub collection.
>
> Thank you very much in advanced.
>
> Kind Regards,Muhammad Rehman Kahloon.





Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Susheel Kumar
Agree. One should first try to measure the performance with standard/common
approach.

On Thu, Jul 20, 2017 at 11:00 AM, Walter Underwood 
wrote:

> I agree. Use the standard shard distribution and delete by query to remove
> older documents.
>
> Much, much simpler and probably faster at query time.
>
> I’m seeing a lot of e-mails about people trying to do fancy things with
> sharding before they’ve even tried and measured the performance.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jul 20, 2017, at 7:57 AM, Erick Erickson 
> wrote:
> >
> > Use the "implicit" router (being renamed "manual". that takes the
> > value of a particular field (_route_ by default) and sends docs to
> > that exact shard.
> >
> > But I also question whether sharding on this schema is a good idea. If
> > you have an access pattern where most queries are for, say, the last
> > two days then all the work will be done on only 2 machines and all the
> > rest will be idle. You should at least consider just using normal
> > routing that distributes the data across all shards and then use
> > delete-by-query to delete the data older than 10 days.
> >
> > Best,
> > Erick
> >
> > On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon
> >  wrote:
> >>
> >> Hi Sir,
> >>Taken your id from your document on SlideShare.
> >> Need your guidance on my plan ,My target is to create
> sub-collection/shards within a collection.
> >> e.g
> >> Currently 1 have 10 days data and want to store data against
> each date in separate partitions.  like oracle partition concepts (one
> table can have many partitions)
> >> Plan is to store each date data with in separate node, Total physical
> nodes are 10 and after 10 days, 11th date data load in node1 and existing
> data backup (oldest date data with purge and backed up).
> >> Please guide me how can i perform that using SolrCloud.  1 collection
> with unlimited sub collection.
> >>
> >> Thank you very much in advanced.
> >>
> >> Kind Regards,Muhammad Rehman Kahloon.
>
>


Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Walter Underwood
I agree. Use the standard shard distribution and delete by query to remove 
older documents.

Much, much simpler and probably faster at query time.

I’m seeing a lot of e-mails about people trying to do fancy things with 
sharding before they’ve even tried and measured the performance.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 20, 2017, at 7:57 AM, Erick Erickson  wrote:
> 
> Use the "implicit" router (being renamed "manual". that takes the
> value of a particular field (_route_ by default) and sends docs to
> that exact shard.
> 
> But I also question whether sharding on this schema is a good idea. If
> you have an access pattern where most queries are for, say, the last
> two days then all the work will be done on only 2 machines and all the
> rest will be idle. You should at least consider just using normal
> routing that distributes the data across all shards and then use
> delete-by-query to delete the data older than 10 days.
> 
> Best,
> Erick
> 
> On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon
>  wrote:
>> 
>> Hi Sir,
>>Taken your id from your document on SlideShare.
>> Need your guidance on my plan ,My target is to create sub-collection/shards 
>> within a collection.
>> e.g
>> Currently 1 have 10 days data and want to store data against each 
>> date in separate partitions.  like oracle partition concepts (one table can 
>> have many partitions)
>> Plan is to store each date data with in separate node, Total physical nodes 
>> are 10 and after 10 days, 11th date data load in node1 and existing data 
>> backup (oldest date data with purge and backed up).
>> Please guide me how can i perform that using SolrCloud.  1 collection with 
>> unlimited sub collection.
>> 
>> Thank you very much in advanced.
>> 
>> Kind Regards,Muhammad Rehman Kahloon.



Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Erick Erickson
Use the "implicit" router (being renamed "manual". that takes the
value of a particular field (_route_ by default) and sends docs to
that exact shard.

But I also question whether sharding on this schema is a good idea. If
you have an access pattern where most queries are for, say, the last
two days then all the work will be done on only 2 machines and all the
rest will be idle. You should at least consider just using normal
routing that distributes the data across all shards and then use
delete-by-query to delete the data older than 10 days.

Best,
Erick

On Thu, Jul 20, 2017 at 12:51 AM, rehman kahloon
 wrote:
>
> Hi Sir,
> Taken your id from your document on SlideShare.
> Need your guidance on my plan ,My target is to create sub-collection/shards 
> within a collection.
> e.g
>  Currently 1 have 10 days data and want to store data against each 
> date in separate partitions.  like oracle partition concepts (one table can 
> have many partitions)
> Plan is to store each date data with in separate node, Total physical nodes 
> are 10 and after 10 days, 11th date data load in node1 and existing data 
> backup (oldest date data with purge and backed up).
> Please guide me how can i perform that using SolrCloud.  1 collection with 
> unlimited sub collection.
>
> Thank you very much in advanced.
>
> Kind Regards,Muhammad Rehman Kahloon.