Re: very slow frequent updates

Jeff Wartes Wed, 24 Feb 2016 10:08:39 -0800

I suspect your problem is the intersection of “very large document” and “high 
rate of change”. Either of those alone would be fine.


You’re correct, if the thing you need to search or sort by is the thing with a 
high change rate, you probably aren’t going to be able to peel those things out 
of your index. 

Perhaps you could work something out with join queries? So you have two kinds 
of documents - book content and book price - and your high-frequency change is 
limited to documents with very little data.





On 2/24/16, 4:01 AM, "roland.sz...@booknwalk.com on behalf of Szűcs Roland" 
<roland.sz...@booknwalk.com on behalf of szucs.rol...@bookandwalk.hu> wrote:

>I have checked it already in the ref. guide. It is stated that you can not
>search in external fields:
>https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes
>
>Really I am very curios that my problem is not a usual one or the case is
>that SOLR mainly focuses on search and not a kind of end-to-end support.
>How this approach works with 1 million documents with frequently changing
>prices?
>
>Thanks your time,
>
>Roland
>
>2016-02-24 12:39 GMT+01:00 Stefan Matheis <matheis.ste...@gmail.com>:
>
>> Depending of what features you do actually need, might be worth a look
>> on "External File Fields" Roland?
>>
>> -Stefan
>>
>> On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland
>> <szucs.rol...@bookandwalk.hu> wrote:
>> > Thanks Jeff your help,
>> >
>> > Can it work in production environment? Imagine when my customer initiate
>> a
>> > query having 1 000 docs in the result set. I can not use the pagination
>> of
>> > SOLR as the field which is the basis of the sort is not included in the
>> > schema for example the price. The customer wants the list in descending
>> > order of the price.
>> >
>> > So I have to get all the 1000 docids from solr and find the metadata of
>> > them in a sql database or in cache in best case. This is the way you
>> > suggested? Is it not too slow?
>> >
>> > Regards,
>> > Roland
>> >
>> > 2016-02-23 19:29 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>:
>> >
>> >>
>> >> My suggestion would be to split your problem domain. Use Solr
>> exclusively
>> >> for search - index the id and only those fields you need to search on.
>> Then
>> >> use some other data store for retrieval. Get the id’s from the solr
>> >> results, and look them up in the data store to get the rest of your
>> fields.
>> >> This allows you to keep your solr docs as small as possible, and you
>> only
>> >> need to update them when a *searchable* field changes.
>> >>
>> >> Every “update" in solr is a delete/insert. Even the "atomic update”
>> >> feature is just a shortcut for that. It requires stored fields because
>> the
>> >> data from the stored fields gets copied into the new insert.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 2/22/16, 12:21 PM, "Roland Szűcs" <roland.sz...@booknwalk.com>
>> wrote:
>> >>
>> >> >Hi folks,
>> >> >
>> >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
>> >> >fields do not change at all like content, author, publisher.... Only
>> the
>> >> >price field changes frequently.
>> >> >
>> >> >We let the customers to make full text search so we indexed the content
>> >> >filed. Due to the frequency of the price updates we use the atomic
>> update
>> >> >feature. As a requirement of the atomic updates we have to store all
>> the
>> >> >fields even the content field which is 1MB/document and we did not
>> want to
>> >> >store it just index it.
>> >> >
>> >> >As we wanted to update 100 documents with atomic update it took about 3
>> >> >minutes. Taking into account that our metadata /document is 1 Kb and
>> our
>> >> >content field / document is 1MB we use 1000 more memory to accelerate
>> the
>> >> >update process.
>> >> >
>> >> >I am almost 100% sure that we make something wrong.
>> >> >
>> >> >What is the best practice of the frequent updates when 99% part of a
>> given
>> >> >document is constant forever?
>> >> >
>> >> >Thank in advance
>> >> >
>> >> >--
>> >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Roland
>> >> Szűcs
>> >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Connect
>> >> with
>> >> >me on Linkedin <
>> >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
>> >> ><https://bookandwalk.hu/>
>> >> >CEO Phone: +36 1 210 81 13
>> >> >Bookandwalk.hu <https://bokandwalk.hu/>
>> >>
>> >
>> >
>> >
>> > --
>> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs
>> Roland
>> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
>> Ismerkedjünk
>> > meg a Linkedin <
>> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
>> > -en <https://bookandwalk.hu/>
>> > Ügyvezető Telefon: +36 1 210 81 13
>> > Bookandwalk.hu <https://bokandwalk.hu/>
>>
>
>
>
>-- 
><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs Roland
><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Ismerkedjünk
>meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
>-en <https://bookandwalk.hu/>
>Ügyvezető Telefon: +36 1 210 81 13
>Bookandwalk.hu <https://bokandwalk.hu/>

Re: very slow frequent updates

Reply via email to