Re: very slow frequent updates

2016-02-24 Thread Szűcs Roland
Thanks again Jeff. I will check the documentation of join queries becasue I
never used it before.

Regards

Roland

2016-02-24 19:07 GMT+01:00 Jeff Wartes :

>
> I suspect your problem is the intersection of “very large document” and
> “high rate of change”. Either of those alone would be fine.
>
> You’re correct, if the thing you need to search or sort by is the thing
> with a high change rate, you probably aren’t going to be able to peel those
> things out of your index.
>
> Perhaps you could work something out with join queries? So you have two
> kinds of documents - book content and book price - and your high-frequency
> change is limited to documents with very little data.
>
>
>
>
>
> On 2/24/16, 4:01 AM, "roland.sz...@booknwalk.com on behalf of Szűcs
> Roland"  szucs.rol...@bookandwalk.hu> wrote:
>
> >I have checked it already in the ref. guide. It is stated that you can not
> >search in external fields:
> >
> https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes
> >
> >Really I am very curios that my problem is not a usual one or the case is
> >that SOLR mainly focuses on search and not a kind of end-to-end support.
> >How this approach works with 1 million documents with frequently changing
> >prices?
> >
> >Thanks your time,
> >
> >Roland
> >
> >2016-02-24 12:39 GMT+01:00 Stefan Matheis :
> >
> >> Depending of what features you do actually need, might be worth a look
> >> on "External File Fields" Roland?
> >>
> >> -Stefan
> >>
> >> On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland
> >>  wrote:
> >> > Thanks Jeff your help,
> >> >
> >> > Can it work in production environment? Imagine when my customer
> initiate
> >> a
> >> > query having 1 000 docs in the result set. I can not use the
> pagination
> >> of
> >> > SOLR as the field which is the basis of the sort is not included in
> the
> >> > schema for example the price. The customer wants the list in
> descending
> >> > order of the price.
> >> >
> >> > So I have to get all the 1000 docids from solr and find the metadata
> of
> >> > them in a sql database or in cache in best case. This is the way you
> >> > suggested? Is it not too slow?
> >> >
> >> > Regards,
> >> > Roland
> >> >
> >> > 2016-02-23 19:29 GMT+01:00 Jeff Wartes :
> >> >
> >> >>
> >> >> My suggestion would be to split your problem domain. Use Solr
> >> exclusively
> >> >> for search - index the id and only those fields you need to search
> on.
> >> Then
> >> >> use some other data store for retrieval. Get the id’s from the solr
> >> >> results, and look them up in the data store to get the rest of your
> >> fields.
> >> >> This allows you to keep your solr docs as small as possible, and you
> >> only
> >> >> need to update them when a *searchable* field changes.
> >> >>
> >> >> Every “update" in solr is a delete/insert. Even the "atomic update”
> >> >> feature is just a shortcut for that. It requires stored fields
> because
> >> the
> >> >> data from the stored fields gets copied into the new insert.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On 2/22/16, 12:21 PM, "Roland Szűcs" 
> >> wrote:
> >> >>
> >> >> >Hi folks,
> >> >> >
> >> >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of
> the
> >> >> >fields do not change at all like content, author, publisher Only
> >> the
> >> >> >price field changes frequently.
> >> >> >
> >> >> >We let the customers to make full text search so we indexed the
> content
> >> >> >filed. Due to the frequency of the price updates we use the atomic
> >> update
> >> >> >feature. As a requirement of the atomic updates we have to store all
> >> the
> >> >> >fields even the content field which is 1MB/document and we did not
> >> want to
> >> >> >store it just index it.
> >> >> >
> >> >> >As we wanted to update 100 documents with atomic update it took
> about 3
> >> >> >minutes. Taking into account that our metadata /document is 1 Kb and
> >> our
> >> >> >content field / document is 1MB we use 1000 more memory to
> accelerate
> >> the
> >> >> >update process.
> >> >> >
> >> >> >I am almost 100% sure that we make something wrong.
> >> >> >
> >> >> >What is the best practice of the frequent updates when 99% part of a
> >> given
> >> >> >document is constant forever?
> >> >> >
> >> >> >Thank in advance
> >> >> >
> >> >> >--
> >> >> >
> Roland
> >> >> Szűcs
> >> >> >
> Connect
> >> >> with
> >> >> >me on Linkedin <
> >> >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> >> >> >
> >> >> >CEO Phone: +36 1 210 81 13
> >> >> >Bookandwalk.hu 
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >  Szűcs

Re: very slow frequent updates

2016-02-24 Thread Jeff Wartes

I suspect your problem is the intersection of “very large document” and “high 
rate of change”. Either of those alone would be fine.

You’re correct, if the thing you need to search or sort by is the thing with a 
high change rate, you probably aren’t going to be able to peel those things out 
of your index. 

Perhaps you could work something out with join queries? So you have two kinds 
of documents - book content and book price - and your high-frequency change is 
limited to documents with very little data.





On 2/24/16, 4:01 AM, "roland.sz...@booknwalk.com on behalf of Szűcs Roland" 
 wrote:

>I have checked it already in the ref. guide. It is stated that you can not
>search in external fields:
>https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes
>
>Really I am very curios that my problem is not a usual one or the case is
>that SOLR mainly focuses on search and not a kind of end-to-end support.
>How this approach works with 1 million documents with frequently changing
>prices?
>
>Thanks your time,
>
>Roland
>
>2016-02-24 12:39 GMT+01:00 Stefan Matheis :
>
>> Depending of what features you do actually need, might be worth a look
>> on "External File Fields" Roland?
>>
>> -Stefan
>>
>> On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland
>>  wrote:
>> > Thanks Jeff your help,
>> >
>> > Can it work in production environment? Imagine when my customer initiate
>> a
>> > query having 1 000 docs in the result set. I can not use the pagination
>> of
>> > SOLR as the field which is the basis of the sort is not included in the
>> > schema for example the price. The customer wants the list in descending
>> > order of the price.
>> >
>> > So I have to get all the 1000 docids from solr and find the metadata of
>> > them in a sql database or in cache in best case. This is the way you
>> > suggested? Is it not too slow?
>> >
>> > Regards,
>> > Roland
>> >
>> > 2016-02-23 19:29 GMT+01:00 Jeff Wartes :
>> >
>> >>
>> >> My suggestion would be to split your problem domain. Use Solr
>> exclusively
>> >> for search - index the id and only those fields you need to search on.
>> Then
>> >> use some other data store for retrieval. Get the id’s from the solr
>> >> results, and look them up in the data store to get the rest of your
>> fields.
>> >> This allows you to keep your solr docs as small as possible, and you
>> only
>> >> need to update them when a *searchable* field changes.
>> >>
>> >> Every “update" in solr is a delete/insert. Even the "atomic update”
>> >> feature is just a shortcut for that. It requires stored fields because
>> the
>> >> data from the stored fields gets copied into the new insert.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 2/22/16, 12:21 PM, "Roland Szűcs" 
>> wrote:
>> >>
>> >> >Hi folks,
>> >> >
>> >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
>> >> >fields do not change at all like content, author, publisher Only
>> the
>> >> >price field changes frequently.
>> >> >
>> >> >We let the customers to make full text search so we indexed the content
>> >> >filed. Due to the frequency of the price updates we use the atomic
>> update
>> >> >feature. As a requirement of the atomic updates we have to store all
>> the
>> >> >fields even the content field which is 1MB/document and we did not
>> want to
>> >> >store it just index it.
>> >> >
>> >> >As we wanted to update 100 documents with atomic update it took about 3
>> >> >minutes. Taking into account that our metadata /document is 1 Kb and
>> our
>> >> >content field / document is 1MB we use 1000 more memory to accelerate
>> the
>> >> >update process.
>> >> >
>> >> >I am almost 100% sure that we make something wrong.
>> >> >
>> >> >What is the best practice of the frequent updates when 99% part of a
>> given
>> >> >document is constant forever?
>> >> >
>> >> >Thank in advance
>> >> >
>> >> >--
>> >> > Roland
>> >> Szűcs
>> >> > Connect
>> >> with
>> >> >me on Linkedin <
>> >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
>> >> >
>> >> >CEO Phone: +36 1 210 81 13
>> >> >Bookandwalk.hu 
>> >>
>> >
>> >
>> >
>> > --
>> >  Szűcs
>> Roland
>> > 
>> Ismerkedjünk
>> > meg a Linkedin <
>> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
>> > -en 
>> > Ügyvezető Telefon: +36 1 210 81 13
>> > Bookandwalk.hu 
>>
>
>
>
>-- 
> Szűcs Roland
> 

Re: very slow frequent updates

2016-02-24 Thread Szűcs Roland
I have checked it already in the ref. guide. It is stated that you can not
search in external fields:
https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes

Really I am very curios that my problem is not a usual one or the case is
that SOLR mainly focuses on search and not a kind of end-to-end support.
How this approach works with 1 million documents with frequently changing
prices?

Thanks your time,

Roland

2016-02-24 12:39 GMT+01:00 Stefan Matheis :

> Depending of what features you do actually need, might be worth a look
> on "External File Fields" Roland?
>
> -Stefan
>
> On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland
>  wrote:
> > Thanks Jeff your help,
> >
> > Can it work in production environment? Imagine when my customer initiate
> a
> > query having 1 000 docs in the result set. I can not use the pagination
> of
> > SOLR as the field which is the basis of the sort is not included in the
> > schema for example the price. The customer wants the list in descending
> > order of the price.
> >
> > So I have to get all the 1000 docids from solr and find the metadata of
> > them in a sql database or in cache in best case. This is the way you
> > suggested? Is it not too slow?
> >
> > Regards,
> > Roland
> >
> > 2016-02-23 19:29 GMT+01:00 Jeff Wartes :
> >
> >>
> >> My suggestion would be to split your problem domain. Use Solr
> exclusively
> >> for search - index the id and only those fields you need to search on.
> Then
> >> use some other data store for retrieval. Get the id’s from the solr
> >> results, and look them up in the data store to get the rest of your
> fields.
> >> This allows you to keep your solr docs as small as possible, and you
> only
> >> need to update them when a *searchable* field changes.
> >>
> >> Every “update" in solr is a delete/insert. Even the "atomic update”
> >> feature is just a shortcut for that. It requires stored fields because
> the
> >> data from the stored fields gets copied into the new insert.
> >>
> >>
> >>
> >>
> >>
> >> On 2/22/16, 12:21 PM, "Roland Szűcs" 
> wrote:
> >>
> >> >Hi folks,
> >> >
> >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
> >> >fields do not change at all like content, author, publisher Only
> the
> >> >price field changes frequently.
> >> >
> >> >We let the customers to make full text search so we indexed the content
> >> >filed. Due to the frequency of the price updates we use the atomic
> update
> >> >feature. As a requirement of the atomic updates we have to store all
> the
> >> >fields even the content field which is 1MB/document and we did not
> want to
> >> >store it just index it.
> >> >
> >> >As we wanted to update 100 documents with atomic update it took about 3
> >> >minutes. Taking into account that our metadata /document is 1 Kb and
> our
> >> >content field / document is 1MB we use 1000 more memory to accelerate
> the
> >> >update process.
> >> >
> >> >I am almost 100% sure that we make something wrong.
> >> >
> >> >What is the best practice of the frequent updates when 99% part of a
> given
> >> >document is constant forever?
> >> >
> >> >Thank in advance
> >> >
> >> >--
> >> > Roland
> >> Szűcs
> >> > Connect
> >> with
> >> >me on Linkedin <
> >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> >> >
> >> >CEO Phone: +36 1 210 81 13
> >> >Bookandwalk.hu 
> >>
> >
> >
> >
> > --
> >  Szűcs
> Roland
> > 
> Ismerkedjünk
> > meg a Linkedin <
> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> > -en 
> > Ügyvezető Telefon: +36 1 210 81 13
> > Bookandwalk.hu 
>



-- 
 Szűcs Roland
 Ismerkedjünk
meg a Linkedin 
-en 
Ügyvezető Telefon: +36 1 210 81 13
Bookandwalk.hu 


Re: very slow frequent updates

2016-02-24 Thread Stefan Matheis
Depending of what features you do actually need, might be worth a look
on "External File Fields" Roland?

-Stefan

On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland
 wrote:
> Thanks Jeff your help,
>
> Can it work in production environment? Imagine when my customer initiate a
> query having 1 000 docs in the result set. I can not use the pagination of
> SOLR as the field which is the basis of the sort is not included in the
> schema for example the price. The customer wants the list in descending
> order of the price.
>
> So I have to get all the 1000 docids from solr and find the metadata of
> them in a sql database or in cache in best case. This is the way you
> suggested? Is it not too slow?
>
> Regards,
> Roland
>
> 2016-02-23 19:29 GMT+01:00 Jeff Wartes :
>
>>
>> My suggestion would be to split your problem domain. Use Solr exclusively
>> for search - index the id and only those fields you need to search on. Then
>> use some other data store for retrieval. Get the id’s from the solr
>> results, and look them up in the data store to get the rest of your fields.
>> This allows you to keep your solr docs as small as possible, and you only
>> need to update them when a *searchable* field changes.
>>
>> Every “update" in solr is a delete/insert. Even the "atomic update”
>> feature is just a shortcut for that. It requires stored fields because the
>> data from the stored fields gets copied into the new insert.
>>
>>
>>
>>
>>
>> On 2/22/16, 12:21 PM, "Roland Szűcs"  wrote:
>>
>> >Hi folks,
>> >
>> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
>> >fields do not change at all like content, author, publisher Only the
>> >price field changes frequently.
>> >
>> >We let the customers to make full text search so we indexed the content
>> >filed. Due to the frequency of the price updates we use the atomic update
>> >feature. As a requirement of the atomic updates we have to store all the
>> >fields even the content field which is 1MB/document and we did not want to
>> >store it just index it.
>> >
>> >As we wanted to update 100 documents with atomic update it took about 3
>> >minutes. Taking into account that our metadata /document is 1 Kb and our
>> >content field / document is 1MB we use 1000 more memory to accelerate the
>> >update process.
>> >
>> >I am almost 100% sure that we make something wrong.
>> >
>> >What is the best practice of the frequent updates when 99% part of a given
>> >document is constant forever?
>> >
>> >Thank in advance
>> >
>> >--
>> > Roland
>> Szűcs
>> > Connect
>> with
>> >me on Linkedin <
>> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
>> >
>> >CEO Phone: +36 1 210 81 13
>> >Bookandwalk.hu 
>>
>
>
>
> --
>  Szűcs Roland
>  Ismerkedjünk
> meg a Linkedin 
> -en 
> Ügyvezető Telefon: +36 1 210 81 13
> Bookandwalk.hu 


Re: very slow frequent updates

2016-02-24 Thread Szűcs Roland
Thanks Jeff your help,

Can it work in production environment? Imagine when my customer initiate a
query having 1 000 docs in the result set. I can not use the pagination of
SOLR as the field which is the basis of the sort is not included in the
schema for example the price. The customer wants the list in descending
order of the price.

So I have to get all the 1000 docids from solr and find the metadata of
them in a sql database or in cache in best case. This is the way you
suggested? Is it not too slow?

Regards,
Roland

2016-02-23 19:29 GMT+01:00 Jeff Wartes :

>
> My suggestion would be to split your problem domain. Use Solr exclusively
> for search - index the id and only those fields you need to search on. Then
> use some other data store for retrieval. Get the id’s from the solr
> results, and look them up in the data store to get the rest of your fields.
> This allows you to keep your solr docs as small as possible, and you only
> need to update them when a *searchable* field changes.
>
> Every “update" in solr is a delete/insert. Even the "atomic update”
> feature is just a shortcut for that. It requires stored fields because the
> data from the stored fields gets copied into the new insert.
>
>
>
>
>
> On 2/22/16, 12:21 PM, "Roland Szűcs"  wrote:
>
> >Hi folks,
> >
> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
> >fields do not change at all like content, author, publisher Only the
> >price field changes frequently.
> >
> >We let the customers to make full text search so we indexed the content
> >filed. Due to the frequency of the price updates we use the atomic update
> >feature. As a requirement of the atomic updates we have to store all the
> >fields even the content field which is 1MB/document and we did not want to
> >store it just index it.
> >
> >As we wanted to update 100 documents with atomic update it took about 3
> >minutes. Taking into account that our metadata /document is 1 Kb and our
> >content field / document is 1MB we use 1000 more memory to accelerate the
> >update process.
> >
> >I am almost 100% sure that we make something wrong.
> >
> >What is the best practice of the frequent updates when 99% part of a given
> >document is constant forever?
> >
> >Thank in advance
> >
> >--
> > Roland
> Szűcs
> > Connect
> with
> >me on Linkedin <
> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> >
> >CEO Phone: +36 1 210 81 13
> >Bookandwalk.hu 
>



-- 
 Szűcs Roland
 Ismerkedjünk
meg a Linkedin 
-en 
Ügyvezető Telefon: +36 1 210 81 13
Bookandwalk.hu 


Re: very slow frequent updates

2016-02-23 Thread Jeff Wartes

My suggestion would be to split your problem domain. Use Solr exclusively for 
search - index the id and only those fields you need to search on. Then use 
some other data store for retrieval. Get the id’s from the solr results, and 
look them up in the data store to get the rest of your fields. This allows you 
to keep your solr docs as small as possible, and you only need to update them 
when a *searchable* field changes.

Every “update" in solr is a delete/insert. Even the "atomic update” feature is 
just a shortcut for that. It requires stored fields because the data from the 
stored fields gets copied into the new insert.





On 2/22/16, 12:21 PM, "Roland Szűcs"  wrote:

>Hi folks,
>
>We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
>fields do not change at all like content, author, publisher Only the
>price field changes frequently.
>
>We let the customers to make full text search so we indexed the content
>filed. Due to the frequency of the price updates we use the atomic update
>feature. As a requirement of the atomic updates we have to store all the
>fields even the content field which is 1MB/document and we did not want to
>store it just index it.
>
>As we wanted to update 100 documents with atomic update it took about 3
>minutes. Taking into account that our metadata /document is 1 Kb and our
>content field / document is 1MB we use 1000 more memory to accelerate the
>update process.
>
>I am almost 100% sure that we make something wrong.
>
>What is the best practice of the frequent updates when 99% part of a given
>document is constant forever?
>
>Thank in advance
>
>-- 
> Roland Szűcs
> Connect with
>me on Linkedin 
>
>CEO Phone: +36 1 210 81 13
>Bookandwalk.hu 


very slow frequent updates

2016-02-22 Thread Roland Szűcs
Hi folks,

We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
fields do not change at all like content, author, publisher Only the
price field changes frequently.

We let the customers to make full text search so we indexed the content
filed. Due to the frequency of the price updates we use the atomic update
feature. As a requirement of the atomic updates we have to store all the
fields even the content field which is 1MB/document and we did not want to
store it just index it.

As we wanted to update 100 documents with atomic update it took about 3
minutes. Taking into account that our metadata /document is 1 Kb and our
content field / document is 1MB we use 1000 more memory to accelerate the
update process.

I am almost 100% sure that we make something wrong.

What is the best practice of the frequent updates when 99% part of a given
document is constant forever?

Thank in advance

-- 
 Roland Szűcs
 Connect with
me on Linkedin 

CEO Phone: +36 1 210 81 13
Bookandwalk.hu