Re: highlighter, stored documents and performance

2019-03-21 Thread Erick Erickson
By and large, storing data will not affect search speed as much as you might 
think. Getting the top N results (say 10) doesn’t use stored data at all. It’s 
only _after_ that point that highlighting occurs on the 10 docs.

As far as needing the full doc, Jörn is right, it must be stored. The problem 
is that what’s in the index, aside from being very expensive to use to 
reconstruct the doc (think 10s of seconds at least per doc) is lossy. Say you 
stem and one of your words is ‘running’. All that’s in the index is ‘run’ so 
using that to highlight, even if it were fast, wouldn’t be satisfactory.

Best,
Erick

> On Mar 21, 2019, at 9:32 AM, Jörn Franke  wrote:
> 
> Hi,
> 
> Then you have to go for the full documents. I recommend to reduce then the 
> returned results, use paging (if it is a web ui) and split the documents on 
> several nodes (if the previous measures do not turn out to be successful).
> 
> Best regards 
> 
>> Am 21.03.2019 um 17:15 schrieb Martin Frank Hansen (MHQ) :
>> 
>> Hi Jörn,
>> 
>> Thanks for your answer.
>> 
>> Unfortunately, there is no summary included in the documents  and I would 
>> like it to work for all documents.
>> 
>> Best regards
>> 
>> Martin
>> 
>> 
>> Internal - KMD A/S
>> 
>> -Original Message-
>> From: Jörn Franke 
>> Sent: 21. marts 2019 17:11
>> To: solr-user@lucene.apache.org
>> Subject: Re: highlighter, stored documents and performance
>> 
>> I don’t think so - to highlight any possible query you need the full 
>> document.
>> 
>> You could optimize it by only storing a subset of the document and highlight 
>> only in this subset.
>> 
>> Alternatively you can store a summary and show only the summary without 
>> highlighting.
>> 
>>> Am 21.03.2019 um 17:05 schrieb Martin Frank Hansen (MHQ) :
>>> 
>>> Hi,
>>> 
>>> I am wondering how performance highlighting in Solr performs when the 
>>> number of documents get large?
>>> 
>>> Right now we have about 1 TB of data in all sorts of file types and I was 
>>> wondering how storing these documents within Solr (for highlighting 
>>> purpose) will affect performance?
>>> 
>>> Is it possible to use highlighting without storing the documents?
>>> 
>>> Best regards
>>> 
>>> Martin
>>> 
>>> 
>>> 
>>> 
>>> Internal - KMD A/S
>>> 
>>> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
>>> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der 
>>> fortæller, hvordan vi behandler oplysninger om dig.
>>> 
>>> Protection of your personal data is important to us. Here you can read 
>>> KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we 
>>> process your personal data.
>>> 
>>> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
>>> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst 
>>> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi 
>>> dig slette e-mailen i dit system uden at videresende eller kopiere den. 
>>> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri 
>>> for virus og andre fejl, som kan påvirke computeren eller it-systemet, 
>>> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi 
>>> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse 
>>> med at modtage og bruge e-mailen.
>>> 
>>> Please note that this message may contain confidential information. If you 
>>> have received this message by mistake, please inform the sender of the 
>>> mistake by sending a reply, then delete the message from your system 
>>> without making, distributing or retaining any copies of it. Although we 
>>> believe that the message and any attachments are free from viruses and 
>>> other errors that might affect the computer or it-system where it is 
>>> received and read, the recipient opens the message at his or her own risk. 
>>> We assume no responsibility for any loss or damage arising from the receipt 
>>> or use of this message.



Re: highlighter, stored documents and performance

2019-03-21 Thread Jörn Franke
Hi,

Then you have to go for the full documents. I recommend to reduce then the 
returned results, use paging (if it is a web ui) and split the documents on 
several nodes (if the previous measures do not turn out to be successful).

Best regards 

> Am 21.03.2019 um 17:15 schrieb Martin Frank Hansen (MHQ) :
> 
> Hi Jörn,
> 
> Thanks for your answer.
> 
> Unfortunately, there is no summary included in the documents  and I would 
> like it to work for all documents.
> 
> Best regards
> 
> Martin
> 
> 
> Internal - KMD A/S
> 
> -Original Message-
> From: Jörn Franke 
> Sent: 21. marts 2019 17:11
> To: solr-user@lucene.apache.org
> Subject: Re: highlighter, stored documents and performance
> 
> I don’t think so - to highlight any possible query you need the full document.
> 
> You could optimize it by only storing a subset of the document and highlight 
> only in this subset.
> 
> Alternatively you can store a summary and show only the summary without 
> highlighting.
> 
>> Am 21.03.2019 um 17:05 schrieb Martin Frank Hansen (MHQ) :
>> 
>> Hi,
>> 
>> I am wondering how performance highlighting in Solr performs when the number 
>> of documents get large?
>> 
>> Right now we have about 1 TB of data in all sorts of file types and I was 
>> wondering how storing these documents within Solr (for highlighting purpose) 
>> will affect performance?
>> 
>> Is it possible to use highlighting without storing the documents?
>> 
>> Best regards
>> 
>> Martin
>> 
>> 
>> 
>> 
>> Internal - KMD A/S
>> 
>> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
>> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
>> hvordan vi behandler oplysninger om dig.
>> 
>> Protection of your personal data is important to us. Here you can read KMD’s 
>> Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process 
>> your personal data.
>> 
>> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
>> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst 
>> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi 
>> dig slette e-mailen i dit system uden at videresende eller kopiere den. 
>> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri 
>> for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori 
>> den modtages og læses, åbnes den på modtagerens eget ansvar. Vi påtager os 
>> ikke noget ansvar for tab og skade, som er opstået i forbindelse med at 
>> modtage og bruge e-mailen.
>> 
>> Please note that this message may contain confidential information. If you 
>> have received this message by mistake, please inform the sender of the 
>> mistake by sending a reply, then delete the message from your system without 
>> making, distributing or retaining any copies of it. Although we believe that 
>> the message and any attachments are free from viruses and other errors that 
>> might affect the computer or it-system where it is received and read, the 
>> recipient opens the message at his or her own risk. We assume no 
>> responsibility for any loss or damage arising from the receipt or use of 
>> this message.


RE: highlighter, stored documents and performance

2019-03-21 Thread Martin Frank Hansen (MHQ)
Hi Jörn,

Thanks for your answer.

Unfortunately, there is no summary included in the documents  and I would like 
it to work for all documents.

Best regards

Martin


Internal - KMD A/S

-Original Message-
From: Jörn Franke 
Sent: 21. marts 2019 17:11
To: solr-user@lucene.apache.org
Subject: Re: highlighter, stored documents and performance

I don’t think so - to highlight any possible query you need the full document.

You could optimize it by only storing a subset of the document and highlight 
only in this subset.

Alternatively you can store a summary and show only the summary without 
highlighting.

> Am 21.03.2019 um 17:05 schrieb Martin Frank Hansen (MHQ) :
>
> Hi,
>
> I am wondering how performance highlighting in Solr performs when the number 
> of documents get large?
>
> Right now we have about 1 TB of data in all sorts of file types and I was 
> wondering how storing these documents within Solr (for highlighting purpose) 
> will affect performance?
>
> Is it possible to use highlighting without storing the documents?
>
> Best regards
>
> Martin
>
>
>
>
> Internal - KMD A/S
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process 
> your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
>
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.


Re: highlighter, stored documents and performance

2019-03-21 Thread Jörn Franke
I don’t think so - to highlight any possible query you need the full document.

You could optimize it by only storing a subset of the document and highlight 
only in this subset.

Alternatively you can store a summary and show only the summary without 
highlighting. 

> Am 21.03.2019 um 17:05 schrieb Martin Frank Hansen (MHQ) :
> 
> Hi,
> 
> I am wondering how performance highlighting in Solr performs when the number 
> of documents get large?
> 
> Right now we have about 1 TB of data in all sorts of file types and I was 
> wondering how storing these documents within Solr (for highlighting purpose) 
> will affect performance?
> 
> Is it possible to use highlighting without storing the documents?
> 
> Best regards
> 
> Martin
> 
> 
> 
> 
> Internal - KMD A/S
> 
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik, der fortæller, 
> hvordan vi behandler oplysninger om dig.
> 
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy outlining how we process 
> your personal data.
> 
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
> 
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.


highlighter, stored documents and performance

2019-03-21 Thread Martin Frank Hansen (MHQ)
Hi,

I am wondering how performance highlighting in Solr performs when the number of 
documents get large?

Right now we have about 1 TB of data in all sorts of file types and I was 
wondering how storing these documents within Solr (for highlighting purpose) 
will affect performance?

Is it possible to use highlighting without storing the documents?

Best regards

Martin




Internal - KMD A/S

Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
KMD’s Privatlivspolitik, der fortæller, 
hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s 
Privacy Policy outlining how we process your 
personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis 
du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og 
ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre 
fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og 
læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar 
for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have 
received this message by mistake, please inform the sender of the mistake by 
sending a reply, then delete the message from your system without making, 
distributing or retaining any copies of it. Although we believe that the 
message and any attachments are free from viruses and other errors that might 
affect the computer or it-system where it is received and read, the recipient 
opens the message at his or her own risk. We assume no responsibility for any 
loss or damage arising from the receipt or use of this message.