Re: optimize boosting parameters
We monitor the response time (pingdom) of the page that uses these boosting parameters. Since the addition of these boosting parameters and an additional field to search on (which I will create a thread on it in the mailing list), the page average response time has increased by 1-2 seconds. Management has feedback on this. If it does turn out to be the boosting (and IIRC the map function can be expensive), can you pre-compute some number of the boosts? Your requirements look like they can be computed at index time, then boost by just the value of the pre-computed field. I have gone through the list of functions and map function is the only one that can meet the requirements. Or is there a less expensive function that I missed out? By pre-compute some number, do you mean before the indexing at preparation stage, check the value of P_SupplierResponseRate. If the value = 3, specify 'boost="0.4"' for the field of the document? BTW, boosts < 1.0 _reduce_ the score. I mention that just in case that’s a surprise ;) Oh it is to reduce the score?! Not increase (multiply or add) the score by less than 1? You use termfreq, which changes of course, but 1> if your corpus is updated often enough, the termfreqs will be relatively stable. in that case you can pre-compute them too. We do incremental indexing every half an hour on this collection. Average of 50K-100K documents during each indexing. Collection has 7+ milliion documents. So the entire corpus does not get updated in every indexing. 2> your problem statement has nothing to do with termfreq so why are you using it in the first place? I read up on termfreq function again. It returns the number of times the term appears in the field for that document. It does not really fit the requirements. Thank you for pointing it out. I should use map instead? Derek On 8/12/2020 9:48 pm, Erick Erickson wrote: Before worrying about it too much, exactly _how_ much has the performance changed? I’ve just been in too many situations where there’s no objective measure of performance before and after, just someone saying “it seems slower” and had those performance changes disappear when a rigorous test is done. Then spent a lot of time figuring out that the person reporting the problem hadn’t had coffee yet. Or the network was slow. Or…. If it does turn out to be the boosting (and IIRC the map function can be expensive), can you pre-compute some number of the boosts? Your requirements look like they can be computed at index time, then boost by just the value of the pre-computed field. BTW, boosts < 1.0 _reduce_ the score. I mention that just in case that’s a surprise ;) Of course that means that to change the boosting you need to re-index. You use termfreq, which changes of course, but 1> if your corpus is updated often enough, the termfreqs will be relatively stable. in that case you can pre-compute them too. 2> your problem statement has nothing to do with termfreq so why are you using it in the first place? Best, Erick On Dec 8, 2020, at 12:46 AM, Radu Gheorghe wrote: Hi Derek, Ah, then my reply was completely off :) I don’t really see a better way. Maybe other than changing termfreq to field, if the numeric field has docValues? That may be faster, but I don’t know for sure. Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production Support On 8 Dec 2020, at 06:17, Derek Poh wrote: Hi Radu Apologies for not making myself clear. I would like to know if there is a more simple or efficient way to craft the boosting parameters based on the requirements. For example, I am using 'if', 'map' and 'termfreq' functions in the bf parameters. Is there a more efficient or simple function that can be use instead? Or craft the 'formula' it in a more efficient way? On 7/12/2020 10:05 pm, Radu Gheorghe wrote: Hi Derek, It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself. I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back? Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG ( https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG ) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing. Or even better, you can have something like Quaerite play with boost values for you: https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production Support On 7 Dec 2020
Re: optimize boosting parameters
Hi Radu Apologies for not making myself clear. I would like to know if there is a more simple or efficient way to craft the boosting parameters based on the requirements. For example, I am using 'if', 'map' and 'termfreq' functions in the bf parameters. Is there a more efficient or simple function that can be use instead? Or craft the 'formula' it in a more efficient way? On 7/12/2020 10:05 pm, Radu Gheorghe wrote: Hi Derek, It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself. I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back? Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG (https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing. Or even better, you can have something like Quaerite play with boost values for you: https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production Support On 7 Dec 2020, at 10:51, Derek Poh wrote: Hi I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then. I am trying to find out if the added boosting parameters (below) could have contributed to the increased. The boosting is working as per requirements. May I know if the implemented boosting parameters can be enhanced or optimized further? Hopefully to improve on the response time of the query and the page. Requirements: 1. If P_SupplierResponseRate is: a. 3, boost by 0.4 b. 2, boost by 0.2 2. If P_SupplierResponseTime is: a. 4, boost by 0.4 b. 3, boost by 0.2 3. If P_MWSScore is: a. between 80-100, boost by 1.6 b. between 60-79, boost by 0.8 4. If P_SupplierRanking is: a. 3, boost by 0.3 b. 4, boost by 0.6 c. 5, boost by 0.9 b. 6, boost by 1.2 Boosting parameters implemented: bf=map(P_SupplierResponseRate,3,3,0.4,0) bf=map(P_SupplierResponseRate,2,2,0.2,0) bf=map(P_SupplierResponseTime,4,4,0.4,0) bf=map(P_SupplierResponseTime,3,3,0.2,0) bf=map(P_MWSScore,80,100,1.6,0) bf=map(P_MWSScore,60,79,0.8,0) bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0 I am using Solr 7.7.2 -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
optimize boosting parameters
Hi I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then. I am trying to find out if the added boosting parameters (below) could have contributed to the increased. The boosting is working as per requirements. May I know if the implemented boosting parameters can be enhanced or optimized further? Hopefully to improve on the response time of the query and the page. Requirements: 1. If P_SupplierResponseRate is: a. 3, boost by 0.4 b. 2, boost by 0.2 2. If P_SupplierResponseTime is: a. 4, boost by 0.4 b. 3, boost by 0.2 3. If P_MWSScore is: a. between 80-100, boost by 1.6 b. between 60-79, boost by 0.8 4. If P_SupplierRanking is: a. 3, boost by 0.3 b. 4, boost by 0.6 c. 5, boost by 0.9 b. 6, boost by 1.2 Boosting parameters implemented: bf=map(P_SupplierResponseRate,3,3,0.4,0) bf=map(P_SupplierResponseRate,2,2,0.2,0) bf=map(P_SupplierResponseTime,4,4,0.4,0) bf=map(P_SupplierResponseTime,3,3,0.2,0) bf=map(P_MWSScore,80,100,1.6,0) bf=map(P_MWSScore,60,79,0.8,0) bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0 I am using Solr 7.7.2 -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: advice on whether to use stopwords for use case
Yes, the requirements (for now) is not to return any results. I think they may change the requirements,pending their return from the holidays. If so, then check for those words in the query before sending it to Solr. That is what I think so too. Thinking further, using stopwords for this, there will still be results return when the number of words in the search keywords is more than the stopwords. On 1/10/2020 2:57 am, Walter Underwood wrote: I’m not clear on the requirements. It sounds like the query “cigar” or “cuban cigar” should return zero results. Is that right? If so, then check for those words in the query before sending it to Solr. But the stopwords approach seems like the requirement is different. Could you give some examples? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Sep 30, 2020, at 11:53 AM, Alexandre Rafalovitch wrote: You may also want to look at something like: https://docs.querqy.org/index.html ApacheCon had (is having..) a presentation on it that seemed quite relevant to your needs. The videos should be live in a week or so. Regards, Alex. On Tue, 29 Sep 2020 at 22:56, Alexandre Rafalovitch wrote: I am not sure why you think stop words are your first choice. Maybe I misunderstand the question. I read it as that you need to exclude completely a set of documents that include specific keywords when called from specific module. If I wanted to differentiate the searches from specific module, I would give that module a different end-point (Request Query Handler), instead of /select. So, /nocigs or whatever. Then, in that end-point, you could do all sorts of extra things, such as setting appends or even invariants parameters, which would include filter query to exclude any documents matching specific keywords. I assume it is ok to return documents that are matching for other reasons. Ideally, you would mark the cigs documents during indexing with a binary or enumeration flag and then during search you just need to check against that flag. In that case, you could copyField your text and run it against something like https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter combined with Shingles for multiwords. Or similar. And just transform it as index-only so that the result is basically a yes/no flag. Similar thing could be done with UpdateRequestProcessor pipeline if you want to end up with a true boolean flag. The idea is the same, just to have an index-only flag that you force lock into for any request from specific module. Or even with something like ElevationSearchComponent. Same idea. Hope this helps. Regards, Alex. On Tue, 29 Sep 2020 at 22:28, Derek Poh wrote: Hi I have read in the mailings list that we should try to avoid using stop words. I have a use case where I would like to know if there is other alternative solutions beside using stop words. There is business requirement to return zero result when the search is cigarette related words and the search is coming from a particular module on our site. It does not apply to all searches from our site. There is a list of these cigarette related words. This list contains single word, multiple words (Electronic cigar), multiple words with punctuation (e-cigarette case). I am planning to copy a different set of search fields, that will include the stopword filter in the index and query stage, for this module to use. For this use case, other than using stop words to handle it, is there any alternative solution? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: advice on whether to use stopwords for use case
Hi Alex The business requirement (for now) is not to return any result when the search keywords are cigarette related. The business user team will provide the list of the cigarette related keywords. Will digest, explore and research on your suggestions. Thank you. On 30/9/2020 10:56 am, Alexandre Rafalovitch wrote: I am not sure why you think stop words are your first choice. Maybe I misunderstand the question. I read it as that you need to exclude completely a set of documents that include specific keywords when called from specific module. If I wanted to differentiate the searches from specific module, I would give that module a different end-point (Request Query Handler), instead of /select. So, /nocigs or whatever. Then, in that end-point, you could do all sorts of extra things, such as setting appends or even invariants parameters, which would include filter query to exclude any documents matching specific keywords. I assume it is ok to return documents that are matching for other reasons. Ideally, you would mark the cigs documents during indexing with a binary or enumeration flag and then during search you just need to check against that flag. In that case, you could copyField your text and run it against something like https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter combined with Shingles for multiwords. Or similar. And just transform it as index-only so that the result is basically a yes/no flag. Similar thing could be done with UpdateRequestProcessor pipeline if you want to end up with a true boolean flag. The idea is the same, just to have an index-only flag that you force lock into for any request from specific module. Or even with something like ElevationSearchComponent. Same idea. Hope this helps. Regards, Alex. On Tue, 29 Sep 2020 at 22:28, Derek Poh wrote: Hi I have read in the mailings list that we should try to avoid using stop words. I have a use case where I would like to know if there is other alternative solutions beside using stop words. There is business requirement to return zero result when the search is cigarette related words and the search is coming from a particular module on our site. It does not apply to all searches from our site. There is a list of these cigarette related words. This list contains single word, multiple words (Electronic cigar), multiple words with punctuation (e-cigarette case). I am planning to copy a different set of search fields, that will include the stopword filter in the index and query stage, for this module to use. For this use case, other than using stop words to handle it, is there any alternative solution? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
advice on whether to use stopwords for use case
Hi I have read in the mailings list that we should try to avoid using stop words. I have a use case where I would like to know if there is other alternative solutions beside using stop words. There is business requirement to return zero result when the search is cigarette related words and the search is coming from a particular module on our site. It does not apply to all searches from our site. There is a list of these cigarette related words. This list contains single word, multiple words (Electronic cigar), multiple words with punctuation (e-cigarette case). I am planning to copy a different set of search fields, that will include the stopword filter in the index and query stage, for this module to use. For this use case, other than using stop words to handle it, is there any alternative solution? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
combined multiple bf into a single bf
I have the following boost requirement using bf response_rate is 3, boost by ^0.6 response_rate is 2, boost by ^0.3 response_time is 4, boost by ^0.6 response_time is 3, boost by ^0.3 I am using a bf for each of the boost requirement, bf=map(response_rate,3,3,0.6,0)=map(response_rate,2,2,0.3,0)=map(response_time,4,4,0.6,0)=map(response_time,3,3,0.3,0) I am trying to reduce on the number of parameters in the query. Is it possible to combined them into 1 or 2 bf? Running Solr 4.10.4. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
alternative suggestions on how to store product attributes in collection
Hi I would like to know if there are suggestions on how I can handle my task below. Please pardon the lengthy description. I need to store product attributes in a collection. Attributes like Size, Color, Material etc. Each product can have up to max of 5 attributes. Between products, their attributes can be different. Attribute can be added and deleted from the source system. A simple example of possible product attributes information Product Attribute Value P1 Size M P1 Size L P1 Color Red P2 Size M P2 Color Blue P3 Material Plastic P4 Amp 12 I have come up with 2 approaches to it: 1. If I store each attribute as a field in a collection, there will be alot of fields to create. Furthermore as attribute can be added and deleted, the maintaining of the attributes fields in solr will be difficult. However with each field for each attribute the product attribute facets will be easy and straight forward. Example, Size facet: M - 2 L - 1 Color facet: Red - 1 Blue - 1 2. Another approach is to create only a field to store the attributes and attributes value of a product. This field will be multi-value. Solr does not need to bother with new attribute and deleted attribute. Eg. P_ProductAttribute P1: Size-M Size-L Color-Red However the product attribute facet with this approach will required the UI to iterate through the facet, extract the attributes and their values to display as individual attribute facet in the search result page. Eg, of P_ProductAttribute facet: Color-Blue>1 Color-Red>1 Size-L>1 Size-M>2 Any other suggestion on how I can approach this? Regards, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: TolerantUpdateProcessorFactory maxErrors=-1 issue
Hi Tomas I moved TolerantUpdateProcessorFactoryto the beginning of the chain, reload the collection. The indexing process still abort. On 22/9/2018 4:28 AM, Tomás Fernández Löbbe wrote: Hi Derek, I suspect you need to move the TolerantUpdateProcessorFactory to the beginning of the chain On Thu, Sep 20, 2018 at 6:17 PM Derek Poh wrote: Does any one have any idea whatcould be the causeof this? On 19/9/2018 11:40 AM, Derek Poh wrote: In addition, I tried withmaxErrors=3 and with only 1error document, the indexing process still gets aborted. Could it be the way I defined the TolerantUpdateProcessorFactory in solrconfg.xml? On 18/9/2018 3:13 PM, Derek Poh wrote: Hi I am using CSV formatted indexupdates to index on tab delimited file. I have define "TolerantUpdateProcessorFactory" with "maxErrors=-1" in the solrconfig.xml to skip any document update error and proceed to update the remaining documents without failing. Howeverit does not seemto be workingas there is an document in the tab delimited file withadditional number of fields and this caused the indexing to abort instead. This is how I start the indexing, curl -o /apps/search/logs/indexing.log " http://localhost:8983/solr/$collection/update?update.chain=$updateChainName=true=%09=^=$fieldnames$splitOptions; --data-binary "@/apps/search/feed/$csvFilePath/$csvFileName" -H 'Content-type:application/csv' This is how the TolerantUpdateProcessorFactory is defined in the solrconfig.xml, P_SupplierId P_TradeShowId P_ProductId id id -1 43200 P_TradeShowOnlineEndDateUTC Solr version is 6.6.2. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: TolerantUpdateProcessorFactory maxErrors=-1 issue
Does any one have any idea whatcould be the causeof this? On 19/9/2018 11:40 AM, Derek Poh wrote: In addition, I tried withmaxErrors=3 and with only 1error document, the indexing process still gets aborted. Could it be the way I defined the TolerantUpdateProcessorFactory in solrconfg.xml? On 18/9/2018 3:13 PM, Derek Poh wrote: Hi I am using CSV formatted indexupdates to index on tab delimited file. I have define "TolerantUpdateProcessorFactory" with "maxErrors=-1" in the solrconfig.xml to skip any document update error and proceed to update the remaining documents without failing. Howeverit does not seemto be workingas there is an document in the tab delimited file withadditional number of fields and this caused the indexing to abort instead. This is how I start the indexing, curl -o /apps/search/logs/indexing.log "http://localhost:8983/solr/$collection/update?update.chain=$updateChainName=true=%09=^=$fieldnames$splitOptions; --data-binary "@/apps/search/feed/$csvFilePath/$csvFileName" -H 'Content-type:application/csv' This is how the TolerantUpdateProcessorFactory is defined in the solrconfig.xml, P_SupplierId P_TradeShowId P_ProductId id id -1 43200 P_TradeShowOnlineEndDateUTC Solr version is 6.6.2. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: TolerantUpdateProcessorFactory maxErrors=-1 issue
In addition, I tried withmaxErrors=3 and with only 1error document, the indexing process still gets aborted. Could it be the way I defined the TolerantUpdateProcessorFactory in solrconfg.xml? On 18/9/2018 3:13 PM, Derek Poh wrote: Hi I am using CSV formatted indexupdates to index on tab delimited file. I have define "TolerantUpdateProcessorFactory" with "maxErrors=-1" in the solrconfig.xml to skip any document update error and proceed to update the remaining documents without failing. Howeverit does not seemto be workingas there is an document in the tab delimited file withadditional number of fields and this caused the indexing to abort instead. This is how I start the indexing, curl -o /apps/search/logs/indexing.log "http://localhost:8983/solr/$collection/update?update.chain=$updateChainName=true=%09=^=$fieldnames$splitOptions; --data-binary "@/apps/search/feed/$csvFilePath/$csvFileName" -H 'Content-type:application/csv' This is how the TolerantUpdateProcessorFactory is defined in the solrconfig.xml, P_SupplierId P_TradeShowId P_ProductId id id -1 43200 P_TradeShowOnlineEndDateUTC Solr version is 6.6.2. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
TolerantUpdateProcessorFactory maxErrors settings issue
Hi I am using CSV formatted indexupdates to index on tab delimited file. I have define "TolerantUpdateProcessorFactory" with "maxErrors=-1" in the solrconfig.xml to skip any document update error and proceed to update the remaining documents without failing. Howeverit does not seemto be workingas there is an document in the tab delimited file withadditional number of fields and this caused the indexing to abort instead. This is how I start the indexing, curl -o /apps/search/logs/indexing.log "http://localhost:8983/solr/$collection/update?update.chain=$updateChainName=true=%09=^=$fieldnames$splitOptions; --data-binary "@/apps/search/feed/$csvFilePath/$csvFileName" -H 'Content-type:application/csv' This is how the TolerantUpdateProcessorFactory is defined in the solrconfig.xml, P_SupplierId P_TradeShowId P_ProductId id id -1 43200 P_TradeShowOnlineEndDateUTC Solr version is 6.6.2. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
TolerantUpdateProcessorFactory maxErrors=-1 issue
Hi I am using CSV formatted indexupdates to index on tab delimited file. I have define "TolerantUpdateProcessorFactory" with "maxErrors=-1" in the solrconfig.xml to skip any document update error and proceed to update the remaining documents without failing. Howeverit does not seemto be workingas there is an document in the tab delimited file withadditional number of fields and this caused the indexing to abort instead. This is how I start the indexing, curl -o /apps/search/logs/indexing.log "http://localhost:8983/solr/$collection/update?update.chain=$updateChainName=true=%09=^=$fieldnames$splitOptions; --data-binary "@/apps/search/feed/$csvFilePath/$csvFileName" -H 'Content-type:application/csv' This is how the TolerantUpdateProcessorFactory is defined in the solrconfig.xml, P_SupplierId P_TradeShowId P_ProductId id id -1 43200 P_TradeShowOnlineEndDateUTC Solr version is 6.6.2. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: change DocExpirationUpdateProcessorFactory deleteByQuery NOW parameter time zone
SG refers to Singaporeand the time is UTC +8. That means I need to set the P_TradeShowOnlineEndDate date to UTC instead of UTC +8 as a workaround to it. On 31/8/2018 10:16 PM, Shawn Heisey wrote: On 8/30/2018 7:26 PM, Derek Poh wrote: Can the timezone of the NOW parameter in the |deleteByQuery| of the DocExpirationUpdateProcessorFactory be change to my timezone? I am in SG and using solr 6.5.1. I do not know what SG is. The timezone cannot be changed. Solr *always* handles dates in UTC. You can assign a timezone when doing date math, but this is only used to determine when a new day or week starts -- the dates themselves will be in UTC. Thanks, Shawn -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
change DocExpirationUpdateProcessorFactory deleteByQuery NOW parameter time zone
Hi Can the timezone of the NOW parameter in the |deleteByQuery| of the DocExpirationUpdateProcessorFactory be change to my timezone? I am in SG and using solr 6.5.1. The timestamp of the entries in the solr.log is in my timezone but the NOW parameter of the |deleteByQuery| is a different timezone (UTC?). The |deleteByQuery| entry in the solr.log: 2018-08-30 16:34:03.941 INFO (qtp834133664-3600) [c:exhibitor_product_2 s:shard1 r:core_node1 x:exhibitor_product_2_shard1_replica2] o.a.s.u.p.LogUpdateProcessorFactory [exhibitor_product_2_shard1_replica2] webapp=/solr path=/update params={update.distrib=FROMLEADER&_version_=-1610212229046599680=http://192.168.83.152:8983/solr/exhibitor_product_2_shard1_replica1/=javabin=2}{deleteByQuery={!cache=false}P_TradeShowOnlineEndDate:[* TO 2018-08-30T08:34:06.804Z] (-1610212229046599680)} 0 23 DocExpirationUpdateProcessorFactory definition in solrconfig.xml: P_SupplierId P_TradeShowId P_ProductId id id -1 86400 P_TradeShowOnlineEndDate stored="true" multiValued="false"/> Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
collections replicas still in Recovery Mode after restarting Solr
Hi We have a setup of 2 servers, running Solr 6.6.2, on production. There are 5 collections. All collection are created as 1 shard x 2 replicas. 4 of the collections have this issue. A replica of each of this 4 collections is in Recovery Mode. The affected replicas are on the same server or node. I noticed there is no Leader node indicated for this 4 collections in the Solr Admin. This is the screenshot of the Solr Admin http://imagebucket.net/pmndqkijla5c/solr_admin.PNG This is the commands I used to stop and start the solr process, bin/solr stop -p 8983 bin/solr start -cloud -p 8983 -s "/apps/search/solr-6.6.2/home" -z hktszk1:2181,hktszk2:2181,hktszk3:2181 May I know how can I bring up this replicas? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: How to find out which search terms have matches in a search
Hi Erik I have explored the facetquery but it doesnotreally help. Thank you for your suggestion. On 12/6/2018 7:49 PM, Erik Hatcher wrote: Derek - One trick I like to do is try various forms of a query all in one go. With facet=on, you can: =big brown bear =big brown =brown bear =big =brown =bear The returned counts give you an indication of what queries matched docs in the result set, and which didn’t. If you did this with q=*:* you’d see how each of those matched across the entire collection. Grouping and group.query could be used similarly. I’ve used facet.query to do some Venn diagramming of overlap of search results like that. An oldie but a goodie: https://www.slideshare.net/lucenerevolution/hatcher-erik-rapid-prototyping-with-solr/12 <https://www.slideshare.net/lucenerevolution/hatcher-erik-rapid-prototyping-with-solr/12> 4.10.4? woah Erik Hatcher Senior Solutions Architect, Lucidworks.com On Jun 11, 2018, at 11:16 PM, Derek Poh wrote: Hi How can I find out which search terms have matches in a search? Eg. The search terms are "big brown bear".And only "big" and "brown" have matches in the searchresult. Can Solr return this information that "big" and "brown" have matches in the search result? I want touse this information to display on the search result page that "big" and "brown" have matches. Somethinglike "big brown bear". Amusing solr 4.10.4. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: How to find out which search terms have matches in a search
Seems like theHighlight feature could help but with some workaround. Will need to explore more on it. Thank you. On 12/6/2018 5:32 PM, Alessandro Benedetti wrote: I would recommend to look into the Highlight feature[1] . There are few implementations and they should be all right for your user requirement. Regards [1] https://lucene.apache.org/solr/guide/7_3/highlighting.html - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: How to find out which search terms have matches in a search
Sorry I realized the strike through on the term "bear" in "big brown bear" cannot be displayaccordinglyin the mailing list. My aim is to have the search terms "big brown bear", display on the search result page with the term "bear" striked through since it does not have a match in the search result. On 12/6/2018 11:16 AM, Derek Poh wrote: Hi How can I find out which search terms have matches in a search? Eg. The search terms are "big brown bear".And only "big" and "brown" have matches in the searchresult. Can Solr return this information that "big" and "brown" have matches in the search result? I want touse this information to display on the search result page that "big" and "brown" have matches. Somethinglike "big brown bear". Amusing solr 4.10.4. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
How to find out which search terms have matches in a search
Hi How can I find out which search terms have matches in a search? Eg. The search terms are "big brown bear".And only "big" and "brown" have matches in the searchresult. Can Solr return this information that "big" and "brown" have matches in the search result? I want touse this information to display on the search result page that "big" and "brown" have matches. Somethinglike "big brown bear". Amusing solr 4.10.4. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
edit gc parameters in solr.in.sh or solr?
Hi From your experience, would like to know if It is advisable to change the gc parameters in solr.in.sh or solrfile? It is mentioned in the documentation to edit solr.in.sh but would like toknow which file you actually edit. I am using Solr 6.6.2at the moment. Regards, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: ways to check if document is in a huge search result set
I see. Thank you. On 9/13/2017 2:36 PM, Michael Kuhlmann wrote: Am 13.09.2017 um 04:04 schrieb Derek Poh: Hi Michael "Then continue using binary search depending on the returned score values." May I know what do you mean by using binary search? An example algorithm is in Java method java.util.Arrays::binarySearch. Or more detailed: https://en.wikipedia.org/wiki/Binary_search_algorithm Best, Michael -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: ways to check if document is in a huge search result set
Hi Michael "Then continue using binary search depending on the returned score values." May I know what do you mean by using binary search? On 9/12/2017 3:08 PM, Michael Kuhlmann wrote: So you're looking for a solution to validate the result output. You have two ways: 1. Assuming you're sorting by the default "score" sort option: Find the result you're looking for by setting the fq filter clause accordingly, and add "score" the the fl field list. Then do the normal unfiltered search, still including "score", and start with page, let's say, 50,000. Then continue using binary search depending on the returned score values. 2. Set fl to return only the supplier id, then you'll probably be able to return several ten-thousand results at once. But be warned, the result position of these elements can vary with every single commit, esp. when there're lots of documents with the same score value. -Michael Am 12.09.2017 um 03:21 schrieb Derek Poh: Some additional information. I have a query from user that a supplier's product(s) is not in the search result. I debugged by adding a fq on the supplier id to the query to verify the supplier's product is in thesearch result. The products do existin the search result. I want to tell user in which page of the search result the supplier's product appear in. To do this I go through each page of the search result to find the supplier's product. It is still fine if the search result has a few hundreds products but it will be a chore if the result have thousands. In this case there are more than 100,000 products in the result. Any advice on easier ways to check which page the supplier's product or document appear in a search result? On 9/11/2017 2:44 PM, Mikhail Khludnev wrote: You can request facet field, query facet, filter or even explainOther. On Mon, Sep 11, 2017 at 5:12 AM, Derek Poh <d...@globalsources.com> wrote: Hi I have a collection of productdocument. Each productdocument has supplier information in it. I need to check if a supplier's products is return in a search resultcontaining over 100,000 products and in which page (assuming pagination is 20 products per page). Itis time-consuming and "labour-intensive" to go through each page to look for the product of the supplier. Would like to know if you guys have any better and easier waysto do this? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: ways to check if document is in a huge search result set
Some additional information. I have a query from user that a supplier's product(s) is not in the search result. I debugged by adding a fq on the supplier id to the query to verify the supplier's product is in thesearch result. The products do existin the search result. I want to tell user in which page of the search result the supplier's product appear in. To do this I go through each page of the search result to find the supplier's product. It is still fine if the search result has a few hundreds products but it will be a chore if the result have thousands. In this case there are more than 100,000 products in the result. Any advice on easier ways to check which page the supplier's product or document appear in a search result? On 9/11/2017 2:44 PM, Mikhail Khludnev wrote: You can request facet field, query facet, filter or even explainOther. On Mon, Sep 11, 2017 at 5:12 AM, Derek Poh <d...@globalsources.com> wrote: Hi I have a collection of productdocument. Each productdocument has supplier information in it. I need to check if a supplier's products is return in a search resultcontaining over 100,000 products and in which page (assuming pagination is 20 products per page). Itis time-consuming and "labour-intensive" to go through each page to look for the product of the supplier. Would like to know if you guys have any better and easier waysto do this? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
ways to check if document is in a huge search result set
Hi I have a collection of productdocument. Each productdocument has supplier information in it. I need to check if a supplier's products is return in a search resultcontaining over 100,000 products and in which page (assuming pagination is 20 products per page). Itis time-consuming and "labour-intensive" to go through each page to look for the product of the supplier. Would like to know if you guys have any better and easier waysto do this? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: different length/size of unique 'id' field value in a collection.
Hi Rick Myapologies I didnot make myself clearon the value of the fields. There are numbers. I used 'ts1', 'sup1' and 'pdt1' for simplicity and for ease of understanding instead of the actual numbers. You mentioned this design has the potential for (in error cases) concatenating id's incorrectly. Could explain more on this? On 5/22/2017 6:12 PM, Rick Leir wrote: On 2017-05-22 02:25 AM, Derek Poh wrote: Hi Due to the source data structure, I need to concatenate the values of 2 fields ('supplier_id' and 'product_id') to form the unique 'id' of each document. However there are cases where some documents only have 'supplier_id' field. This will result in some documents with a longer/larger 'id' field (have both 'supplier_id' and 'product_id') and some with a shorter/smaller 'id' field value (has only 'supplier_id'). Please refer to simplified representation of the records below. 3rd record only has supplier id . ts1 sup1 pdt1 ts1 sup1 pdt2 ts1 sup2 ts1 sup3 pdt3 ts1 sup4 pdt5 ts1 sup4 pdt6 I understand the unique 'id' is use during indexing to check whether a document already exists. Create if it does not exists else update if it exists. Are there any implications if the unique 'id' field value is of different size/length among documents of a collection? No Is it advisable to have such design? Derek You need unique ID's. This design has the potential for (in error cases) concatenating id's incorrectly. It might be better to have ID's which are just a number. That said, my current project has ID's which are not just a number, YMMV. cheers -- Rick Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
different length/size of unique 'id' field value in a collection.
Hi Due to the source data structure, I need to concatenate the values of 2 fields ('supplier_id' and 'product_id') to form the unique 'id' of each document. However there are cases where some documents only have 'supplier_id' field. This will result in some documents with a longer/larger 'id' field (have both 'supplier_id' and 'product_id') and some with a shorter/smaller 'id' field value (has only 'supplier_id'). Please refer to simplified representation of the records below. 3rd record only has supplier id . ts1 sup1 pdt1 ts1 sup1 pdt2 ts1 sup2 ts1 sup3 pdt3 ts1 sup4 pdt5 ts1 sup4 pdt6 I understand the unique 'id' is use during indexing to check whether a document already exists. Create if it does not exists else update if it exists. Are there any implications if the unique 'id' field value is of different size/length among documents of a collection? Is it advisable to have such design? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: 1 main collection or multiple smaller collections?
Richard Iam considering the sameoption asyour suggestion to put them in 1 single collection of products documents. A product doccontaining the supplier info. In this option, a supplier info will get repeated in eachof the supplier's product doc.I may be influenced by DB concepts. Guess it's a trade off for this option. On 4/28/2017 1:01 AM, Rick Leir wrote: Does it make sense to use nested documents here? Products could be nested in a supplier document perhaps. Alternately, consider de-normalizing "til it hurts". A product doc might be able to contain supplier info. On April 27, 2017 8:50:59 AM EDT, Shawn Heisey <apa...@elyograg.org> wrote: On 4/26/2017 11:57 PM, Derek Poh wrote: There are some common fields between them. At the source data end (database), the supplier info and product info are updated separately. In this regard, I should separate them? If it's In 1 single collection, when there are updatesto only the supplier info,the product info will be index again even though there is noupdates to them, Is my reasoning valid? On 4/27/2017 1:33 PM, Walter Underwood wrote: Do they have the same fields or different fields? Are they updated separately or together? If they have the same fields and are updated together, I’d put them in the same collection. Otherwise, probably separate. Walter's statements are right on the money, you just might need a little more detail. There are are two critical details that decide whether you even CAN combine different data in a single index: One is that all types of records must use the same field (the uniqueKey field) to determine uniqueness, and the value of this field must be unique across the entire dataset. The other is that there SHOULD be a field with a name like "type" that your search client can use to differentiate the different kinds of documents. This type field is not necessary, but it does make things easier. Assuming you CAN combine documents, there is still the question of whether you SHOULD. If the fields that you will commonly search are the same between the different kinds of documents, and if people want to be able to do one search and get more than one of the document types you are indexing, then it is something you should consider. If people will only ever search one type of document, you should probably keep them in separate indexes to keep things cleaner. Thanks, Shawn -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: 1 main collection or multiple smaller collections?
Hi Shawn 1 set of data is suppliers info and 1 set isthe suppliers products info. Usercan eitherdo a product search or a supplier search. 1 optionI am thinking of is to put them in 1 single collectionwith each product as a document. Each productdocument will have the supplier info in it. Product id will be the uniquekey field. With thisoption, the same supplier infowill be in every product document of the supplier. A simplified example: doc: product id: P1 product description: XXX supplier id: S1 supplier name: XXX suppiler address: XXX doc: product id: P2 product description: XXXYYY supplier id: S1 supplier name: XXX supplier address: XXX I may be influenced by DB concepts. Is such a design logical? On 4/27/2017 8:50 PM, Shawn Heisey wrote: On 4/26/2017 11:57 PM, Derek Poh wrote: There are some common fields between them. At the source data end (database), the supplier info and product info are updated separately. In this regard, I should separate them? If it's In 1 single collection, when there are updatesto only the supplier info,the product info will be index again even though there is noupdates to them, Is my reasoning valid? On 4/27/2017 1:33 PM, Walter Underwood wrote: Do they have the same fields or different fields? Are they updated separately or together? If they have the same fields and are updated together, I’d put them in the same collection. Otherwise, probably separate. Walter's statements are right on the money, you just might need a little more detail. There are are two critical details that decide whether you even CAN combine different data in a single index: One is that all types of records must use the same field (the uniqueKey field) to determine uniqueness, and the value of this field must be unique across the entire dataset. The other is that there SHOULD be a field with a name like "type" that your search client can use to differentiate the different kinds of documents. This type field is not necessary, but it does make things easier. Assuming you CAN combine documents, there is still the question of whether you SHOULD. If the fields that you will commonly search are the same between the different kinds of documents, and if people want to be able to do one search and get more than one of the document types you are indexing, then it is something you should consider. If people will only ever search one type of document, you should probably keep them in separate indexes to keep things cleaner. Thanks, Shawn -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: 1 main collection or multiple smaller collections?
There are some common fields between them. At the source data end (database), the supplier info and product info are updated separately. In this regard, I should separate them? If it's In 1 single collection, when there are updatesto only the supplier info,the product info will be index again even though there is noupdates to them, Is my reasoning valid? On 4/27/2017 1:33 PM, Walter Underwood wrote: Do they have the same fields or different fields? Are they updated separately or together? If they have the same fields and are updated together, I’d put them in the same collection. Otherwise, probably separate. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 26, 2017, at 10:25 PM, Derek Poh <d...@globalsources.com> wrote: Hi I amplanning for a migration of a legacy searchengine to Solr. Basically thedata can be categorisedinto suppliersinfo, suppliers products info and products category info. These sets of data are related to each other. suppliers products data, which is the largest, have around 300,000 records currentlyand projected to increase. Should I put these data in 1 single collection or in separate collections - eg. 1 collection for suppliers info, 1 collection for suppliers products infoand 1 collection fo products categories info? What should I consider and plan for when deciding which option to take? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
1 main collection or multiple smaller collections?
Hi I amplanning for a migration of a legacy searchengine to Solr. Basically thedata can be categorisedinto suppliersinfo, suppliers products info and products category info. These sets of data are related to each other. suppliers products data, which is the largest, have around 300,000 records currentlyand projected to increase. Should I put these data in 1 single collection or in separate collections - eg. 1 collection for suppliers info, 1 collection for suppliers products infoand 1 collection fo products categories info? What should I consider and plan for when deciding which option to take? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: format data at source or format data during indexing?
Hi Alex The business use case for the field is - exact match - singular-plural stemmingon each terms in the field Eg. search for "dvd cases" must match "dvd case"and "dvds case". This is the field type currently and It satisfy the business use case. The 1 drawback of this is I need to add those words that cannot be singular-plural stemmed correctly by EnglishMinimalStemFilter to the 'plural-singular.txt' of StemmerOverrideFilter as and when users reported on those words. positionIncrementGap="100"> pattern="^(.*)$" replacement="z01x $1 z01x" /> dictionary="plural_singular.txt" /> I am wondering if it is advisable to let Solr append the code 'z01x' during indexing or append the code at source data end and feed to Solr. For the query aspect, I will let Solr append the code to the query search words. On 3/30/2017 7:28 PM, Alexandre Rafalovitch wrote: What's you actual business use case? On 30 Mar 2017 1:53 AM, "Derek Poh" <d...@globalsources.com> wrote: Hi Erick So I could also not use the query analyzer stage to append the code to the search keyword? Have the front-end application append the code for every query it issue instead? On 3/30/2017 12:20 PM, Erick Erickson wrote: I generally prefer index-time work to query-time work on the theory that the index-time work is done once and the query time work is done for each query. That said, for a corpus this size (and presumably without a large query rate) I doubt you'd be able to measure any difference. So basically choose the easiest to implement IMO. Best, Erick On Wed, Mar 29, 2017 at 8:43 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: I am not sure I can tell how to decide on one or another. However, I wanted to mention that you also have an option of doing in in the UpdateRequestProcessor chain. That's still within Solr (and therefore is consistent with multiple clients feeding into Solr) but is before individual field processing (so will survive - for example - a copyField). Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 29 March 2017 at 23:38, Derek Poh <d...@globalsources.com> wrote: Hi Ineed to create afield that will be prefix and suffix with code 'z01x'.This field needs to have the code in the index and during query. I can either 1. have the source data of the field formatted with the code before indexing (outside solr). use a charFilter in the query stage of the field typeto add the codeduring query. OR 2. use the charFilter before tokenizerclass during the index and query analyzer stage of the field type. The collection has between 100k - 200k documents currentlybut it may increase in the future. Theindexing time with option 2 and current indexing time is almost the same, between 2-3 minutes. Which option would you advice? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: format data at source or format data during indexing?
Hi Alex Thank you for pointing out theUpdateRequestProcessor option. On 3/30/2017 11:43 AM, Alexandre Rafalovitch wrote: I am not sure I can tell how to decide on one or another. However, I wanted to mention that you also have an option of doing in in the UpdateRequestProcessor chain. That's still within Solr (and therefore is consistent with multiple clients feeding into Solr) but is before individual field processing (so will survive - for example - a copyField). Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 29 March 2017 at 23:38, Derek Poh <d...@globalsources.com> wrote: Hi Ineed to create afield that will be prefix and suffix with code 'z01x'.This field needs to have the code in the index and during query. I can either 1. have the source data of the field formatted with the code before indexing (outside solr). use a charFilter in the query stage of the field typeto add the codeduring query. OR 2. use the charFilter before tokenizerclass during the index and query analyzer stage of the field type. The collection has between 100k - 200k documents currentlybut it may increase in the future. Theindexing time with option 2 and current indexing time is almost the same, between 2-3 minutes. Which option would you advice? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: format data at source or format data during indexing?
Hi Erick So I could also not use the query analyzer stage to append the code to the search keyword? Have the front-end application append the code for every query it issue instead? On 3/30/2017 12:20 PM, Erick Erickson wrote: I generally prefer index-time work to query-time work on the theory that the index-time work is done once and the query time work is done for each query. That said, for a corpus this size (and presumably without a large query rate) I doubt you'd be able to measure any difference. So basically choose the easiest to implement IMO. Best, Erick On Wed, Mar 29, 2017 at 8:43 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: I am not sure I can tell how to decide on one or another. However, I wanted to mention that you also have an option of doing in in the UpdateRequestProcessor chain. That's still within Solr (and therefore is consistent with multiple clients feeding into Solr) but is before individual field processing (so will survive - for example - a copyField). Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 29 March 2017 at 23:38, Derek Poh <d...@globalsources.com> wrote: Hi Ineed to create afield that will be prefix and suffix with code 'z01x'.This field needs to have the code in the index and during query. I can either 1. have the source data of the field formatted with the code before indexing (outside solr). use a charFilter in the query stage of the field typeto add the codeduring query. OR 2. use the charFilter before tokenizerclass during the index and query analyzer stage of the field type. The collection has between 100k - 200k documents currentlybut it may increase in the future. Theindexing time with option 2 and current indexing time is almost the same, between 2-3 minutes. Which option would you advice? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
format data at source or format data during indexing?
Hi Ineed to create afield that will be prefix and suffix with code 'z01x'.This field needs to have the code in the index and during query. I can either 1. have the source data of the field formatted with the code before indexing (outside solr). use a charFilter in the query stage of the field typeto add the codeduring query. pattern="^(.*)$" replacement="z01x $1 z01x" /> OR 2. use the charFilter before tokenizerclass during the index and query analyzer stage of the field type. The collection has between 100k - 200k documents currentlybut it may increase in the future. Theindexing time with option 2 and current indexing time is almost the same, between 2-3 minutes. Which option would you advice? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: to handle expired documents: collection alias or delete by id query
Hi Tom The moving alias design is interesting, will explore it. Regarding themethod of creating the collection on a node for indexing only and adding replicas of it to other nodes for queryinguponcompletion of indexing. Am I right to say this is used in conjunction with collection alias or the moving alias you mentioned? On 3/24/2017 10:23 PM, Tom Evans wrote: On Thu, Mar 23, 2017 at 6:10 AM, Derek Poh <d...@globalsources.com> wrote: Hi I have collections of products. I am doing indexing 3-4 times daily. Every day there are products that expired and I need to remove them from these collectionsdaily. Ican think of 2 ways to do this. 1. using collection aliasto switch between a main and temp collection. - clear and index the temp collection - create alias to temp collection. - clear and index the main collection. - create alias to main collection. this way require additional collections. Another way of doing this is to have a moving alias (not constantly clearing the "temp" collection). If you reindex daily, your index would be called "products_mmdd" with an alias to "products". The advantage of this is that you can roll back to a previous version of the index if there are problems, and each index is guaranteed to be freshly created with no artifacts. The biggest consideration for me would be how long indexing your full corpus takes you. If you can do it in a small period of time, then full indexes would be preferable. If it takes a very long time, deleting is preferable. If you are doing a cloud setup, full indexes are even more appealing. You can create the new collection on a single node (even if sharded; just place each shard on the same node). This would only place the indexing cost on that one node, whilst other nodes would be unaffected by indexing degrading regular query response time. You also don't have to distribute the documents around the cluster. There is no distributed indexing in Solr, each replica has to index each document again, even if it is not the leader. Once indexing is complete, you can expand the collection by adding replicas of that shard on other nodes - perhaps even removing it from the node that did the indexing. We have a node that solely does indexing, before the collection is queried for anything it is added to the querying nodes. You can do this manually, or you can automate it using the collections API. Cheers Tom -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: to handle expired documents: collection alias or delete by id query
Erick Generally the products have contracted date but they could be extended and also get expired prematurely. We will need additional processing to cater for these scenarios and update the 'expiry date' fields accordingly. Will go through thedocumentationagainand see if it can fitour use case. Thank you, Derek On 3/23/2017 11:12 PM, Erick Erickson wrote: have you considered using TTL (Time To Live)? You have to know at index time when the doc will expire. If you do, Solr will delete the doc for you when its life is over. See: https://lucidworks.com/2014/05/07/document-expiration/ Also the Ref guide: https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors#UpdateRequestProcessors-UpdateRequestProcessorFactories particularly DocExpirationUpdateProcessorFactory Best, Erick On Thu, Mar 23, 2017 at 5:28 AM, Emir Arnautovic <emir.arnauto...@sematext.com> wrote: Hi Derek, There are both pros and cons for both approaches: 1. if you are doing full reindexing PRO is that you have clean index all the time and even if something goes wrong, you don't have to switch alias to updated index so your users will not notice issues. CON is that you are doing full reindex all the time even amount of changes is minimal. Also, this approach is not real time friendly if you plan to have more frequent update cycles. 2. If you delete in existing index, you do min changes. But note that deleted doc are just flagged in index as deleted and removed when segments are merged. This can result in skewed statistics and if you have replicas and sort by score, can result in different ordering depending on replicas' merge cycles. Using optimize after update is done would solve this issue. In order to make the right decision, you have to look at size of your collection, number of deleted items etc. You can even combine approaches, e.g. delete daily and do full reindex once a week. HTH, Emir On 23.03.2017 07:10, Derek Poh wrote: Hi I have collections of products. I am doing indexing 3-4 times daily. Every day there are products that expired and I need to remove them from these collectionsdaily. Ican think of 2 ways to do this. 1. using collection aliasto switch between a main and temp collection. - clear and index the temp collection - create alias to temp collection. - clear and index the main collection. - create alias to main collection. this way require additional collections. 2. get list of expired products and generate deleteby id queries to the collections. Would like to get some advice on which way should I adopt? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: to handle expired documents: collection alias or delete by id query
Hi Emir Thank you for pointing outdeleted docwill still existin the indextill it is optimize and itwill skewed statistics. We dosort by score. This new collectionsare partofa new business initiativeandwe do not know as yet what will be their sizelike. Willgo ponder on your inputs. Thank you. Derek On 3/23/2017 8:28 PM, Emir Arnautovic wrote: Hi Derek, There are both pros and cons for both approaches: 1. if you are doing full reindexing PRO is that you have clean index all the time and even if something goes wrong, you don't have to switch alias to updated index so your users will not notice issues. CON is that you are doing full reindex all the time even amount of changes is minimal. Also, this approach is not real time friendly if you plan to have more frequent update cycles. 2. If you delete in existing index, you do min changes. But note that deleted doc are just flagged in index as deleted and removed when segments are merged. This can result in skewed statistics and if you have replicas and sort by score, can result in different ordering depending on replicas' merge cycles. Using optimize after update is done would solve this issue. In order to make the right decision, you have to look at size of your collection, number of deleted items etc. You can even combine approaches, e.g. delete daily and do full reindex once a week. HTH, Emir On 23.03.2017 07:10, Derek Poh wrote: Hi I have collections of products. I am doing indexing 3-4 times daily. Every day there are products that expired and I need to remove them from these collectionsdaily. Ican think of 2 ways to do this. 1. using collection aliasto switch between a main and temp collection. - clear and index the temp collection - create alias to temp collection. - clear and index the main collection. - create alias to main collection. this way require additional collections. 2. get list of expired products and generate deleteby id queries to the collections. Would like to get some advice on which way should I adopt? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
to handle expired documents: collection alias or delete by id query
Hi I have collections of products. I am doing indexing 3-4 times daily. Every day there are products that expired and I need to remove them from these collectionsdaily. Ican think of 2 ways to do this. 1. using collection aliasto switch between a main and temp collection. - clear and index the temp collection - create alias to temp collection. - clear and index the main collection. - create alias to main collection. this way require additional collections. 2. get list of expired products and generate deleteby id queries to the collections. Would like to get some advice on which way should I adopt? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Break up a supplier's documents (products) from dominating search result.
While testing with groupparam(i think it apply to field collapse as well), I encountered a scenario where the number of suppliers in a result is less than the number of items to display per page (user select). Eg. Products per page to display is 80. The search result has 182 matching productswhichbelong to 13 suppliers. Grouping by supplier idand 1 product per supplier, only 13 products will be return. Issuing anotherquery to getmore products to fill up the page will not help as there is no more suppliers. Initial query parameters, start=0=80=grout=P_SupplierSource:(1)=true=P_SupplierId=simple issue another query to get more products to fill up. this will not return anyresult. start=80=80=grout=P_SupplierSource:(1)=true=P_SupplierId=simple Any suggestions/advice on how to address this scenario? On 11/29/2016 11:01 AM, Alexandre Rafalovitch wrote: You can use expand and it will provide several documents per group (but in a different data structure in the response). Then it is up to you how to sequence or interleave the results in your UI. You do need to deal with edge-cases like what happens if you say 3 products per group, but then one group has only one and you don't have enough items in a list, etc. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 29 November 2016 at 12:56, Derek Poh <d...@globalsources.com> wrote: Hi Walter You used field collapsing for your case as well? For my case the search result page is listing of products. There is a option to select the number of products to display per page. Let's say 40 products per page is selected. A search result has 100 matching products but these products belong to only 20 suppliers. The page will only display 20 products (1 product per supplier). We still need to fill up the remaining 20 empty products. How can I handle this scenario? On 11/29/2016 8:26 AM, Walter Underwood wrote: We had a similar feature in the Ultraseek search engine. One of our customers was a magazine publisher, and they wanted the best hit from each magazine on the first page. I expect that field collapsing would work for this. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Nov 28, 2016, at 4:19 PM, Derek Poh <d...@globalsources.com> wrote: Alex Hope I understand what you meant by positive business requirements. With a few supplier's products dominating the first page of a search result, the sales will not be able to convince prospectiveor existing clients to sign up. They would like the results tofeature other supplier's products as well. To the extreme case, they were thinking of displaying the results tobe in such order Supplier A product Supplier B product Supplier C product Supplier A product Supplier B product Supplier C product ... Theyare alright with implementing this logic tothe first page only andsubsequent pages will be as per current logic if it is not possible to implement it to the entire search result. Will take a lookat Collapse and Expandto seeif it can help. On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote: You have described your _negative_ business requirements, but not the _positive_ ones. So, it is hard to see what they want to happen. It is easy enough to promote or demote a particular filter matches. But you want to partially limit them. On a first page? What about on the second? I suspect you would have to have a slightly different interface to do this effectively. And, most likely, using Collapse and Expand: https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results . Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 28 November 2016 at 20:09, Derek Poh <d...@globalsources.com> wrote: Hi We have a business requirement to breakupa supplier's products from dominating search resultso as to allow othersuppliers' products in the search result to have exposure. Business users are open to implementing this for the first page of the search resultif it is not possible to apply tothe entire search result. From the sample keywords users have provided, I also discovered thatmost of the time a supplier's products that are listed consecutively in the result all have the same score. Any advice/suggestions on how I cando it? Please let me know if more information is require. Thank you. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or
Re: Break up a supplier's documents (products) from dominating search result.
Is there a way where we do not have to change the page UI? This is the search page for your reference. http://www.globalsources.com/gsol/GeneralManager?hostname=www.globalsources.com_search=on=search%2FProductSearchResults_search=off==PRODUCT=en=new=denim+fabric=en_id=300149681_id=23844==t=N=ProdSearch=GetPoint=DoFreeTextSearch_search=on_search=off=grid On 11/29/2016 10:04 AM, Walter Underwood wrote: We used something like field collapsing, but it wasn’t with Solr or Lucene. They had not been invented at the time. This was a feature of the Ultraseek engine from Infoseek, probably in 1997 or 1998. With field collapsing, you provide a link to show more results from that source. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Nov 28, 2016, at 5:56 PM, Derek Poh <d...@globalsources.com> wrote: Hi Walter You used field collapsing for your case as well? For my case the search result page is listing of products. There is a option to select the number of products to display per page. Let's say 40 products per page is selected. A search result has 100 matching products but these products belong to only 20 suppliers. The page will only display 20 products (1 product per supplier). We still need to fill up the remaining 20 empty products. How can I handle this scenario? On 11/29/2016 8:26 AM, Walter Underwood wrote: We had a similar feature in the Ultraseek search engine. One of our customers was a magazine publisher, and they wanted the best hit from each magazine on the first page. I expect that field collapsing would work for this. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Nov 28, 2016, at 4:19 PM, Derek Poh <d...@globalsources.com> wrote: Alex Hope I understand what you meant by positive business requirements. With a few supplier's products dominating the first page of a search result, the sales will not be able to convince prospectiveor existing clients to sign up. They would like the results tofeature other supplier's products as well. To the extreme case, they were thinking of displaying the results tobe in such order Supplier A product Supplier B product Supplier C product Supplier A product Supplier B product Supplier C product ... Theyare alright with implementing this logic tothe first page only andsubsequent pages will be as per current logic if it is not possible to implement it to the entire search result. Will take a lookat Collapse and Expandto seeif it can help. On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote: You have described your _negative_ business requirements, but not the _positive_ ones. So, it is hard to see what they want to happen. It is easy enough to promote or demote a particular filter matches. But you want to partially limit them. On a first page? What about on the second? I suspect you would have to have a slightly different interface to do this effectively. And, most likely, using Collapse and Expand: https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results . Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 28 November 2016 at 20:09, Derek Poh <d...@globalsources.com> wrote: Hi We have a business requirement to breakupa supplier's products from dominating search resultso as to allow othersuppliers' products in the search result to have exposure. Business users are open to implementing this for the first page of the search resultif it is not possible to apply tothe entire search result. From the sample keywords users have provided, I also discovered thatmost of the time a supplier's products that are listed consecutively in the result all have the same score. Any advice/suggestions on how I cando it? Please let me know if more information is require. Thank you. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored f
Re: Break up a supplier's documents (products) from dominating search result.
Hi Walter You used field collapsing for your case as well? For my case the search result page is listing of products. There is a option to select the number of products to display per page. Let's say 40 products per page is selected. A search result has 100 matching products but these products belong to only 20 suppliers. The page will only display 20 products (1 product per supplier). We still need to fill up the remaining 20 empty products. How can I handle this scenario? On 11/29/2016 8:26 AM, Walter Underwood wrote: We had a similar feature in the Ultraseek search engine. One of our customers was a magazine publisher, and they wanted the best hit from each magazine on the first page. I expect that field collapsing would work for this. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Nov 28, 2016, at 4:19 PM, Derek Poh <d...@globalsources.com> wrote: Alex Hope I understand what you meant by positive business requirements. With a few supplier's products dominating the first page of a search result, the sales will not be able to convince prospectiveor existing clients to sign up. They would like the results tofeature other supplier's products as well. To the extreme case, they were thinking of displaying the results tobe in such order Supplier A product Supplier B product Supplier C product Supplier A product Supplier B product Supplier C product ... Theyare alright with implementing this logic tothe first page only andsubsequent pages will be as per current logic if it is not possible to implement it to the entire search result. Will take a lookat Collapse and Expandto seeif it can help. On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote: You have described your _negative_ business requirements, but not the _positive_ ones. So, it is hard to see what they want to happen. It is easy enough to promote or demote a particular filter matches. But you want to partially limit them. On a first page? What about on the second? I suspect you would have to have a slightly different interface to do this effectively. And, most likely, using Collapse and Expand: https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results . Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 28 November 2016 at 20:09, Derek Poh <d...@globalsources.com> wrote: Hi We have a business requirement to breakupa supplier's products from dominating search resultso as to allow othersuppliers' products in the search result to have exposure. Business users are open to implementing this for the first page of the search resultif it is not possible to apply tothe entire search result. From the sample keywords users have provided, I also discovered thatmost of the time a supplier's products that are listed consecutively in the result all have the same score. Any advice/suggestions on how I cando it? Please let me know if more information is require. Thank you. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Break up a supplier's documents (products) from dominating search result.
Alex Hope I understand what you meant by positive business requirements. With a few supplier's products dominating the first page of a search result, the sales will not be able to convince prospectiveor existing clients to sign up. They would like the results tofeature other supplier's products as well. To the extreme case, they were thinking of displaying the results tobe in such order Supplier A product Supplier B product Supplier C product Supplier A product Supplier B product Supplier C product ... Theyare alright with implementing this logic tothe first page only andsubsequent pages will be as per current logic if it is not possible to implement it to the entire search result. Will take a lookat Collapse and Expandto seeif it can help. On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote: You have described your _negative_ business requirements, but not the _positive_ ones. So, it is hard to see what they want to happen. It is easy enough to promote or demote a particular filter matches. But you want to partially limit them. On a first page? What about on the second? I suspect you would have to have a slightly different interface to do this effectively. And, most likely, using Collapse and Expand: https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results . Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 28 November 2016 at 20:09, Derek Poh <d...@globalsources.com> wrote: Hi We have a business requirement to breakupa supplier's products from dominating search resultso as to allow othersuppliers' products in the search result to have exposure. Business users are open to implementing this for the first page of the search resultif it is not possible to apply tothe entire search result. From the sample keywords users have provided, I also discovered thatmost of the time a supplier's products that are listed consecutively in the result all have the same score. Any advice/suggestions on how I cando it? Please let me know if more information is require. Thank you. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Break up a supplier's documents (products) from dominating search result.
Hi We have a business requirement to breakupa supplier's products from dominating search resultso as to allow othersuppliers' products in the search result to have exposure. Business users are open to implementing this for the first page of the search resultif it is not possible to apply tothe entire search result. From the sample keywords users have provided, I also discovered thatmost of the time a supplier's products that are listed consecutively in the result all have the same score. Any advice/suggestions on how I cando it? Please let me know if more information is require. Thank you. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Split words with period in between into separate tokens
Why didn't I thought of that. That's another alternative. Thank you for your suggestion. Appreciate it. On 10/13/2016 5:41 AM, Georg Sorst wrote: You could use a PatternReplaceCharFilter before your tokenizer to replace the dot with a space character. Derek Poh <d...@globalsources.com> schrieb am Mi., 12. Okt. 2016 11:38: Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The field does has values with numbers in them therefore it is not applicable. Thank you. On 10/12/2016 4:22 PM, Dheerendra Kulkarni wrote: You can use LetterTokenizerFactory instead. Regards, Dheerendra Kulkarni On Wed, Oct 12, 2016 at 6:24 AM, Derek Poh <d...@globalsources.com> wrote: Hi How can I split words with period in between into separate tokens. Eg. "Co.Ltd" => "Co" "Ltd" . I am using StandardTokenizerFactory and it does notreplace periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names. This is the field definition, synonyms="synonyms.txt" ignoreCase="true" expand="true"/> Solr versionis 10.4.10. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Split words with period in between into separate tokens
Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The field does has values with numbers in them therefore it is not applicable. Thank you. On 10/12/2016 4:22 PM, Dheerendra Kulkarni wrote: You can use LetterTokenizerFactory instead. Regards, Dheerendra Kulkarni On Wed, Oct 12, 2016 at 6:24 AM, Derek Poh <d...@globalsources.com> wrote: Hi How can I split words with period in between into separate tokens. Eg. "Co.Ltd" => "Co" "Ltd" . I am using StandardTokenizerFactory and it does notreplace periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names. This is the field definition, Solr versionis 10.4.10. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Split words with period in between ("Co.Ltd") into separate tokens
Thank you for pointing out the flags. I set generateWordParts=1 and the term is split up. On 10/12/2016 3:26 PM, Modassar Ather wrote: Hi, The flags set in your WordDelimiterFilterFactory definition is 0. You can try with generateWordParts=1 and splitOnCaseChange=1 and see if it breaks as per your requirement. You can also try with other available flags enabled. Best, Modassar On Wed, Oct 12, 2016 at 12:44 PM, Derek Poh <d...@globalsources.com> wrote: I tried adding Word Delimiter Filter to the field but it does not process or it truncate away the term "Co.Ltd". On 10/12/2016 8:54 AM, Derek Poh wrote: Hi How can I split words with period in between into separate tokens. Eg. "Co.Ltd" => "Co" "Ltd" . I am using StandardTokenizerFactory and it does notreplace periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names. This is the field definition, Solr versionis 10.4.10. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Split words with period in between ("Co.Ltd") into separate tokens
I tried adding Word Delimiter Filter to the field but it does not process or it truncate away the term "Co.Ltd". generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> On 10/12/2016 8:54 AM, Derek Poh wrote: Hi How can I split words with period in between into separate tokens. Eg. "Co.Ltd" => "Co" "Ltd" . I am using StandardTokenizerFactory and it does notreplace periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names. This is the field definition, positionIncrementGap="100"> words="stopwords.txt" /> words="stopwords.txt" /> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> Solr versionis 10.4.10. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Split words with period in between into separate tokens
Hi How can I split words with period in between into separate tokens. Eg. "Co.Ltd" => "Co" "Ltd" . I am using StandardTokenizerFactory and it does notreplace periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names. This is the field definition, positionIncrementGap="100"> words="stopwords.txt" /> words="stopwords.txt" /> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> Solr versionis 10.4.10. Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
display filter based on existence of facet
I have a couple of filtersthat is text input based, where user will input a value into the text boxes of these filters. The condition is these filters will only be display if the facets exists in the search result. Eg. Min Order Qty filter will be displayif theMin Order Qty facet exists in thesolr result. To display this filter, I only need to'know' there is value to filter on. Currentlyall the possible terms and counts of the Min Order Qty field is return for this facet. Any suggestions on how I can avoid the computation of the possible terms and their countsfor the facet fieldand hence reduce the computational time of the query? I just need to know there is'a value to filter on'. This is the parameters of the query that is use to display the list of filters. group.field=P_SupplierId=true=true=0=0=coffee=P_SupplierSource:(1)=true=1=P_CNState=P_BusinessType=P_CombinedBusTypeFlat=P_CombinedCompCertFlat=P_CombinedExportCountryFlat=P_CombinedProdCertFlat=P_Country=P_CSFParticipant=P_FOBPriceMinFlag=P_FOBPriceMaxFlag=P_HasAuditInfo=P_HasCreditInfo=P_LeadTime=P_Microsite=P_MinOrderQty=P_MonthlyCapacityFlag=P_OEMServices=P_PSEParticipant=P_SupplierRanking=P_SupplierUpcomingTradeShow=P_YearsInBusiness=P_SmallOrderFlag Using solr 4.10.4 Thankyou, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: moving leader to another replica of a collection?
Hi Shawn Got it. Will delete all replicas on thatserver first before shutting down solron it. Thank you, Derek On 7/11/2016 9:43 AM, Shawn Heisey wrote: On 7/10/2016 7:34 PM, Derek Poh wrote: I need to remove a server from the cluster of serversrunning solr in my production environment. 1of the collection's replica is a leader on this server. The collection is setup as 1shard with 5 replicas. With each replica residing on a physical server. How can I move or assignanother replicaas the leader on another server? Or should I just go ahead and stop the solr process on this server and solr or zookeeper will elect another replicaas leader? If you shut down that Solr server, the remaining servers will elect a new leader. There is the preferred leader functionality, but this is really only something that's needed if you have a very large number of collections/shards and need to distribute the leader roles evenly among multiple servers. For a small number, having leaders concentrated on one server does not represent a performance problem. If the server will be permanently decommissioned, you should probably use DELETEREPLICA on the collections API to remove all replicas on that server before shutting it down. That can also initiate leader election. Thanks, Shawn -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
moving leader to another replica of a collection?
Hi This is my situation. I need to remove a server from the cluster of serversrunning solr in my production environment. 1of the collection's replica is a leader on this server. The collection is setup as 1shard with 5 replicas. With each replica residing on a physical server. How can I move or assignanother replicaas the leader on another server? Or should I just go ahead and stop the solr process on this server and solr or zookeeper will elect another replicaas leader? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Define search query parameters in Solr or let clients applicationscraft them?
Hi Scott, thank you for sharing your solution, appreciate it. To me interms of maintainability I think it will bebetter to define all the parameters either at the client end or solr end. On 6/15/2016 9:47 AM, scott.chu wrote: In my case, I write a HTTP gateway between application and Solr engine. This is long existed before I use Solr as SE. Back that time, I figure out one day I might replace our old SE and it would cause two dilemma: 1> If our applications directly call the API of THE search engines, when we replace it with another SE, all the calling statements have to be rewritten. It would be a very hard job for us, especially when the number and size of applications get bigger. 2> We have applications written in different languages and we from time to time need to maunally test status of SE by our system engineers. Furthermore, we want to fix some default parameters in the gateway for simplicity and security issues (e.g. Shortening the size of HTTP call, Prevent db names, field names, etc. shown in the HTTP call, etc.) And these considerations ended up with a gateway design. For your question, IMHO, I wouldn't define query parameters in Solr unless you think they WOULD BE GLOBALIZED. You can consider our solution. scott.chu,scott@udngroup.com 2016/6/15 (週三) - Original Message - From: Derek Poh To: solr-user CC: Date: 2016/6/13 (週一) 11:21 Subject: Define search query parameters in Solr or let clients applicationscraft them? Hi Would like to get some advice on should the queries parameters be define in Solr or let the clients applications define and pass the queries parameters to Solr? Regards, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. - 未在此訊息中找到病毒。 已透過 AVG 檢查 - www.avg.com 版本: 2015.0.6201 / 病毒庫: 4598/12409 - 發佈日期: 06/12/16 -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Define search query parameters in Solr or let clients applications craft them?
Hi Emir Yaguess one way is to implement a policy where new queries from client application have to be reviewcouple with periodic search log grooming as you have suggested. On 6/14/2016 4:12 PM, Emir Arnautovic wrote: Hi Derek, Unless you lock all your parameters, there will always be a chance of inefficient queries. Only way to fight that is to have full control of Solr interface and provide some search API, or to do regular search log grooming. Emir On 14.06.2016 03:05, Derek Poh wrote: Hi Emir Thank you for pointing out the cons of defining them in Solr config. One of the thing I am worry about in letting clientapplication defined the parametersis the developers will use or include unnecessary, wrong and resource intensive parameters. On 6/13/2016 5:50 PM, Emir Arnautovic wrote: Hi Derek, Maybe I am looking this from perspective who is working with other peoples' setups, but I prefer when it is defined in Solr configs: I can get sense of queries from looking at configs, you have mechanism to lock some parameters, updates are centralized... However, it does come with some cons: it is less expressive than what you can do in client code, you have to reload cores when you want to change, people tend to override it from client so you get configs in two places. HTH, Emir On 13.06.2016 05:21, Derek Poh wrote: Hi Would like to get some advice on should the queries parameters be define in Solr or let the clients applications define and pass the queries parameters to Solr? Regards, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Define search query parameters in Solr or let clients applications craft them?
Hi Emir Thank you for pointing out the cons of defining them in Solr config. One of the thing I am worry about in letting clientapplication defined the parametersis the developers will use or include unnecessary, wrong and resource intensive parameters. On 6/13/2016 5:50 PM, Emir Arnautovic wrote: Hi Derek, Maybe I am looking this from perspective who is working with other peoples' setups, but I prefer when it is defined in Solr configs: I can get sense of queries from looking at configs, you have mechanism to lock some parameters, updates are centralized... However, it does come with some cons: it is less expressive than what you can do in client code, you have to reload cores when you want to change, people tend to override it from client so you get configs in two places. HTH, Emir On 13.06.2016 05:21, Derek Poh wrote: Hi Would like to get some advice on should the queries parameters be define in Solr or let the clients applications define and pass the queries parameters to Solr? Regards, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Define search query parameters in Solr or let clients applications craft them?
Hi Would like to get some advice on should the queries parameters be define in Solr or let the clients applications define and pass the queries parameters to Solr? Regards, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: float or string type for a field with whole number and decimal number values?
Sorry about that. Thank you for your explanation. I still have some questions on using and setting up collection alias for my current situation. I will start a new threadon this. On 5/31/2016 11:21 PM, Erick Erickson wrote: First, when changing the topic of the thread, please start a new thread. This is called "thread hijacking" and makes it difficult to find threads later. Collection aliasing does not do _anything_ about adding/deleting/whatever. It's just a way to do exactly what you want. Your clients point to mycollection. You use the CREATEALIAS command to point mycollection to mycollection_1. Thereafter you can do anything you want to mycollection_1 using either name. That is, you can address mycollection_1 explicitly. You can use mycollection. It doesn't matter. Then you can create mycollection_2. So far you can _only_ address mycollection_2 explicitly. You then use the CREATEALIAS to point mycollection at mycollection_2. At that point, anybody using mycollection will start working with mycollection_2. Meanywhile, mycollection_1 is still addressable (presumably by the back end) by addressing it explicitly rather than through an alias. It has _not_ been changed in any way by creating the new alias. Best, Erick On Mon, May 30, 2016 at 11:15 PM, Derek Poh <d...@globalsources.com> wrote: Hi Erick Thank you for pointing out the sort behaviour of numbers in a string field. I did not think of that. Will use float. Would like to know how would you guys handle the usage of collection alias in my case. I have a 'product' collectionand Icreate a new collection'product_tmp' for this field type change and index into it. I create an alias 'product' on this new 'product_tmp' collection. IfI were to index to or delete documents from the 'product' collection, SOLR will index on and delete from 'product_tmp' collection, am I right? That means the 'product' collection cannot be usedanymore? Even if I were to create an alias 'product_old' on 'product' collection;issue a delete all documents or index on 'product_old', SOLR will delete or index on 'product_tmp' collection instead? My intention is to avoid having to updatethe clients serversto point to 'product_tmp' collection. On 5/31/2016 10:57 AM, Erick Erickson wrote: bq: Should I change the field type to "float" or "string"? I'd go with float. Let's assume you want to sort by this field. 10.00 sorts before 9.0 if you just use Strings. Plus floats are generally much more compact. bq: do I need to delete all documents in the index and do a full indexing That's the way I'd do it. You can always index to a _new_ collection (assuming SolrCloud) and use collection aliasing to switch your search all at once Best, Erick On Sun, May 29, 2016 at 12:56 AM, Derek Poh <d...@globalsources.com> wrote: I am using solr 4.10.4. On 5/29/2016 3:52 PM, Derek Poh wrote: Hi I have a field that is of "int" type currentlyand it's values are whole numbers. Due tochange inbusiness requirement, this field will need to take in decimal numbers as well. This fieldis sorted onand filter by range (field:[ 1 to *]). Should I change the field type to "float" or "string"? For the change to take effect, do I need to delete all documents in the index and do a full indexing? Or I can just do a full indexing without theneed to delete all documents first? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (incl
Re: float or string type for a field with whole number and decimal number values?
Hi Erick Thank you for pointing out the sort behaviour of numbers in a string field. I did not think of that. Will use float. Would like to know how would you guys handle the usage of collection alias in my case. I have a 'product' collectionand Icreate a new collection'product_tmp' for this field type change and index into it. I create an alias 'product' on this new 'product_tmp' collection. IfI were to index to or delete documents from the 'product' collection, SOLR will index on and delete from 'product_tmp' collection, am I right? That means the 'product' collection cannot be usedanymore? Even if I were to create an alias 'product_old' on 'product' collection;issue a delete all documents or index on 'product_old', SOLR will delete or index on 'product_tmp' collection instead? My intention is to avoid having to updatethe clients serversto point to 'product_tmp' collection. On 5/31/2016 10:57 AM, Erick Erickson wrote: bq: Should I change the field type to "float" or "string"? I'd go with float. Let's assume you want to sort by this field. 10.00 sorts before 9.0 if you just use Strings. Plus floats are generally much more compact. bq: do I need to delete all documents in the index and do a full indexing That's the way I'd do it. You can always index to a _new_ collection (assuming SolrCloud) and use collection aliasing to switch your search all at once Best, Erick On Sun, May 29, 2016 at 12:56 AM, Derek Poh <d...@globalsources.com> wrote: I am using solr 4.10.4. On 5/29/2016 3:52 PM, Derek Poh wrote: Hi I have a field that is of "int" type currentlyand it's values are whole numbers. Due tochange inbusiness requirement, this field will need to take in decimal numbers as well. This fieldis sorted onand filter by range (field:[ 1 to *]). Should I change the field type to "float" or "string"? For the change to take effect, do I need to delete all documents in the index and do a full indexing? Or I can just do a full indexing without theneed to delete all documents first? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: float or string type for a field with whole number and decimal number values?
I am using solr 4.10.4. On 5/29/2016 3:52 PM, Derek Poh wrote: Hi I have a field that is of "int" type currentlyand it's values are whole numbers. stored="true" multiValued="false"/> Due tochange inbusiness requirement, this field will need to take in decimal numbers as well. This fieldis sorted onand filter by range (field:[ 1 to *]). Should I change the field type to "float" or "string"? For the change to take effect, do I need to delete all documents in the index and do a full indexing? Or I can just do a full indexing without theneed to delete all documents first? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
float or string type for a field with whole number and decimal number values?
Hi I have a field that is of "int" type currentlyand it's values are whole numbers. stored="true" multiValued="false"/> Due tochange inbusiness requirement, this field will need to take in decimal numbers as well. This fieldis sorted onand filter by range (field:[ 1 to *]). Should I change the field type to "float" or "string"? For the change to take effect, do I need to delete all documents in the index and do a full indexing? Or I can just do a full indexing without theneed to delete all documents first? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Advice to add additional non-related fields to a collection or create a subset of it?
Mikhail It was caused by an endless loop in the page's codes that is triggered only under certain conditions. On 5/11/2016 4:07 PM, Mikhail Khludnev wrote: On Wed, May 11, 2016 at 10:16 AM, Derek Poh <d...@globalsources.com> wrote: Hi Erick Yes we have identified and fixed the page slow loading. Derek, Can you elaborate more? What did you fix? I was wondering if there are any best practices when it comes to deciding to create a single collection that stores all information in it or create multiple sub collections. I understand now itdepends on the use-case. My apologies for not giving it much thoughts before asking the questions. Thank you for your patience. - Derek On 5/10/2016 12:10 PM, Erick Erickson wrote: Not quite sure where you are at with this. It sounds like your slow loading is fixed and was a coding issue on your part, that happens to us all. bq: Is it advisable to has as less number of queries to solr in a page? Of course it is advisable to have as few Solr queries executed to display a page as possible. Every one costs you at least _some_ turnaround time. You can mitigate this (assuming your Solr server isn't running flat out) by issuing the subsequent queries in parallel threads. But it's not really a question to me of advisability, it's a question of what your application needs to deliver. The use-case drives all. You can do some tricks like display partial pages and fill in the rest behind the scenes to display when your user clicks something and the like. bq: In my case, by denormalizing,that means putting the product and supplier information into one collection? The supplier information are stored but not indexed in the collection. It Depends(tm). If all you want to do is provide supplier information when people do product searches then stored-only is fine. If you want to perform queries like "show me all the products supplied by supplier X", then you need to index at least some values too. Best, Erick On Sun, May 8, 2016 at 10:36 PM, Derek Poh <d...@globalsources.com> wrote: Hi Erick In my case, by denormalizing,that means putting the product and supplier information into one collection? The supplier information arestored but not indexed in thecollection. We haveidentified itwas a combination of a loop and bad source data that caused an endless loop under certain scenario. Is it advisable to has as less number of queries to solr in a page? On 5/6/2016 11:17 PM, Erick Erickson wrote: Denormalizing the data is usually the first thing to try. That's certainly the preferred option if it doesn't bloat the index unacceptably. But my real question is what have you done to try to figure out _why_ it's slow? Do you have some loop like for (each found document) extract all the supplier IDs and query Solr for them) ? That's a fundamental design decision that will be expensive. Have you examined the time each query takes to see if Solr is really the bottleneck or whether it's "something else"? Mind you, I have no clue what "something else" is here Do you ever return lots of rows (i.e. thousands)? Solr serves queries very quickly, so I'd concentrate on identifying what is slow before jumping to a solution Best, Erick On Wed, May 4, 2016 at 10:28 PM, Derek Poh <d...@globalsources.com> wrote: Hi We have a "product" collection and a "supplier" collection. The "product" collection contains products information and "supplier" collection contains the product's suppliers information. We have a subsidiary page that query on "product" collection for the search. The display result include product and supplier information. This page will query the "product" collection to get the matching product records. From this query a list of the matching product's supplier id is extracted and used in a filter query against the "supplier" collection to get the necessary supplier's information. The loading of this page is very slow, it leads to timeout at times as well. Beside looking at tweaking the codes of the page we are also looking at what tweaking can be done on solr side. Reducing the number of queries generated bythis page was one of the optionto try. The main "product" collection is also use by our site main search page and other subsidiary pages as well. So the query load on it is substantial. It has about 6.5 million documents and index size of 38-39 GB. It is setup as 1 shard with 5 replicas. Each replica is on it's own server. Total of 5 servers. There are other smaller collections with similar 1 shard 5 replicas setup residing on these servers as well. I am thinking of either 1. Index supplier information into the "product" collection. 2. Create another similar "product" collection for this page to use. This collection will have lesser product fields and will include the required supplier fields. But
Re: Advice to add additional non-related fields to a collection or create a subset of it?
Hi Erick Yes we have identified and fixed the page slow loading. I was wondering if there are any best practices when it comes to deciding to create a single collection that stores all information in it or create multiple sub collections. I understand now itdepends on the use-case. My apologies for not giving it much thoughts before asking the questions. Thank you for your patience. - Derek On 5/10/2016 12:10 PM, Erick Erickson wrote: Not quite sure where you are at with this. It sounds like your slow loading is fixed and was a coding issue on your part, that happens to us all. bq: Is it advisable to has as less number of queries to solr in a page? Of course it is advisable to have as few Solr queries executed to display a page as possible. Every one costs you at least _some_ turnaround time. You can mitigate this (assuming your Solr server isn't running flat out) by issuing the subsequent queries in parallel threads. But it's not really a question to me of advisability, it's a question of what your application needs to deliver. The use-case drives all. You can do some tricks like display partial pages and fill in the rest behind the scenes to display when your user clicks something and the like. bq: In my case, by denormalizing,that means putting the product and supplier information into one collection? The supplier information are stored but not indexed in the collection. It Depends(tm). If all you want to do is provide supplier information when people do product searches then stored-only is fine. If you want to perform queries like "show me all the products supplied by supplier X", then you need to index at least some values too. Best, Erick On Sun, May 8, 2016 at 10:36 PM, Derek Poh <d...@globalsources.com> wrote: Hi Erick In my case, by denormalizing,that means putting the product and supplier information into one collection? The supplier information arestored but not indexed in thecollection. We haveidentified itwas a combination of a loop and bad source data that caused an endless loop under certain scenario. Is it advisable to has as less number of queries to solr in a page? On 5/6/2016 11:17 PM, Erick Erickson wrote: Denormalizing the data is usually the first thing to try. That's certainly the preferred option if it doesn't bloat the index unacceptably. But my real question is what have you done to try to figure out _why_ it's slow? Do you have some loop like for (each found document) extract all the supplier IDs and query Solr for them) ? That's a fundamental design decision that will be expensive. Have you examined the time each query takes to see if Solr is really the bottleneck or whether it's "something else"? Mind you, I have no clue what "something else" is here Do you ever return lots of rows (i.e. thousands)? Solr serves queries very quickly, so I'd concentrate on identifying what is slow before jumping to a solution Best, Erick On Wed, May 4, 2016 at 10:28 PM, Derek Poh <d...@globalsources.com> wrote: Hi We have a "product" collection and a "supplier" collection. The "product" collection contains products information and "supplier" collection contains the product's suppliers information. We have a subsidiary page that query on "product" collection for the search. The display result include product and supplier information. This page will query the "product" collection to get the matching product records. From this query a list of the matching product's supplier id is extracted and used in a filter query against the "supplier" collection to get the necessary supplier's information. The loading of this page is very slow, it leads to timeout at times as well. Beside looking at tweaking the codes of the page we are also looking at what tweaking can be done on solr side. Reducing the number of queries generated bythis page was one of the optionto try. The main "product" collection is also use by our site main search page and other subsidiary pages as well. So the query load on it is substantial. It has about 6.5 million documents and index size of 38-39 GB. It is setup as 1 shard with 5 replicas. Each replica is on it's own server. Total of 5 servers. There are other smaller collections with similar 1 shard 5 replicas setup residing on these servers as well. I am thinking of either 1. Index supplier information into the "product" collection. 2. Create another similar "product" collection for this page to use. This collection will have lesser product fields and will include the required supplier fields. But the number of documents in it will be the same as the main "product" collection. The index size will be smallerthough. With either 2 options we do not need to query "supplier" collection. So there is one less query and hopefully it will improve the performance of this page. W
Re: Advice to add additional non-related fields to a collection or create a subset of it?
Hi Erick In my case, by denormalizing,that means putting the product and supplier information into one collection? The supplier information arestored but not indexed in thecollection. We haveidentified itwas a combination of a loop and bad source data that caused an endless loop under certain scenario. Is it advisable to has as less number of queries to solr in a page? On 5/6/2016 11:17 PM, Erick Erickson wrote: Denormalizing the data is usually the first thing to try. That's certainly the preferred option if it doesn't bloat the index unacceptably. But my real question is what have you done to try to figure out _why_ it's slow? Do you have some loop like for (each found document) extract all the supplier IDs and query Solr for them) ? That's a fundamental design decision that will be expensive. Have you examined the time each query takes to see if Solr is really the bottleneck or whether it's "something else"? Mind you, I have no clue what "something else" is here Do you ever return lots of rows (i.e. thousands)? Solr serves queries very quickly, so I'd concentrate on identifying what is slow before jumping to a solution Best, Erick On Wed, May 4, 2016 at 10:28 PM, Derek Poh <d...@globalsources.com> wrote: Hi We have a "product" collection and a "supplier" collection. The "product" collection contains products information and "supplier" collection contains the product's suppliers information. We have a subsidiary page that query on "product" collection for the search. The display result include product and supplier information. This page will query the "product" collection to get the matching product records. From this query a list of the matching product's supplier id is extracted and used in a filter query against the "supplier" collection to get the necessary supplier's information. The loading of this page is very slow, it leads to timeout at times as well. Beside looking at tweaking the codes of the page we are also looking at what tweaking can be done on solr side. Reducing the number of queries generated bythis page was one of the optionto try. The main "product" collection is also use by our site main search page and other subsidiary pages as well. So the query load on it is substantial. It has about 6.5 million documents and index size of 38-39 GB. It is setup as 1 shard with 5 replicas. Each replica is on it's own server. Total of 5 servers. There are other smaller collections with similar 1 shard 5 replicas setup residing on these servers as well. I am thinking of either 1. Index supplier information into the "product" collection. 2. Create another similar "product" collection for this page to use. This collection will have lesser product fields and will include the required supplier fields. But the number of documents in it will be the same as the main "product" collection. The index size will be smallerthough. With either 2 options we do not need to query "supplier" collection. So there is one less query and hopefully it will improve the performance of this page. What is the advise between the 2 options? Any other advice or options? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Advice to add additional non-related fields to a collection or create a subset of it?
Hi We have a "product" collection and a "supplier" collection. The "product" collection contains products information and "supplier" collection contains the product's suppliers information. We have a subsidiary page that query on "product" collection for the search. The display result include product and supplier information. This page will query the "product" collection to get the matching product records. From this query a list of the matching product's supplier id is extracted and used in a filter query against the "supplier" collection to get the necessary supplier's information. The loading of this page is very slow, it leads to timeout at times as well. Beside looking at tweaking the codes of the page we are also looking at what tweaking can be done on solr side. Reducing the number of queries generated bythis page was one of the optionto try. The main "product" collection is also use by our site main search page and other subsidiary pages as well. So the query load on it is substantial. It has about 6.5 million documents and index size of 38-39 GB. It is setup as 1 shard with 5 replicas. Each replica is on it's own server. Total of 5 servers. There are other smaller collections with similar 1 shard 5 replicas setup residing on these servers as well. I am thinking of either 1. Index supplier information into the "product" collection. 2. Create another similar "product" collection for this page to use. This collection will have lesser product fields and will include the required supplier fields. But the number of documents in it will be the same as the main "product" collection. The index size will be smallerthough. With either 2 options we do not need to query "supplier" collection. So there is one less query and hopefully it will improve the performance of this page. What is the advise between the 2 options? Any other advice or options? Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: make document with more matches rank higher with edismax parser?
Will try the "tie" parameterand see if it satisfy business user requirements. Thank you. On 4/2/2016 7:15 AM, Alexandre Rafalovitch wrote: Have you tried 'tie' parameter? https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Thetie%28TieBreaker%29Parameter Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 1 April 2016 at 14:03, Derek Poh <d...@globalsources.com> wrote: Hi Correct me if I am wrong, my understanding of edismax parser is it use the max score of the matches in a doc. How do I make docs with more matches rank higher with edismax? These 2 docs are from the same query result and this is their order in the result. P_ProductId: 1116393488 P_CatConcatKeyword: Bancos del poder P_NewShortDescription: Accione el banco, 10,400mAh, 5.0V DC entran P_VeryShortDescription: Accione el banco score: 0.83850163 P_ProductId: 1124048475 P_CatConcatKeyword: Bancos del poder P_NewShortDescription: Banco del poder con el altavoz P_VeryShortDescription: Banco del poder score: 0.83850163 q=Bancos del poder qf=P_CatConcatKeyword^3.0 P_NewShortDescription^2.0 P_NewVeryShortDescription^1.0 From the debug info, both docs max score match is from P_CatConcatKeyword field. Debug info of both docs attached. Comparing the field matches between both, the 2nd doc has more fields with matches. How can I make 2nd doc ranked higher based on this? -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
make document with more matches rank higher with edismax parser?
Hi Correct me if I am wrong, my understanding of edismax parser is it use the max score of the matches in a doc. How do I make docs with more matches rank higher with edismax? These 2 docs are from the same query resultand this is their order in the result. P_ProductId: 1116393488 P_CatConcatKeyword: Bancos del poder P_NewShortDescription: Accione el banco, 10,400mAh, 5.0V DC entran P_VeryShortDescription: Accione el banco score: 0.83850163 P_ProductId: 1124048475 P_CatConcatKeyword: Bancos del poder P_NewShortDescription: Banco del poder con el altavoz P_VeryShortDescription: Banco del poder score: 0.83850163 q=Bancos del poder qf=P_CatConcatKeyword^3.0 P_NewShortDescription^2.0 P_NewVeryShortDescription^1.0 From the debug info, both docs max score match is from P_CatConcatKeyword field. Debug info of both docsattached. Comparing the field matches between both, the 2nd doc has more fields with matches. How can I make 2nd doc ranked higher based on this? -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.1124048475 0.83850163 = (MATCH) sum of: 0.004233816 = (MATCH) sum of: 0.0019395099 = (MATCH) max of: 8.000289E-9 = (MATCH) weight(spp_keyword:banc^1.0E-5 in 6088628) [DefaultSimilarity], result of: 8.000289E-9 = score(doc=6088628,freq=1.0), product of: 1.74163E-9 = queryWeight, product of: 1.0E-5 = boost 9.187129 = idf(docFreq=1868, maxDocs=6717914) 1.8957282E-5 = queryNorm 4.5935645 = fieldWeight in 6088628, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 9.187129 = idf(docFreq=1868, maxDocs=6717914) 0.5 = fieldNorm(doc=6088628) 5.8594847E-4 = (MATCH) weight(P_NewShortDescription:banco in 6088628) [DefaultSimilarity], result of: 5.8594847E-4 = score(doc=6088628,freq=1.0), product of: 1.0539445E-4 = queryWeight, product of: 5.559576 = idf(docFreq=70312, maxDocs=6717914) 1.8957282E-5 = queryNorm 5.559576 = fieldWeight in 6088628, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.559576 = idf(docFreq=70312, maxDocs=6717914) 1.0 = fieldNorm(doc=6088628) 0.0012108017 = (MATCH) weight(P_VeryShortDescription:banco^2.0 in 6088628) [DefaultSimilarity], result of: 0.0012108017 = score(doc=6088628,freq=1.0), product of: 2.1425923E-4 = queryWeight, product of: 2.0 = boost 5.6511064 = idf(docFreq=64162, maxDocs=6717914) 1.8957282E-5 = queryNorm 5.6511064 = fieldWeight in 6088628, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.6511064 = idf(docFreq=64162, maxDocs=6717914) 1.0 = fieldNorm(doc=6088628) 0.0019395099 = (MATCH) weight(P_CatConcatKeyword:banco^3.0 in 6088628) [DefaultSimilarity], result of: 0.0019395099 = score(doc=6088628,freq=1.0), product of: 3.3211973E-4 = queryWeight, product of: 3.0 = boost 5.8397913 = idf(docFreq=53129, maxDocs=6717914) 1.8957282E-5 = queryNorm 5.8397913 = fieldWeight in 6088628, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.8397913 = idf(docFreq=53129, maxDocs=6717914) 1.0 = fieldNorm(doc=6088628) 4.8392292E-4 = (MATCH) max of: 3.6249184E-9 = (MATCH) weight(spp_keyword:del^1.0E-5 in 6088628) [DefaultSimilarity], result of: 3.6249184E-9 = score(doc=6088628,freq=1.0), product of: 1.1723361E-9 = queryWeight, product of: 1.0E-5 = boost 6.184094 = idf(docFreq=37653, maxDocs=6717914) 1.8957282E-5 = queryNorm 3.092047 = fieldWeight in 6088628, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.184094 = idf(docFreq=37653, maxDocs=6717914) 0.5 = fieldNorm(doc=6088628) 4.699589E-5 = (MATCH) weight(P_NewShortDescription:del in 6088628) [DefaultSimilarity], result of: 4.699589E-5 = score(doc=6088628,freq=1.0), product of: 2.9848188E-5 = queryWeight, product of: 1.5744972 = idf(docFreq=3782103, maxDocs=6717914) 1.8957282E-5 = queryNorm 1.5744972 = fieldWeight in 6088628, product of: 1.0 = tf(freq=1.0),
Re: Filter factory to reduce word from plural forms to singular forms correctly?
Hi Alex Can you advice how can I make use of copyField to handle this issue? NLP lematisation will be the last resort and subject to budget and business usersdecision. Derek On 3/1/2016 8:13 AM, Alexandre Rafalovitch wrote: On 29 February 2016 at 20:40, Derek Poh <d...@globalsources.com> wrote: Is there other filter factory that can reduce pluralto singular correctly? English is not an easy language and most of the heuristic filters have issues. You could try copyField and multiple approaches. Or, if this is a really Really big issue for you, there are commercial companies that do NLP lematisation properly and integrate with Solr. But they are not cheap. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: Filter factory to reduce word from plural forms to singular forms correctly?
Hi Emir For my use case, it is to do a exact match (enclosed searchkeyword in double quotes) on a search field. Searchon "power banks" should return matches for "power bank" and "power banks", singular and plural forms. I will need to do furthertesting with porterstemfilter to ensure it meet thebusiness use case. On 2/29/2016 7:07 PM, Emir Arnautovic wrote: Hi Derek, Why does aggressive stemming worries you? You might have false positives, but that is desired behavior in most cases. In your case "iphone" documents will also be returned for "iphon" query. Is this something that is not desired behavior? You can have more than one field if you want to prefer matches with exact wording, but that is unnecessary overhead in most cases. Regards, Emir -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Filter factory to reduce word from plural forms to singular forms correctly?
Hi I am using EnglishMinimalStemFilterFactory to reducewords in plural forms to singular forms. The filter factory is not reducingthe plural formof 'es' to the singular form correctly. It is reducing correctly for plural form of 's'. "boxes" is reduced to "boxe" instead of "box" "glasses" to "glasse" instead of "glass" etc. I tried with PorterStemFilterFactory, itis able to reduce the plural 'es' formto singular form correctly. However itreduced "iphones" to "iphon" instead. Is there other filter factory that can reduce pluralto singular correctly? The field type definition of the field. positionIncrementGap="100"> -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: "pf" not supported by edismax?
Hi Jack Sorry I am confused. For mycase,it seems that "pf" only work with dismax. with dismax: +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) *(spp_keyword_exact:dvd bracket)* with edismax: +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) () On 2/15/2016 1:26 PM, Jack Krupansky wrote: Maybe because the tokenized phrase produces only a single term it is ignored. In any case, it won't be a phrase. pf only does something useful for phrases. IOW, where a PhraseQuery can be generated. A PhraseQuery for more than a single term would never match when the field value is a single term. -- Jack Krupansky On Mon, Feb 15, 2016 at 12:11 AM, Derek Poh <d...@globalsources.com> wrote: It is using KeywordTokenizerFactory. It is still consider as tokenized? Here's the field definition: On 2/15/2016 12:43 PM, Jack Krupansky wrote: pf stands for phrase boosting, which implies tokenized text... spp_keyword_exact sounds like it is not tokenized. -- Jack Krupansky On Sun, Feb 14, 2016 at 10:08 PM, Derek Poh <d...@globalsources.com> wrote: Hi Correct me If I am wrong, edismax is an extension of dismax, so it will support "pf". But from my testing I noticed "pf" is not working with edismax. From the debug information of a query using "pf" with edismax, there is no phrase match for the "pf" field "spp_keyword_exact". If I changed to dismax, it is doing a phrase match on the field. Is this normal? We are running Solr 4.10.4. Below is the queriesand their debug information. Query using "pf" with edismax and the debug statement: http://hkenedcdg1.globalsources.com:8983/solr/product/select?q=dvd%20bracket=spp_keyword_exact=P_SPPKW,P_NewShortDescription.P_CatConCatKeyword,P_VeryShortDescription=spp_keyword_exact=query=edismax dvd bracket dvd bracket (+(DisjunctionMaxQuery((spp_keyword_exact:dvd)) DisjunctionMaxQuery((spp_keyword_exact:bracket))) ())/no_coord +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) () ExtendedDismaxQParser Query using "pf" with dismax and the debug statement: http://hkenedcdg1.globalsources.com:8983/solr/product/select?q=dvd%20bracket=spp_keyword_exact=P_SPPKW,P_NewShortDescription.P_CatConCatKeyword,P_VeryShortDescription=spp_keyword_exact=query=dismax dvd bracket dvd bracket (+(DisjunctionMaxQuery((spp_keyword_exact:dvd)) DisjunctionMaxQuery((spp_keyword_exact:bracket))) DisjunctionMaxQuery((spp_keyword_exact:dvd bracket)))/no_coord +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) (spp_keyword_exact:dvd bracket) DisMaxQParser Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: "pf" not supported by edismax?
It is using KeywordTokenizerFactory. It is still consider as tokenized? Here's the field definition: type="gs_keyword_exact" multiValued="true"/> positionIncrementGap="100"> On 2/15/2016 12:43 PM, Jack Krupansky wrote: pf stands for phrase boosting, which implies tokenized text... spp_keyword_exact sounds like it is not tokenized. -- Jack Krupansky On Sun, Feb 14, 2016 at 10:08 PM, Derek Poh <d...@globalsources.com> wrote: Hi Correct me If I am wrong, edismax is an extension of dismax, so it will support "pf". But from my testing I noticed "pf" is not working with edismax. From the debug information of a query using "pf" with edismax, there is no phrase match for the "pf" field "spp_keyword_exact". If I changed to dismax, it is doing a phrase match on the field. Is this normal? We are running Solr 4.10.4. Below is the queriesand their debug information. Query using "pf" with edismax and the debug statement: http://hkenedcdg1.globalsources.com:8983/solr/product/select?q=dvd%20bracket=spp_keyword_exact=P_SPPKW,P_NewShortDescription.P_CatConCatKeyword,P_VeryShortDescription=spp_keyword_exact=query=edismax dvd bracket dvd bracket (+(DisjunctionMaxQuery((spp_keyword_exact:dvd)) DisjunctionMaxQuery((spp_keyword_exact:bracket))) ())/no_coord +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) () ExtendedDismaxQParser Query using "pf" with dismax and the debug statement: http://hkenedcdg1.globalsources.com:8983/solr/product/select?q=dvd%20bracket=spp_keyword_exact=P_SPPKW,P_NewShortDescription.P_CatConCatKeyword,P_VeryShortDescription=spp_keyword_exact=query=dismax dvd bracket dvd bracket (+(DisjunctionMaxQuery((spp_keyword_exact:dvd)) DisjunctionMaxQuery((spp_keyword_exact:bracket))) DisjunctionMaxQuery((spp_keyword_exact:dvd bracket)))/no_coord +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) (spp_keyword_exact:dvd bracket) DisMaxQParser Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
"pf" not supported by edismax?
Hi Correct me If I am wrong, edismax is an extension of dismax, so it will support "pf". But from my testing I noticed "pf" is not working with edismax. From the debug information of a query using "pf" with edismax, there is no phrase match for the "pf" field "spp_keyword_exact". If I changed to dismax, it is doing a phrase match on the field. Is this normal? We are running Solr 4.10.4. Below is the queriesand their debug information. Query using "pf" with edismax and the debug statement: http://hkenedcdg1.globalsources.com:8983/solr/product/select?q=dvd%20bracket=spp_keyword_exact=P_SPPKW,P_NewShortDescription.P_CatConCatKeyword,P_VeryShortDescription=spp_keyword_exact=query=edismax dvd bracket dvd bracket (+(DisjunctionMaxQuery((spp_keyword_exact:dvd)) DisjunctionMaxQuery((spp_keyword_exact:bracket))) ())/no_coord +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) () ExtendedDismaxQParser Query using "pf" with dismax and the debug statement: http://hkenedcdg1.globalsources.com:8983/solr/product/select?q=dvd%20bracket=spp_keyword_exact=P_SPPKW,P_NewShortDescription.P_CatConCatKeyword,P_VeryShortDescription=spp_keyword_exact=query=dismax dvd bracket dvd bracket (+(DisjunctionMaxQuery((spp_keyword_exact:dvd)) DisjunctionMaxQuery((spp_keyword_exact:bracket))) DisjunctionMaxQuery((spp_keyword_exact:dvd bracket)))/no_coord +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) (spp_keyword_exact:dvd bracket) DisMaxQParser Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: implement exact match for one of the search fields only?
Hi Erick << The manual way of doing this would be to construct an elaborate query, like q=spp_keyword_exact:"dvd bracket" OR P_ShortDescription:(dvd bracket) OR NOTE: the parens are necessary or the last part of the above would be parsed as P_ShortDescription:dvd default_searchfield:bracket >> Your suggestion to construct the query like q=spp_keyword_exact:"dvd bracket" OR P_ShortDescription:(dvd bracket) OR does not fit into our current implementation. The front-end pages will only pass the "q=search keywords" in the query to solr. The list of search fields (qf) is pre-defined in solr. Do you have any alternatives to implement your suggestion without making changes to the front-end? On 1/29/2016 1:49 AM, Erick Erickson wrote: bq: if you are interested phrase query, you should use String field If you do this, you will NOT be able to search within the string. I.e. if the doc field is "my dog has fleas" you cannot match "dog has" with a string-based field. If you want to match the _entire_ string or you want prefix-only matching, then string might work, i.e. if you _only_ want to be able to match "my dog has fleas" "my dog*" but not "dog has fleas". On to the root question though. I really think you want to look at edismax. What you're trying to do is apply the same search term to individual fields. In particular, the pf parameter will automatically apply the search terms _as a phrase_ against the field specified, relieving you of having to enclose things in quotes. The manual way of doing this would be to construct an elaborate query, like q=spp_keyword_exact:"dvd bracket" OR P_ShortDescription:(dvd bracket) OR NOTE: the parens are necessary or the last part of the above would be parsed as P_ShortDescription:dvd default_searchfield:bracket And the =query trick will show you exactly how things are actually searched, it's invaluable. Best, Erick On Thu, Jan 28, 2016 at 5:08 AM, Mugeesh Husainwrote: Hi, if you are interested phrase query, you should use String field instead of text field in schema like as this will solved you problem. if you are missing anything else let share -- View this message in context: http://lucene.472066.n3.nabble.com/implement-exact-match-for-one-of-the-search-fields-only-tp4253786p4253827.html Sent from the Solr - User mailing list archive at Nabble.com. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: implement exact match for one of the search fields only?
Hi Emir For the other search fields, if they have matches it should be return. On 1/28/2016 8:17 PM, Emir Arnautovic wrote: Hi Derek, It is not clear what you are trying to achieve: "one of the search fields is an exact phrase match while the rest of the search fields can be exact or partial matches". What does "while" mean - it has to match in other fields as well or result should be scored better if it does but not mandatory to match? For exact match you can use string type instead of text. For querying multiple fields you can take a look at (e)dismax query parser. Regards, Emir -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: implement exact match for one of the search fields only?
Do you mean for the spp_keyword_exact field, I should use String field with keyword tokenised and lowercase token filtered? On 1/28/2016 10:54 PM, Alessandro Benedetti wrote: I think you are overthinking the problem : I agre the described one is the most obvious solution in your case. Only addition is to use a keyword tokenised field type, lowercase token filtered if you want to be case in-sensitive . Cheers On 28 January 2016 at 13:08, Mugeesh Husainwrote: Hi, if you are interested phrase query, you should use String field instead of text field in schema like as this will solved you problem. if you are missing anything else let share -- View this message in context: http://lucene.472066.n3.nabble.com/implement-exact-match-for-one-of-the-search-fields-only-tp4253786p4253827.html Sent from the Solr - User mailing list archive at Nabble.com. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
implement exact match for one of the search fields only?
Hi First of all, sorry for the long post. How do I implement or structured the query such that one of the search fields is an exact phrase match while the rest of the search fields can be exact or partial matches? Is this possible? I have the following search fields - P_VeryShortDescription - P_ShortDescription - P_CatConcatKeyword - spp_keyword_exact For the spp_keyword_exact field, I want to apply an exact match to it. I have a document with the following information. If I search for 'dvd', this document should not match. However if I search for 'dvd bracket', this document should match. Right now when I search for 'dvd', it is not return, which is correct. I want it to be return when I search for 'dvd bracket' but it is not. I try enclosing it in double quotes "dvd bracket" but it is not return. Then again I can't enclosed the search terms in double quotes "dvd bracket" as those documents with the word 'dvd' and 'bracket' in the other fields will not be match, am I right? doc:
Re: implement exact match for one of the search fields only?
Hi Erick and all Yes I am trying to apply the same search term to all the 4 search fieldsand 1 of the search field must be an exact match. You mentioned "In particular, the pf parameter will automatically apply the search terms _as a phrase_ against the field specified, relieving you of having to enclose things in quotes." I triedbut it is not returning the document. http://hkenedcdg1.globalsources.com:8983/solr/product/select?q=dvd%20bracket=spp_keyword_exact=edismax=query=spp_keyword_exact=P_ProductId,spp_keyword_exact,P_SPPKW I may have misunderstood. On 1/29/2016 1:49 AM, Erick Erickson wrote: bq: if you are interested phrase query, you should use String field If you do this, you will NOT be able to search within the string. I.e. if the doc field is "my dog has fleas" you cannot match "dog has" with a string-based field. If you want to match the _entire_ string or you want prefix-only matching, then string might work, i.e. if you _only_ want to be able to match "my dog has fleas" "my dog*" but not "dog has fleas". On to the root question though. I really think you want to look at edismax. What you're trying to do is apply the same search term to individual fields. In particular, the pf parameter will automatically apply the search terms _as a phrase_ against the field specified, relieving you of having to enclose things in quotes. The manual way of doing this would be to construct an elaborate query, like q=spp_keyword_exact:"dvd bracket" OR P_ShortDescription:(dvd bracket) OR NOTE: the parens are necessary or the last part of the above would be parsed as P_ShortDescription:dvd default_searchfield:bracket And the =query trick will show you exactly how things are actually searched, it's invaluable. Best, Erick On Thu, Jan 28, 2016 at 5:08 AM, Mugeesh Husainwrote: Hi, if you are interested phrase query, you should use String field instead of text field in schema like as this will solved you problem. if you are missing anything else let share -- View this message in context: http://lucene.472066.n3.nabble.com/implement-exact-match-for-one-of-the-search-fields-only-tp4253786p4253827.html Sent from the Solr - User mailing list archive at Nabble.com. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: StringIndexOutOfBoundsException using spellcheck and synonyms
Hi Any advice how to resolve or workaround to this issue? On 11/17/2015 8:28 AM, Derek Poh wrote: Hi Scott I amusing Solr 4.10.4. On 11/16/2015 10:06 PM, Scott Stults wrote: Hi Derek, Could you please add what version of Solr you see this in? I didn't see a related Jira, so this might warrant a new one. k/r, Scott On Sun, Nov 15, 2015 at 11:01 PM, Derek Poh <d...@globalsources.com> wrote: Hi Iam using spellcheck and synonyms.I am getting "java.lang.StringIndexOutOfBoundsException: String index out of range: -1" for some keywords. I think I managed to narrow down to the likely caused of it. I have thisline of entry in the synonyms.txt file, body spray,cologne,parfum,parfume,perfume,purfume,toilette When I search for 'cologne' it will hit the exception. If I removed the'body spray' from the line, I will not hit the exception. cologne,parfum,parfume,perfume,purfume,toilette It seems like it could be due to multi terms in the synonyms files but there are some keywords with multi terms in synonyms that does not has the issue. This line has a multi term "paint ball" in it, when I search for paintball or paintballs it does not hit the exception. paintball,paintballs,paint ball Any advice how can I resolve this issue? The field use for spellcheck: synonyms="synonyms.txt" ignoreCase="true" expand="true"/> Exception stacktrace: 2015-11-16T07:06:43,055 - ERROR [qtp744979286-193443:SolrException@142] - null:java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) at java.lang.StringBuilder.replace(StringBuilder.java:266) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io .AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:722) Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attac
Re: StringIndexOutOfBoundsException using spellcheck and synonyms
Hi Scott I amusing Solr 4.10.4. On 11/16/2015 10:06 PM, Scott Stults wrote: Hi Derek, Could you please add what version of Solr you see this in? I didn't see a related Jira, so this might warrant a new one. k/r, Scott On Sun, Nov 15, 2015 at 11:01 PM, Derek Poh <d...@globalsources.com> wrote: Hi Iam using spellcheck and synonyms.I am getting "java.lang.StringIndexOutOfBoundsException: String index out of range: -1" for some keywords. I think I managed to narrow down to the likely caused of it. I have thisline of entry in the synonyms.txt file, body spray,cologne,parfum,parfume,perfume,purfume,toilette When I search for 'cologne' it will hit the exception. If I removed the'body spray' from the line, I will not hit the exception. cologne,parfum,parfume,perfume,purfume,toilette It seems like it could be due to multi terms in the synonyms files but there are some keywords with multi terms in synonyms that does not has the issue. This line has a multi term "paint ball" in it, when I search for paintball or paintballs it does not hit the exception. paintball,paintballs,paint ball Any advice how can I resolve this issue? The field use for spellcheck: Exception stacktrace: 2015-11-16T07:06:43,055 - ERROR [qtp744979286-193443:SolrException@142] - null:java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) at java.lang.StringBuilder.replace(StringBuilder.java:266) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io .AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:722) Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or
StringIndexOutOfBoundsException using spellcheck and synonyms
Hi Iam using spellcheck and synonyms.I am getting "java.lang.StringIndexOutOfBoundsException: String index out of range: -1" for some keywords. I think I managed to narrow down to the likely caused of it. I have thisline of entry in the synonyms.txt file, body spray,cologne,parfum,parfume,perfume,purfume,toilette When I search for 'cologne' it will hit the exception. If I removed the'body spray' from the line, I will not hit the exception. cologne,parfum,parfume,perfume,purfume,toilette It seems like it could be due to multi terms in the synonyms files but there are some keywords with multi terms in synonyms that does not has the issue. This line has a multi term "paint ball" in it, when I search for paintball or paintballs it does not hit the exception. paintball,paintballs,paint ball Any advice how can I resolve this issue? The field use for spellcheck: multiValued="true"/> positionIncrementGap="100"> words="stopwords.txt" /> words="stopwords.txt" /> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> Exception stacktrace: 2015-11-16T07:06:43,055 - ERROR [qtp744979286-193443:SolrException@142] - null:java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) at java.lang.StringBuilder.replace(StringBuilder.java:266) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:722) Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
Re: 'missing content stream' issuing expungeDeletes=true
There are around 6+ millions documents in the collection. Each document (or product record) is unqiue in the collection. When we found out the document has a docfreq of 2, we did a query on the document's product id and indeed 2 documents were returned. We suspect 1 of them is deleted but not remove from the index. We try optimizing. Only 1 document is return when we query again and the document docreq is 1. We checked the source data and the document is not duplicated. It could be the way we index (full index every time) that result in this scenario of having 2 of the same document in the index. On 9/2/2015 12:11 PM, Erick Erickson wrote: How many document total in your corpus? And how many do you intend to have? My point is that if you are testing this with a small corpus, the results are very likely different than when you test on a reasonable corpus. So if you expect your "real" index will contain many more docs than what you're testing, this is likely a red herring. But something isn't making a lot of sense here. You say you've traced it to having a docfreq of 2 that changes to 1. But that means that the value is unique in your entire corpus, which kind of indicates you're trying to boost on unique values which is unusual. If you're confident in your model though, the only way to guarantee what you want is to optimize/expungeDeletes. Best, Erick On Tue, Sep 1, 2015 at 7:51 PM, Derek Poh <d...@globalsources.com> wrote: Erick Yes, we see documents changing their position in the list due to having deleted docs. In our searchresult,weapply higher boost (bq) to a group of matched documents to have them display at the top tier of the result. At times 1 or 2 of these documentsare not return in the top tier, they are relegateddown to the lower tierof the result. Wediscovered that these documents have a lower score due to docFreq=2. After we do an optimize, these 1-2 documents are back in the top tier result order and their docFreqis 1. On 9/1/2015 11:40 PM, Erick Erickson wrote: Derek: Why do you care? What evidence do you have that this matters _practically_? If you've look at scoring with a small number of documents, you'll see significant differences due to deleted documents. In most cases, as you get a larger number of documents the ranking of documents in an index with no deletions .vs. indexes that have deletions is usually not noticeable. I'm suggesting that this is a red herring. Your specific situation may be different of course, but since scoring is really only about ranking docs relative to each other, unless the relative positions change enough to be noticeable it's not a problem. Note that I'm saying "relative rankings", NOT "absolute score". Document scores have no meaning outside comparisons to other docs _in the same query_. So unless you see documents changing their position in the list due to having deleted docs, it's not worth spending time on IMO. Best, Erick On Tue, Sep 1, 2015 at 12:45 AM, Upayavira <u...@odoko.co.uk> wrote: I wonder if this resolves it [1]. It has been applied to trunk, but not to the 5.x release branch. If you needed it in 5.x, I wonder if there's a way that particular choice could be made configurable. Upayavira [1] https://issues.apache.org/jira/browse/LUCENE-6711 On Tue, Sep 1, 2015, at 02:43 AM, Derek Poh wrote: Hi Upayavira In fact we are using optimize currently but was advised to use expunge deletes as it is less resource intensive. So expunge deletes will only remove deleted documents, it will not merge all index segments into one? If we don't use optimize, the deleted documents in the index will affect the scores (with docFreq=2) of the matched documents which will affect the relevancy of the search result. Derek On 9/1/2015 12:05 AM, Upayavira wrote: If you really must expunge deletes, use optimize. That will merge all index segments into one, and in the process will remove any deleted documents. Why do you need to expunge deleted documents anyway? It is generally done in the background for you, so you shouldn't need to worry about it. Upayavira On Mon, Aug 31, 2015, at 06:46 AM, davidphilip cherian wrote: Hi, The below curl command worked without error, you can try. curl http://localhost:8983/solr/techproducts/update?commit=true -H "Content-Type: text/xml" --data-binary '' However, after executing this, I could still see same deleted counts on dashboard. Deleted Docs:6 I am not sure whether that means, the command did not take effect or it took effect but did not reflect on dashboard view. On Mon, Aug 31, 2015 at 8:51 AM, Derek Poh <d...@globalsources.com> wrote: Hi I tried doing a expungeDeletes=true with the following but get the message 'missing content stream'. What am I missing? I need to provide additional parameters? curl 'http://127.0.0.1:8983/solr/supplier/update/json?expungeDeletes=true '; Thanks, Derek -- CON
Re: 'missing content stream' issuing expungeDeletes=true
Erick Yes, we see documents changing their position in the list due to having deleted docs. In our searchresult,weapply higher boost (bq) to a group of matched documents to have them display at the top tier of the result. At times 1 or 2 of these documentsare not return in the top tier, they are relegateddown to the lower tierof the result. Wediscovered that these documents have a lower score due to docFreq=2. After we do an optimize, these 1-2 documents are back in the top tier result order and their docFreqis 1. On 9/1/2015 11:40 PM, Erick Erickson wrote: Derek: Why do you care? What evidence do you have that this matters _practically_? If you've look at scoring with a small number of documents, you'll see significant differences due to deleted documents. In most cases, as you get a larger number of documents the ranking of documents in an index with no deletions .vs. indexes that have deletions is usually not noticeable. I'm suggesting that this is a red herring. Your specific situation may be different of course, but since scoring is really only about ranking docs relative to each other, unless the relative positions change enough to be noticeable it's not a problem. Note that I'm saying "relative rankings", NOT "absolute score". Document scores have no meaning outside comparisons to other docs _in the same query_. So unless you see documents changing their position in the list due to having deleted docs, it's not worth spending time on IMO. Best, Erick On Tue, Sep 1, 2015 at 12:45 AM, Upayavira <u...@odoko.co.uk> wrote: I wonder if this resolves it [1]. It has been applied to trunk, but not to the 5.x release branch. If you needed it in 5.x, I wonder if there's a way that particular choice could be made configurable. Upayavira [1] https://issues.apache.org/jira/browse/LUCENE-6711 On Tue, Sep 1, 2015, at 02:43 AM, Derek Poh wrote: Hi Upayavira In fact we are using optimize currently but was advised to use expunge deletes as it is less resource intensive. So expunge deletes will only remove deleted documents, it will not merge all index segments into one? If we don't use optimize, the deleted documents in the index will affect the scores (with docFreq=2) of the matched documents which will affect the relevancy of the search result. Derek On 9/1/2015 12:05 AM, Upayavira wrote: If you really must expunge deletes, use optimize. That will merge all index segments into one, and in the process will remove any deleted documents. Why do you need to expunge deleted documents anyway? It is generally done in the background for you, so you shouldn't need to worry about it. Upayavira On Mon, Aug 31, 2015, at 06:46 AM, davidphilip cherian wrote: Hi, The below curl command worked without error, you can try. curl http://localhost:8983/solr/techproducts/update?commit=true -H "Content-Type: text/xml" --data-binary '' However, after executing this, I could still see same deleted counts on dashboard. Deleted Docs:6 I am not sure whether that means, the command did not take effect or it took effect but did not reflect on dashboard view. On Mon, Aug 31, 2015 at 8:51 AM, Derek Poh <d...@globalsources.com> wrote: Hi I tried doing a expungeDeletes=true with the following but get the message 'missing content stream'. What am I missing? I need to provide additional parameters? curl 'http://127.0.0.1:8983/solr/supplier/update/json?expungeDeletes=true '; Thanks, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and
Re: 'missing content stream' issuing expungeDeletes=true
Hi Upayavira In fact we are using optimize currently but was advised to use expunge deletes as it is less resource intensive. So expunge deletes will only remove deleted documents, it will not merge all index segments into one? If we don't use optimize, the deleted documents in the index will affect the scores (with docFreq=2) of the matched documents which will affect the relevancy of the search result. Derek On 9/1/2015 12:05 AM, Upayavira wrote: If you really must expunge deletes, use optimize. That will merge all index segments into one, and in the process will remove any deleted documents. Why do you need to expunge deleted documents anyway? It is generally done in the background for you, so you shouldn't need to worry about it. Upayavira On Mon, Aug 31, 2015, at 06:46 AM, davidphilip cherian wrote: Hi, The below curl command worked without error, you can try. curl http://localhost:8983/solr/techproducts/update?commit=true -H "Content-Type: text/xml" --data-binary '' However, after executing this, I could still see same deleted counts on dashboard. Deleted Docs:6 I am not sure whether that means, the command did not take effect or it took effect but did not reflect on dashboard view. On Mon, Aug 31, 2015 at 8:51 AM, Derek Poh <d...@globalsources.com> wrote: Hi I tried doing a expungeDeletes=true with the following but get the message 'missing content stream'. What am I missing? I need to provide additional parameters? curl 'http://127.0.0.1:8983/solr/supplier/update/json?expungeDeletes=true '; Thanks, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons. -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
'missing content stream' issuing expungeDeletes=true
Hi I tried doing a expungeDeletes=true with the following but get the message 'missing content stream'. What am I missing? I need to provide additional parameters? curl 'http://127.0.0.1:8983/solr/supplier/update/json?expungeDeletes=true'; Thanks, Derek -- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
No query fields MATCH weight info for a doc in debug?
Hi I came across a document that does not has the query field MATCH weight information in the debugbut the query fields do have the search term in them. What can cause this? Here is some info of the document. Search keyword is LED The search fields and their values, P_CatConcatKeyword = E14 LED bulbs P_NewShortDescription = TIWIN TUV GS CE RoHS Certified LED Bulb, 3W 5W 7W 9W 11W 13W P_VeryShortDescription = LED Bulb spp_keyword_exact = LED,led bulb,led bulb light,solar garden lights This is the partial debug info for the document, 2.9332387 = (MATCH) sum of: 2.516484E-7 = (MATCH) max of: 2.516484E-7 = (MATCH) weight(spp_keyword_exact:led^1.0E-5 in 1775174) [DefaultSimilarity], result of: 2.516484E-7 = score(doc=1775174,freq=1.0), product of: 3.8561673E-8 = queryWeight, product of: 1.0E-5 = boost 13.051737 = idf(docFreq=38, maxDocs=6684479) 2.9545242E-4 = queryNorm 6.5258684 = fieldWeight in 1775174, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 13.051737 = idf(docFreq=38, maxDocs=6684479) 0.5 = fieldNorm(doc=1775174) 2.8821251 = (MATCH) weight(P_ProductId:1119054943^38.0 in 1775174) [DefaultSimilarity], result of: 2.8821251 = score(doc=1775174,freq=1.0), product of: 0.17988378 = queryWeight, product of: 38.0 = boost 16.022152 = idf(docFreq=1, maxDocs=6684479) 2.9545242E-4 = queryNorm 16.022152 = fieldWeight in 1775174, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 16.022152 = idf(docFreq=1, maxDocs=6684479) 1.0 = fieldNorm(doc=1775174) Thanks, Derek
Re: understanding collapsingQParser with facet vs group.facet
Hi Upayavira Thank you for your explanation onthe difference between traditional grouping and collapsingQParser. I understand more now. On 6/19/2015 7:11 PM, Upayavira wrote: On Fri, Jun 19, 2015, at 06:20 AM, Derek Poh wrote: Hi I read about collapsingQParser returns the facet count the same as group.truncate=true and has this issue with the facet count and the after filter facet count notthe same. Using group.facetdoes not has this issue but it's performance is very badcompared to collapsingQParser. I trying to understand why collapsingQParser behave this way and will need to explain to management. Can someone explain how collapsingQParser calculatethefacet countscompated to group.facet? I'm not familiar with group.facet. But to compare traditional grouping to the collapsingQParser - in traditional grouping, all matching documents remain in the result set, but they are grouped for output purposes. However, the collapsingQParser is actually a query filter. It will reduce the number of matching results. Any faceting that happens will happen on the filtered results. I wonder if you can use this syntax to achieve faceting alongside collapsing: q=whatever fq={!collapse tag=collapse}blah facet.field={!ex=collapse}my_facet_field This way, you get the benefits of the CollapsingQParserPlugin, with full faceting on the uncollapsed resultset. I've no idea how this would perform, but I'd expect it to be better than the grouping option. Upayavira
Re: understanding collapsingQParser with facet vs group.facet
Hi Joel By group heads, is it referring to the document thatis use to represent each group in the main result section? Eg. Using the below 3 documentsandwe collapse on field supplier_id supplier_id:S1 product_id:P1 supplier_id:S2 product_id:P2 supplier_id:S2 product_id:P3 With collapse on supplier_id, the result in the main sectionis as follows, supplier_id:S1 product_id:P1 supplier_id:S2 product_id:P3 The group head of supplier_id:S1 is P1and supplier_id:S2 will be P3? Facets (and even sort) are calculated on P1 and P3? -Derek On 6/19/2015 7:05 PM, Joel Bernstein wrote: The CollapsingQParserPlugin currently doesn't calculate facets at all. It simply collapses the document set. The facets are then calculated only on the group heads. Grouping has special faceting code built into it that supports the group.facet functionality. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 19, 2015 at 6:20 AM, Derek Poh d...@globalsources.com wrote: Hi I read about collapsingQParser returns the facet count the same as group.truncate=true and has this issue with the facet count and the after filter facet count notthe same. Using group.facetdoes not has this issue but it's performance is very badcompared to collapsingQParser. I trying to understand why collapsingQParser behave this way and will need to explain to management. Can someone explain how collapsingQParser calculatethefacet countscompated to group.facet? Thank you, Derek
understanding collapsingQParser with facet vs group.facet
Hi I read about collapsingQParser returns the facet count the same as group.truncate=true and has this issue with the facet count and the after filter facet count notthe same. Using group.facetdoes not has this issue but it's performance is very badcompared to collapsingQParser. I trying to understand why collapsingQParser behave this way and will need to explain to management. Can someone explain how collapsingQParser calculatethefacet countscompated to group.facet? Thank you, Derek
sort on fields that are not mandatory in each document
Hi I am trying to sort on multiple fields. These fields donot necessary exist in every document. sort=sppddrank asc, ddrank asc From the sorted result, it seems that documents which donot have sppddrank field is at the top. How can I make the documents that have the sppddrank field to be on top and sortedby it and those documents which do not have the field below? -Derek
Re: sort on fields that are not mandatory in each document
Hi Ahmet The sortMissingLast and sortMissingFirst attributes are defined at the field or fieldType level? field name=P_TSRank type=int indexed=true stored=true multiValued=false/ fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/ On 5/27/2015 4:43 PM, Ahmet Arslan wrote: Hi, I think you are looking for sortMissing* attributes: sortMissingLast and sortMissingFirst attributes are optional attributes are currently supported on types that are sorted internally as strings and on numeric types. Ahmet On Wednesday, May 27, 2015 11:36 AM, Derek Poh d...@globalsources.com wrote: Hi I am trying to sort on multiple fields. These fields donot necessary exist in every document. sort=sppddrank asc, ddrank asc From the sorted result, it seems that documents which donot have sppddrank field is at the top. How can I make the documents that have the sppddrank field to be on top and sortedby it and those documents which do not have the field below? -Derek
Re: sort on fields that are not mandatory in each document
Got it. Thank you Rajani. On 5/27/2015 5:34 PM, Rajani Maski wrote: Hi Derek, They are at the fieldType Level. You might find some reference examples in schema.xml using them. https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties On Wed, May 27, 2015 at 2:30 PM, Derek Poh d...@globalsources.com wrote: Hi Ahmet The sortMissingLast and sortMissingFirst attributes are defined at the field or fieldType level? field name=P_TSRank type=int indexed=true stored=true multiValued=false/ fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/ On 5/27/2015 4:43 PM, Ahmet Arslan wrote: Hi, I think you are looking for sortMissing* attributes: sortMissingLast and sortMissingFirst attributes are optional attributes are currently supported on types that are sorted internally as strings and on numeric types. Ahmet On Wednesday, May 27, 2015 11:36 AM, Derek Poh d...@globalsources.com wrote: Hi I am trying to sort on multiple fields. These fields donot necessary exist in every document. sort=sppddrank asc, ddrank asc From the sorted result, it seems that documents which donot have sppddrank field is at the top. How can I make the documents that have the sppddrank field to be on top and sortedby it and those documents which do not have the field below? -Derek
Re: sort on fields that are not mandatory in each document
Oh ok. Thank youAlessandro. On 5/27/2015 6:07 PM, Alessandro Benedetti wrote: Actually it is both field level and field type level. You decide based on your use case ( can happen that for the same field type , you want sortMissingFirst for one field, and sortMissingLast for another) . I want to add a bonus note, related the ( empty ) and null concept. Be very careful you don't index empty values for your fields or this will mess up the sorting. Solr manage the missing values ( that are null values), and does not manage the empty values. Those values for a human are identical to null values, but not for solr. So you can have very weird situations for your users. So , to be sure everything work nice with sortMissing attribute, be sure to not index empty values. Cheers 2015-05-27 10:34 GMT+01:00 Rajani Maski rajani.ma...@lucidworks.com: Hi Derek, They are at the fieldType Level. You might find some reference examples in schema.xml using them. https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties On Wed, May 27, 2015 at 2:30 PM, Derek Poh d...@globalsources.com wrote: Hi Ahmet The sortMissingLast and sortMissingFirst attributes are defined at the field or fieldType level? field name=P_TSRank type=int indexed=true stored=true multiValued=false/ fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/ On 5/27/2015 4:43 PM, Ahmet Arslan wrote: Hi, I think you are looking for sortMissing* attributes: sortMissingLast and sortMissingFirst attributes are optional attributes are currently supported on types that are sorted internally as strings and on numeric types. Ahmet On Wednesday, May 27, 2015 11:36 AM, Derek Poh d...@globalsources.com wrote: Hi I am trying to sort on multiple fields. These fields donot necessary exist in every document. sort=sppddrank asc, ddrank asc From the sorted result, it seems that documents which donot have sppddrank field is at the top. How can I make the documents that have the sppddrank field to be on top and sortedby it and those documents which do not have the field below? -Derek
Re: search or filter by a list of document ids and return them in the same order.
Hi Erick Sorry I missed your reply. Ya that is the alternative solution I am thinking of if it's not possible through Solr. -Derek On 4/24/2015 12:01 AM, Erick Erickson wrote: Not that I know of. But your application gets the original params back, so you can order the display based on the params that are echoed back. Best, Erick On Thu, Apr 23, 2015 at 2:17 AM, Derek Poh d...@globalsources.com wrote: Hi I am trying to search or filter by alist ofdocuments by their ids (product id field).The requirement is the return documents must be in the same order as search or filter by. Eg.if i search or filter on the below list of ids, the documents must be return in the same order too 1083342171 1079463095 1078278592 1085253674 1076558399 Is this possible? Thanks, Derek
Re: search or filter by a list of document ids and return them in the same order.
Hi Any advise on this? Thanks, Derek On 4/23/2015 5:17 PM, Derek Poh wrote: Hi I am trying to search or filter by alist ofdocuments by their ids (product id field).The requirement is the return documents must be in the same order as search or filter by. Eg.if i search or filter on the below list of ids, the documents must be return in the same order too 1083342171 1079463095 1078278592 1085253674 1076558399 Is this possible? Thanks, Derek
search or filter by a list of document ids and return them in the same order.
Hi I am trying to search or filter by alist ofdocuments by their ids (product id field).The requirement is the return documents must be in the same order as search or filter by. Eg.if i search or filter on the below list of ids, the documents must be return in the same order too 1083342171 1079463095 1078278592 1085253674 1076558399 Is this possible? Thanks, Derek
spellcheck enabled but not getting any suggestions.
Hi I have enabled spellcheck but not getting any suggestions withincorrectly spelled keywords. I added the spellcheck into the/select request handler. What steps did I miss out? spellcheck list in return result: lst name=spellcheck lst name=suggestions/ /lst solrconfig.xml: requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str !-- Spell checking defaults -- str name=spellcheckon/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str str name=spellcheck.maxCollations3/str /lst !-- append spellchecking to our list of components -- arr name=last-components strspellcheck/str /arr /requestHandler
Re: Collapse and Expand behaviour on result with 1 document.
Hi Joel Is the number of documents info available when using collapse and expand parameters? I can't seem to find it in the return xml. I know the numFound in the the main result set (result maxScore=6.470696 name=response numFound=27 start=0) refer to the number of collapse groups. I need to issue another query without the collapse and expand parameters to get the total number of documents? Or is there any fieldor parameter that indicate the number of documents that can be return through 'fl' parameter? I am trying to display such info on the front-end, 571 led results from 240 suppliers. On 4/1/2015 7:05 PM, Joel Bernstein wrote: Exactly correct. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 1, 2015 at 5:44 AM, Derek Poh d...@globalsources.com wrote: Hi Joel Correct me if my understanding is wrong. Using supplier id as the field to collapse on. - If thecollapse group heads inthe main result set has only 1document in each group, the expanded section will be empty since there are no documents to expandfor each collapse group. - To render the page, I need to iterate the main result set. For each document I have to check if there is an expanded group with the same supplier id. - The facets counts is based on the number of collapse groupsin the main result set (result maxScore=6.470696 name=response numFound=27 start=0) -Derek On 3/31/2015 7:43 PM, Joel Bernstein wrote: The way that collapse/expand is designed to be used is as follows: The main result set will contain the collapsed group heads. The expanded section will contain the expanded groups for the page of results. To render the page you iterate the main result set. For each document check to see if there is an expanded group. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein joels...@gmail.com wrote: You should be able to use collapse/expand with one result. Does the document in the main result set have group members that aren't being expanded? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh d...@globalsources.com wrote: If I want to group the results (by a certain field) even if there is only 1 document, I should use the group parameter instead? The requirement is to group the result of product documents by their supplier id. group=truegroup.field=P_SupplierIdgroup.limit=5 Is it true that the performance of collapse is better than group parameter on large data set, say 10-20 million documents? -Derek On 3/31/2015 10:03 AM, Joel Bernstein wrote: The expanded section will only include groups that have expanded documents. So, if the document that in the main result set has no documents to expand, then this is working as expected. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh d...@globalsources.com wrote: Hi I have a query which return 1 document. When I add the collapse and expand parameters to it, expand=trueexpand.rows=5fq={!collapse%20field=P_SupplierId}, the expanded section is empty (lst name=expanded/). Is this the behaviour of collapse and expand parameters on result which contain only 1 document? -Derek
Re: sort on facet.index?
Yonik Isee. Thank you for the updates. On 4/3/2015 12:28 AM, Yonik Seeley wrote: On Thu, Apr 2, 2015 at 10:25 AM, Ryan Josal rjo...@gmail.com wrote: Sorting the result set or the facets? For the facets there is facet.sort=index (lexicographically) and facet.sort=count. So maybe you are asking if you can sort by index, but reversed? I don't think this is possible, and it's a good question. The new facet module that will be in Solr 5.1 supports sorting both directions on both count and index order (as well as by statistics / bucket aggregations). http://yonik.com/json-facet-api/ -Yonik
sort on facet.index?
Is sorting on facet index supported? I would like to sort on the below facet index lst name=P_SupplierRanking int name=014/int int name=18/int int name=212/int int name=3349/int int name=481/int int name=58/int int name=612/int /lst to lst name=P_SupplierRanking int name=612/int int name=58/int int name=481/int int name=3349/int ... ... ... /lst -Derek
Re: Collapse and Expand behaviour on result with 1 document.
Hi Joel Correct me if my understanding is wrong. Using supplier id as the field to collapse on. - If thecollapse group heads inthe main result set has only 1document in each group, the expanded section will be empty since there are no documents to expandfor each collapse group. - To render the page, I need to iterate the main result set. For each document I have to check if there is an expanded group with the same supplier id. - The facets counts is based on the number of collapse groupsin the main result set (result maxScore=6.470696 name=response numFound=27 start=0) -Derek On 3/31/2015 7:43 PM, Joel Bernstein wrote: The way that collapse/expand is designed to be used is as follows: The main result set will contain the collapsed group heads. The expanded section will contain the expanded groups for the page of results. To render the page you iterate the main result set. For each document check to see if there is an expanded group. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein joels...@gmail.com wrote: You should be able to use collapse/expand with one result. Does the document in the main result set have group members that aren't being expanded? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh d...@globalsources.com wrote: If I want to group the results (by a certain field) even if there is only 1 document, I should use the group parameter instead? The requirement is to group the result of product documents by their supplier id. group=truegroup.field=P_SupplierIdgroup.limit=5 Is it true that the performance of collapse is better than group parameter on large data set, say 10-20 million documents? -Derek On 3/31/2015 10:03 AM, Joel Bernstein wrote: The expanded section will only include groups that have expanded documents. So, if the document that in the main result set has no documents to expand, then this is working as expected. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh d...@globalsources.com wrote: Hi I have a query which return 1 document. When I add the collapse and expand parameters to it, expand=trueexpand.rows=5fq={!collapse%20field=P_SupplierId}, the expanded section is empty (lst name=expanded/). Is this the behaviour of collapse and expand parameters on result which contain only 1 document? -Derek