Dismax: Impossible to search for a _phrase_ in tokenized and untokenized fields at the same time
Hello, It seems to me that there is no way how I can use dismax handler for searching in both tokenized and untokenized fields while I'm searching for a phrase. Consider the next example. I have two fields in index: product_name and product_name_un. The schema looks like: fieldType name=string_ignore_case class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_no_stopwords_en class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English/ /analyzer /fieldType field name=product_name type=text_no_stopwords_en indexed=true stored=true/ field name=product_name_un type=string_ignore_case indexed=true stored=true/ copyField source=product_name dest=product_name_un/ I'm using dismax to search in both of them at the same time: defType=dismaxqf=product_name product_name_un^2.0. (this is done to bring on top of the results the products which name _equals_ the entered criteria). 1. When I'm searching for the phrase (two or more keywords), e.g. blue car, the input string is tokenized and even I have in the index product_name_un=blue car, the product_name_un^2.0 part of the dismax config has no effect. 2. When I enter blue car (in quotas) the string is not tokenized and product_name_un^2.0 part works, but nothing could be found in product_name field. I.e. there is no way to have a proper search against two fields at the same time. The workaround that I found is using bq parameter for specifying the boost query for search in field product_name_un. But I don't think that this should be the only solution. Another note, related to that: when I set as a default field for search product_name_un, and query with the ../select/?q=blue carrows=10... I got empty results despite the fact that I have blue car value in the index in that field. I have to use quotas again to fix that... Shouldn't it determine the field type and apply corresponding analyzers/tokenizers/etc.? -- View this message in context: http://www.nabble.com/Dismax%3A-Impossible-to-search-for-a-_phrase_-in-tokenized-and-untokenized-fields-at-the-same-time-tp25832932p25832932.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax: Impossible to search for a _phrase_ in tokenized and untokenized fields at the same time
On Sat, Oct 10, 2009 at 6:34 AM, Alex Baranov alex.barano...@gmail.com wrote: Hello, It seems to me that there is no way how I can use dismax handler for searching in both tokenized and untokenized fields while I'm searching for a phrase. Consider the next example. I have two fields in index: product_name and product_name_un. The schema looks like: fieldType name=string_ignore_case class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_no_stopwords_en class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English/ /analyzer /fieldType field name=product_name type=text_no_stopwords_en indexed=true stored=true/ field name=product_name_un type=string_ignore_case indexed=true stored=true/ copyField source=product_name dest=product_name_un/ I'm using dismax to search in both of them at the same time: defType=dismaxqf=product_name product_name_un^2.0. (this is done to bring on top of the results the products which name _equals_ the entered criteria). 1. When I'm searching for the phrase (two or more keywords), e.g. blue car, the input string is tokenized and even I have in the index product_name_un=blue car, the product_name_un^2.0 part of the dismax config has no effect. Hmmm, right. This is due to the fact that the Lucene query parser (still actually used in dismax) breaks things up by whitespace *before* analysis (so the analyzer for the untokenized field never sees the two tokens together). 2. When I enter blue car (in quotas) the string is not tokenized and product_name_un^2.0 part works, but nothing could be found in product_name field. Using explicit quotes will make a phrase query, so blue and car must appear right next to eachother in product_name. If it's OK to require both blue and car, in product_name then you can just set a slop for explicit phrase queries with the qs parameter. -Yonik http://www.lucidimagination.com I.e. there is no way to have a proper search against two fields at the same time. The workaround that I found is using bq parameter for specifying the boost query for search in field product_name_un. But I don't think that this should be the only solution. Another note, related to that: when I set as a default field for search product_name_un, and query with the ../select/?q=blue carrows=10... I got empty results despite the fact that I have blue car value in the index in that field. I have to use quotas again to fix that... Shouldn't it determine the field type and apply corresponding analyzers/tokenizers/etc.? -- View this message in context: http://www.nabble.com/Dismax%3A-Impossible-to-search-for-a-_phrase_-in-tokenized-and-untokenized-fields-at-the-same-time-tp25832932p25832932.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax: Impossible to search for a _phrase_ in tokenized and untokenized fields at the same time
I guess this is a bug that should be added in JIRA (if it is not there already). Should I add it? Hmmm, right. This is due to the fact that the Lucene query parser (still actually used in dismax) breaks things up by whitespace *before* analysis (so the analyzer for the untokenized field never sees the two tokens together). Is there a way how to tell to Lucene parser not to break things up by the whitespace? Should one use some whitespace code instead of actual space? I think what we need here is some kind of a special quotas which will tell not to use Lucene query parser at all (might be very useful for situation like this when search is applied to the default field, i.e. when the field is not specified). If it's OK to require both blue and car, in product_name then you can just set a slop for explicit phrase queries with the qs parameter. It's not good for me unfortunately, but thanks for the suggestion. Alex Baranov. On Sat, Oct 10, 2009 at 3:01 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Sat, Oct 10, 2009 at 6:34 AM, Alex Baranov alex.barano...@gmail.com wrote: Hello, It seems to me that there is no way how I can use dismax handler for searching in both tokenized and untokenized fields while I'm searching for a phrase. Consider the next example. I have two fields in index: product_name and product_name_un. The schema looks like: fieldType name=string_ignore_case class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_no_stopwords_en class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English/ /analyzer /fieldType field name=product_name type=text_no_stopwords_en indexed=true stored=true/ field name=product_name_un type=string_ignore_case indexed=true stored=true/ copyField source=product_name dest=product_name_un/ I'm using dismax to search in both of them at the same time: defType=dismaxqf=product_name product_name_un^2.0. (this is done to bring on top of the results the products which name _equals_ the entered criteria). 1. When I'm searching for the phrase (two or more keywords), e.g. blue car, the input string is tokenized and even I have in the index product_name_un=blue car, the product_name_un^2.0 part of the dismax config has no effect. Hmmm, right. This is due to the fact that the Lucene query parser (still actually used in dismax) breaks things up by whitespace *before* analysis (so the analyzer for the untokenized field never sees the two tokens together). 2. When I enter blue car (in quotas) the string is not tokenized and product_name_un^2.0 part works, but nothing could be found in product_name field. Using explicit quotes will make a phrase query, so blue and car must appear right next to eachother in product_name. If it's OK to require both blue and car, in product_name then you can just set a slop for explicit phrase queries with the qs parameter. -Yonik http://www.lucidimagination.com I.e. there is no way to have a proper search against two fields at the same time. The workaround that I found is using bq parameter for specifying the boost query for search in field product_name_un. But I don't think that this should be the only solution. Another note, related to that: when I set as a default field for search product_name_un, and query with the ../select/?q=blue carrows=10... I got empty results despite the fact that I have blue car value in the index in that field. I have to use quotas again to fix that... Shouldn't it determine the field type and apply corresponding analyzers/tokenizers/etc.? -- View this message in context: http://www.nabble.com/Dismax%3A-Impossible-to-search-for-a-_phrase_-in-tokenized-and-untokenized-fields-at-the-same-time-tp25832932p25832932.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH and EmbeddedSolr
ModifiableSolrParams p = new ModifiableSolrParams(); p.add(qt, /dataimport); p.add(command, full-import); server.query(p, METHOD.POST); I do this But it starts giving me this exception SEVERE: Full Import failed java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1760) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:216) at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:366) at org.apache.solr.update.DirectUpdateHandler2$CommitTracker.scheduleCommitWithin(DirectUpdateHandler2.java:466) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:322) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:69) at org.apache.solr.handler.dataimport.SolrWriter.doDeleteAll(SolrWriter.java:192) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:332) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) 2009/10/10 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com you may need to extend a SolrRequest and set appropriate path (/dataimport) and other params then you may invoke the request method. On Sat, Oct 10, 2009 at 11:07 AM, rohan rai hiroha...@gmail.com wrote: The configuration is not an issue... But how doindex i invoke it... I only have known a url way to invoke it and thus import the data into index... like http://localhost:8983/solr/db/dataimport?command=full-import t But with embedded I havent been able to figure it out Regards Rohan 2009/10/10 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I guess it should be possible... what are the problems you encounter? On Sat, Oct 10, 2009 at 10:56 AM, rohan rai hiroha...@gmail.com wrote: Have been unable to use DIH for Embedded Solr Is there a way?? Regards Rohan -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Solr 1.4 Release Party
I can't wait... -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: DIH and EmbeddedSolr
This is pretty unstable...anyone has any clue...Sometimes it even creates index, sometimes it does not ?? But everytime time I do get this exception Regards Rohan On Sat, Oct 10, 2009 at 6:07 PM, rohan rai hiroha...@gmail.com wrote: ModifiableSolrParams p = new ModifiableSolrParams(); p.add(qt, /dataimport); p.add(command, full-import); server.query(p, METHOD.POST); I do this But it starts giving me this exception SEVERE: Full Import failed java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1760) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:216) at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:366) at org.apache.solr.update.DirectUpdateHandler2$CommitTracker.scheduleCommitWithin(DirectUpdateHandler2.java:466) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:322) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:69) at org.apache.solr.handler.dataimport.SolrWriter.doDeleteAll(SolrWriter.java:192) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:332) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) 2009/10/10 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com you may need to extend a SolrRequest and set appropriate path (/dataimport) and other params then you may invoke the request method. On Sat, Oct 10, 2009 at 11:07 AM, rohan rai hiroha...@gmail.com wrote: The configuration is not an issue... But how doindex i invoke it... I only have known a url way to invoke it and thus import the data into index... like http://localhost:8983/solr/db/dataimport?command=full-import t But with embedded I havent been able to figure it out Regards Rohan 2009/10/10 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I guess it should be possible... what are the problems you encounter? On Sat, Oct 10, 2009 at 10:56 AM, rohan rai hiroha...@gmail.com wrote: Have been unable to use DIH for Embedded Solr Is there a way?? Regards Rohan -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Question regarding proximity search
Hi I would appreciate if someone can throw some light on the following point regarding proximity search. i have a search box and if a use comes and type in honda car WITHOUT any double quotes, i want to get all documents with matches, and also they should be ranked based on proximity. i.e. the more the two terms are nearer the more is the rank. From the admin looks like in order to test proximity i have to always give the word in double quote and a slop value http://localhost:8983/solr/select/?q=honda+car~12version=2.2start=0rows=10indent=on Hence looks like from admin point of view in order to do proximity i have to always give it in double quotes. My questions is in order to do proximity search we always have to pass the query as a phrase ie. in double quotes. Yes, if you are using LuceneQParserPlugin. The next question is that i thought using dismax handler i could do a search on a field and i can specify the ps value in order to define proximity. and this is the query i am giving and i get back no results. any advice where i am going wrong http://localhost:8983/solr/proxTest/?q=honda car Can you try http://localhost:8983/solr/proxTest/?q=honda+car You don't need quotes in dismax. You can append debugQuery=true to url to see whats going on. Hope this helps.
Customizing solr search: SpanQueries (revisited)
Hi all, I am trying to use SpanQueries to save*all* hits for custom query type (e.g. defType=fooSpanQuery), along with token positions. I have this working in straight lucene, so my challenge is to implement it half-intelligently in solr. At the moment, I can't figure out where and how to customize the 'inner' search process. So far, I have my own SpanQParser, and SpanQParserPlugin, which successfully return a hard-coded span query (but this is not critical for my current challenge, I believe). I also have managed to configure solr to call my custom SpanQueryComponent, which I believe is the focus of my challenge. At this initial stage, I have simply extended QueryComponent, and overriden QueryComponent.process() while I am trying to find my way through the code :-). So, with all that setup, can someone point me in the right direction for custom processing of a query (or just the query results)? A few differences for my use-case are: -- I want to save every hit along with position information. I believe this means I want to use SpanQueries (like I have in lucene), but perhaps there are other options. -- I do not need to build much in the way of a response. This is an automated analysis, so no user will see the solr results. I will save them to a database, but for simplicity just a log.info(Score:{}, Term:{}, TokenNumber:{},...) would be great at the moment. -- I will always process every span, even those with near zero 'score' I think I want to focus on SpanQParser.process(), probably overriding the functionality in (SolrIndexSearcher)searcher.search(result,cmd) which seems to just call getDocListC(qr,cmd);// ?? is this my main focus point?? Does this seem like a reasonable approach? If so, how do I do it? I think I'm missing something obvious; perhaps there is an easy way to extend SolrIndexSearcher in solrconfig.xml to have my custom SpanQueryComponent call a custom IndexSearcher where I simply override getDocListC()? And for extra-karma-credit: any thoughts on performance gains (or loss?) if I basically drop must of the advanced optimization like TopDocsCollector and such? If have thousands of queries, and want to save *every* span for each query, is there likely to be significant overhead from the optimizations which are intended for users to 'page' through windows of hits? Also, thanks to Grant for replying to my previous inquiry (http://osdir.com/ml/solr-dev.lucene.apache.org/2009-05/msg00010.html). This email is partly me trying to implement his suggestion, and partly just trying to understand basic Solr customization. I tried sending out a previous draft of this message yesterday, but haven't seen it on the lists, so my apologies if this becomes a duplicate post. Thank you, Sean -- View this message in context: http://www.nabble.com/Customizing-solr-search%3A-SpanQueries-%28revisited%29-tp25838412p25838412.html Sent from the Solr - User mailing list archive at Nabble.com.
http replication transfer speed
Anyone know why you would see a transfer speed of just 10-20MB over a gigbit network connection? Even with standard drives, I would expect to at least see around 40MB. Has anyone seen over 10-20 using replication? Any ideas on what the bottleneck should be? I think even a standard drive can do writes of a bit of 40MB/s, and certainly reads over that. Thoughts? -- - Mark http://www.lucidimagination.com
Optimize on slaves?
Hi, Simple question! I have a nightly cron job to send the optimize command to Solr on our master instance. Is this also required on Solr replicated slaves to optimise their indexes? Thanks, Matt This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may also be LEGALLY PRIVILEGED. If you are not the intended addressee, please do not use, disclose, copy or distribute the message or the information it contains. Instead, please notify me as soon as possible and delete the e-mail, including any attachments. Thank you.
Re: Optimize on slaves?
No. The slaves will copy the current index, optimized or not. --wunder On Oct 10, 2009, at 4:33 PM, Matthew Painter wrote: Hi, Simple question! I have a nightly cron job to send the optimize command to Solr on our master instance. Is this also required on Solr replicated slaves to optimise their indexes? Thanks, Matt This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may also be LEGALLY PRIVILEGED. If you are not the intended addressee, please do not use, disclose, copy or distribute the message or the information it contains. Instead, please notify me as soon as possible and delete the e-mail, including any attachments. Thank you.
RE: Optimize on slaves?
My apologies; I've just found the answer (that optimisation should be on the master server only) From: Matthew Painter Sent: Sunday, 11 October 2009 12:34 p.m. To: 'solr-user@lucene.apache.org' Subject: Optimize on slaves? Hi, Simple question! I have a nightly cron job to send the optimize command to Solr on our master instance. Is this also required on Solr replicated slaves to optimise their indexes? Thanks, Matt This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may also be LEGALLY PRIVILEGED. If you are not the intended addressee, please do not use, disclose, copy or distribute the message or the information it contains. Instead, please notify me as soon as possible and delete the e-mail, including any attachments. Thank you.
Re: http replication transfer speed
On a drive that can do 40+ that's getting query load might have it's writes knocked down to that? - Mark http://www.lucidimagination.com (mobile) On Oct 10, 2009, at 6:41 PM, Mark Miller markrmil...@gmail.com wrote: Anyone know why you would see a transfer speed of just 10-20MB over a gigbit network connection? Even with standard drives, I would expect to at least see around 40MB. Has anyone seen over 10-20 using replication? Any ideas on what the bottleneck should be? I think even a standard drive can do writes of a bit of 40MB/s, and certainly reads over that. Thoughts? -- - Mark http://www.lucidimagination.com
Tips on speeding up indexing needed...
Folks: I have a corpus of approx 6 M documents each of approx 4K bytes. Currently, the way indexing is set up I read documents from a database and issue solr post requests in batches (batches are set up so that the maxPostSize of tomcat which is set to 2MB is adhered to). This means that in each batch we write approx 600 or so documents to SOLR. What I am seeing is that I am able to push about 2500 docs per minute or approx 40 or so per second. I saw in Erik's talk on Friday that speeds of 250 docs/sec to 25000 docs/sec have been achieved. Needless to say I am sure that performance numbers vary widely and are dependent on the domain, machine configurations, etc. I am running on Windows 2003 server, with 4 GB RAM, dual core xeon. Any tips on what I can do to speed this up? Thanks, Bill
Re: Tips on speeding up indexing needed...
Oh and one more thing...For historical reasons our apps run using msft technologies, so using SolrJ would be next to impossible at the present time Thanks in advance for your help! -- Bill -- From: William Pierce evalsi...@hotmail.com Sent: Saturday, October 10, 2009 5:47 PM To: solr-user@lucene.apache.org Subject: Tips on speeding up indexing needed... Folks: I have a corpus of approx 6 M documents each of approx 4K bytes. Currently, the way indexing is set up I read documents from a database and issue solr post requests in batches (batches are set up so that the maxPostSize of tomcat which is set to 2MB is adhered to). This means that in each batch we write approx 600 or so documents to SOLR. What I am seeing is that I am able to push about 2500 docs per minute or approx 40 or so per second. I saw in Erik's talk on Friday that speeds of 250 docs/sec to 25000 docs/sec have been achieved. Needless to say I am sure that performance numbers vary widely and are dependent on the domain, machine configurations, etc. I am running on Windows 2003 server, with 4 GB RAM, dual core xeon. Any tips on what I can do to speed this up? Thanks, Bill
Re: Tips on speeding up indexing needed...
A few things off the bat: 1) do not commit until the end. 2) use the DataImportHandler - it runs inside Solr and reads the database. This cuts out the HTTP transfer/XML xlation overheads. 3) examine your schema. Some of the text analyzers are quite slow. Solr tips: http://wiki.apache.org/solr/SolrPerformanceFactors Lucene tips: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed And, what you don't want to hear: for jobs like this, Solr/Lucene is disk-bound. The Windows NTFS file system is much slower than what is available for Linux or the Mac, and these numbers are for those machines. Good luck! Lance Norskog On Sat, Oct 10, 2009 at 5:57 PM, William Pierce evalsi...@hotmail.com wrote: Oh and one more thing...For historical reasons our apps run using msft technologies, so using SolrJ would be next to impossible at the present time Thanks in advance for your help! -- Bill -- From: William Pierce evalsi...@hotmail.com Sent: Saturday, October 10, 2009 5:47 PM To: solr-user@lucene.apache.org Subject: Tips on speeding up indexing needed... Folks: I have a corpus of approx 6 M documents each of approx 4K bytes. Currently, the way indexing is set up I read documents from a database and issue solr post requests in batches (batches are set up so that the maxPostSize of tomcat which is set to 2MB is adhered to). This means that in each batch we write approx 600 or so documents to SOLR. What I am seeing is that I am able to push about 2500 docs per minute or approx 40 or so per second. I saw in Erik's talk on Friday that speeds of 250 docs/sec to 25000 docs/sec have been achieved. Needless to say I am sure that performance numbers vary widely and are dependent on the domain, machine configurations, etc. I am running on Windows 2003 server, with 4 GB RAM, dual core xeon. Any tips on what I can do to speed this up? Thanks, Bill -- Lance Norskog goks...@gmail.com
Re: Facets with an IDF concept
In Solr a facet is assigned one number: the number of documents in which it appears. The facets are sorted by that number. Would your use case be solved with a second number that is formulated from the relevance of the associated documents? For example: facet relevance = count * sum(scores of documents) with coefficients for each input? To do this, for each document counted by the facet, you then have to find that document in the result list and pull the score. This would be much slower than the current count the documents algorithm. But if you have limited the document list via filter, this could still be fast enough for interactive use. If I wanted to make a tag cloud, this is how I would do it. On Fri, Oct 9, 2009 at 3:58 PM, Asif Rahman a...@newscred.com wrote: Hi Wojtek: Sorry for the late, late reply. I haven't implemented this yet, but it is on the (long) list of my todos. Have you made any progress? Asif On Thu, Aug 13, 2009 at 5:42 PM, wojtekpia wojte...@hotmail.com wrote: Hi Asif, Did you end up implementing this as a custom sort order for facets? I'm facing a similar problem, but not related to time. Given 2 terms: A: appears twice in half the search results B: appears once in every search result I think term A is more interesting. Using facets sorted by frequency, term B is more important (since it shows up first). To me, terms that appear in all documents aren't really that interesting. I'm thinking of using a combination of document count (in the result set, not globally) and term frequency (in the result set, not globally) to come up with a facet sort order. Wojtek -- View this message in context: http://www.nabble.com/Facets-with-an-IDF-concept-tp24071160p24959192.html Sent from the Solr - User mailing list archive at Nabble.com. -- Asif Rahman Lead Engineer - NewsCred a...@newscred.com http://platform.newscred.com -- Lance Norskog goks...@gmail.com
Re: Is negative boost possible?
If you dont want to do a pure negative query and just want boost a few documents down based on a matching criteria try to use linear function (one of the functions available in boost function) with a negative m (slope). We could solve our problem this way. We wanted to do negatively boost some documents based on certain keywords while Marc Sturlese wrote: :the only way to negative boost is to positively boost the inverse... : :(*:* -field1:value_to_penalize)^10 This will do the job aswell as bq supports pure negative queries (at least in trunk): bq=-field1:value_to_penalize^10 http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53db8c5fd31133dc3566318d1aad2bb23e07e hossman wrote: : Use decimal figure less than 1, e.g. 0.5, to express less importance. but that's stil la positive boost ... it still increases the scores of documents that match. the only way to negative boost is to positively boost the inverse... (*:* -field1:value_to_penalize)^10 : I am looking for a way to assign negative boost to a term in Solr query. : Our use scenario is that we want to boost matching documents that are : updated recently and penalize those that have not been updated for a long : time. There are other terms in the query that would affect the scores as : well. For example we construct a query similar to this: : : *:* field1:value1^2 field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO *]^5 : lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3 : : I notice it's not possible to simply use a negative boosting factor in the : query. Is there any way to achieve such result? : : Regards, : Shi Quan He : : -Hoss -- View this message in context: http://www.nabble.com/Is-negative-boost-possible--tp25025775p25840621.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems with WordDelimiterFilterFactory
On Fri, Oct 9, 2009 at 3:33 AM, Patrick Jungermann patrick.jungerm...@googlemail.com wrote: Hi Bern, the problem is the character sequence --. A query is not allowed to have minus characters that consequent upon another one. Remove one minus character and the query will be parsed without problems. Or you could escape the hyphen character. If you are using SolrJ, use ClientUtils.escapeQueryChars on the query string. -- Regards, Shalin Shekhar Mangar.
Re: Default query parameter for one core
On Fri, Oct 9, 2009 at 7:56 PM, Michael solrco...@gmail.com wrote: Hm... still no success. Can anyone point me to a doc that explains how to define and reference core properties? I've had no luck searching Google. Shalin, I gave an identical 'property name=shardsParam/' tag to each of my cores, and referenced ${solr.core.shardsParam} (with no default specified via a colon) in solrconfig.xml. I get an error on startup: I should have mentioned it earlier but the property name in your case would be just ${shardParam}. The solr.core string is only for automatically added properties such as name, instanceDir, dataDir, configName, schemaName -- Regards, Shalin Shekhar Mangar.
Re: Default query parameter for one core
On Fri, Oct 9, 2009 at 9:39 PM, Michael solrco...@gmail.com wrote: For posterity... After reading through http://wiki.apache.org/solr/SolrConfigXml and http://wiki.apache.org/solr/CoreAdmin and http://issues.apache.org/jira/browse/SOLR-646, I think there's no way for me to make only one core specify shards=foo, short of duplicating my solrconfig.xml for that core and adding one line: - I can't use a variable like ${shardsParam} in a single shared solrconfig.xml, because the line str name=shards${shardsParam}/str has to be in there, and that forces a (possibly empty) shards parameter onto cores that *don't* need one, causing a NullPointerException. Well, we can fix the NPE :) Please raise an issue. - I can't suck in just that one str line via a SOLR-646-style import, like #solrconfig.xml requestHandler lst name=defaults import file=${shardspec_file}/ /list /requestHandler #solr.xml core name=core0property name=shardspec_file value=some_file/... core name=core1property name=shardspec_file value=/dev/null/... because SOLR-646's import feature got cut. So I think my best bet is to make two mostly-identical solrconfig.xmls, and point core0 to the one specifying a shards= parameter: core name=core0 config=core0_solrconfig.xml/ I don't like the duplication of config, but at least it accomplishes my goal! There is another way too. Each plugin in Solr now supports a configuration attribute named enable which can be true or false. You can control the value (true/false) through a variable. So you can duplicate just the handle instead of the complete solrconfig.xml -- Regards, Shalin Shekhar Mangar.
Re: Slave re-replication of index over and over
On Fri, Oct 9, 2009 at 9:49 PM, Moshe Cohen mos...@gmail.com wrote: Hi, I am using SOLR 1.4 (July 23rd nightly build), with a master-slave setup. I have encountered twice an occurrence of the slave recreating the indexes over and over gain. Couldn't find any pointers in the log. Any help would be appreciated I vaguely remember a bug which caused the slave to loop. Can you upgrade to the latest nightly and see if that solves the problem? -- Regards, Shalin Shekhar Mangar.