Re: How to Create a weighted function (dismax or otherwise)
I am trying to create a feature that allows search results to be displayed by this formula sum(weight1*text relevance score, weight2 * price). weight1 and weight2 are numeric values that can be changed to influence the search results. I am sending the following query params to the Solr instance for searching. q=red defType=dismax qf=10^name+2^price Correct syntax of qf and pf is fieldName^boostFactor, i.e, qf=name^10 price^2 However your query is a word, so it won't match in price field. I assume price field is numeric. You can simulate sum(weight1*text relevance score, weight2 * price). with bf parameter and FunctionQueries. q=reddefTypeedismaxqf=namebf=product(price,w1/w2) http://wiki.apache.org/solr/FunctionQuery http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29
Re: Solr - search queries not returning results
I believe I am missing something very elementary. The following query returns zero hits: http://localhost:8983/solr/core0/select/?q=testabc With this URL, you are hitting the RequestHandler defined as requestHandler name= default=true / in your core0/conf/solrconfig.xml. However, using solritas, it finds many results: http://localhost:8983/solr/core0/itas?q=testabc With this one, you are hitting the one registered as requestHandler name=/itas Do you have any idea what the issue may be? Probably they have different default parameters configured. For example (e)dismax versus lucene query parser. lucene query parser searches testabc in your default field. dismax searches it in all of the fields defined in qf parameter. You can see the full parameter list by appending echoParams=all to your search URL.
Re: Using RAMDirectoryFactory in Master/Slave setup
...Using RAMDirectory really does not help performance... I kind of agree, but in my experience with lucene, there are cases where RAMDirectory helps a lot, with all its drawbacks (huge heap and gc() tuning). We had very good experience with MMAP on average, but moving to RAMDirectory with properly tuned gc() reduced 95% of slow performers in upper range of response times (e.g. slowest 5% queries). On average it made practically no difference. Maybe is this mitigated by better warm up on solr than our hand-tuned warmup, maybe not, I do not really know. In MMAP, you need to have really smart warm up (MMAP) to beat IO quirks, for RAMDir you need to tune gc(), choose your poison :) I argue, in some cases it is very hard to tame IO quirks (e.g. this is shared resource, you never know what going really on in shared app setup!). Then, see only what is happening on major merge and all these efforts with native linux directory to somehow get a grip on that... If you have spare ram, you are probably safer with RAMDirectory. From the theoretical perspective, in ideal case, RAM ought to be faster than disk (and more expensive). If this is not the case, we did something wrong. I have a feeling that this work Mike is doing with in memory Codecs (fst TermDictionary, pulsing codec co) in Lucene 4, native directory features ... will make RAMDirectory really obsolete for production setup. Cheers, eks On Wed, Jun 29, 2011 at 6:00 AM, Lance Norskog goks...@gmail.com wrote: Using RAMDirectory really does not help performance. Java garbage collection has to work around all of the memory taken by the segments. It works out that Solr works better (for most indexes) without using the RAMDirectory. On Sun, Jun 26, 2011 at 2:07 PM, nipunb ni...@walmartlabs.com wrote: PS: Sorry if this is a repost, I was unable to see my message in the mailing list - this may have been due to my outgoing email different from the one I used to subscribe to the list with. Overview – Trying to evaluate if keeping the index in memory using RAMDirectoryFactory can help in query performance.I am trying to perform the indexing on the master using solr.StandardDirectoryFactory and make those indexes accesible to the slave using solr.RAMDirectoryFactory Details: We have set-up Solr in a master/slave enviornment. The index is built on the master and then replicated to slaves which are used to serve the query. The replication is done using the in-built Java replication in Solr. On the master, in the indexDefaults of solrconfig.xml we have directoryFactory name=DirectoryFactory class=solr.StandardDirectoryFactory/ On the slave, I tried to use the following in the indexDefaults directoryFactory name=DirectoryFactory class=solr.RAMDirectoryFactory/ My slave shows no data for any queries. In solrconfig.xml it is mentioned that replication doesn’t work when using RAMDirectoryFactory, however this ( https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use it to have the index on disk and then load into memory. To test the sanity of my set-up, I changed solrconfig.xml in the slave to and replicated: directoryFactory name=DirectoryFactory class=solr.StandardDirectoryFactory/ I was able to see the results. Shouldn’t RAMDirectoryFactory be used for reading index from disk into memory? Any help/pointers in the right direction would be appreciated. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3111792p3111792.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Regex replacement not working!
Hi, i have this bunch of lines in my schema.xml that should do a replacement but it doesn't work! fieldType name=salary_max_text class=solr.TextField omitNorms=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=([0-9]+k?[.,]?[0-9]*).*?([0-9]+k?[.,]?[0-9]*) replacement=$2/ /analyzer /fieldType I need it to extract only the numbers from some other string. The strings can be anything: only letters (so it should replace it with an empty string), letters + numbers. The numbers can be in one of those formats 17000 -- ok 17,000 -- should be replaced with 17000 17.000 -- should be replaced with 17000 17k -- should be replaced with 17000 how can i accomplish this? -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3120748.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using RAMDirectoryFactory in Master/Slave setup
On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote: In MMAP, you need to have really smart warm up (MMAP) to beat IO quirks, for RAMDir you need to tune gc(), choose your poison :) Other alternatives are operating system RAM disks (avoids the GC problem) and using SSDs (nearly the same performance as RAM).
Re: Using RAMDirectoryFactory in Master/Slave setup
sure, SSD or RAM disks fix these problems with IO. Anyhow, I can really see no alternative for some in memory index for slaves, especially for low latency master-slave apps (high commit rate is a problem). having possibility to run slaves in memory that are slurping updates from Master seams to me like a preffered method (you need no twiddling with OS, just CPU and RAM is what you need for your slaves, run slave and point it to master ). I assume that update propagation times could be better by having some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that does reload() directly from the Master (maybe even uncommitted, somehow NRT-likish). Point being, lower latency update than current 1-5 Minutes (wiki recommended values) is not going to be possible with current master-slave solution, due to the nature of it (commit to disk on master, copy delta to slave disk, reload...) This is a lot of ping pong... ES and solandra are by nature better suited if you need update propagation in seconds range. It is just thinking aloud, and slightly off-topic... solr/lucene as it is today, rocks anyhow. On Wed, Jun 29, 2011 at 10:55 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote: In MMAP, you need to have really smart warm up (MMAP) to beat IO quirks, for RAMDir you need to tune gc(), choose your poison :) Other alternatives are operating system RAM disks (avoids the GC problem) and using SSDs (nearly the same performance as RAM).
Re: Using RAMDirectoryFactory in Master/Slave setup
sure, SSD or RAM disks fix these problems with IO. Anyhow, I can really see no alternative for some in memory index for slaves, especially for low latency master-slave apps (high commit rate is a problem). having possibility to run slaves in memory that are slurping updates from Master seams to me like a preffered method (you need no twiddling with OS, just CPU and RAM is what you need for your slaves, run slave and point it to master ). I assume that update propagation times could be better by having some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that does reload() directly from the Master (maybe even uncommitted, somehow NRT-likish). Point being, lower latency update than current 1-5 Minutes (wiki recommended values) is not going to be possible with current master-slave solution, due to the nature of it (commit to disk on master, copy delta to slave disk, reload...) This is a lot of ping pong... ES and solandra are by nature better suited if you need update propagation in seconds range. It is just thinking aloud, and slightly off-topic... solr/lucene as it is today, rocks anyhow. On Wed, Jun 29, 2011 at 10:55 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote: In MMAP, you need to have really smart warm up (MMAP) to beat IO quirks, for RAMDir you need to tune gc(), choose your poison :) Other alternatives are operating system RAM disks (avoids the GC problem) and using SSDs (nearly the same performance as RAM).
filters effect on search results
Hi, when i query for elegant in solr i get results for elegance too. *I used these filters for index analyze* WhitespaceTokenizerFactory StopFilterFactory WordDelimiterFilterFactory LowerCaseFilterFactory SynonymFilterFactory EnglishPorterFilterFactory RemoveDuplicatesTokenFilterFactory ReversedWildcardFilterFactory * and for query analyze:* .WhitespaceTokenizerFactory SynonymFilterFactory StopFilterFactory WordDelimiterFilterFactory LowerCaseFilterFactory EnglishPorterFilterFactory RemoveDuplicatesTokenFilterFactory I want to know which filter affecting my search result. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/filters-effect-on-search-results-tp3120968p3120968.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
Hi, i have this bunch of lines in my schema.xml that should do a replacement but it doesn't work! fieldType name=salary_max_text class=solr.TextField omitNorms=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=([0-9]+k?[.,]?[0-9]*).*?([0-9]+k?[.,]?[0-9]*) replacement=$2/ /analyzer /fieldType charFilter definitions should be above the tokenizer definition. i.e., analyzer charFilter tokenizer filter
Re: filters effect on search results
Hi, when i query for elegant in solr i get results for elegance too. *I used these filters for index analyze* WhitespaceTokenizerFactory StopFilterFactory WordDelimiterFilterFactory LowerCaseFilterFactory SynonymFilterFactory EnglishPorterFilterFactory RemoveDuplicatesTokenFilterFactory ReversedWildcardFilterFactory * and for query analyze:* .WhitespaceTokenizerFactory SynonymFilterFactory StopFilterFactory WordDelimiterFilterFactory LowerCaseFilterFactory EnglishPorterFilterFactory RemoveDuplicatesTokenFilterFactory I want to know which filter affecting my search result. It is EnglishPorterFilterFactory, you can verify it from admin/analysis.jsp page.
Re: Regex replacement not working!
fieldType name=salary_min_text class=solr.TextField analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.* replacement=$1/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.* replacement=$1/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType fieldType name=salary_max_text class=solr.TextField analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.* replacement=$2/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.* replacement=$2/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType this is the final version of my schema part, but what i get is this: doc float name=score1.0/float str name=salaryNegotiable/str str name=salary_maxNegotiable/str str name=salary_minNegotiable/str /doc doc float name=score1.0/float str name=salary£7 to £8 per hour/str str name=salary_max£7 to £8 per hour/str str name=salary_min£7 to £8 per hour/str /doc doc float name=score1.0/float str name=salary£125 to £150 per day/str str name=salary_max£125 to £150 per day/str str name=salary_min£125 to £150 per day/str /doc which is not what i'm expecting... the regular expression works in http://www.fileformat.info/tool/regex.htm without any problem -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fuzzy Query Param
Which version of Solr (Lucene) are you using? Recent versions of Lucene now accept ~N 1 to be edit distance. Ie foobar~2 matches any term that's = 2 edit distance away from foobar. Mike McCandless http://blog.mikemccandless.com On Tue, Jun 28, 2011 at 11:00 PM, entdeveloper cameron.develo...@gmail.com wrote: According to the docs on lucene query syntax: Starting with Lucene 1.9 an additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. I was messing around with this and started doing queries with values greater than 1 and it seemed to be doing something. However I haven't been able to find any documentation on this. What happens when specifying a fuzzy query with a value 1? tiger~2 animal~3 -- View this message in context: http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3120235.html Sent from the Solr - User mailing list archive at Nabble.com.
How to disable Phonetic search
I am using solr1.4 When I search for keyword ansys I get lot of posts. but when I search for ansys NOT ansi I get nothing. I guess its because of Phonetic search, ansys is converted into ansi ( that is NOT keyword) and nothing returns. How to handle this kind of problem. -- Thanks and Regards Mohammad Shariq
Re: Regex replacement not working!
fieldType name=salary_min_text class=solr.TextField analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.* replacement=$1/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.* replacement=$1/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType fieldType name=salary_max_text class=solr.TextField analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.* replacement=$2/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.* replacement=$2/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType this is the final version of my schema part, but what i get is this: doc float name=score1.0/float str name=salaryNegotiable/str str name=salary_maxNegotiable/str str name=salary_minNegotiable/str /doc doc float name=score1.0/float str name=salary£7 to £8 per hour/str str name=salary_max£7 to £8 per hour/str str name=salary_min£7 to £8 per hour/str /doc doc float name=score1.0/float str name=salary£125 to £150 per day/str str name=salary_max£125 to £150 per day/str str name=salary_min£125 to £150 per day/str /doc which is not what i'm expecting... the regular expression works in http://www.fileformat.info/tool/regex.htm without any problem I am not good with regular expressions, but response always contains untouched/un-analyzed version of fields. You can visually test your fieldType/regex on admin/analysis.jsp page. It show indexed terms step by step.
Re: How to disable Phonetic search
I am using solr1.4 When I search for keyword ansys I get lot of posts. but when I search for ansys NOT ansi I get nothing. I guess its because of Phonetic search, ansys is converted into ansi ( that is NOT keyword) and nothing returns. How to handle this kind of problem. Find and remove occurrences of solr.PhoneticFilterFactory from your schema.xml file.
Re: conditionally update document on unique id
On Wed, Jun 29, 2011 at 2:01 AM, eks dev eks...@yahoo.co.uk wrote: Quick question, Is there a way with solr to conditionally update document on unique id? Meaning, default, add behavior if id is not already in index and *not to touch index if already there. Deletes are not important (no sync issues). I am asking because I noticed with deduplication turned on, index-files get modified even if I update the same documents again (same signatures). I am facing very high dupes rate (40-50%), and setup is going to be master-slave with high commit rate (requirement is to reduce propagation latency for updates). Having unnecessary index modifications is going to waste effort to ship the same information again and again. if there is no standard way, what would be the fastest way to check if Term exists in index from UpdateRequestProcessor? I'd suggest that you use the searcher's getDocSet with a TermQuery. Use the SolrQueryRequest#getSearcher so you don't need to worry about ref counting. e.g. req.getSearcher().getDocSet(new TermQuery(new Term(signatureField, sigString))).size(); I intend to extend SignatureUpdateProcessor to prevent a document from propagating down the chain if this happens? Would that be a way to deal with it? I repeat, there are no deletes to make headaches with synchronization Yes, that should be fine. -- Regards, Shalin Shekhar Mangar.
Re: filters effect on search results
Indeed, I find the Porter stemmer to be too 'aggressive' for my taste, I prefer the EnglishMinimalStemFilterFactory, with the caveat that it depends on your data set. Cheers François On Jun 29, 2011, at 6:21 AM, Ahmet Arslan wrote: Hi, when i query for elegant in solr i get results for elegance too. *I used these filters for index analyze* WhitespaceTokenizerFactory StopFilterFactory WordDelimiterFilterFactory LowerCaseFilterFactory SynonymFilterFactory EnglishPorterFilterFactory RemoveDuplicatesTokenFilterFactory ReversedWildcardFilterFactory * and for query analyze:* .WhitespaceTokenizerFactory SynonymFilterFactory StopFilterFactory WordDelimiterFilterFactory LowerCaseFilterFactory EnglishPorterFilterFactory RemoveDuplicatesTokenFilterFactory I want to know which filter affecting my search result. It is EnglishPorterFilterFactory, you can verify it from admin/analysis.jsp page.
Encoding problem while indexing
I am working on indexing arabic documents containg arabic diacritics and dotless characters (old arabic characters), I am using Apache Tomcat server, and I am using my modified version of the aramorph analyzer as the arabic analyzer. I managed on the development enviorment to normalize the arabic diacritics and dotless characters (same concept as in the solr.ArabicNormalizationFilterFactory). and i can verfiy that the analyzer is working fine, and i get the correct stem for arabic words. the input text file for testing has a utf-8 encoding. When i build the aramorph jar file and place it under solr lib, the diacritics and the dotless characters splits the word. I made sure that the server.xml contains the URI-Encoding=utf-8. I also made sure that the text being send to solr using solj is utf-8 encoding example : solr.addBean(new Doc(4,new String(حِباًَ.getBytes(UTF8; but nothing is working. I tried to use the analyze link on solr admin for both indexing and querying and both shows that the arabic word is splited if a diacritics or dotless character is found. Do you have any idea what might be the problem schema snippet: fieldType name=text class=solr.TextField analyzer type=index class=gpl.pierrick.brihaye.aramorph.lucene.ArabicNormalizeStemmer/ analyzer type=query class=gpl.pierrick.brihaye.aramorph.lucene.ArabicNormalizeStemmer/ /fieldType I also added the following parameter to the JVM: -Dfile.encoding=UTF-8 Thanks, engy
Re: Regex replacement not working!
Index Analyzer org.apache.solr.analysis.KeywordTokenizerFactory {luceneMatchVersion=LUCENE_31} position1 term text £22000 - £25000 per annum + benefits startOffset 0 endOffset 36 org.apache.solr.analysis.PatternReplaceFilterFactory {replacement=$2, pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*, luceneMatchVersion=LUCENE_31} position1 term text 25000 startOffset 0 endOffset 36 this is my output for the field salary_max, it seems to be working from the admin jsp interface -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121353.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
Index Analyzer org.apache.solr.analysis.KeywordTokenizerFactory {luceneMatchVersion=LUCENE_31} position 1 term text £22000 - £25000 per annum + benefits startOffset 0 endOffset 36 org.apache.solr.analysis.PatternReplaceFilterFactory {replacement=$2, pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*, luceneMatchVersion=LUCENE_31} position 1 term text 25000 startOffset 0 endOffset 36 this is my output for the field salary_max, it seems to be working from the admin jsp interface That's good to know. If you explain your final goal in detail, users can give better pointers.
Re: Regex replacement not working!
i have the string You may earn 25k dollars per week stored in the field salary i'm using 2 copyfields salary_min and salary_max with source in salary with those 2 datatypes salary is text salary_min is salary_min_text salary_max is salary_max_text so, i was expecting this: solr updates its index solr copies the value from salary to salary_min and applies the value with the regex solr copies the value from salary to salary_max and applies the value with the regex but it's not working, it copies the value from one field to another, but the filter isn't applied, even if it's working as you could see -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121386.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
i have the string You may earn 25k dollars per week stored in the field salary i'm using 2 copyfields salary_min and salary_max with source in salary with those 2 datatypes salary is text salary_min is salary_min_text salary_max is salary_max_text so, i was expecting this: solr updates its index solr copies the value from salary to salary_min and applies the value with the regex solr copies the value from salary to salary_max and applies the value with the regex but it's not working, it copies the value from one field to another, but the filter isn't applied, even if it's working as you could see Okey, that makes sense. copyField just copies the content. It has nothing to do with analyzers. Two solutions comes to my mind. 1-) If you are using data import handler, I think (i am not good with regex), you can use regex transformer to populate these two fields. http://wiki.apache.org/solr/DataImportHandler#RegexTransformer 2-) If not, you can populate these two field in a custom UpdateRequestProcessor. There is an example to modify and to start here : http://wiki.apache.org/solr/UpdateRequestProcessor
what is solr clustering component
I just went through solr wiki page for clustering. But i am not getting what is the benefit of using clustering. Can anyone tell me what is actually clusering and what its use in indexing and searching. does it effect search results?? Please reply - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-solr-clustering-component-tp3121484p3121484.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
ok, but i'm not applying the filtering on the copyfields. this is how my schema looks: field name=salary type=text indexed=true stored=true / field name=salary_min type=salary_min_text indexed=true stored=true / field name=salary_max type=salary_max_text indexed=true stored=true / copyField source=salary dest=salary_min / copyField source=salary dest=salary_max / and the two datatypes defined before. that's why i tought i could first use copyField to copy the value then index them with my two datatypes filtering... -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121497.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filters effect on search results
admin/analysis.jsp page shows RemoveDuplicatesTokenFilterFactory ,ReversedWildcardFilterFactory ,.EnglishPorterFilterFactory - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/filters-effect-on-search-results-tp3120968p3121506.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to disable Phonetic search
I was using SnowballPorterFilterFactory for stemming, and that stammer was stemming the words. I added the keyword ansys to file protwords.txt. Now the stemming is not happening for ansys and Its OK now. On 29 June 2011 17:12, Ahmet Arslan iori...@yahoo.com wrote: I am using solr1.4 When I search for keyword ansys I get lot of posts. but when I search for ansys NOT ansi I get nothing. I guess its because of Phonetic search, ansys is converted into ansi ( that is NOT keyword) and nothing returns. How to handle this kind of problem. Find and remove occurrences of solr.PhoneticFilterFactory from your schema.xml file. -- Thanks and Regards Mohammad Shariq
Re: Regex replacement not working!
Hi Samuele, It's not clear for me if your goal is to search on that field (for example, salary_min:[100 TO 200]) or if you want to show the transformed field to the user (so you want the result of the regex replacement to be included in the search results). If your goal is to show the results to the user, then (as Ahmet said in a previous mail) it won't work, because the content of the documents is stored verbatim. The analysis only affects the way that documents are searched. If your goal is to search, could you please show us the query that you're using to test the use case? Thanks! *Juan* On Wed, Jun 29, 2011 at 10:02 AM, samuele.mattiuzzo samum...@gmail.comwrote: ok, but i'm not applying the filtering on the copyfields. this is how my schema looks: field name=salary type=text indexed=true stored=true / field name=salary_min type=salary_min_text indexed=true stored=true / field name=salary_max type=salary_max_text indexed=true stored=true / copyField source=salary dest=salary_min / copyField source=salary dest=salary_max / and the two datatypes defined before. that's why i tought i could first use copyField to copy the value then index them with my two datatypes filtering... -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121497.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
Am 29.06.2011 12:30, schrieb samuele.mattiuzzo: fieldType name=salary_min_text class=solr.TextField analyzer type=index ... this is the final version of my schema part, but what i get is this: doc float name=score1.0/float str name=salaryNegotiable/str str name=salary_maxNegotiable/str str name=salary_minNegotiable/str /doc ... The mistake is that you assume that the filter applied to the result. This is not true. Index filters only affect the index (as the name says), not the contents. Therefore, if you have copyFields that are stored, the'll always return the same value as the original field. Try inspecting your index data with luke or the admin console. Then you'll see whether your regex applies. Greetings, Kuli
Re: Regex replacement not working!
my goal is/was storing the value into the field, and i get i have to create my Update handler. i was trying to use query with salary_min:[100 TO 200] and it's actually working... since i just need it to search, i'll stay with this solution is the [100 TO 200] a performance killer? i remember reading something around, but cannot find it again... -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr - search queries not returning results
Thanks to both of you, I understand now and am now getting the expected results. Cheers! On Wed, Jun 29, 2011 at 2:21 AM, Ahmet Arslan iori...@yahoo.com wrote: I believe I am missing something very elementary. The following query returns zero hits: http://localhost:8983/solr/core0/select/?q=testabc With this URL, you are hitting the RequestHandler defined as requestHandler name= default=true / in your core0/conf/solrconfig.xml. However, using solritas, it finds many results: http://localhost:8983/solr/core0/itas?q=testabc With this one, you are hitting the one registered as requestHandler name=/itas Do you have any idea what the issue may be? Probably they have different default parameters configured. For example (e)dismax versus lucene query parser. lucene query parser searches testabc in your default field. dismax searches it in all of the fields defined in qf parameter. You can see the full parameter list by appending echoParams=all to your search URL.
Re: Regex replacement not working!
my goal is/was storing the value into the field, and i get i have to create my Update handler. i was trying to use query with salary_min:[100 TO 200] and it's actually working... since i just need it to search, i'll stay with this solution is the [100 TO 200] a performance killer? i remember reading something around, but cannot find it again... Please be aware that range query is working on strings. It will return unwanted results. String sorting and integer sorting is different. If you are after range queries you need to defied price_min and price_max fields as trie-based types. tint, tdouble etc. And populate them with the update processor or at client side.
Re: what is solr clustering component
I just went through solr wiki page for clustering. But i am not getting what is the benefit of using clustering. Can anyone tell me what is actually clusering and what its use in indexing and searching. does it effect search results?? Please reply It is for search result clustering. Try the demo with the query word jaguar. http://search.carrot2.org/stable/search It generates clusters and labels. (on the left)
Re: Regex replacement not working!
ok, last question on the UpdateProcessor: can you please give me the steps to implement my own? i mean, i can push my custom processor in solr's code, and then what? i don't understand how i have to change the solrconf.xml and how can i bind that to the updater i just wrotea and also i don't understand how i do have to change the schema.xml i'm sorry for this question, but i started working on solr 5 days ago and for some things i really need a lot of documentation, and this isn't fully covered anywhere -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
ok, last question on the UpdateProcessor: can you please give me the steps to implement my own? i mean, i can push my custom processor in solr's code, and then what? i don't understand how i have to change the solrconf.xml and how can i bind that to the updater i just wrotea and also i don't understand how i do have to change the schema.xml i'm sorry for this question, but i started working on solr 5 days ago and for some things i really need a lot of documentation, and this isn't fully covered anywhere Implementing a conditional copyField example is a good place start. You can use it as a template. You don't need to modify the solr source code for this. You can write your class, compile it, put the resulting jar into solrHome/lib directory. It is explained here, how to register your new update processor in solrconfig.xml http://wiki.apache.org/solr/SolrPlugins#UpdateRequestProcessorFactory
Re: Regex replacement not working!
I have had the same problems with regex and I went with the regular pattern replace filter rather than the charfilter. When I added it to the very end of the chain, only then would it work...I am on Solr 3.2. I have also noticed that the HTML filter factory is not working either. When I dump the field that it's supposed to be working on, all the hyperlinks and everything that you would expect to be stripped are still present. Adam On Wed, Jun 29, 2011 at 10:04 AM, samuele.mattiuzzo samum...@gmail.comwrote: ok, last question on the UpdateProcessor: can you please give me the steps to implement my own? i mean, i can push my custom processor in solr's code, and then what? i don't understand how i have to change the solrconf.xml and how can i bind that to the updater i just wrotea and also i don't understand how i do have to change the schema.xml i'm sorry for this question, but i started working on solr 5 days ago and for some things i really need a lot of documentation, and this isn't fully covered anywhere -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
too bad it is still in todo, that's why i was asking some for some tips on writing, compiling, registration, calling... -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
too bad it is still in todo, that's why i was asking some for some tips on writing, compiling, registration, calling... Here is general information about how to customize solr via plugins. http://wiki.apache.org/solr/SolrPlugins Here is the registration and code example. http://wiki.apache.org/solr/UpdateRequestProcessor
Solr 3.2 filter cache warming taking longer than 1.4.1
I have noticed a significant difference in filter cache warming times on my shards between 3.2 and 1.4.1. What can I do to troubleshoot this? Please let me know what additional information you might need to look deeper. I know this isn't enough. It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15 seconds to do an autowarm count of 4 on 3.2. The only explicit warming query is *:*, sorted descending by post_date, a tlong field containing a UNIX timestamp, precisionStep 16. The indexes are not entirely identical, but the new one did evolve from the old one. Perhaps one of the experts might spot something that makes for much slower filter cache warming, or some way to look deeper if this seems wrong? Is there a way to see the search URL bits that populated the cache? Index differences: The new index has four extra small fields, is no longer removing stopwords, and has omitTermFreqAndPositions enabled on a significant number of fields. Most of the fields are tokenized text, and now more than half of those don't have tf and tp enabled. Naturally the largest text field where most of the matches happen still does have them enabled. To increase reindex speed, the new index has a termIndexInterval of 1024, the old one is at the default of 128. In terms of raw size, the new index is less than one percent larger than the old one. The old shards average out to 17.22GB, the new ones to 17.41GB. Here's an overview of the differences of each type of file (comparing the huge optimized segment only, not the handful of tiny ones since) on one the index with the largest size gap, old value listed first: fdt: 6317180127/6055634923 (4.1% decrease) fdx: 76447972/75647412 (1% decrease) fnm: 382, 338 (44 bytes! woohoo!) frq: 2828400926/2873249038 (1.5% increase) nrm: 28367782/38223988 (35% increase) prx: 2449154203/2684249069 (9.5% increase) tii: 1686298/13329832 (790% increase) tis: 923045932/999294109 (8% increase) tvd: 18910972/19111840 (1% increase) tvf: 5867309063/5640332282 (3.9% decrease) tvx: 151294820/152895940 (1% increase) The tii and nrm files are the only ones that saw a significant size increase, but the tii file is MUCH bigger. Thanks, Shawn
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2 and then run some of the queries to see if you can figure out which are slower? Do any of the queries have stopwords in fields where you now index those? If so, that could entirely account for the difference. -Yonik http://www.lucidimagination.com On Wed, Jun 29, 2011 at 10:59 AM, Shawn Heisey s...@elyograg.org wrote: I have noticed a significant difference in filter cache warming times on my shards between 3.2 and 1.4.1. What can I do to troubleshoot this? Please let me know what additional information you might need to look deeper. I know this isn't enough. It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15 seconds to do an autowarm count of 4 on 3.2. The only explicit warming query is *:*, sorted descending by post_date, a tlong field containing a UNIX timestamp, precisionStep 16. The indexes are not entirely identical, but the new one did evolve from the old one. Perhaps one of the experts might spot something that makes for much slower filter cache warming, or some way to look deeper if this seems wrong? Is there a way to see the search URL bits that populated the cache? Index differences: The new index has four extra small fields, is no longer removing stopwords, and has omitTermFreqAndPositions enabled on a significant number of fields. Most of the fields are tokenized text, and now more than half of those don't have tf and tp enabled. Naturally the largest text field where most of the matches happen still does have them enabled. To increase reindex speed, the new index has a termIndexInterval of 1024, the old one is at the default of 128. In terms of raw size, the new index is less than one percent larger than the old one. The old shards average out to 17.22GB, the new ones to 17.41GB. Here's an overview of the differences of each type of file (comparing the huge optimized segment only, not the handful of tiny ones since) on one the index with the largest size gap, old value listed first: fdt: 6317180127/6055634923 (4.1% decrease) fdx: 76447972/75647412 (1% decrease) fnm: 382, 338 (44 bytes! woohoo!) frq: 2828400926/2873249038 (1.5% increase) nrm: 28367782/38223988 (35% increase) prx: 2449154203/2684249069 (9.5% increase) tii: 1686298/13329832 (790% increase) tis: 923045932/999294109 (8% increase) tvd: 18910972/19111840 (1% increase) tvf: 5867309063/5640332282 (3.9% decrease) tvx: 151294820/152895940 (1% increase) The tii and nrm files are the only ones that saw a significant size increase, but the tii file is MUCH bigger. Thanks, Shawn
Solr just 'hangs' under load test - ideas?
Hi, all. I'm hoping someone has some thoughts here. We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the getLuceneVersion() calls, but use luceneMatchVersion directly). We're running in a Tomcat instance, 64 bit Java. CATALINA_OPTS are: -Xmx7168m -Xms7168m -XX:MaxPermSize=256M We're running 2 Solr cores, with the same schema. We use SolrJ to run our searches from a Java app running in JBoss. JBoss, Tomcat, and the Solr Index folders are all on the same server. In case it's relevant, we're using JMeter as a load test harness. We're running on Solaris, a 16 processor box with 48GB physical memory. I've run a successful load test at a 100 user load (at that rate there are about 5-10 solr searches / second), and solr search responses were coming in under 100ms. When I tried to ramp up, as far as I can tell, Solr is just hanging. (We have some logging statements around the SolrJ calls - just before, we log how long our query construction takes, then we run the SolrJ query and log the search times. We're getting a number of the query construction logs, but no corresponding search time logs). Symptoms: The Tomcat and JBoss processes show as well under 1% CPU, and they are still the top processes. CPU states show around 99% idle. RES usage for the two Java processes around 3GB each. LWP under 120 for each. STATE just shows as sleep. JBoss is still 'alive', as I can get into a piece of software that talks to our JBoss app to get data. We set things up to use log4j logging for Solr - the log isn't showing any errors or exceptions. We're not indexing - just searching. Back in January, we did load testing on a prototype, and had no problems (though that was Solr 1.4 at the time). It ramped up beautifully - bottle necks were our apps, not Solr. What I'm benchmarking now is a descendent of that prototyping - a bit more complex on searches and more fields in the schema, but same basic search logic as far as SolrJ usage. Any ideas? What else to look at? Ringing any bells? I can send more details if anyone wants specifics... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/
[Announce] Solr 3.2 with RankingAlgorithm NRT capability, very high performance 1428 tps
Hi! I would like to announce Solr 3.2 with RankingAlgorithm has Near Real Time capability now. The NRT performance is very high, 1428 documents/sec [ MBArtists 390k index]. The NRT functionality allows you to add documents without the IndexSearchers being closed or caches being cleared. A commit is not needed with the document update. Searches can run concurrently with document updates. No changes are needed except for enabling the NRT through solrconfig.xml. A new visible attribute has been introduced that allows one to tune the visibility of a document added to the index. The default is 150ms. This can be set to 0 enabling documents to become visible for searches as soon as they are added. The visibility attribute is added as below: realtime visible=150true/realtime With visible attribute at 200ms, the performance is about 1428 TPS (document adds) on a dual core intel system with 2GB heap with searches in parallel. I have a wiki page that describes NRT performance in detail and can be accessed from here: http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver3.2 You can download Solr 3.2 with RankingAlgorithm (NRT version) from here: http://solr-ra.tgels.com I would like to invite you to give this version a try as the performance is very high, comparable to the default load. Regards, - Nagendra Nagarajayya http://solr-ra.tgels.com http://rankingalgorithm.tgels.com
Field Value Highlighting
Hi, I need help in figuring out the right configuration to perform highlighting in Solr. I can retrieve the matching documents plus the highlighted matches. I've done another tool called DTSearch where it would return the offset positions of the field value to highlight. I've tried a few different configurations but it appears that Solr returns the actual matched documents + a section called highlighting with snippets (which can be configured to have length of 'X'). I was wondering if there is a way to retrieve just the actual documents with highlighted values or a way to retrieve the offset position of the field values so that I can perform highlighting. I am using SolrNet client to integrate to Solr. I've also tweaked the configs and used the web admin interface to test highlighting but not yet successful. Thank you in advance. Z
Re: Solr just 'hangs' under load test - ideas?
Can you get a thread dump to see what is hanging? -Yonik http://www.lucidimagination.com On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford bob.sandif...@sirsidynix.com wrote: Hi, all. I'm hoping someone has some thoughts here. We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the getLuceneVersion() calls, but use luceneMatchVersion directly). We're running in a Tomcat instance, 64 bit Java. CATALINA_OPTS are: -Xmx7168m -Xms7168m -XX:MaxPermSize=256M We're running 2 Solr cores, with the same schema. We use SolrJ to run our searches from a Java app running in JBoss. JBoss, Tomcat, and the Solr Index folders are all on the same server. In case it's relevant, we're using JMeter as a load test harness. We're running on Solaris, a 16 processor box with 48GB physical memory. I've run a successful load test at a 100 user load (at that rate there are about 5-10 solr searches / second), and solr search responses were coming in under 100ms. When I tried to ramp up, as far as I can tell, Solr is just hanging. (We have some logging statements around the SolrJ calls - just before, we log how long our query construction takes, then we run the SolrJ query and log the search times. We're getting a number of the query construction logs, but no corresponding search time logs). Symptoms: The Tomcat and JBoss processes show as well under 1% CPU, and they are still the top processes. CPU states show around 99% idle. RES usage for the two Java processes around 3GB each. LWP under 120 for each. STATE just shows as sleep. JBoss is still 'alive', as I can get into a piece of software that talks to our JBoss app to get data. We set things up to use log4j logging for Solr - the log isn't showing any errors or exceptions. We're not indexing - just searching. Back in January, we did load testing on a prototype, and had no problems (though that was Solr 1.4 at the time). It ramped up beautifully - bottle necks were our apps, not Solr. What I'm benchmarking now is a descendent of that prototyping - a bit more complex on searches and more fields in the schema, but same basic search logic as far as SolrJ usage. Any ideas? What else to look at? Ringing any bells? I can send more details if anyone wants specifics... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/
CopyField into another CopyField?
In solr, is it possible to 'chain' copyfields so that you can copy the value of one into another? Example: field name=title ... / field name=author ... / field name=name ... / field name=autocomplete ... / field name=ac_spellcheck ... / copyField source=title dest=autocomplete / copyField source=author dest=autocomplete / copyField source=name dest=autocomplete / copyField source=autocomplete dest=ac_spellcheck / Point being, every time I add a new field to the autocomplete, I want it to automatically also be added to ac_spellcheck without having to do it twice. -- View this message in context: http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-tp3122408p3122408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fuzzy Query Param
I'm using Solr trunk. If it's levenstein/edit distance, that's great, that's what I want. It just didn't seem to be officially documented anywhere so I wanted to find out for sure. Thanks for confirming. -- View this message in context: http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3122418.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting by vale of field
Hi Say I have a field type in multiple documents which can be either type:bike type:boat type:car type:van and I want to order a search to give me documents in the following order type:car type:van type:boat type:bike Is there a way I can do this just using the sort method? Thanks
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 6/29/2011 9:17 AM, Yonik Seeley wrote: Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2 and then run some of the queries to see if you can figure out which are slower? Do any of the queries have stopwords in fields where you now index those? If so, that could entirely account for the difference. The query cache warms very quickly, it's the filter cache that's taking forever. I'm not intimately familiar with what is being put in our filter queries by our webapp, but I'd be a little surprised if there are stopwords there. A quick grep through solr logs (when I've turned it up to INFO) for the really common ones didn't reveal any. People do type them in fairly frequently, but they go into q= ... fq values are constructed internally, not from what a user types, and as far as I know, they involve fields that have never had stopwords removed. I will do some experimentation with your suggestions. Thanks, Shawn
RE: Sorting by vale of field
You could try adding a new int field (like typeSort) that has the desired sort values. So when adding a document with type:car, also add typeSort:1; when adding type:van, also add typeSort:2; etc. Then you could do sort=typeSort asc to get them in your desired order. I think this is also possible with custom function queries, but I've never done that. -Michael
Methods for preserving text entities?
We have some text entities in fields to index (and search) like so: field name=textSolr is a really myword; search engine!/field I would like to preserve/protect myword; and not resolve it in the indexing or search results. What sort of methods have people used? I realize the results are returned in XML format, so preserving these text entities may be hard. Are people replacing the character or doing something else? Thanks in advance!
Re: Default schema - 'keywords' not multivalued
On 06/28/2011 12:04 PM, Chris Hostetter wrote: : I'm streaming over the document content (presumably via tika) and its : gathering the document's metadata which includes the keywords metadata field. : Since I'm also passing that field from the DB to the REST call as a list (as : you suggested) there is a collision because the keywords field is single : valued. : : I can change this behavior using a copy field. What I wanted to know is if : there was a specific reason the default schema defined a field like keywords : single valued so I could make sure I wasn't missing something before I changed : things. That file is just an example, you're absolutely free to change it to meet your use case. I'm not very familiar with Tika, but based on the comment in the example config... !-- Common metadata fields, named specifically to match up with SolrCell metadata when parsing rich documents such as Word, PDF. Some fields are multiValued only because Tika currently may return multiple values for them. -- ...i suspect it was intentional that that field is *not* multiValued (i guess Tika always returns a single delimited value?) but if you have multiple descrete values you want to send for your DB backed data there is no downside to changing that. : While I'm at it, I'd REALLY like to know how to use DIH to index the metadata : from the database while simultaneously streaming over the document content and : indexing it. I've never quite figured it out yet but I have to believe it is : a possibility. There's a TikaEntityProcessor that can be used to have Tika crunch the data that comes from an entity and extract out specific fields, and it can be used in combination with a JdbcDataSource and a BinFileDataSource so that a field in your db data specifies the name of a file on disk to use as the TikaEntity -- but i've personally never tried it Here's a simple example someone posted last year that they got working... http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html -Hoss Thanks Hoss, I'll just change the schema then. The problem with TikaEntityProcessor is this installation is still running v1.4.1 so I'll need to upgrade. Any short and sweet instructions for upgrading to 3.2? I have a pretty straight forward Tomcat install, would just dropping in the new war suffice? - Tod
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 6/29/2011 11:27 AM, Shawn Heisey wrote: On 6/29/2011 9:17 AM, Yonik Seeley wrote: Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2 and then run some of the queries to see if you can figure out which are slower? Do any of the queries have stopwords in fields where you now index those? If so, that could entirely account for the difference. The query cache warms very quickly, it's the filter cache that's taking forever. I'm not intimately familiar with what is being put in our filter queries by our webapp, but I'd be a little surprised if there are stopwords there. A quick grep through solr logs (when I've turned it up to INFO) for the really common ones didn't reveal any. People do type them in fairly frequently, but they go into q= ... fq values are constructed internally, not from what a user types, and as far as I know, they involve fields that have never had stopwords removed. I should add that this happens only after the index has had at least a few hundred queries, when deletes are committed. The delete process runs every ten minutes, and checks for document presence before issuing the delete, which avoids unnecessary commits. Just now, three of the six shards had documents deleted, and they took 29.07, 27.57, and 28.66 seconds to warm. The 1.4.1 counterpart to the 29.07 second one only took 4.78 seconds, and it did twice as many autowarm queries. I know it's not my single *:* sorted warming query (firstSearcher and newSearcher), because on solr startup with either version, warm time is 0.01 seconds. I have useColdSearcher set to false. Thanks, Shawn
Re: Methods for preserving text entities?
Ah, I think I suddenly answered my own question, but appreciate further insight if you have it. I converted the in myword; to an amp; so it looks like this: field name=textSolr is a really amp;myword; search engine!/field On Wed, Jun 29, 2011 at 12:40 PM, Walter Closenfleight walter.p.closenflei...@gmail.com wrote: We have some text entities in fields to index (and search) like so: field name=textSolr is a really myword; search engine!/field I would like to preserve/protect myword; and not resolve it in the indexing or search results. What sort of methods have people used? I realize the results are returned in XML format, so preserving these text entities may be hard. Are people replacing the character or doing something else? Thanks in advance!
Re: How to Create a weighted function (dismax or otherwise)
Are there any best practices or preferred ways to accomplish what I am trying? Do the params for defType, qf and bf belong in a solr request handler? Is it possible to have the weights as variables so they can be tweaked till we find the optimum balance in showing our results? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Create-a-weighted-function-dismax-or-otherwise-tp3119977p3122630.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr just 'hangs' under load test - ideas?
OK - I figured it out. It's not solr at all (and I'm not really surprised). In the prototype benchmarks, we used a different instance of tomcat than we're using for production load tests. Our prototype tomcat instance had no maxThreads value set, so was using the default value of 200. The production tomcat environment has a maxThreads value of 15 - we were just running out of threads and getting connection refused exceptions thrown when we ramped up the Solr hits past a certain level. Thanks for considering, Yonik (and any others waiting to see any reply I made)... (As others have said - this listserv is great!) Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, June 29, 2011 12:18 PM To: solr-user@lucene.apache.org Subject: Re: Solr just 'hangs' under load test - ideas? Can you get a thread dump to see what is hanging? -Yonik http://www.lucidimagination.com On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford bob.sandif...@sirsidynix.com wrote: Hi, all. I'm hoping someone has some thoughts here. We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the getLuceneVersion() calls, but use luceneMatchVersion directly). We're running in a Tomcat instance, 64 bit Java. CATALINA_OPTS are: -Xmx7168m -Xms7168m -XX:MaxPermSize=256M We're running 2 Solr cores, with the same schema. We use SolrJ to run our searches from a Java app running in JBoss. JBoss, Tomcat, and the Solr Index folders are all on the same server. In case it's relevant, we're using JMeter as a load test harness. We're running on Solaris, a 16 processor box with 48GB physical memory. I've run a successful load test at a 100 user load (at that rate there are about 5-10 solr searches / second), and solr search responses were coming in under 100ms. When I tried to ramp up, as far as I can tell, Solr is just hanging. (We have some logging statements around the SolrJ calls - just before, we log how long our query construction takes, then we run the SolrJ query and log the search times. We're getting a number of the query construction logs, but no corresponding search time logs). Symptoms: The Tomcat and JBoss processes show as well under 1% CPU, and they are still the top processes. CPU states show around 99% idle. RES usage for the two Java processes around 3GB each. LWP under 120 for each. STATE just shows as sleep. JBoss is still 'alive', as I can get into a piece of software that talks to our JBoss app to get data. We set things up to use log4j logging for Solr - the log isn't showing any errors or exceptions. We're not indexing - just searching. Back in January, we did load testing on a prototype, and had no problems (though that was Solr 1.4 at the time). It ramped up beautifully - bottle necks were our apps, not Solr. What I'm benchmarking now is a descendent of that prototyping - a bit more complex on searches and more fields in the schema, but same basic search logic as far as SolrJ usage. Any ideas? What else to look at? Ringing any bells? I can send more details if anyone wants specifics... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/
Re: Custom Query Processing
Anything is an option, but I think I found another way. I am going to add a new SearchComponent which reads some additional query parameters and builds the appropriate filter. On Tue, Jun 28, 2011 at 2:07 PM, Dmitry Kan dmitry@gmail.com wrote: You should modify the SolrCore for this, if I'm not mistaken. Would extending LuceneQParserPlugin (solr 1.4) be an option for you? On Tue, Jun 28, 2011 at 12:25 AM, Jamie Johnson jej2...@gmail.com wrote: I have a need to take an incoming solr query and apply some additional constraints to it on the Solr end. Our previous implementation used a QueryWrapperFilter along with some custom code to build a new Filter from the query provided. How can we plug this filter into Solr? -- Regards, Dmitry Kan
Looking for Custom Highlighting guidance
I have a schema with a text field and a text_phonetic field and would like to perform highlighting on them in such a way that the tokens that match are combined. What would be a reasonable way to accomplish this?
Re: Default schema - 'keywords' not multivalued
: The problem with TikaEntityProcessor is this installation is still running : v1.4.1 so I'll need to upgrade. : : Any short and sweet instructions for upgrading to 3.2? I have a pretty : straight forward Tomcat install, would just dropping in the new war suffice? It should be fairly straight forward, check the instructions in CHANGES.txt for any potential gotchas. I posted a writtup a while back on upgrading from 1.4 to 3.1 from a user perspective... http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/ -Hoss
Strip Punctuation From Field
From all I've read, using something like PatternReplaceFilterFactory allows you to replace / remove text in an index, but is there anything similar that allows manipulation of the text in the associated field? For example, if I pulled a status from Twitter like, Hi, this is a #hashtag. I would like to remove the # from that string and use it for both the index, and also the field value that is returned from a query, i.e., Hi, this is a hashtag.
Re: Looking for Custom Highlighting guidance
Does the phonetic analysis preserve the offsets of the original text field? If so, you should probably be able to hack up FastVectorHighlighter to do what you want. -Mike On 06/29/2011 02:22 PM, Jamie Johnson wrote: I have a schema with a text field and a text_phonetic field and would like to perform highlighting on them in such a way that the tokens that match are combined. What would be a reasonable way to accomplish this?
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote: Just now, three of the six shards had documents deleted, and they took 29.07, 27.57, and 28.66 seconds to warm. The 1.4.1 counterpart to the 29.07 second one only took 4.78 seconds, and it did twice as many autowarm queries. Can you post the logs at the INFO level that covers the warming period? -Yonik http://www.lucidimagination.com
Writing SolrPlugin example
Is there a Solr plugin example similar to Nutch's( http://wiki.apache.org/nutch/WritingPluginExample) example? I found was a SolrPlugin(http://wiki.apache.org/solr/SolrPlugins) wiki page but it didn't have any example code. It would be helpful if there was a concrete example that would explain how to write, compile and build a custom plugin. Thanks, Ravi
Re: conditionally update document on unique id
Thanks Shalin! would you not expect req.getSearcher().docFreq(t); to be slightly faster? Or maybe even req.getSearcher().getFirstMatch(t) != -1; which one should be faster, any known side effects? On Wed, Jun 29, 2011 at 1:45 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Jun 29, 2011 at 2:01 AM, eks dev eks...@yahoo.co.uk wrote: Quick question, Is there a way with solr to conditionally update document on unique id? Meaning, default, add behavior if id is not already in index and *not to touch index if already there. Deletes are not important (no sync issues). I am asking because I noticed with deduplication turned on, index-files get modified even if I update the same documents again (same signatures). I am facing very high dupes rate (40-50%), and setup is going to be master-slave with high commit rate (requirement is to reduce propagation latency for updates). Having unnecessary index modifications is going to waste effort to ship the same information again and again. if there is no standard way, what would be the fastest way to check if Term exists in index from UpdateRequestProcessor? I'd suggest that you use the searcher's getDocSet with a TermQuery. Use the SolrQueryRequest#getSearcher so you don't need to worry about ref counting. e.g. req.getSearcher().getDocSet(new TermQuery(new Term(signatureField, sigString))).size(); I intend to extend SignatureUpdateProcessor to prevent a document from propagating down the chain if this happens? Would that be a way to deal with it? I repeat, there are no deletes to make headaches with synchronization Yes, that should be fine. -- Regards, Shalin Shekhar Mangar.
Re: conditionally update document on unique id
On Wed, Jun 29, 2011 at 4:32 PM, eks dev eks...@googlemail.com wrote: req.getSearcher().getFirstMatch(t) != -1; Yep, this is currently the fastest option we have. -Yonik http://www.lucidimagination.com
Re: conditionally update document on unique id
Hi Yonik, as this recommendation comes from you, I am not going to test it, you are well known as a speed junkie ;) When we are there (in SignatureUpdateProcessor), why is this code not moved to the constructor, but remains in processAdd ... Signature sig = (Signature) req.getCore().getResourceLoader().newInstance(signatureClass); sig.init(params); ... Should we be expecting on the fly signatureClass changes / params? I am still not all that familiar with solr life cycles... might be stupid question. Thanks, eks On Wed, Jun 29, 2011 at 10:36 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Jun 29, 2011 at 4:32 PM, eks dev eks...@googlemail.com wrote: req.getSearcher().getFirstMatch(t) != -1; Yep, this is currently the fastest option we have. -Yonik http://www.lucidimagination.com
Re: After the query component has the results, can I do more filtering on them?
bump -- View this message in context: http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3123502.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting by vale of field
Thanks, Yes this is the work around I am currently doing. Still wondering is the sort method can be used alone. On 29 June 2011 18:34, Michael Ryan mr...@moreover.com wrote: You could try adding a new int field (like typeSort) that has the desired sort values. So when adding a document with type:car, also add typeSort:1; when adding type:van, also add typeSort:2; etc. Then you could do sort=typeSort asc to get them in your desired order. I think this is also possible with custom function queries, but I've never done that. -Michael
Re: After the query component has the results, can I do more filtering on them?
So I made a custom search component which runs right after the query component and this custom component will update the score of each based on some things (and no, I definitely can't use existing components). I didn't see any easy way to just update the score so what I currently do is something like this: DocList docList = rb.getResults().docList; float[] scores = new float[docList.size()]; int[] docs = new int[docList.size()]; int docCounter = 0; int maxScore = 0; while (docList.iterator().hasNext()) { int userId = docList.iterator().nextDoc(); int score = userIdsToScore.get(userId); scores[docCounter] = score; docs[docCounter] = userId; docCounter++; if (maxScore score) { maxScore = score; } } docList = new DocSlice(0, docCounter, docs, scores, 0, maxScore); my userIdsToScore hashtable is how I'm determining the new score. There are a few other things I'm doing but this is the gist. I'm also not sure how to go about sorting this...but basically my question is, is this how I should be updating the score of the documents? This way you are just updating the scores cosmetically. i.e. they are not sorted by score anymore. Plus with this approach you can only process start + rows many documents at maximum. Obtaining the whole result set is not an option. If you have some mapping like userIdsToScore, may be you can use ExternalFileField combined with FunctionQueries to influence score.
Building a facet search filter frontend in XSLT
Hi all, I am looking for some help in building a front end facet filter using XSLT. The code I use is: http://pastebin.com/xVv9La9j On the image attached, the checkbox should be selected. (You clicked and submited the facet form. The URL changed) I can use xsl:if, but there's nothing that I can use on the XML that will let me test before outputting the input checkbox. Has anyone done any similar thing? I haven't seen any examples building a facet search filter frontend in XSLT, the example.xsl that comes with solr is pretty basic, are there any other examples in XSLT implementing the facet filters around? Thanks, Filype
Re: How to Create a weighted function (dismax or otherwise)
Are there any best practices or preferred ways to accomplish what I am trying? People usually prefer multiplicative boosting. But in your case you want additive boosting. Dismax's bf is additive. There is also _val_ hook. http://wiki.apache.org/solr/SolrQuerySyntax Do the params for defType, qf and bf belong in a solr request handler? They can be defined in defaults section in request handler as well as via query parameters. q=testpf=...qf=... Is it possible to have the weights as variables so they can be tweaked till we find the optimum balance in showing our results? Yes you can try different settings on the fly using query parameters.
Re: Building a facet search filter frontend in XSLT
Hi Filype, in the response you should have a list of fq arguments something like arr name=fq strfield:facetValue/str strfield:FacetValue/str /arr use this to set your inputs to be selected / checked On 29 June 2011 23:54, Filype Pereira pereira.fil...@gmail.com wrote: Hi all, I am looking for some help in building a front end facet filter using XSLT. The code I use is: http://pastebin.com/xVv9La9j On the image attached, the checkbox should be selected. (You clicked and submited the facet form. The URL changed) I can use xsl:if, but there's nothing that I can use on the XML that will let me test before outputting the input checkbox. Has anyone done any similar thing? I haven't seen any examples building a facet search filter frontend in XSLT, the example.xsl that comes with solr is pretty basic, are there any other examples in XSLT implementing the facet filters around? Thanks, Filype
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On Wed, Jun 29, 2011 at 3:28 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote: Just now, three of the six shards had documents deleted, and they took 29.07, 27.57, and 28.66 seconds to warm. The 1.4.1 counterpart to the 29.07 second one only took 4.78 seconds, and it did twice as many autowarm queries. Can you post the logs at the INFO level that covers the warming period? OK, your filter queries have hundreds of terms in them (and that means hundreds of term lookups, which uses the term index). Thus, your termIndexInterval change is be the leading suspect for the slowdown. A termIndexInterval of 1024 means that a term lookup will seek to the closest 1024th term and then call next() until the desired term is found. Hence instead of calling next() an average of 64 times internally, it's now 512 times. Of course there is still a mystery about why your tii (which is the term index) would be so much bigger instead of smaller... -Yonik http://www.lucidimagination.com
Multicore clustering setup problem
I had set up the clusteringComponent in solrconfig.xml for my first core. It has been working fine and now I want to get my next core working. I set up the second core with the clustering component so that I could use it, use solritas properly, etc. but Solr did not like the solrconfig.xml changes for the second core. I'm getting this error when Solr is started or when I hit a Solr related URL: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.clustering.ClusteringComponent' Should the clusteringComponent be set up in a shared configuration file somehow or is there something else I am doing wrong? Thanks in advance!
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 6/29/2011 7:50 PM, Yonik Seeley wrote: OK, your filter queries have hundreds of terms in them (and that means hundreds of term lookups, which uses the term index). Thus, your termIndexInterval change is be the leading suspect for the slowdown. A termIndexInterval of 1024 means that a term lookup will seek to the closest 1024th term and then call next() until the desired term is found. Hence instead of calling next() an average of 64 times internally, it's now 512 times. Of course there is still a mystery about why your tii (which is the term index) would be so much bigger instead of smaller... It turns out I got the two indexes backwards, the smaller one was the new index. I may have mixed up the indexes on some of the other files too, but they weren't much different, so I'm not going to try and figure out where any mistakes might be. Earlier in the afternoon I figured this out, removed termIndexInterval from my config, and rebuilt the index. I had originally put this in to speed up indexing. The evidence I had available at the time told me that this goal was accomplished, but the rebuild actually went faster without the statement. Warming times are now averaging under 10 seconds even with the warmup count back up to 8. This is still slower than I would like, but it is a major improvement. Even more important, I understand what happened. I was thinking perhaps I might actually decrease the termIndexInterval value below the default of 128. I know from reading the Hathi Trust blog that memory usage for the tii file is much more than the size of the file would indicate, but if I increase it from 13MB to 26MB, it probably would still be OK. Are any index intervals for the other Lucene files configurable in a similar manner? I know that screwing too much with the defaults can make things much worse, so I would be very careful with any adjustments, and try to fully understand why any performance gain or loss occurred. Thanks, Shawn
Re: what is solr clustering component
thanks iorixxx, i changed my configuration to include clustering in search results. in my xml format search results i got a tag clusters, to show this clusters in to search results do i need to parse this xml. and my second question is does clustering effect indexes. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-solr-clustering-component-tp3121484p3124627.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what is solr clustering component
and my second question is does clustering effect indexes. No, it doesn't. Clustering is performed only on the search results produced by Solr, it doesn't change anything in the index. Cheers, Staszek
Re: Multicore clustering setup problem
Hi, Can you post the full strack trace? I'd need to know if it's really org.apache.solr.handler.clustering.ClusteringComponent that's missing or some other class ClusteringComponent depends on. Cheers, Staszek On Thu, Jun 30, 2011 at 04:19, Walter Closenfleight walter.p.closenflei...@gmail.com wrote: I had set up the clusteringComponent in solrconfig.xml for my first core. It has been working fine and now I want to get my next core working. I set up the second core with the clustering component so that I could use it, use solritas properly, etc. but Solr did not like the solrconfig.xml changes for the second core. I'm getting this error when Solr is started or when I hit a Solr related URL: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.clustering.ClusteringComponent' Should the clusteringComponent be set up in a shared configuration file somehow or is there something else I am doing wrong? Thanks in advance!