Why Jboss server is stopped due to SOLR
I am trying to connect the SOLR with Java code using URLConnection, i have deployed solr war file in jboss server(assuming server machine in some other location or remote) its working fine if no exception raises... but if any exception raises in server like connection failure its stopping the jboss client(assuming client machine) where my Java code resides. 11:49:38,345 INFO [STDOUT] [2011-10-27 11:49:38.345] class = com.dstsystems.adc.efs.rs.util.SimplePost,method = fatal(),level = SEVERE: ,message = Connection error (is Solr running at http://xx.yy.zzz:8080/solr/update ?): java.net.ConnectException: Connection refused: connect 11:49:38,361 INFO [Server] Runtime shutdown hook called, forceHalt: true 11:49:38,376 INFO [Server] JBoss SHUTDOWN: Undeploying all packages 11:49:48,018 INFO [TransactionManagerService] Stopping recovery manager 11:49:48,128 INFO [Server] Shutdown complete Shutdown complete.. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-Jboss-server-is-stopped-due-to-SOLR-tp3456903p3456903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query/Delete performance difference between straight HTTP and SolrJ
Am 26.10.2011 18:29, schrieb Shawn Heisey: For inserting, I do use a Collection of SolrInputDocuments. The delete process grabs values from idx_delete, does a query like the above (the part that's slow in Java), then if any documents are found, issues a deleteByQuery with the same string. Why do you first query for these documents? Why don't you just delete them? Solr won't harm if no documents are affected by your delete query, and you'll get the number of affected documents in your response anyway. When deleting, Solrj nearly does nothing on its own, it just sends the POST request and analyzes the simple response. The behaviour in a get request is similar. We do thousands of update, delete and get requests in a minute using Solrj without problems, your timing problems must come frome somewhere else. -Kuli
Re: Query/Delete performance difference between straight HTTP and SolrJ
Sorry, I was wrong. Am 27.10.2011 09:36, schrieb Michael Kuhlmann: and you'll get the number of affected documents in your response anyway. That's not true, you don't get the affected document count. Anyway, it's still true that you don't need to check for documents first, at least not when you don't need this information somewhere else. -Kuli
Re: DisMax search
I am searching for 9065 , so its not about case sensitivity. My search is searching across all the field names and not limiting it to one field(specified in the qf param and using deftype dismax) By saying case sensitivity, Erik was referring def*T*ype parameter itself. (not the value of query) http://wiki.apache.org/solr/CommonQueryParameters#defType
Re: Get results ordered by field content starting with specific word
--- On Wed, 10/26/11, darul daru...@gmail.com wrote: From: darul daru...@gmail.com Subject: Get results ordered by field content starting with specific word To: solr-user@lucene.apache.org Date: Wednesday, October 26, 2011, 11:36 PM I have seen many threads talking about it but not found any way on how to resolve it. In my schema 2 fields : Results are sorted by field2 desc like in the following listing when looking for word1 as query pattern: I would like to get Doc3 at the end because word1 is not at the beginning of the field content. Have you any idea ? I have seen SpanNearQuery, tried FuzzySearch with no success etc...maybe making a special QueryParserPlugin, but I am lost ;) May be you can make use of SpanFirstQuery. http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/spans/SpanFirstQuery.html However, I would insert an artificial token (e.g. BEGIN_OF_DOC) before indexing my fields. And use a phrase query or something to boost documents starts with word1. e.g. BEGIN_OF_DOC word1
Re: Optimization /Commit memory
Thanks Simon and Jay .That was helpful . So what we are looking at during optimize is 2 or 3 times free Disk Space to recreate the index. Regards Sujatha On Wed, Oct 26, 2011 at 12:26 AM, Simon Willnauer simon.willna...@googlemail.com wrote: RAM costs during optimize / merge is generally low. Optimize is basically a merge of all segments into one, however there are exceptions. Lucene streams existing segments from disk and serializes the new segment on the fly. When you optimize or in general when you merge segments you need disk space for the source segments and the targed (merged) segment. If you use CompoundFileSystem (CFS) you need to additional space once the merge is done and your files are packed into the CFS which is basically the size of the target (merged) segment. Once the merge is done lucene can free the diskspace unless you have an IndexReader open that references those segments (lucene keeps track of these files and frees diskspace once possible). That said, I think you should use optimize very very rarely. Usually if you document collection is rarely changing optimize is useful and reasonable once in a while. if you collection is constantly changing you should rely on the merge policy to balance the number of segments for you in the background. Lucene 3.4 has a nice improved TieredMergePolicy that does a great job. (previous version are also good - just saying) A commit is basically flushing the segment you have in memory (IndexWriter memory) to disk. compression ratio can be up to 30% of the ram cost or even more depending on your data. The actual commit doesn't need a notable amount of memory. hope this helps simon On Mon, Oct 24, 2011 at 7:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.gov wrote: I have not spent a lot of time researching it, but one would expect that the OS RAM requirement for optimization of an index to be minimal. My understanding is that during optimization an essentially new index is built. Once complete it switches out the indexes and will throw away the old one. (In Windows it may not throw away the old one until the next Commit). JRJ -Original Message- From: Sujatha Arun [mailto:suja.a...@gmail.com] Sent: Friday, October 21, 2011 12:10 AM To: solr-user@lucene.apache.org Subject: Re: Optimization /Commit memory Just one more thing ,when we are talking about Optimization , we are referring to HD free space for replicating the index (2 or 3 times the index size ) .what is role of RAM (OS) here? Regards Suajtha On Fri, Oct 21, 2011 at 10:12 AM, Sujatha Arun suja.a...@gmail.com wrote: Thanks that helps. Regards Sujatha On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: Well, since the OS RAM includes the JVM RAM, that is part of your requirement, yes? Aside from the JVM and normal OS requirements, all you need OS RAM for is file caching. Thus, for updates, the OS RAM is not a major factor. For searches, you want sufficient OS RAM to cache enough of the index to get the query performance you need, and to cache queries inside the JVM if you get a lot of repeat queries (see solrconfig.xml for the various caches: we have not played with them much). So, the amount of RAM necessary for that is very much dependent upon the size of your index, so I cannot give you a simple number. You seem to believe that you have to have sufficient memory to have the entire index in memory. Except where extremely high performance is required, I have not found that to be the case. This is just one of those your mileage may vary things. There is not a single answer or formula that fits every situation. JRJ -Original Message- From: Sujatha Arun [mailto:suja.a...@gmail.com] Sent: Wednesday, October 19, 2011 11:58 PM To: solr-user@lucene.apache.org Subject: Re: Optimization /Commit memory Thanks Jay , I was trying to compute the *OS RAM requirement* *not JVM RAM* for a 14 GB Index [cumulative Index size of all Instances].And I put it thus - Requirement of Operating System RAM for an Index of 14GB is - Index Size + 3 Times the maximum Index Size of Individual Instance for Optimize . That is to say ,I have several Instances ,combined Index Size is 14GB .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is 14GB +3 * 2.5 GB ~ = 22GB. Correct? Regards Sujatha On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.gov wrote: Commit does not particularly spike disk or memory usage, unless you are adding a very large number of documents between commits. A commit can cause a need to merge indexes, which can increase disk space temporarily. An optimize is *likely* to merge indexes, which will usually increase disk space temporarily. How much disk space depends very much
Re: Get results ordered by field content starting with specific word
Well, at indexed time I can not touch because we do not have data to index anymore. To use SpanFirstQuery, I need to make a custom ParserQuery ? -- View this message in context: http://lucene.472066.n3.nabble.com/Get-results-ordered-by-field-content-starting-with-specific-word-tp3455754p3457167.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Get results ordered by field content starting with specific word
Well, at indexed time I can not touch because we do not have data to index anymore. To use SpanFirstQuery, I need to make a custom ParserQuery ? If re-index is not an option, then writing custom is necessary to use SpanFirstQuery. You need to add it as an optional clause (with high boost) your whole boolean query. Also you can try and vote SOLR-839. With it it may be possible to use SpanFirstQuery. https://issues.apache.org/jira/browse/SOLR-839
Search calendar avaliability
hello, I want to filter search by calendar availability. For each document I know the days which it is not available. How could I build my fields filter the documents that are available in a range of dates? For example, a document A is available from 1-9-2011 to 5-9-2011 and is available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in between) If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA would be a match. If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA wouldn't be a match as even the start and end are avaliables there's a gap of no avaliability between them. is this possible with Solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Can dynamic fields defined by a prefix be used with LatLonType?
It appears that the solution to this is to ensure that the pattern for your component field is longer than the pattern for your dynamic parent field. This will ensure that the component field takes precedence. For example *__coordinate is longer than OBJECT_LL_* so it will take precedence. -Original Message- From: Tom Cooke [mailto:tom.co...@gossinteractive.com] Sent: 26 October 2011 20:06 To: solr-user@lucene.apache.org Subject: Can dynamic fields defined by a prefix be used with LatLonType? Hi, I'm adding support for lat/lon data into an existing schema which uses prefix-based dynamic fields e.g. OBJECT_I_*. I would like to add OBJECT_LL_* as a dynamic field for LatLonType data but it seems that the LatLonType always needs to add suffixes for the dynamically created subfields which leads to a field name being generated that not only matches the subfield suffix e.g. *_coordinate but also matches OBJECT_LL_* leading to a clash. Is there any way around this other than always using a suffix-based approach to define any dynamic fields that contain LatLonType data? Thanks, Tom Sign-up to our newsletter for industry best practice and thought leadership: http://www.gossinteractive.com/newsletter Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, Plymouth, PL1 1LG. Company Registration No: 3553908 This email contains proprietary information, some or all of which may be legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on this email. Email transmission cannot be guaranteed to be secure or error free, as information may be intercepted, corrupted, lost, destroyed, arrive late or incomplete or contain viruses. This email and any files attached to it have been checked with virus detection software before transmission. You should nonetheless carry out your own virus check before opening any attachment. GOSS Interactive Ltd accepts no liability for any loss or damage that may be caused by software viruses.
Re: help needed on solr-uima integration
(11/10/27 9:12), Xue-Feng Yang wrote: Hi, From Solr Info page, I can see my solr-uima core is there, but updateRequestProcessorChain is not there. What is the reason? Because UpdateRequestProcessor(and Chain) is not type of SolrInfoMBean. (As those classes in the page implement SolrInfoMBean, you can see them) koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Limit by score? sort by other field
When we display search results to our users we include a percentage score. Top result being 100%, then all others normalised based on the maxScore, calculated outside of Solr. We now want to limit returned docs with a percentage score higher than say, 50%. e.g. We want to search but only return docs scoring above 80%, but want to sort by date, hence not being able to just sort by score.
Re: Search calendar avaliability
what you is looking for is imho not releated to solr in special. The topic should be solr as temporal database. In your case if you have a timeline from 0 to 10 and you have two documents from 1 to 6 and 5 to 13 you can get all documents within 0 - 10 by quering document.end = 0 and document.start = 10. The greater or less equal depends on your definition of outside and inside the interval. But beware the exchanged fields end and start. Hth Per Am 27.10.2011 12:06, schrieb Anatoli Matuskova: hello, I want to filter search by calendar availability. For each document I know the days which it is not available. How could I build my fields filter the documents that are available in a range of dates? For example, a document A is available from 1-9-2011 to 5-9-2011 and is available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in between) If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA would be a match. If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA wouldn't be a match as even the start and end are avaliables there's a gap of no avaliability between them. is this possible with Solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search calendar avaliability
do your docs have daily availability ? if so you could index each doc for each day (rather than have some logic embedded in your data) so instead of doc1 (1/9/2011 - 5/9/2011) you have doc1 1/9/2011 doc1 2/9/2011 doc1 3/9/2011 doc1 4/9/2011 doc1 5/9/2011 this makes search much easier and flexible. If needed you can collapse on doc id if you need to present to the user at doc level. or group of date even. The problem you have is because you have logic and data in a field, get rid of the logic and just store the data. Cheers Lee c On 27 October 2011 12:36, Per Newgro per.new...@gmx.ch wrote: what you is looking for is imho not releated to solr in special. The topic should be solr as temporal database. In your case if you have a timeline from 0 to 10 and you have two documents from 1 to 6 and 5 to 13 you can get all documents within 0 - 10 by quering document.end = 0 and document.start = 10. The greater or less equal depends on your definition of outside and inside the interval. But beware the exchanged fields end and start. Hth Per Am 27.10.2011 12:06, schrieb Anatoli Matuskova: hello, I want to filter search by calendar availability. For each document I know the days which it is not available. How could I build my fields filter the documents that are available in a range of dates? For example, a document A is available from 1-9-2011 to 5-9-2011 and is available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in between) If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA would be a match. If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA wouldn't be a match as even the start and end are avaliables there's a gap of no avaliability between them. is this possible with Solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MoreLikeThis - To many hits
Have you tried varying mintf and mindf? Setting them higher than 1 seems like it would reduce the number of docs returned.. Best Erick On Tue, Oct 25, 2011 at 2:57 AM, vraa allanv...@gmail.com wrote: Hi I'm using the MoreLikeThis functionallity http://wiki.apache.org/solr/MoreLikeThis http://wiki.apache.org/solr/MoreLikeThis , and it works almost perfectly for my situation. But, i get to many hist, and mayby thats the hole idea of MoreLikeThis, but im gonna ask anyway. My query looks like this: /select/?q=id:11mlt=truemlt.match.include=truemlt.fl=make,model,variantmlt.mindf=1mlt.mintf=1fl=id,score,make,model,variant . The id is a Lamborghini. There are only 8 lamborghinis in my database and still i get a lot more hits. Is it possible to make it so Solr only return 8 results in this query? Which means solr must interpret the query so that there must be a hit on all of the mlt.fl. If not, then remove the last of the mlt.fl (variant) and try again. If no hits then remove model and so forth. Does it make sense? -- View this message in context: http://lucene.472066.n3.nabble.com/MoreLikeThis-To-many-hits-tp3450632p3450632.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query/Delete performance difference between straight HTTP and SolrJ
From everything you've said, it certainly sounds like a low-level I/O problem in the client, not a server slowdown of any sort. Maybe Perl is using the same connection over and over (keep-alive) and Java is not. I really don't know. One thing I've heard is that StreamingUpdateSolrServer (I think that's what it's called) can give better throughput for large request batches. If you're not using that, you may be having problems w/closing and re-opening connections? -Mike On 10/26/2011 9:56 PM, Shawn Heisey wrote: On 10/26/2011 6:16 PM, Michael Sokolov wrote: Have you checked to see when you are committing? Is the pattern the same in both instances? If you are committing after each delete request in Java, but not in Perl, that could slow things down. Due to the multihreading of delete requests, I now have the full delete down to 10-15 seconds instead of a minute or more. This is now an acceptable time, but I am completely mystified as to why the Pelr code can do it without multithreading just as fast, and often faster. The Java code is long-running, and the Perl code is started by cron. If you look back to the first message on the thread, you'll see commit messages in the Perl log, but those commits are done with the wait options set to false. That's an extra step the Java code isn't doing - and it's STILL faster.
Regarding Solr Query
I have one query regarding solr search.I have one key words like wireleess mobilty kit i need to search,I am not able to get when i am doing the search.BUt when i have manually added in synonyms.txt file like[wirelss, wireless access.etc] i am able to search the product related to this .Please help me out without giving any input in sysnonyms.txt how i able to search? Hints: already wireleess mobilty kit product already indexed in solr when i am searching. Is it checked Synonyms.txt when search is happening. Please let me know any solution for this ASAP,its an urgent requirement for me Regards, Jayanta
Re: Queries suggestion (not the suggester :P)
I've seen something like this done with an index of queries. That is, you index actual user queries in some new core where each document is a query. Then you issue the terms of the new query against this index and get back similar documents (that are really queries). You'll want to take some care about what is actually indexed, but that's an exercise for the reader. You might wind up using edismax as your request handler in order to use some of the tuning parameters for how tight/loose you want your responses to be or maybe just ORing the terms together and counting on the ranking will be OK. Best Erick On Tue, Oct 25, 2011 at 6:21 AM, Simone Tripodi simonetrip...@apache.org wrote: Hi all guys, I'm working on a search service that uses solr as search engine and the results are provided in the Atom form, containing some OpenSearch tags. What I'm interested to understand is if it is possible, via solr, having in the response some suggestions to other queries in order to enrich our opensearch info, i.e. a user submits `General Motors annual report` and solr answers the results with information to form a `General Motors annual report 2005` subset or a `General Motors` superset, so the replu can be transformed to: opensearch:Query role=request searchTerms=General Motors annual report / opensearch:Query role=subset searchTerms=General Motors annual report 2005 opensearch:Query role=superset searchTerms=General Motors / So my question is: is this possible? And if yes... how? :) Many thanks in advance, every suggestion would be really appreciated! Have a nice day, all the best, Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/
Re: Search for the single hash # character never returns results
Take a look at your admin/analysis page and put your tokens in for both index and query times. What I think you'll see is that the # is being stripped at query time due to the first PatternReplaceFilterFactory. You probably want to split your analyzers into an index-time and query-time pair and do the appropriate replacements to keep # at quer time. Best Erick On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley daniel.brad...@adfero.co.uk wrote: When running a search such as: field_name:# field_name:# field_name:\# where there is a record with the value of exactly #, solr returns 0 rows. The workaround we are having to use is to use a range query on the field such as: field_name:[# TO #] and this returns the correct documents. Use case details: We have a field that indexes a text field and calculates a letter group. This keeps only the first significant character from a value (number or letter), and if it is a number the simply stores # as we want all numbered items grouped together. I'm also aware that we could also fix this by using a specific number instead of the hash character, however, I though I'd raise this to see if there is a wider issue. I've listed some specific details below. Thanks for your time, Daniel Bradley Field definition: fieldType name=letterGrouping class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=^([a-zA-Z0-9]).* group=1/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=([0-9]) replacement=# replace=all / /analyzer /fieldType Server information: Solr Specification Version: 3.2.0 Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15 Lucene Specification Version: 3.2.0 Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
Re: DisMax and WordDelimiterFilterFactory
What happens if you change your WDDF definition in the query part of your analysis chain to NOT split on case change? Then your index should contain the right fragments (and combined words) and your queries would match. I admit I haven't thought this through entirely, but this would work for your example I think. Unfortunately I suspect it would break other cases I suspect you're in a lesser of two evils situation. But I can't imagine a 100% solution here. You're effectively asking to compensate for any fat-fingered thing a user does. Impossible I think... Best Erick On Tue, Oct 25, 2011 at 1:13 PM, Demian Katz demian.k...@villanova.edu wrote: I've seen a couple of threads related to this subject (for example, http://www.mail-archive.com/solr-user@lucene.apache.org/msg33400.html), but I haven't found an answer that addresses the aspect of the problem that concerns me... I have a field type set up like this: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The important feature here is the use of WordDelimiterFilterFactory, which allows a search for WiFi to match an indexed term of wi fi (for example). The problem, of course, is that if a user accidentally introduces a case change in their query, the query analyzer chain breaks it into multiple words and no hits are found... so a search for exaMple will look for exa mple and fail. I've found two solutions that resolve this problem in the admin panel field analysis tool: 1.) Turn on catenateWords and catenateNumbers in the query analyzer - this reassembles the user's broken word and allows a match. 2.) Turn on preserveOriginal in the query analyzer - this passes through the user's original query, which then gets cleaned up bythe ICUFoldingFilterFactory and allows a match. The problem is that in my real-world application, which uses DisMax, neither of these solutions work. It appears that even though (if I understand correctly) the WordDelimiterFilterFactory is returning ALTERNATIVE tokens, the DisMax handler is combining them a way that requires all of them to match in an inappropriate way... for example, here's partial debugQuery output for the exaMple search using Dismax and solution #2 above: parsedquery:+DisjunctionMaxQuery((genre:\(exampl exa) mple\^300.0 | title_new:\(exampl exa) mple\^100.0 | topic:\(exampl exa) mple\^500.0 | series:\(exampl exa) mple\^50.0 | title_full_unstemmed:\(example exa) mple\^600.0 | geographic:\(exampl exa) mple\^300.0 | contents:\(exampl exa) mple\^10.0 | fulltext_unstemmed:\(example exa) mple\^10.0 | allfields_unstemmed:\(example exa) mple\^10.0 | title_alt:\(exampl exa) mple\^200.0 | series2:\(exampl exa) mple\^30.0 | title_short:\(exampl exa) mple\^750.0 | author:\(example exa) mple\^300.0 | title:\(exampl exa) mple\^500.0 | topic_unstemmed:\(example exa) mple\^550.0 | allfields:\(exampl exa) mple\ | author_fuller:\(example exa) mple\^150.0 | title_full:\(exampl exa) mple\^400.0 | fulltext:\(exampl exa) mple\)) (), Obviously, that is not what I want - ideally it would be something like 'exampl OR ex ample'. I also read about the autoGeneratePhraseQueries setting, but that seems to take things way too far in the opposite direction - if I set that to false, then I get matches for any individual token; i.e. example OR ex OR ample - not good at all! I have a sinking suspicion that there is not an easy solution to my problem, but this seems to be a fairly basic need; splitOnCaseChange is a useful feature to have, but it's more
Re: solr.PatternReplaceFilterFactory AND endoffset
What does your admin/analysis page show? And how about the results with debugQuery=on? Best Erick On Wed, Oct 26, 2011 at 5:34 AM, roySolr royrutten1...@gmail.com wrote: Hi, I have some problems with the patternreplaceFilter. I can't use the worddelimiter because i only want to replace some special chars given by myself. Some example: Tottemham-hotspur (london) Arsenal (london) I want this: replace - with ( OR ) with . In the analytics i see this: position 1 term text tottemham hotspur london startOffset 0 endOffset 26 So the replacefilter works. Now i want to search tottemham hotspur london. This gives no results. position 1 term text tottemham hotspur london startOffset 0 endOffset 24 It works when i search for tottemham-hotspur (london). I think the problem is the difference in offset(24 vs 26). I need some help... -- View this message in context: http://lucene.472066.n3.nabble.com/solr-PatternReplaceFilterFactory-AND-endoffset-tp3454049p3454049.html Sent from the Solr - User mailing list archive at Nabble.com.
Faceting on multiple fields, with multiple where clauses
hi, I have the following situation: - A dropdownlist to search trips by Country - A dropdownlist to search trips by departureperiod (range/month) I want to have facetresults on these fields. When i select a value in 1 of the dropdownlists, i receive the correct numbers (facets) If Country = Belgium, then i receive the original number of trips per country and the number of trips per departuredate for Belgium. But, when i combine this search with a country and a departureperiod, then i expect to receive: - the number of trips per country in the selected departureperiod (for first dropdownlist) AND - the number of trips in the selected country (for second dropdownlist) But for some reason, i can't get the correct values when i combine these 2 filters. I receive the correct number of trips/period, but the countries aren't filtered by this period anymore. Can somebody explain me what i'm doing wrong? This is the query for the combined search: http://localhost:8080/solr/select/?facet=truefacet.date={!ex=SD}StartDatef.StartDate.facet.date.start=2011-10-1T00:00:00Zf.StartDate.facet.date.end=2012-09-30T00:00:00Zfacet.field={!ex=CC}CountryCodef.StartDate.facet.date.gap=%2B1MONTHrows=0version=2.2q={!tag=CC}CountryCode:IDq={!tag=SD}StartDate:[2011-11-01T00:00:00Z TO 2011-11-30T00:00:00Z] -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-multiple-fields-with-multiple-where-clauses-tp3457432p3457432.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting on multiple fields, with multiple where clauses
You've got two q parameters. For filtering on facet values, you're better off using fq parameters instead (and if there is no other query, set q=*:*, or if using dismax set q.alt=*:* and leave q empty/unspecified). Only one q parameter is used, but any number of fq parameters may be specified. Erik On Oct 27, 2011, at 08:09 , Rubinho wrote: hi, I have the following situation: - A dropdownlist to search trips by Country - A dropdownlist to search trips by departureperiod (range/month) I want to have facetresults on these fields. When i select a value in 1 of the dropdownlists, i receive the correct numbers (facets) If Country = Belgium, then i receive the original number of trips per country and the number of trips per departuredate for Belgium. But, when i combine this search with a country and a departureperiod, then i expect to receive: - the number of trips per country in the selected departureperiod (for first dropdownlist) AND - the number of trips in the selected country (for second dropdownlist) But for some reason, i can't get the correct values when i combine these 2 filters. I receive the correct number of trips/period, but the countries aren't filtered by this period anymore. Can somebody explain me what i'm doing wrong? This is the query for the combined search: http://localhost:8080/solr/select/?facet=truefacet.date={!ex=SD}StartDatef.StartDate.facet.date.start=2011-10-1T00:00:00Zf.StartDate.facet.date.end=2012-09-30T00:00:00Zfacet.field={!ex=CC}CountryCodef.StartDate.facet.date.gap=%2B1MONTHrows=0version=2.2q={!tag=CC}CountryCode:IDq={!tag=SD}StartDate:[2011-11-01T00:00:00Z TO 2011-11-30T00:00:00Z] -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-multiple-fields-with-multiple-where-clauses-tp3457432p3457432.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search for the single hash # character never returns results
Fantastic, thanks, yes I completely overlooked that case, separating the analysers worked a treat. Had also posted on stack overflow but the mailing list proved to be superior! Many thanks, Daniel On 27 October 2011 13:09, Erick Erickson erickerick...@gmail.com wrote: Take a look at your admin/analysis page and put your tokens in for both index and query times. What I think you'll see is that the # is being stripped at query time due to the first PatternReplaceFilterFactory. You probably want to split your analyzers into an index-time and query-time pair and do the appropriate replacements to keep # at quer time. Best Erick On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley daniel.brad...@adfero.co.uk wrote: When running a search such as: field_name:# field_name:# field_name:\# where there is a record with the value of exactly #, solr returns 0 rows. The workaround we are having to use is to use a range query on the field such as: field_name:[# TO #] and this returns the correct documents. Use case details: We have a field that indexes a text field and calculates a letter group. This keeps only the first significant character from a value (number or letter), and if it is a number the simply stores # as we want all numbered items grouped together. I'm also aware that we could also fix this by using a specific number instead of the hash character, however, I though I'd raise this to see if there is a wider issue. I've listed some specific details below. Thanks for your time, Daniel Bradley Field definition: fieldType name=letterGrouping class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=^([a-zA-Z0-9]).* group=1/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=([0-9]) replacement=# replace=all / /analyzer /fieldType Server information: Solr Specification Version: 3.2.0 Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15 Lucene Specification Version: 3.2.0 Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
Re: Faceting on multiple fields, with multiple where clauses
Hi Erik, Thank you very much. Your hint did solve the problem. Acutally, i don't understand why (i read the difference between Q and QF, but it's still not clear to me why it did'nt work with Q). But it's solved, that's the most important :) Thanks, Ruben -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-multiple-fields-with-multiple-where-clauses-tp3457432p3457569.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgratding the Index from 1.4.1 to 3.4 using replication
I don't think it'll work as I've tried this approach myself and the blocking issue was that Solr 1.4.1 use a different javabin version than Solr 3.4 (I think it's 1 vs 2) so the master and the slave(s) can't communicate using standard replication handler and thus can't exchange information and data about the index. My 2 cents. Tommaso 2011/10/26 Jaeger, Jay - DOT jay.jae...@dot.wi.gov I very much doubt that would work: different versions of Lucene involved, and Solr replication does just a streamed file copy, nothing fancy. JRJ -Original Message- From: Nemani, Raj [mailto:raj.nem...@turner.com] Sent: Wednesday, October 26, 2011 12:55 PM To: solr-user@lucene.apache.org Subject: Upgratding the Index from 1.4.1 to 3.4 using replication All, We are planning to upgrade our Solr instance from 1.4.1 to 3.4. We understand that we need to re-index all the documents given the changes to the index structure. If we setup a replication pipe with 1.4.1 as the Master and 3.4 as the salve (with an empty index) is there would the replication process convert the index from 1.4.1 format to 3.4 format? Thanks so much in advance for your time and help. Raj
RE: DisMax and WordDelimiterFilterFactory (limitations of MultiPhraseQuery)
If we change the query chain to not split on case change, then we lose half the benefit of that feature -- if a user types WiFi and the source record contains wi fi, we fail to get a hit. As you say, that may be worth considering if it comes down to picking the lesser evil, but I still think there should be a complete solution to my problem -- I'm not trying to compensate for every fat-fingered user behavior... just one specific one! Ultimately, I think my problem relates to this note from the documentation about using phrases in the SynonymFilterFactory: Phrase searching (ie: sea biscit) will cause the QueryParser to pass the entire string to the analyzer, but if the SynonymFilter is configured to expand the synonyms, then when the QueryParser gets the resulting list of tokens back from the Analyzer, it will construct a MultiPhraseQuery that will not have the desired effect. This is because of the limited mechanism available for the Analyzer to indicate that two terms occupy the same position: there is no way to indicate that a phrase occupies the same position as a term. For our example the resulting MultiPhraseQuery would be (sea | sea | seabiscuit) (biscuit | biscit) which would not match the simple case of seabiscuit occuring in a document. So I suppose I'm just running up against a fundamental limitation of Solr... but this seems like a fundamental limitation that might be worth overcoming -- I'm sure my use case is not the only one where this could matter. Has anyone given this any thought? - Demian -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, October 27, 2011 8:21 AM To: solr-user@lucene.apache.org Subject: Re: DisMax and WordDelimiterFilterFactory What happens if you change your WDDF definition in the query part of your analysis chain to NOT split on case change? Then your index should contain the right fragments (and combined words) and your queries would match. I admit I haven't thought this through entirely, but this would work for your example I think. Unfortunately I suspect it would break other cases I suspect you're in a lesser of two evils situation. But I can't imagine a 100% solution here. You're effectively asking to compensate for any fat-fingered thing a user does. Impossible I think... Best Erick On Tue, Oct 25, 2011 at 1:13 PM, Demian Katz demian.k...@villanova.edu wrote: I've seen a couple of threads related to this subject (for example, http://www.mail-archive.com/solr-user@lucene.apache.org/msg33400.html), but I haven't found an answer that addresses the aspect of the problem that concerns me... I have a field type set up like this: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The important feature here is the use of WordDelimiterFilterFactory, which allows a search for WiFi to match an indexed term of wi fi (for example). The problem, of course, is that if a user accidentally introduces a case change in their query, the query analyzer chain breaks it into multiple words and no hits are found... so a search for exaMple will look for exa mple and fail. I've found two solutions that resolve this problem in the admin panel field analysis tool: 1.) Turn on catenateWords and catenateNumbers in the query analyzer - this reassembles the user's broken word and allows a match. 2.) Turn on preserveOriginal in the query analyzer - this passes through the user's original query, which then gets cleaned up bythe
Re: Regarding Solr Query
Can you explain more what's the fieldType, what's the actual content of the field in the document. Why are you trying to use synonyms? Regards On Thu, Oct 27, 2011 at 7:55 AM, Sahoo, Jayanta jayanta.sa...@hp.comwrote: I have one query regarding solr search.I have one key words like wireleess mobilty kit i need to search,I am not able to get when i am doing the search.BUt when i have manually added in synonyms.txt file like[wirelss, wireless access.etc] i am able to search the product related to this .Please help me out without giving any input in sysnonyms.txt how i able to search? Hints: already wireleess mobilty kit product already indexed in solr when i am searching. Is it checked Synonyms.txt when search is happening. Please let me know any solution for this ASAP,its an urgent requirement for me Regards, Jayanta -- Alireza Salimi Java EE Developer
RE: Difficulties Installing Solr with Jetty 7.x
OK, so it sounds like the index.jsp welcome page setting is not the issue. That is not a big surprise. (WebSphere does not have that as a global default, but Jetty 6 certainly did, and it looks like Jetty 7 does as well). BTW, that should be /solr/admin/index.jsp, as I indicated, not /solr/admin.index/jsp as appears in your message. I am guessing that was just a typo. If that was a typo, then your issue has nothing to do with Solr proper, but instead probably means that the WAR was not properly installed into Jetty or that the release of Jetty you snagged itself has issues. All you *should* need to do is copy the solr.war file from the solr distribution into $JETTY_HOME/webapps and (perhaps) restart Jetty. But it sounds like you did that, and your logs indicate that you did that. As an aside, the web.xml originates in the WAR, and once that is deployed in Jetty it lives in $JETTY_HOME/work/Jetty.solr.war.../webapp/WEB-INF . At least in Jetty 6. In your case, in your logs, I saw: 2011-10-25 16:44:51.564:INFO:oejw.WebInfConfiguration:Extract jar:file:/var/jetty/webapps/solr.war!/ to /tmp/jetty-0.0.0.0-8080-solr.war-_-any-/webappexpanded WAR lives here. So it would live under the .../webapp directory as shown above. That exclamation point puzzles me a little, but doesn't seem to be a real issue. You should see, in the above .../webapp directory a file index.jsp, a folder admin, and a file admin/index.jsp, among other things. If those are not present, then Jetty was unable to properly extract the WAR (that seems unlikely, but worth checking). If they are there, then Jetty ought to be able to find them. My complete WAG is that the fix will lie somewhere in the contexts/ directory. Well, Jetty 6 has no such directory that I can see. So maybe that is something new to Jetty 7, and it doesn't automatically take a root context from an application from the WAR file. Could be. However, in reading http://wiki.eclipse.org/Jetty/Howto/Deploy_Web_Applications, it looks like it ought to work the same as it ever did. I do note, however, that if you pre-created a directory named solr/, that could mess things up. But a simple touch solr.war followed by a restart of Jetty ought to cause Jetty to redeploy the application. In any event, it looks to me like you might do better to post your question to the Jetty folks - if you can't get to /solr/index.jsp or /solr/admin/index.jsp, and get a 404, then that points to a Jetty related issue. JRJ -Original Message- From: Scott Vanderbilt [mailto:li...@datagenic.com] Sent: Wednesday, October 26, 2011 5:41 PM To: solr-user@lucene.apache.org Subject: Re: Difficulties Installing Solr with Jetty 7.x Jay: Thanks for the response. $JETTY_HOME/etc/webdefault.xml is the unmodified file that came with Jetty, and it has a welcome-file-list referencing index.jsp, index.html, and index.htm. Attempting to load /solr/admin.index.jsp generates a 404. All other URLs generate a 404 also, except /, which returns the Jetty test app home page. Not sure if this is useful, but that page contains the following info: This webapp is deployed in $JETTY_HOME/webapp/test and configured by $JETTY_HOME/contexts/test.xml You refer to Solr's web.xml. I have no such file, or any other config files which are Solr-specific, so far as I can tell. I followed the Solr wiki page instructions http://wiki.apache.org/solr/SolrInstall, so apart from copying the solr.war into $JETTY_HOME/webapps/, the only other thing I copied over from the Solr example distribution was the directory apache-solr-3.4.0/example/solr/ as $JETTY_HOME/solr/. My complete WAG is that the fix will lie somewhere in the contexts/ directory. I really see no other place to do Solr-specific configuration apart from $JETTY_HOME/etc/, and my intuition is that these files shouldn't be messed with unless the intention is to affect global container-wide behavior. Which I don't. I'm only trying to get Solr running. I may want to run other apps, so I'd rather leave Jetty's config files as is. On 10/26/2011 2:05 PM, Jaeger, Jay - DOT wrote: ERRATA, that should the the *SOLR* web.xml (not the Jetty web.xml) Sorry for the confusion. -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Wednesday, October 26, 2011 4:02 PM To: 'solr-user@lucene.apache.org' Subject: RE: Difficulties Installing Solr with Jetty 7.x From your logs, it looks like the Solr library is being found just fine, and that the servlet is initing OK. X-Spam-Status: No, hits=0.00 required=0.90 Does your Jetty configuration specify index.jsp in a welcome list? We had that problem in WebSphere: we got 404's the same way, and the cure was to modify the Jetty web.xml to include: welcome-file-list welcome-fileindex.jsp/welcome-file /welcome-file-list In our Solr web.xml, and submitted a JIRA on the issue (I don't have the number
Re: Limit by score? sort by other field
Hi Robert, take a look to http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html#a3219117 and http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html So will sort=date+descq={!frange l=0.85}query($qq) qq=the original relevancy query help? Best regards Karsten Original-Nachricht Datum: Thu, 27 Oct 2011 12:30:31 +0100 Von: Robert Brown r...@intelcompute.com An: solr-user@lucene.apache.org Betreff: Limit by score? sort by other field When we display search results to our users we include a percentage score. Top result being 100%, then all others normalised based on the maxScore, calculated outside of Solr. We now want to limit returned docs with a percentage score higher than say, 50%. e.g. We want to search but only return docs scoring above 80%, but want to sort by date, hence not being able to just sort by score.
Re: Limit by score? sort by other field
Sounds like a custom sorting collector would work - one that throws away docs with less than some minimum score, so that it only collects/sorts documents with some minimum score. AFAIK score is calculated even if you sort by some other field. On Oct 27, 2011, at 9:49 AM, karsten-s...@gmx.de wrote: Hi Robert, take a look to http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html#a3219117 and http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html So will sort=date+descq={!frange l=0.85}query($qq) qq=the original relevancy query help? Best regards Karsten Original-Nachricht Datum: Thu, 27 Oct 2011 12:30:31 +0100 Von: Robert Brown r...@intelcompute.com An: solr-user@lucene.apache.org Betreff: Limit by score? sort by other field When we display search results to our users we include a percentage score. Top result being 100%, then all others normalised based on the maxScore, calculated outside of Solr. We now want to limit returned docs with a percentage score higher than say, 50%. e.g. We want to search but only return docs scoring above 80%, but want to sort by date, hence not being able to just sort by score.
Re: Limit by score? sort by other field
BTW, this would be good standard feature for SOLR, as I've run into this requirement more than once. On Oct 27, 2011, at 9:49 AM, karsten-s...@gmx.de wrote: Hi Robert, take a look to http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html#a3219117 and http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html So will sort=date+descq={!frange l=0.85}query($qq) qq=the original relevancy query help? Best regards Karsten Original-Nachricht Datum: Thu, 27 Oct 2011 12:30:31 +0100 Von: Robert Brown r...@intelcompute.com An: solr-user@lucene.apache.org Betreff: Limit by score? sort by other field When we display search results to our users we include a percentage score. Top result being 100%, then all others normalised based on the maxScore, calculated outside of Solr. We now want to limit returned docs with a percentage score higher than say, 50%. e.g. We want to search but only return docs scoring above 80%, but want to sort by date, hence not being able to just sort by score.
Re: Search calendar avaliability
I don't like the idea of indexing a doc per each value, the dataset can grow a lot. I have thought that something like this could work: At indexing time, if I know the dates of no avaliability, I could gather the avaliability ones (will consider unknown as available). So, I index 4 fields aval_yes_start, aval_yes_end, aval_no_start, aval_no_end (all are multiValued) If the user ask for avaliability from $start to $end I filter like: fq=aval_yes_start:[$start TO $end]fq=aval_yes_end:[$start TO $end]fq=*-*aval_no_start:[$start TO $end]fq=*-*aval_no_end:[$start TO $end] This way I make sure start date is available, end dates too and there are no unavaliable gaps in between. As I save ranges and no concrete days the number of multiValued shouldn't grow a lot and using trie fields I think these range queries should be fast. Any better idea? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457810.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search calendar avaliability
On Thu, Oct 27, 2011 at 7:13 AM, Anatoli Matuskova anatoli.matusk...@gmail.com wrote: I don't like the idea of indexing a doc per each value, the dataset can grow a lot. What does a lot mean? How high is the sky? A million people with 3 year schedules is a billion tiny documents. That doesn't sound like such an enormous number. I have thought that something like this could work: At indexing time, if I know the dates of no avaliability, I could gather the avaliability ones (will consider unknown as available). So, I index 4 fields aval_yes_start, aval_yes_end, aval_no_start, aval_no_end (all are multiValued) If the user ask for avaliability from $start to $end I filter like: fq=aval_yes_start:[$start TO $end]fq=aval_yes_end:[$start TO $end]fq=*-*aval_no_start:[$start TO $end]fq=*-*aval_no_end:[$start TO $end] This can be done. And given that you want long stretches of availability, but what happens when a reservation is canceled? You have to coalesce intervals. That isn't impossible, but it is a pain. Would this count as premature optimization? Simply retrieving days in the range and counting gets the right answer a bit more simply. Additions and deletions and modifications all work. If you want to drive down to a resolution of seconds, the document time slot model doesn't work. But for days, it probably does.
Re: Search calendar avaliability
What does a lot mean? How high is the sky? If I have 3 milion docs I would end up with 3 milion * days avaliable This can be done. And given that you want long stretches of availability, but what happens when a reservation is canceled? You have to coalesce intervals. That isn't impossible, but it is a pain. Would this count as premature optimization? I always build the index from scratch indexing from an external datasource, getting the avaliability from there (and all the other data from a document) If you want to drive down to a resolution of seconds, the document time slot model doesn't work. But for days, it probably does. yes, the avaliability is defined per days, not per seconds. I'm trying to find the way to make this perform as better as possible. I've found this and it's interesting too: https://issues.apache.org/jira/browse/SOLR-1913 But the only way I see to use it is generate dinamic fields per month and filter using them. The problem here is that for each month I want to filter a search request, I would have to load a FieldCache.getInts and will quickly run OOM. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457899.html Sent from the Solr - User mailing list archive at Nabble.com.
How can I force the threshold for a fuzzy query?
Hi guys, I'm new to Solr (as you may guess for the subject). I'd like to force the threshold for fuzzy queries to, say, 0.7. I've read that fuzzy queries are expensive, but limiting it's threshold to a number near 1 would help. So my question is: Is this possible to configure in some of the xml configuration files? and if that's so, if I use this query: myField:myQuery~0.2 Would Solr use the configured threshold instead, preventing indeed that anyone force a minor value than what I've set in the xml file? Would it help for what I want to do? Thanks in advance!
Re: Query/Delete performance difference between straight HTTP and SolrJ
On 10/27/2011 1:36 AM, Michael Kuhlmann wrote: Why do you first query for these documents? Why don't you just delete them? Solr won't harm if no documents are affected by your delete query, and you'll get the number of affected documents in your response anyway. When deleting, Solrj nearly does nothing on its own, it just sends the POST request and analyzes the simple response. The behaviour in a get request is similar. We do thousands of update, delete and get requests in a minute using Solrj without problems, your timing problems must come frome somewhere else. -Kuli When you do a delete blind, you have to follow it up with a commit. On my larger shards containing data older than approximately one week, a commit is resource intensive and takes 10 to 30 seconds. As much as 75% of the time, there are no updates to my larger shards (10.7 million records each), most of the activity happens on the small shard with the newest data (usually under 50 records), which I call the incremental. On almost every update run, there are changes to the incremental, but doing a commit on that shard rarely takes more than a second or two. The long commit times on the larger indexes is a result of cache warming, and almost all of the time is spent warming the filter cache. The answer to the next obvious question: autowarmCount=4 on that cache, with a maximum size of 64. We are working as fast as we can on reducing the complexity and size of our filter queries. It will require significant changes in our application. Thanks, Shawn
Re: Limit by score? sort by other field
I have a similar problem except I need to filter scores that are too high. Robert Stewart bstewart...@gmail.com 於 Oct 27, 2011 7:04 AM 寫道: BTW, this would be good standard feature for SOLR, as I've run into this requirement more than once. On Oct 27, 2011, at 9:49 AM, karsten-s...@gmx.de wrote: Hi Robert, take a look to http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html#a3219117 and http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html So will sort=date+descq={!frange l=0.85}query($qq) qq=the original relevancy query help? Best regards Karsten Original-Nachricht Datum: Thu, 27 Oct 2011 12:30:31 +0100 Von: Robert Brown r...@intelcompute.com An: solr-user@lucene.apache.org Betreff: Limit by score? sort by other field When we display search results to our users we include a percentage score. Top result being 100%, then all others normalised based on the maxScore, calculated outside of Solr. We now want to limit returned docs with a percentage score higher than say, 50%. e.g. We want to search but only return docs scoring above 80%, but want to sort by date, hence not being able to just sort by score.
Re: DisMax search
Sorry my bad :(. Thanks for the help. It worked. I completely overlooked the defType. -- View this message in context: http://lucene.472066.n3.nabble.com/DisMax-search-tp3455671p3458454.html Sent from the Solr - User mailing list archive at Nabble.com.
bbox issue
I'm using the geohash field to store points for my data. When I do a bounding box like: localhost:8080/solr/select?q=point:[-45,-80%20TO%20-24,-39] I get a data point that falls outside the box: (-73.03358 -50.46815) The Spatial Search (http://wiki.apache.org/solr/SpatialSearch) page says: Exact distance calculations can be somewhat expensive and it can often make sense to use a quick approximation instead. The bbox filter is guaranteed to encompass all of the points of interest, but it may also include other points that are slightly outside of the required distance. I had sort of assumed that doing a ranged point search would just keep it to those points, but I'm getting items outside my requested range. Is there a way that I can only include items within the box via a configuration change? Worst case, I'll store a lat/long pair and do the ranged search myself, but then I'll have to reindex all my data and make some coding changes in order for it to work. Any input would be greatly appreciated! Thanks! -- Chris
Collection Distribution vs Replication in Solr
Hi guys, If we ignore the features that Replication provides ( http://wiki.apache.org/solr/SolrReplication#Features), which approach is better? Is there any performance problems with Replication? Replications seems quite easier (no special configuration, ssh setting, cron setting), while rsync is a robust protocol. Which one do you recommend? Thanks -- Alireza Salimi Java EE Developer
Re: How can I force the threshold for a fuzzy query?
I am not sure if there is such an option but you might be able to override your query parser and reset that value if it is too fuzzy. look for protected Query newFuzzyQuery(Term term, float minimumSimilarity, int prefixLength) there you can change the actual value used for minimumSimilarity simon On Thu, Oct 27, 2011 at 4:54 PM, Gustavo Falco comfortablynum...@gmail.com wrote: Hi guys, I'm new to Solr (as you may guess for the subject). I'd like to force the threshold for fuzzy queries to, say, 0.7. I've read that fuzzy queries are expensive, but limiting it's threshold to a number near 1 would help. So my question is: Is this possible to configure in some of the xml configuration files? and if that's so, if I use this query: myField:myQuery~0.2 Would Solr use the configured threshold instead, preventing indeed that anyone force a minor value than what I've set in the xml file? Would it help for what I want to do? Thanks in advance!
Re: bbox issue
On Thu, Oct 27, 2011 at 2:34 PM, Christopher Gross cogr...@gmail.com wrote: I'm using the geohash field to store points for my data. When I do a bounding box like: localhost:8080/solr/select?q=point:[-45,-80%20TO%20-24,-39] I get a data point that falls outside the box: (-73.03358 -50.46815) Is there a reason you're using geohash and not LatLonType? The SpatialSearch page is really only applicable to LatLonType - other methods are currently not supported or well tested (and geohash is not mentioned on that page, except in reference to a things in development page). -Yonik http://www.lucidimagination.com The Spatial Search (http://wiki.apache.org/solr/SpatialSearch) page says: Exact distance calculations can be somewhat expensive and it can often make sense to use a quick approximation instead. The bbox filter is guaranteed to encompass all of the points of interest, but it may also include other points that are slightly outside of the required distance. I had sort of assumed that doing a ranged point search would just keep it to those points, but I'm getting items outside my requested range. Is there a way that I can only include items within the box via a configuration change? Worst case, I'll store a lat/long pair and do the ranged search myself, but then I'll have to reindex all my data and make some coding changes in order for it to work. Any input would be greatly appreciated! Thanks! -- Chris
Re: bbox issue
True -- I found the geohash on a separate page. I was using it because it can allow for multiple points, and I was hoping to be ahead of the curve for allowing that feature for the data I'm managing. I can roll back and use the LatLon type -- but then I'm still concerned about the bounding box giving results outside the specified range. Or would I be better off just indexing a lat lon in separate fields, then making a normal numeric ranged search against them. -- Chris On Thu, Oct 27, 2011 at 3:09 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Oct 27, 2011 at 2:34 PM, Christopher Gross cogr...@gmail.com wrote: I'm using the geohash field to store points for my data. When I do a bounding box like: localhost:8080/solr/select?q=point:[-45,-80%20TO%20-24,-39] I get a data point that falls outside the box: (-73.03358 -50.46815) Is there a reason you're using geohash and not LatLonType? The SpatialSearch page is really only applicable to LatLonType - other methods are currently not supported or well tested (and geohash is not mentioned on that page, except in reference to a things in development page). -Yonik http://www.lucidimagination.com The Spatial Search (http://wiki.apache.org/solr/SpatialSearch) page says: Exact distance calculations can be somewhat expensive and it can often make sense to use a quick approximation instead. The bbox filter is guaranteed to encompass all of the points of interest, but it may also include other points that are slightly outside of the required distance. I had sort of assumed that doing a ranged point search would just keep it to those points, but I'm getting items outside my requested range. Is there a way that I can only include items within the box via a configuration change? Worst case, I'll store a lat/long pair and do the ranged search myself, but then I'll have to reindex all my data and make some coding changes in order for it to work. Any input would be greatly appreciated! Thanks! -- Chris
Re: How can I force the threshold for a fuzzy query?
Great! I didn't think there was a way to do it. I was about removing this feature from my app for that reason. I'll give your advice it a try. Thanks a lot! 2011/10/27 Simon Willnauer simon.willna...@googlemail.com I am not sure if there is such an option but you might be able to override your query parser and reset that value if it is too fuzzy. look for protected Query newFuzzyQuery(Term term, float minimumSimilarity, int prefixLength) there you can change the actual value used for minimumSimilarity simon On Thu, Oct 27, 2011 at 4:54 PM, Gustavo Falco comfortablynum...@gmail.com wrote: Hi guys, I'm new to Solr (as you may guess for the subject). I'd like to force the threshold for fuzzy queries to, say, 0.7. I've read that fuzzy queries are expensive, but limiting it's threshold to a number near 1 would help. So my question is: Is this possible to configure in some of the xml configuration files? and if that's so, if I use this query: myField:myQuery~0.2 Would Solr use the configured threshold instead, preventing indeed that anyone force a minor value than what I've set in the xml file? Would it help for what I want to do? Thanks in advance!
Re: bbox issue
On Thu, Oct 27, 2011 at 3:22 PM, Christopher Gross cogr...@gmail.com wrote: I can roll back and use the LatLon type -- but then I'm still concerned about the bounding box giving results outside the specified range. The implementation of things like bbox are intimately tied to the field type (i.e. normally completely different code). LatLon bbox should work fine, but please let us know if it doesn't! -Yonik http://www.lucidimagination.com
Re: Get results ordered by field content starting with specific word
Meaning I need to implement my own QueryParser ? -- View this message in context: http://lucene.472066.n3.nabble.com/Get-results-ordered-by-field-content-starting-with-specific-word-tp3455754p3459064.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: questions about autocommit committing documents
While sending documents with SolrJ Http API...at the end, I am never sure documents are indexed. I would like to store them somewhere and resend them in case commit has failed. If commit occurred every 10 minutes for example, and 100 documents are waiting to be commit, server crash or stop..this 100 documents won't be indexed later because my business logic won't send them again... Then I would like create a Job (cron) which look into a table or somewhere for documents which may not have been indexed due a problem occurred during commit process. -- View this message in context: http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p3459089.html Sent from the Solr - User mailing list archive at Nabble.com.
changing omitNorms on an already built index
So Solr 1.4. I decided I wanted to change a field to have omitNorms=true that didn't previously. So I changed the schema to have omitNorms=true. And I reindexed all documents. But it seems to have had absolutely no effect. All relevancy rankings seem to be the same. Now, I could have a mistake somewhere else, maybe I didn't do what I thought. But I'm wondering if there are any known issues related to this, is there something special you have to do to change a field from omitNorms=false to omitNorms=true on an already built index? Other than re-indexing everything?Any known issues relevant here? Thanks for any help, Jonathan
Re: changing omitNorms on an already built index
As far as I know there's no issue about this. You have to reindex and that's it. In which kind of field are you changing the norms? (You just will see changes in text fields) Using debugQuery=true you can see how norms affect the score (in case you have them not omited) -- View this message in context: http://lucene.472066.n3.nabble.com/changing-omitNorms-on-an-already-built-index-tp3459132p3459169.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collection Distribution vs Replication in Solr
Replication is easier to manage and a bit faster. See the performance numbers: http://wiki.apache.org/solr/SolrReplication -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-Distribution-vs-Replication-in-Solr-tp3458724p3459178.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collection Distribution vs Replication in Solr
I can't see those benchmarks, can you? On Thu, Oct 27, 2011 at 5:20 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Replication is easier to manage and a bit faster. See the performance numbers: http://wiki.apache.org/solr/SolrReplication -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-Distribution-vs-Replication-in-Solr-tp3458724p3459178.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alireza Salimi Java EE Developer
Passing system parameters to solr at runtime
I've been given the project of setting up a CentOS-based solr replication slave for a project here at work. I think it's configured correctly, and replication seems to be happening correctly. I've got some CentOS experience, but I'm having to get up to speed on Solr in a short period of time. The guy who was working on this piece of the project is no longer available and I'm not sure he knew what he was doing anyway. The main problem I'm having is that the project lead wants to make sure the slaves have master disabled for all cores without changing the solrconfig.xml for every core. This link seems to apply to what I'm trying to do: http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node My first question is this: In the tomcat/solr implementation I'm using, I can't tell where/how to pass system parameters (-Denable.master=false) to solr. I've found how to pass this sort of thing into tomcat, but it doesn't seem like this is the same thing. Next question: The link references setting these properties in a solrcore.properties file. I've created the file and landed it next to the applicable solrconfig.xml but it doesn't seem to apply it's settings. Both the master and the slave node are on solr v3.3 The master is running on windows server 2008 r2, and was set up well before my involvement The slave is running CentOS 6.0 Thanks for reading. I'm happy to provide more info as needed.
Re: data import in 4.0
Two things: 1 Look at http://wiki.apache.org/solr/DataImportHandler, the interactive Development Mode section. There's a page that helps you debug this kind of thing. But I suspect your SQL is not correct. You should be able to form a single SQL query that does what you want, something like (and I haven't tested this and my SQL is rusty) SELECT ID, Status, Title, last_name FROM project, person where project.pi_pid = person.pid 2 please start a new thread when changing the subject, from hossman's apache page: When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See:http://people.apache.org/~hossman/#threadhijack Best Erick On Wed, Oct 26, 2011 at 10:46 AM, Adeel Qureshi adeelmahm...@gmail.com wrote: Any comments .. please I am able to do the bulkimport without nested query but with nested query it just keeps working on it and never seems to end .. I would appreciate any help Thanks Adeel On Sat, Oct 22, 2011 at 11:12 AM, Adeel Qureshi adeelmahm...@gmail.comwrote: yup that was it .. my data import files version was not the same as solr war .. now I am having another problem though I tried doing a simple data import document entity name=p query=SELECT ID, Status, Title FROM project field column=ID name=id / field column=Status name=status_s / field column=Title name=title_t / /entity /document simple in terms of just pulling up three fields from a table and adding to index and this worked fine but when I add a nested or joined table .. document entity name=project query=SELECT ID, Status, Title FROM project field column=ID name=id / field column=Status name=status_s / field column=Title name=title_t / entity name=related query=select last_name FROM person per inner join project proj on proj.pi_pid = per.pid where proj.ID = ${project.ID} field column=last_name name=pi_s / /entity /entity /document this data import doesnt seems to end .. it just keeps going .. i only have about 15000 records in the main table and about 22000 in the joined table .. but the Fetch count in dataimport handler status indicator thing shows that it has fetched close to half a million records or something .. i m not sure what those records are .. is there a way to see exactly what queries are being run by dataimport handler .. is there something wrong with my nested query .. THanks Adeel On Fri, Oct 21, 2011 at 3:05 PM, Alireza Salimi alireza.sal...@gmail.comwrote: So to me it heightens the probability of classloader conflicts, I haven't worked with Solr 4.0, so I don't know if set of JAR files are the same with Solr 3.4. Anyway, make sure that there is only ONE instance of apache-solr-dataimporthandler-***.jar in your whole tomcat+webapp. Maybe you have this jar file in CATALINA_HOME\lib folder. On Fri, Oct 21, 2011 at 3:06 PM, Adeel Qureshi adeelmahm...@gmail.com wrote: its deployed on a tomcat server .. On Fri, Oct 21, 2011 at 12:49 PM, Alireza Salimi alireza.sal...@gmail.comwrote: Hi, How do you start Solr, through start.jar or you deploy it to a web container? Sometimes problems like this are because of different class loaders. I hope my answer would help you. Regards On Fri, Oct 21, 2011 at 12:47 PM, Adeel Qureshi adeelmahm...@gmail.com wrote: Hi I am trying to setup the data import handler with solr 4.0 and having some unexpected problems. I have a multi-core setup and only one core needed the dataimport handler so I have added the request handler to it and added the lib imports in config file lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / lib dir=../../dist/ regex=apache-solr-dataimporthandler-extras-\d.*\.jar / for some reason this doesnt works .. it still keeps giving me ClassNoFound error message so I moved the jars files to the shared lib folder and then atleast I was able to see the admin screen with the dataimport plugin loaded. But when I try to do the import its throwing this error message INFO: Starting Full Import Oct 21, 2011 11:35:41 AM org.apache.solr.core.SolrCore execute INFO: [DW] webapp=/solr path=/select params={command=statusqt=/dataimport} status=0 QTime=0 Oct 21, 2011 11:35:41 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties WARNING: Unable to read: dataimport.properties Oct 21, 2011 11:35:41 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.lang.NoSuchMethodError:
Re: solr break up word
Hmmm, I'm not sure what happens when you specify analyzer (without type=index and analyzer type=query. I have no clue which one is used. Look at the admin/analysis page to understand how things are broken up. Did you re-index after you added the ngram filter? You'll get better help if you include example queries with debugQuery=on appended, it'll give us a lot more to work with. Best Erick On Wed, Oct 26, 2011 at 4:14 PM, Boris Quiroz boris.qui...@menco.it wrote: Hi, I've solr running on a CentOS server working OK, but sometimes my application needs to index some parts of a word. For example, if I search 'dislike' word fine but if I search 'disl' it returns zero. Also, if I search 'disl*' returns some values (the same if I search for 'dislike') but if I search 'dislike*' it returns zero too. So, I've two questions: 1. How exactly the asterisk works as a wildcard? 2. What can I do to index properly parts of a word? I added this lines to my schema.xml: fieldType name=text class=solr.TextField omitNorms=false analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.NGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType But I can't get it to work. Is OK what I did or I'm wrong? Thanks. -- Boris Quiroz boris.qui...@menco.it
Re: changing omitNorms on an already built index
we are not actively removing norms. if you set omitNorms=true and index documents they won't have norms for this field. Yet, other segment still have norms until they get merged with a segment that has no norms for that field ie. omits norms. omitNorms is anti-viral so once you set it to true it will be true for other segment eventually. If you optimize you index you should see that norms go away. simon On Thu, Oct 27, 2011 at 11:17 PM, Marc Sturlese marc.sturl...@gmail.com wrote: As far as I know there's no issue about this. You have to reindex and that's it. In which kind of field are you changing the norms? (You just will see changes in text fields) Using debugQuery=true you can see how norms affect the score (in case you have them not omited) -- View this message in context: http://lucene.472066.n3.nabble.com/changing-omitNorms-on-an-already-built-index-tp3459132p3459169.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: changing omitNorms on an already built index
On Thu, Oct 27, 2011 at 6:00 PM, Simon Willnauer simon.willna...@googlemail.com wrote: we are not actively removing norms. if you set omitNorms=true and index documents they won't have norms for this field. Yet, other segment still have norms until they get merged with a segment that has no norms for that field ie. omits norms. omitNorms is anti-viral so once you set it to true it will be true for other segment eventually. If you optimize you index you should see that norms go away. This is only true in trunk (4.x!) https://issues.apache.org/jira/browse/LUCENE-2846 -- lucidimagination.com
Re: Query/Delete performance difference between straight HTTP and SolrJ
On 10/27/2011 5:56 AM, Michael Sokolov wrote: From everything you've said, it certainly sounds like a low-level I/O problem in the client, not a server slowdown of any sort. Maybe Perl is using the same connection over and over (keep-alive) and Java is not. I really don't know. One thing I've heard is that StreamingUpdateSolrServer (I think that's what it's called) can give better throughput for large request batches. If you're not using that, you may be having problems w/closing and re-opening connections? Although I can't claim to know for sure, I'm fairly sure that the simple LWP classes I'm using don't do keepalive unless you specifically configure the user agent to do so. I'll look into it some more. The StreamingUpdateSolrServer says that they only recommend using it with the /update handler, not for queries. I'm not having a problem with the deletes themselves, they go pretty fast. It's all of the queries before each delete that are relatively slow. Doing those queries really adds up. With multithreading, it does all the shards at once, but it still can only query for a limited number of values at a time due to maxBooleanClauses. Now I'm checking and deleting 1000 values at a time, on all shards simultanously. I use CommonsHttpSolrServer, and each of those objects is created only once, when the program first starts up. I figure there are three possibilities: 1) A glaring inefficiency in CommonsHttpSolrServer queries as compared to a straight HTTP POST request. 2) The compartmentalization provided by the virtual machine architecture creates an odd synergy that is not present when there are only two Solr instances on physical machines instead of eight of them (seven shards plus a search broker) on virtual machines. 3) The extra physical memory on the servers with virtualization is granting more of a disk-cache-related performance improvement than the lack of virtualization on the others. Only the first of those possible problems is something that can be determined or fixed without migrating the other servers to my new system. I'm having one other problem with the new build program. I haven't figured out exactly what that problem is, so I am very reluctant to switch everything over. So far it seems to be related to the MySQL JDBC connector or my attempt at threading, not Solr. I mentioned that the hardware is identical except for memory. That's not quite true - the servers accessed by the java program are better. One of them has a slightly faster CPU than its counterpart with virtualization, and they all have 1TB hard drives as opposed to the mixed 500GB 750GB drives in the other servers. All of the servers are Dell 2950 with six-drive RAID10 arrays.
Re: joins and filter queries effecting scoring
Does anyone have any idea on this issue? On Tue, Oct 25, 2011 at 11:40 AM, Jason Toy jason...@gmail.com wrote: Hi Yonik, Without a Join I would normally query user docs with: q=data_text:testfq=is_active_boolean:true With joining users with posts, I get no no results: q={!join from=self_id_i to=user_id_i}data_text:testfq=is_active_boolean:truefq=posts_text:hello I am able to use this query, but it gives me the results in an order that I dont want(nor do I understand its order): q={!join from=self_id_i to=user_id_i}data_text:test AND is_active_boolean:truefq=posts_text:hello I want the order to be the same as I would get from my original q=data_text:testfq=is_active_boolean:true, but with the ability to join with the Posts docs. On Tue, Oct 25, 2011 at 11:30 AM, Yonik Seeley yo...@lucidimagination.com wrote: Can you give an example of the request (URL) you are sending to Solr? -Yonik http://www.lucidimagination.com On Mon, Oct 24, 2011 at 3:31 PM, Jason Toy jason...@gmail.com wrote: I have 2 types of docs, users and posts. I want to view all the docs that belong to certain users by joining posts and users together. I have to filter the users with a filter query of is_active_boolean:true so that the score is not effected,but since I do a join, I have to move the filter query to the query parameter so that I can get the filter applied. The problem is that since the is_active_boolean is moved to the query, the score is affected which returns back an order that I don't want. If I leave the is_active_boolean:true in the fq paramater, I get no results back. My question is how can I apply a filter query to users so that the score is not effected? -- - sent from my mobile -- - sent from my mobile
Re: Search for the single hash # character never returns results
NP. By the way, kudos for posting enough information to diagnose the problem first time round! Erick On Thu, Oct 27, 2011 at 8:46 AM, Daniel Bradley daniel.brad...@adfero.co.uk wrote: Fantastic, thanks, yes I completely overlooked that case, separating the analysers worked a treat. Had also posted on stack overflow but the mailing list proved to be superior! Many thanks, Daniel On 27 October 2011 13:09, Erick Erickson erickerick...@gmail.com wrote: Take a look at your admin/analysis page and put your tokens in for both index and query times. What I think you'll see is that the # is being stripped at query time due to the first PatternReplaceFilterFactory. You probably want to split your analyzers into an index-time and query-time pair and do the appropriate replacements to keep # at quer time. Best Erick On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley daniel.brad...@adfero.co.uk wrote: When running a search such as: field_name:# field_name:# field_name:\# where there is a record with the value of exactly #, solr returns 0 rows. The workaround we are having to use is to use a range query on the field such as: field_name:[# TO #] and this returns the correct documents. Use case details: We have a field that indexes a text field and calculates a letter group. This keeps only the first significant character from a value (number or letter), and if it is a number the simply stores # as we want all numbered items grouped together. I'm also aware that we could also fix this by using a specific number instead of the hash character, however, I though I'd raise this to see if there is a wider issue. I've listed some specific details below. Thanks for your time, Daniel Bradley Field definition: fieldType name=letterGrouping class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=^([a-zA-Z0-9]).* group=1/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=([0-9]) replacement=# replace=all / /analyzer /fieldType Server information: Solr Specification Version: 3.2.0 Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15 Lucene Specification Version: 3.2.0 Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
Re: Faceting on multiple fields, with multiple where clauses
Hmmm, this may be one of those things that's so ingrained it's not mentioned. Certainly the CommonQueryParameters page never explicitly says that there can only be one q parameter But the problem is how would multiple q params be combined? An implied AND? OR? NOT? the syntax would be a mess The rule for fq is that they are intersections, that is an implied AND. Also, the results of fq clauses can be cached. And fqs don't contribute to the scores of documents, they just contribute a yes/no. FWIW Erick On Thu, Oct 27, 2011 at 9:03 AM, Rubinho ru...@gekiere.com wrote: Hi Erik, Thank you very much. Your hint did solve the problem. Acutally, i don't understand why (i read the difference between Q and QF, but it's still not clear to me why it did'nt work with Q). But it's solved, that's the most important :) Thanks, Ruben -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-multiple-fields-with-multiple-where-clauses-tp3457432p3457569.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: help needed on solr-uima integration
Thanks Koji, I finally found a method not found error in SOLR 3.4. The method resolveUpdateChainParam(SolrParams params, org.slf4j.Logger log) is not in the class org.apache.solr.util.SolrPluginUtils. It was very strange there were no errors message. I found the problems after loaded source code to eclipse. Then I checked both SOLR 4.0 and 3.5. Both have this method and again strange to me it is deprecated. When tried 4.0, the number of errors was showed in new admin page but no details. When I tried, I met the following attached errors. Xue-Feng message null java.lang.NullPointerException at org.apache.solr.uima.processor.SolrUIMAConfigurationReader.readAEOverridingParameters(SolrUIMAConfigurationReader.java:101) at org.apache.solr.uima.processor.SolrUIMAConfigurationReader.readSolrUIMAConfiguration(SolrUIMAConfigurationReader.java:42) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:44) at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:74) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:199) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1369) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:217) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:279) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:655) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:595) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:98) at com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:91) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:162) at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:330) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231) at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:174) at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:828) at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:725) at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1019) at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:225) at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137) at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104) at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90) at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79) at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54) at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59) at com.sun.grizzly.ContextTask.run(ContextTask.java:71) at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532) at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513) at java.lang.Thread.run(Thread.java:662) From: Koji Sekiguchi k...@r.email.ne.jp To: solr-user@lucene.apache.org Sent: Thursday, October 27, 2011 7:25:09 AM Subject: Re: help needed on solr-uima integration (11/10/27 9:12), Xue-Feng Yang wrote: Hi, From Solr Info page, I can see my solr-uima core is there, but updateRequestProcessorChain is not there. What is the reason? Because UpdateRequestProcessor(and Chain) is not type of SolrInfoMBean. (As those classes in the page implement SolrInfoMBean, you can see them) koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Re: Passing system parameters to solr at runtime
Would it be acceptable to change a central slave config? Because it's possible to have the replication process distribute solrconfig.xml files to the slaves that are different from the master. That way, your master has it's own solrconfig.xml, and a solrconfig_slave.xml in the conf directory. At replication, the solrconfig_slave.xml is what's sent to the slave as solrconfig.xml, presumably this file has the whole master thing removed. See: http://wiki.apache.org/solr/SolrReplication, the replicating solrconfig.xml section which is another way of saying that I have no clue how to do what you asked, but this solution seems like it might do. Best Erick On Thu, Oct 27, 2011 at 5:34 PM, Michael Dodd md...@vocus.com wrote: I've been given the project of setting up a CentOS-based solr replication slave for a project here at work. I think it's configured correctly, and replication seems to be happening correctly. I've got some CentOS experience, but I'm having to get up to speed on Solr in a short period of time. The guy who was working on this piece of the project is no longer available and I'm not sure he knew what he was doing anyway. The main problem I'm having is that the project lead wants to make sure the slaves have master disabled for all cores without changing the solrconfig.xml for every core. This link seems to apply to what I'm trying to do: http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node My first question is this: In the tomcat/solr implementation I'm using, I can't tell where/how to pass system parameters (-Denable.master=false) to solr. I've found how to pass this sort of thing into tomcat, but it doesn't seem like this is the same thing. Next question: The link references setting these properties in a solrcore.properties file. I've created the file and landed it next to the applicable solrconfig.xml but it doesn't seem to apply it's settings. Both the master and the slave node are on solr v3.3 The master is running on windows server 2008 r2, and was set up well before my involvement The slave is running CentOS 6.0 Thanks for reading. I'm happy to provide more info as needed.
Re: help needed on solr-uima integration
Thanks Koji, I finally found a method not found error in SOLR 3.4. The method resolveUpdateChainParam(SolrParams params, org.slf4j.Logger log) is not in the class org.apache.solr.util.SolrPluginUtils. It was very strange there were no errors message. I found the problems after loaded source code to eclipse. Then I checked both SOLR 4.0 and 3.5. Both have this method and again strange to me it is deprecated. When tried 4.0, the number of errors was showed in new admin page but no details. When I tried 3.5, I met the following attached errors. Xue-Feng message null java.lang.NullPointerException at org.apache.solr.uima.processor.SolrUIMAConfigurationReader.readAEOverridingParameters(SolrUIMAConfigurationReader.java:101) at org.apache.solr.uima.processor.SolrUIMAConfigurationReader.readSolrUIMAConfiguration(SolrUIMAConfigurationReader.java:42) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:44) at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:74) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:199) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1369) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:217) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:279) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:655) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:595) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:98) at com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:91) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:162) at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:330) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231) at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:174) at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:828) at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:725) at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1019) at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:225) at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137) at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104) at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90) at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79) at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54) at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59) at com.sun.grizzly.ContextTask.run(ContextTask.java:71) at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532) at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513) at java.lang.Thread.run(Thread.java:662) From: Koji Sekiguchi k...@r.email.ne.jp To: solr-user@lucene.apache.org Sent: Thursday, October 27, 2011 7:25:09 AM Subject: Re: help needed on solr-uima integration (11/10/27 9:12), Xue-Feng Yang wrote: Hi, From Solr Info page, I can see my solr-uima core is there, but updateRequestProcessorChain is not there. What is the reason? Because UpdateRequestProcessor(and Chain) is not type of SolrInfoMBean. (As those classes in the page implement SolrInfoMBean, you can see them) koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Re: changing omitNorms on an already built index
Well, this could be explained if your fields are very short. Norms are encoded into (part of?) a byte, so your ranking may be unaffected. Try adding debugQuery=on and looking at the explanation. If you've really omitted norms, I think you should see clauses like: 1.0 = fieldNorm(field=features, doc=1) in the output, never something like 0.25 = fieldNorm(field=features, doc=1) i.e. in the absence of norm information, 1 is used. Also, in your index, see if your *.nrm files change in size. And I recommend that, when you're experimenting, you remove your entire solr home/data/index directory (the directory too, not just sub-files) before re-indexing. As Simon and Robert say, eventually the norm data will be purged, but by removing the directory first, you can look at things like the .nrm file with confidence that you're not seeing remnants that haven't been cleaned up quite yet. Best Erick On Thu, Oct 27, 2011 at 5:00 PM, Jonathan Rochkind rochk...@jhu.edu wrote: So Solr 1.4. I decided I wanted to change a field to have omitNorms=true that didn't previously. So I changed the schema to have omitNorms=true. And I reindexed all documents. But it seems to have had absolutely no effect. All relevancy rankings seem to be the same. Now, I could have a mistake somewhere else, maybe I didn't do what I thought. But I'm wondering if there are any known issues related to this, is there something special you have to do to change a field from omitNorms=false to omitNorms=true on an already built index? Other than re-indexing everything? Any known issues relevant here? Thanks for any help, Jonathan
Re: Analyzers from schema.xml with custom parser
You've really got to give a lot more information about what you're trying to do here, what you've tried and what you mean by associate. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Wed, Oct 26, 2011 at 6:29 PM, Milan Dobrota mi...@milandobrota.com wrote: I created a custom plugin parser, and it seems like it is ignoring analyzers from schema.xml. Is there any way to associate the two?
Applying hl.requireFieldMatch to groups of fields
I am trying to highlight FieldA when a user searches on either FieldA or FieldB, but I do not want to highlight FieldA when a user searches on FieldC. To explain further: I have a field named content and a field named contentCS. The content field is a stored text field that uses LowerCaseFilterFactory (i.e., case-insensitive). The contentCS field is a copy of the content field, but is not stored and does not use LowerCaseFilterFactory (i.e., case-sensitive). My query looks like q=...fl=contenthl.requireFieldMatch=truehl.fl=content. I use requireFieldMatch because I do not want certain other things I put in the query to be highlighted in the content field. When I search on either the content or the contentCS fields, I would like the content field to be highlighted. But when searching on any other fields, I do not want the terms for those fields to be highlighted in the content field. I was thinking I could hack this into DefaultSolrHighlighter, QueryTermScorer, and QueryTermExtractor. Perhaps the syntax could look like hl.content.useMatchesFromTheseFields=content,contentCS, and then I would pass an array of field names down into QueryTermExtractor. Anyone have any tips/comments on this? I never looked at the highlighting code before, so not sure what I'm getting myself into... -Michael