Cross-context-forward to solr-instance
Hi, yesterday I tried the Solr-1.3-RC2 and everything seems to work fine using the traditional single-core setup. But while troubleshooting the new multi-core feature, I realized for the first time, that I have been using the deprecated (even in 1.2) class SolrServlet. This is a huge problem for us, as we run the solr-web-app parallel to our main web-app in the same servlet-container. Using this approach we can internally forward update- and select-requests to the Solr-instance currently in use. ServletContext ctx = getServletContext().getContext(solr1); RequestDispatcher rd = ctx.getNamedDispatcher(SolrServer); rd.forward(request, response); As you can see, this approach only works for the servlet named 'SolrServer' which references the deprecated class. The attempt of using a path based dispatcher (ctx.getRequestDispatcher) was not successful, even though I configured the SolrRequestFilter in the solr-web.xml to work on forwards (dispatcherFORWARD/dispatcher), which the documentation discourages. Maybe this is because of the cross-context-dispatch? At the moment I ran totally out of ideas, apart from completely redesigning our whole setup. Any ideas are highly appreciated. Thanks in advance, Björn
Re: Replacing FAST functionality at sesam.no
but Mick Semb Wever will be taking over this job for the next two weeks. back from holidays and taking over where Glenn-Erik left. i'm very new to Solr so please bear with me, i'll run through our setup from scratch. Our test list has 9 entries: abcd efgh ijkl, abcd efgh, efgh ijkl, abcd, efgh, ijkl, ijkl efgh, efgh abcd, and ijkl efgh abcd. I'm using a trunk build of Solr, and using the example/solr for the solr home. Editing schema.xml so to put these entries in as type=string and using defaultOperator=OR gives the expected exact matching functionality given queries are quoted, eg /solr/select/?q=abcd efgh ijkl So then i change type=string to type=shingleString along with fieldType name=shingleString class=solr.StrField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory outputUnigrams=true outputUnigramIfNoNgram=true maxShingleSize=99 / /analyzer /fieldType I never get any hits with quoted queries. Without quotes i only get the unigrams. I get the same outcomes using: [EMAIL PROTECTED]solr.TextField and in the index analyzer [EMAIL PROTECTED]solr.KeywordTokenizerFactory. In fact the ShingleFilter does nothing at all here, commenting the filter line out leads exactly the same behaviour. What am i missing to get shingles actually matching the indexed entries? It seems to be if this was solved it would work without having to use quoted queries. I have been using the analysis.jsp tool Everything looks good except that quotes are captured into the words and shingles, eg term position 12 3 term text abcdefghijkl abcdefgh efgh ijkl abcd efgh ijkl This would explain why quoted queries are not working - the ShingleFilter produces tokens with the character in it. But here i would have atleast expected a hit against efgh ~mck -- He who joyfully marches to music in rank and file has already earned my contempt. He has been given a large brain by mistake, since for him the spinal cord would suffice. Albert Einstein | semb.wever.org | sesat.no | sesam.no | signature.asc Description: This is a digitally signed message part
UpdateRequestProcessorFactory / Chain etc
Trying to build a simple UpdateRequestProcessor that keeps a field (the time of original index) when overwriting a document. 1) Can I make a updateRequestProcessor chain only work as a certain handler or does putting the following in my solrconfig.xml: updateRequestProcessorChain processor class=myspecial.KeepIndexedDateFactory processor class=solr.RunUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / /updateRequestProcessorChain Just handle all document updates? 2) Does a UpdateRequestProcessor support inform ?
Re: UpdateRequestProcessorFactory / Chain etc
Answered my own qs, I think: Trying to build a simple UpdateRequestProcessor that keeps a field (the time of original index) when overwriting a document. 1) Can I make a updateRequestProcessor chain only work as a certain handler or does putting the following in my solrconfig.xml: updateRequestProcessorChain processor class=myspecial.KeepIndexedDateFactory processor class=solr.RunUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / /updateRequestProcessorChain Just handle all document updates? What you have to do is: requestHandler name=/update2 class=solr.XmlUpdateRequestHandler lst name=invariants str name=update.processorKeepIndexed/str /lst /requestHandler updateRequestProcessorChain name=KeepIndexed processor class=myspecial.KeepIndexedDateFactory/ processor class=solr.RunUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / /updateRequestProcessorChain And then calls to /update2 will go through the chain. Calls to /update will not. 2) Does a UpdateRequestProcessor support inform ? No, not that I can tell. And the factory won't get instantiated until the first time you use it.
Re: Faceting MoreLikeThisComponent results
: When using the MoreLikeThisHandler with facets turned on, the facets show : counts of things that are more like my original document. When I use the : MoreLikeThisComponent, the facets show counts of things that match my : original document (I'm querying by document ID), so there is only one ... : How can I facet the results of the MoreLikeThisComponent? I don't think you can at this point. The good news is MoreLikeThisHandler isn't getting removed anytime soon. What we need to do is provide more options on the componets to dictate their behavior when deciding what to process and how to return it ... your example could be solved be either adding an option to MLTComponent telling it to overwrite hte main result set; or by adding an option to FacetComponent specifying the name of a DocSet in the response to use in it's intersections. I think it would be good to do both. (HighlightComponent should probably also have an option just like the one i discribed for FacetComponent) Would you mind filing a feature request? -Hoss
Re: scoring individual values in a multivalued field
: I have a multivalued field that I would want to score individually for each : value. Is there an easy way to do that? Lucene-Java has a (somewhat new) feature called Payloads which allows for things like this built arround the idea that when indexing, any Token cn contain an arbitrary data payload which is persisted along with the TermPosition info in the index -- At query time, different types of queries can use/abuse that payload anyway they want. Currently payload support in Solr is somewhat limited. If you have a custom Analyzer or Tokenizer/TokenFilter that knows about Payloads, they will make it into the index, but you would need to write a custom Similiarty and QParserPlugin to take advantage of it (there's already a BoostingTermQuery in Lucene that you can leverage) Payloads is a really powerful feature, but the fact that it can be used in *so* many different ways is probably the biggest reasons why Solr doesn't have any features yet to make payloads easier to use just via configuration. At the moment, the simplest mechanisms for achieving something like what you are describing that i know of are: 1) repetitive values. Add a value twice to make it counnt (roughly) twice as much. (eliminating lengthNorm and customing your Similarity is neccessary to make it worth exactly twice as much) 2) differnet fields. Partition the spectrum of importance for your values into N buckets, make a field for each bucket, put the value in the bucket that makes the most sense, and at query time query ofr each bucket with a differnet query time boost. : 2) the value of normField is persisted as a byte in the index and the : precision loss hurts. for a field like what you are describing, you'll probably want to omitNorms completley just to make sure docs with lots of values aren't penalized. -Hoss
Re: scoring individual values in a multivalued field
: I ran into the same problem some time ago, couldn't find any relation to the : boost values on the multivalued field and the search results. Does anybody as the OP mentioned, the index time boost values for a field are per field *name* not per value ... they all get folded in together into hte fieldNorm for that field name in that document. -Hoss
Re: Synonyms and stemming revisited
: I see two solutions: : : Either put all possible endings in the synonym file - I do not really : like this solution, as it would make the file very large, and it also : is too easy to miss some specific ending. Or run the stemmer before : the synonym filter, in which case the synonym definitions need to : appear in their stemmed forms. Am I missing something, or does the Based on my understanding of your description of your problem, i think i agree with you. If i've given differnet advice in the past, I'm sure i had a good reason for -- possible due to some aspect of those problems that are subtly differnet then yours ... can you post links to hte specific messages you're refering to, it might help jog my memory. : conversion of the synonym text file need to be done by hand at the : moment? I suppose that it would not be too difficult to write some A recently added feature is that when configuring SynonymFilterFactory you can give it the name of a TokenizerFactory to use when parsing the synonym file. This could be used to stem words *if* you write a TokenizerFactory that calls out to your Stemmer. (see SOLR-319 for the backround on why you can only specify a Tokenizer and not a full fieldType to get the analysis chain from ... in a nutshell: 1. it would have been harder to implement; 2. the only use cases people could think of where Tokenization based.) -Hoss
Re: UpdateRequestProcessorFactory / Chain etc
: And then calls to /update2 will go through the chain. Calls to /update will : not. Correct. Note also that there is also a default attribute you can put on one UpdateProcesserChain and then XmlUpdateRequestHandler (etc...) will use that even if you don't tell them to use a particular chain. If you only define one chain, it becomes the default automaticly. : 2) Does a UpdateRequestProcessor support inform ? : No, not that I can tell. And the factory won't get instantiated until the : first time you use it. inform, no ... but the factories should be getting instantiated during SolrCore init, what makes you think it's not until first use? (that would be a bug if it's true, but a quick skim of SolrCore suggests it should be working correctly) -Hoss
Re: handling multiple multiple resources with single requestHandler
: Any ideas on how could we register single request handler for handling : multiple (wildcarded) contexts/resource uri's ? : : (something like) : : : requestHandler name=/app/* class=solr.StandardRequestHandler : requestHandler name=/app/*/query class=solr.StandardRequestHandler One of the reasons wildcards aren't supported is because it creates ambiguity when dealing with dynamicly created RequestHandlers. Once upon a time we had the notion that a : (colon) could be used in the query path to denote that SolrDispatchFilter should stop there and treat everything up to the colon as the handler name, while everything after the colon should be put in the SolrQueryRequest for use by the RequestHandler, ie... /app/query?q=solr /app/query:yakko/foo/yak?q=solr /app/query:dot/bar/hoss?q=solr ...would all get processed by the /app/query handler which would have access to the , yakko/foo/yak, and dot/bar/hoss parts for each request. That seems to have been removed from SOlrDispatchFilter at some point, I'm not clear why but there are clearly remnents of it so maybe it was a mistake... // unused feature ? int idx = path.indexOf( ':' ); if( idx 0 ) { // save the portion after the ':' for a 'handler' path parameter path = path.substring( 0, idx ); } ...i'm kind of tired right now, but if i'm reading that correctly it's flat out ignoring anything after the colon. (which seems like the worst of both worlds ... you can't have a : in your request handler name, but you can't have access to what comes after it if you put it in the URL) I'm Not sure what's going on there. Maybe someone else understands. : The only way I can do it right now is by modifying SolrDispatchFilter, and : manually adding request context trimming there (reducing the requested context : to /app/), and registering handler for that context (which would later : resolve other parts of it) - but if there is another way to do this - : without changing the code, I would be more than happy to learn about it :) if you're comfortable with ServletFilters enough to muck with SolrDispatchFilter, then wouldn't writing a new filter that you configure to sit in front of SolrDispatchFilter and take pieces out of the URL and add them as request params be just as easy to write (and a lot easier to maintain) ? -Hoss
Re: Questions on compound file format
: 1. Using the compound file format drops the number of file descriptors : needed. Any other benefits? not that i know of. : 2. Indexing may be slower. What about query performance? If i remember correctly it's a little slower, but a little may be inconsequential. : 3. Since Lucene 1.4, the compound file format became the default, however : Solr default is not to use compound file format. Why this inconsistency? SolrIndexConfig.java shows useCompoundFile = true as the defualt ... are you seeing something different getting used as the default somewhere? -Hoss
Re: UpdateRequestProcessorFactory / Chain etc
On Sun, Sep 7, 2008 at 11:00 AM, Chris Hostetter [EMAIL PROTECTED]wrote: inform, no ... but the factories should be getting instantiated during SolrCore init, what makes you think it's not until first use? (that would be a bug if it's true, but a quick skim of SolrCore suggests it should be working correctly) I think Brian is referring to the method UpdateRequestProcessorFactory#getInstance(SolrQueryRequest, SolrQueryResponse, UpdateRequestProcessor) which kinda limits you to create it only on first request as an API. Noble pointed this out in SOLR-660 (but after it was committed) -- https://issues.apache.org/jira/browse/SOLR-660?focusedCommentId=12617235#action_12617235 -- Regards, Shalin Shekhar Mangar.
Re: Questions on compound file format
On Sun, Sep 7, 2008 at 11:21 AM, Chris Hostetter [EMAIL PROTECTED]wrote: SolrIndexConfig.java shows useCompoundFile = true as the defualt ... are you seeing something different getting used as the default somewhere? The example solrconfig.xml has useCompoundFile as false both in the indexDefault as well as in mainIndex section. Should we change that? -- Regards, Shalin Shekhar Mangar.