Result list order in case of ties
Hi, In the case where two or more documents are returned with the same score, is there a way to tell Solr to sort them alphabetically? I have already tried to use the tie-breaker, but I have just one field to search. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Result-list-order-in-case-of-ties-tp3162001p3162001.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Feed index with analyzer output
Ok, the very short question is: Is there a way to submit the analyzer response so that solr already knows what to do with that response? (that is, which field are to be treated as payloads, which are tokens, etc...) Chris Hostetter-3 wrote: can you explain a bit more about what you goal is here? what info are you planning on extracting? what do you intend to change between the info you get back in the first request and the info you want to send in the second request? I plan to add some payloads to some terms between request#1 and request#2. Chris Hostetter-3 wrote: your analyziers and whatnot for request#1 would be exactly what you're use to, but for request#2 you'd need to specify an analyzer that would let you specify, in the field value, the details about the term and position, and offsets, and payloads and what not ... the DelimitedPayloadTokenFilterFactory / DelimitedPayloadTokenFilter can help with some of that, but not all -- you'd either need your own custom analyzer or custom FieldType or something depending on teh specific changes you want to make. Frankly though i really believe you are going about this backwards -- if you want to manipulate the Tokenstream after analysis but before indexing, then why not implement this custom logic thta you want in a TokenFilter and use it in the last TokenFilterFactory you have for your analyzer? Yeah, I thought about that. I really wanted to know if there weren't an already implemented way to do that to avoid reinventing the wheel. It would be cool if I were able to send info to solr formatted in a way like I imagined in my last mail, so that a call to any Tokenizer or TokenFilter wouldn't be necessary. It would have been like using an empty analyzer but still retaining the various token information. Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3140460.html Sent from the Solr - User mailing list archive at Nabble.com.
Payload doesn't apply to WordDelimiterFilterFactory-generated tokens
Hi, I have a problem with the WordDelimiterFilterFactory and the DelimitedPayloadTokenFilterFactory. It seems that the payloads are applied only to the original word that I index and the WordDelimiterFilter doesn't apply the payloads to the tokens it generates. For example, imagine I index the string JavaProject|1.7, at the end of my analyzer pipeline will be transformed like this: JavaProject|1.7 - javaproject|1.7 java project Instead, what I would is a result like this: JavaProject|1.7 - javaproject|1.7 java|1.7 project|1.7 This way the payload would be applied to the document even in case of partial matches on the original word. Now I have used the pipe notation but imagine those payloads already stored in solr internally. How can I do this? If it is needed, my analyzer looks like this: fieldType name=text_C class=solr.TextField positionIncrementGap=100 stored=false indexed=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.DelimitedPayloadTokenFilterFactory encoder=float/ filter class=solr.PatternReplaceFilterFactory pattern=^[a-z]{2,5}[0-9]{1,4}?([.]|[a-z])?(.*) replacement= replace=all / filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 generateNumberParts=1/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.LengthFilterFactory min=1 max=30 / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer . . . Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Payload-doesn-t-apply-to-WordDelimiterFilterFactory-generated-tokens-tp3136748p3136748.html Sent from the Solr - User mailing list archive at Nabble.com.
Feed index with analyzer output
Hi, I'm trying to achieve a sort of better separation between the analysis of a document (tokenizing, filtering ecc.) and the indexing (storing). Now, I would like my application to call the analyzer (/analysis/document) via REST which returns the various tokens in xml format, then feed these data to the index directly without doing the analysis again. But I would also like to retain the original non-analyzed field for diplaying purposes. This can probably be achieved with a copyField, right? So my question is: is it possible to feed the solr index with the ouput of the analyzer? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3131771.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Feed index with analyzer output
Yes, from an utilitarian perspective you're absolutely right. Mine is actually a more academic exercise. I will be more clear on the steps that I would like to take: 1) Call the analyzer of Solr that returns me an XML response in the following format (just a snippet as example) lst name=attributeNames lst name=index lst name=incomingArc|1.6 outgoingArc|1.6 arr name=org.apache.lucene.analysis.WhitespaceTokenizer lst str name=textincomingArc|1.6/str str name=typeword/str int name=start0/int int name=end15/int int name=position1/int /lst lst str name=textoutgoingArc|1.6/str str name=typeword/str int name=start16/int int name=end31/int int name=position2/int /lst /arr arr name=org.apache.lucene.analysis.payloads.DelimitedPayloadTokenFilter lst str name=textincomingArc/str str name=typeword/str int name=start0/int int name=end15/int int name=position1/int str name=payloadorg.apache.lucene.index.Payload:org.apache.lucene.index.Payload@ffe807d2/str /lst lst etc. 2) now I would like to be able to extract the info that I need from there and tell Solr directly which things to index, telling him directly also which are the tokens with their respective payload without performing more analysis. I know that solr does all those things internally starting from the original text but is there a way to skip that phase by telling it immediately from a given field which are the tokens with their payloads? So that they will be stored internally as before, only that this time I would have performed the 2 steps (analysis and indexing) in 2 different phases, with my application orchestrating both of them. I don't know if building the documents with SolrJ could help...maybe that's the way to go? Or is there a particular XML format to send to Solr? For example somthing like: add doc field name=id0001/field field name=text rawValuethis is text/rawValue token pos=1 payload=2.0this/token token pos=2 payload=1.0is/token token pos=3 payload=2.5text/token /field /doc /add Does it make sense? Or maybe I'm dreaming? :) Thank you for answering! -- View this message in context: http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3132556.html Sent from the Solr - User mailing list archive at Nabble.com.