Result list order in case of ties

2011-07-12 Thread Lox
Hi,

In the case where two or more documents are returned with the same score, is
there a way to tell Solr to sort them alphabetically?

I have already tried to use the tie-breaker, but I have just one field to
search.

Thank you.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-list-order-in-case-of-ties-tp3162001p3162001.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Feed index with analyzer output

2011-07-05 Thread Lox
Ok, 

the very short question is:
Is there a way to submit the analyzer response so that solr already knows
what to do with that response? (that is, which field are to be treated as
payloads, which are tokens, etc...)


Chris Hostetter-3 wrote:
 
 can you explain a bit more about what you goal is here?  what info are you 
 planning on extracting?  what do you intend to change between the info you 
 get back in the first request and the info you want to send in the second 
 request?
 

I plan to add some payloads to some terms between request#1 and request#2.


Chris Hostetter-3 wrote:
 
 your analyziers and whatnot for request#1 would be exactly what you're use 
 to, but for request#2 you'd need to specify an analyzer that would let you 
 specify, in the field value, the details about the term and position, and 
 offsets, and payloads and what not ... the 
 DelimitedPayloadTokenFilterFactory / DelimitedPayloadTokenFilter can help 
 with some of that, but not all -- you'd either need your own custom 
 analyzer or custom FieldType or something depending on teh specific 
 changes you want to make.
 
 Frankly though i really believe you are going about this backwards -- if 
 you want to manipulate the Tokenstream after analysis but before indexing, 
 then why not implement this custom logic thta you want in a TokenFilter 
 and use it in the last TokenFilterFactory you have for your analyzer?
 
 

Yeah, I thought about that. I really wanted to know if there weren't an
already implemented way to do that to avoid reinventing the wheel.

It would be cool if I were able to send info to solr formatted in a way like
I imagined in my last mail, so that a call to any Tokenizer or TokenFilter
wouldn't be necessary. It would have been like using an empty analyzer but
still retaining the various token information.

Thank you!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3140460.html
Sent from the Solr - User mailing list archive at Nabble.com.


Payload doesn't apply to WordDelimiterFilterFactory-generated tokens

2011-07-04 Thread Lox
Hi, I have a problem with the WordDelimiterFilterFactory and the
DelimitedPayloadTokenFilterFactory.
It seems that the payloads are applied only to the original word that I
index and the WordDelimiterFilter doesn't apply the payloads to the tokens
it generates.

For example, imagine I index the string JavaProject|1.7, 
at the end of my analyzer pipeline will be transformed like this:
JavaProject|1.7 - javaproject|1.7 java project

Instead, what I would is a result like this:
JavaProject|1.7 - javaproject|1.7 java|1.7 project|1.7

This way the payload would be applied to the document even in case of
partial matches on the original word.
Now I have used the pipe notation but imagine those payloads already stored
in solr internally.

How can I do this?

If it is needed, my analyzer looks like this:
fieldType name=text_C class=solr.TextField positionIncrementGap=100
stored=false indexed=true
  analyzer type=index   
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.DelimitedPayloadTokenFilterFactory 
encoder=float/
filter class=solr.PatternReplaceFilterFactory
pattern=^[a-z]{2,5}[0-9]{1,4}?([.]|[a-z])?(.*)
replacement= replace=all /
filter class=solr.WordDelimiterFilterFactory 
preserveOriginal=1
generateNumberParts=1/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt  enablePositionIncrements=true /   
filter class=solr.TrimFilterFactory /   
filter class=solr.LowerCaseFilterFactory/
filter class=solr.LengthFilterFactory min=1 max=30 /
filter class=solr.SnowballPorterFilterFactory 
language=English
protected=protwords.txt/
  /analyzer
.
.
.

Thank you.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Payload-doesn-t-apply-to-WordDelimiterFilterFactory-generated-tokens-tp3136748p3136748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Feed index with analyzer output

2011-07-02 Thread Lox
Hi,

I'm trying to achieve a sort of better separation between the analysis of a
document (tokenizing, filtering ecc.) and the indexing (storing).
Now, I would like my application to call the analyzer (/analysis/document)
via REST which returns the various tokens in xml format, then feed these
data to the index directly without doing the analysis again.
But I would also like to retain the original non-analyzed field for
diplaying purposes. 
This can probably be achieved with a copyField, right?

So my question is:
is it possible to feed the solr index with the ouput of the analyzer?

Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3131771.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Feed index with analyzer output

2011-07-02 Thread Lox
Yes, from an utilitarian perspective you're absolutely right.
Mine is actually a more academic exercise.

I will be more clear on the steps that I would like to take:
1) Call the analyzer of Solr that returns me an XML response in the
following format (just a snippet as example)

lst name=attributeNames
lst name=index
 lst name=incomingArc|1.6 outgoingArc|1.6
  arr name=org.apache.lucene.analysis.WhitespaceTokenizer
lst
str name=textincomingArc|1.6/str
str name=typeword/str
int name=start0/int
int name=end15/int
int name=position1/int
/lst
lst
str name=textoutgoingArc|1.6/str
str name=typeword/str
int name=start16/int
int name=end31/int
int name=position2/int
/lst
  /arr
  arr
name=org.apache.lucene.analysis.payloads.DelimitedPayloadTokenFilter
lst
str name=textincomingArc/str
str name=typeword/str
int name=start0/int
int name=end15/int
int name=position1/int
str
name=payloadorg.apache.lucene.index.Payload:org.apache.lucene.index.Payload@ffe807d2/str
/lst
lst

etc.

2) now I would like to be able to extract the info that I need from there
and tell Solr directly which things to index, telling him directly also
which are the tokens with their respective payload without performing more
analysis.
I know that solr does all those things internally starting from the original
text but is there a way to skip that phase by telling it immediately from a
given field which are the tokens with their payloads? So that they will be
stored internally as before, only that this time I would have performed the
2 steps (analysis and indexing) in 2 different phases, with my application
orchestrating both of them.

I don't know if building the documents with SolrJ could help...maybe that's
the way to go?
Or is there a particular XML format to send to Solr? For example somthing
like:

add
   doc
 field name=id0001/field
 field name=text
 rawValuethis is text/rawValue
 token pos=1 payload=2.0this/token
 token pos=2 payload=1.0is/token
 token pos=3 payload=2.5text/token
 /field
   /doc
/add

Does it make sense? Or maybe I'm dreaming? :)

Thank you for answering!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3132556.html
Sent from the Solr - User mailing list archive at Nabble.com.