Re: Interesting search question! How to match documents based on the least number of fields that match all query terms?

2014-01-23 Thread Daniel Shane
Thanks Frank, Mikhail & Robert for your input!

I'm looking into your ideas, and running a few test queries to see how it works 
out. I have a feeling that it is more tricky that it sounds, for example, lets 
say I have 3 docs in my index:

Doc1:

m1: a b c d
m2: a b c
m3: a b
m4: a
mAll: a b c d / a b c / a b / a

Doc 2:

m1: a b c 
m2: b c d
m3: 
m4:
mAll: a b c / b c d

Doc 3:

m1: a 
m2: b
m3: c
m4: d
mAll: a / b / c / d

If the search terms are a b c d, then all 3 docs will match, since each of the 
search terms are in the metas. However, the sorting should give this order:

doc1 (1 field matches all terms)
doc2 (2 fields match all terms)
doc3 (4 fields match all terms)

I'll try out your ideas and let you know how it works out!

Daniel Shane



- Original Message -
From: "Franck Brisbart" 
To: solr-user@lucene.apache.org
Sent: Thursday, January 23, 2014 3:12:36 AM
Subject: RE: Interesting search question! How to match documents based on the 
least number of fields that match all query terms?

Hi Daniel,

you can also consider using negative boosts.
This can't be done with solr, but docs which don't match the metadata
can be boosted.

This might do what you want :
-metadata1:(term1 AND ... AND termN)^2
-metadata2:(term1 AND ... AND termN)^2
.
-metadataN:(term1 AND ... AND termN)^2
allMetadatas :(term1 AND ... AND termN)^0.5


Franck Brisbart



Le mercredi 22 janvier 2014 à 19:38 +, Petersen, Robert a écrit :
> Hi Daniel,
> 
> How about trying something like this (you'll have to play with the boosts to 
> tune this), search all the fields with all the terms using edismax and use 
> the minimum should match parameter, but require all terms to match in the 
> allMetadata field.
> https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29
> 
> Lucene query syntax below to give you the general idea, but this query would 
> require all terms to be in one of the metadata fields to get the boost.
> 
> metadata1:(term1 AND ... AND termN)^2
> metadata2:(term1 AND ... AND termN)^2
> .
> metadataN:(term1 AND ... AND termN)^2
> allMetadatas :(term1 AND ... AND termN)^0.5
> 
> That should do approximately what you want,
> Robi
> 
> -Original Message-
> From: Daniel Shane [mailto:sha...@lexum.com] 
> Sent: Tuesday, January 21, 2014 8:42 AM
> To: solr-user@lucene.apache.org
> Subject: Interesting search question! How to match documents based on the 
> least number of fields that match all query terms?
> 
> I have an interesting solr/lucene question and its quite possible that some 
> new features in solr might make this much easier that what I am about to try. 
> If anyone has a clever idea on how to do this search, please let me know!
> 
> Basically, lets state that I have an index in which each documents has a 
> content and several metadata fields.
> 
> Document Fields:
> 
> content
> metadata1
> metadata2
> .
> metadataN
> allMetadatas (all the terms indexed in metadata1...N are concatenated in this 
> field) 
> 
> Assuming that I am searching for documents that contains a certain number of 
> terms (term1 to termN) in their metadata fields, I would like to build a 
> search query that will return document that satisfy these requirement:
> 
> a) All search terms must be present in a metadata field. This is quite easy, 
> we can simply search in the field allMetadatas and that will work fine.
> 
> b) Now for the hard part, we prefer document in which we found the metadatas 
> in the *least number of different fields*. So if one document contains all 
> the search terms in 10 different fields, but another document contains all 
> search terms but in only 8 fields, we would like those to sort first. 
> 
> My first idea was to index terms in the allMetadatas using payloads. Each 
> indexed term would also have the specific metadataN field from which they 
> originate. Then I can write a scorer to score based on these payloads. 
> 
> However, if there is a way to do this without payloads I'm all ears!
> 



Interesting search question! How to match documents based on the least number of fields that match all query terms?

2014-01-21 Thread Daniel Shane
I have an interesting solr/lucene question and its quite possible that some new 
features in solr might make this much easier that what I am about to try. If 
anyone has a clever idea on how to do this search, please let me know!

Basically, lets state that I have an index in which each documents has a 
content and several metadata fields.

Document Fields:

content
metadata1
metadata2
.
metadataN
allMetadatas (all the terms indexed in metadata1...N are concatenated in this 
field) 

Assuming that I am searching for documents that contains a certain number of 
terms (term1 to termN) in their metadata fields, I would like to build a search 
query that will return document that satisfy these requirement:

a) All search terms must be present in a metadata field. This is quite easy, we 
can simply search in the field allMetadatas and that will work fine.

b) Now for the hard part, we prefer document in which we found the metadatas in 
the *least number of different fields*. So if one document contains all the 
search terms in 10 different fields, but another document contains all search 
terms but in only 8 fields, we would like those to sort first. 

My first idea was to index terms in the allMetadatas using payloads. Each 
indexed term would also have the specific metadataN field from which they 
originate. Then I can write a scorer to score based on these payloads. 

However, if there is a way to do this without payloads I'm all ears!

-- 
Daniel Shane
Lexum (www.lexum.com)
sha...@lexum.com


Re: Solr highlighter and custom queries?

2010-05-20 Thread Daniel Shane
Actually, its not as much a Solr problem as a Lucene one, as it turns out, the 
WeightedSpanTermExtractor is in Lucene and not Solr.

Why they decided to only highlight queries that are in Lucene I don't know, but 
what I did to solve this problem was simply to make my queries extends a Lucene 
query instead of just "Query". 

So I decided to extend a BooleanQuery, which is the closest fit to what mine 
actually does.

This make the highlighting "do something" even though its not perfect.

Daniel Shane


Solr highlighter and custom queries?

2010-05-20 Thread Daniel Shane
Hi all!

I'm trying to do some simple highlighting, but I cannot seem to figure out how 
to make it work.

I'm using my own QueryParser which generates custom made queries. I would like 
Solr to be able to highlight them.

I've tried many options in the highlighter but cannot get any snippets to show.

However, if I change the QueryParser to the default solr parser it works.

There is certainly a place in the config or in the query parser where I can 
specify how Solr can highlight my custom queries?

I checked a bit in the source code, and in WeightedSpanTermExtractor class, in 
the method extract(Query query, Map terms), there is a huge list of 
instanceof's that check which type of query we are attempting to match.

Is that the only place where the conversion between query <-> highlighting 
happens? If so, its looks pretty hard coded and would not work with any other 
queries than the ones included in Lucene.

I guess there must be a good reason for this, but is there any other way of 
making the highlighter work without having to hard code all the possible 
queries in a big if / instanceofs?

If we could somehow reuse the code contained in each query to find possible 
matches, it would avoid having to recode the same logic elsewhere.

But as I said, there must be a good reason for doing it the way its already 
coded.

Any ideas on how to work this out with the existing code base would be greatly 
appreciated :)

Daniel Shane



Re: Merge several queries into one result?

2010-02-17 Thread Daniel Shane
Yup, thats also what I was thinking. 

However, I do think that many real world examples cannot simply use one flat 
index. If you have a big index with big documents, you may want to have a 
separate, small index, for things that update frequently etc.. You would need 
to cross reference that index with the main one to produce the final result.

It java it would be easy to just do 2 queries, one to get the main hits, and 
the other to get the smaller index. In fact, that controller could just cache 
those entries in the second index. 

I don't know if it would be easy to include in Solr. It would certainly require 
much thought tough as some may want to cross index another core for each hit, 
while others would just want to retrive a bunch of documents statically.

Daniel Shane

I'll see what could be done, but I don't think anything easy 
- Original Message -
From: "Erick Erickson" 
To: solr-user@lucene.apache.org
Sent: Tuesday, February 16, 2010 10:20:50 PM
Subject: Re: Merge several queries into one result?

It's generally a bad idea to try to think of
various SOLR/Lucene indexes in a database-like
way, Lucene isn't built to do RDBMS-like stuff. The
first suggestion is usually to consider flattening
your data. That would be something like
adding NY and "New York" in each document.

If that's not possible, the thread titled "Collating results from multiple
indexes" might be useful, although my very quick
read of that is that you have to do some custom work...

HTH
Erick


On Tue, Feb 16, 2010 at 4:54 PM, Daniel Shane wrote:

> Hi all!
>
> I'm trying to join 2 indexes together to produce a final result using only
> Solr + Velocity Response Writer.
>
> The problem is that each "hit" of the main index contains references to
> some common documents located in another index. For example, the hit could
> have a field that describes in what state its located. This field would have
> a value of "NY" for New York etc...
>
> Now what if, in velocity, I want to show this information in full detail.
> Instead of the NY, I would like to show "New York"? This information has not
> been indexed in the main index, but rather in a second one.
>
> Is it possible to coalesce or join these results together so that I can
> pass a simple Velocity template to generate the final HTML?
>
> Or do I have to write a webapp in java to cache all these global variables
> (the state codes, the country codes etc...)?
>
> Daniel Shane
>


Re: Preventing mass index delete via DataImportHandler full-import

2010-02-17 Thread Daniel Shane
Thats what I thought. I think I'll take the time to add something to the DIH to 
prevent such things. Maybe a parameter that will cause the import to bail out 
if the documents to index are less than X % of the total number of documents 
already in the index.

There would also be a parameter to override this manually.

I think it would be a good safety precaution.

Daniel Shane

- Original Message -
From: "Noble Paul നോബിള്‍ नोब्ळ्" 
To: solr-user@lucene.apache.org
Sent: Wednesday, February 17, 2010 12:36:52 AM
Subject: Re: Preventing mass index delete via DataImportHandler full-import

On Wed, Feb 17, 2010 at 8:03 AM, Chris Hostetter
 wrote:
>
> : I have a small worry though. When I call the full-import functions, can
> : I configure Solr (via the XML files) to make sure there are rows to
> : index before wiping everything? What worries me is if, for some unknown
> : reason, we have an empty database, then the full-import will just wipe
> : the live index and the search will be broken.
>
> I believe if you set clear=false when doing the full-import, DIH won't
it is clean=false

or use command=import instead of command=full-import
> delete the entire index before it starts.  it probably makes the
> full-import slower (most of the adds wind up being deletes followed by
> adds) but it should prevent you from having an empty index if something
> goes wrong with your DB.
>
> the big catch is you now have to be responsible for managing deletes
> (using the XmlUpdateRequestHandler) yourself ... this bug looks like it's
> goal is to make this easier to deal with (but i'd not really clear to
> me what "deletedPkQuery" is ... it doesnt' seem to be documented.
>
> https://issues.apache.org/jira/browse/SOLR-1168
>
>
>
> -Hoss
>
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Merge several queries into one result?

2010-02-16 Thread Daniel Shane
Hi all!

I'm trying to join 2 indexes together to produce a final result using only Solr 
+ Velocity Response Writer.

The problem is that each "hit" of the main index contains references to some 
common documents located in another index. For example, the hit could have a 
field that describes in what state its located. This field would have a value 
of "NY" for New York etc...

Now what if, in velocity, I want to show this information in full detail. 
Instead of the NY, I would like to show "New York"? This information has not 
been indexed in the main index, but rather in a second one.

Is it possible to coalesce or join these results together so that I can pass a 
simple Velocity template to generate the final HTML?

Or do I have to write a webapp in java to cache all these global variables (the 
state codes, the country codes etc...)?

Daniel Shane


Preventing mass index delete via DataImportHandler full-import

2010-02-16 Thread Daniel Shane
I've setup a simple DIH import handler with Solr that connects via a database 
to my data.

I have a small worry though. When I call the full-import functions, can I 
configure Solr (via the XML files) to make sure there are rows to index before 
wiping everything? What worries me is if, for some unknown reason, we have an 
empty database, then the full-import will just wipe the live index and the 
search will be broken.

I don't think its possible, but I'm new to Solr so its quite possible I've 
overlooked how this could be done.

Thanks in advance for any help!
Daniel Shane