Facet on same date field multiple times

2011-12-12 Thread dbashford
I've Googled around a bit and seen this referenced a few times, but cannot
seem to get it to work

I have a query that looks like this:

facet=true
facet.date={!key=foo}date
f.foo.facet.date.start=2010-12-12T00:00:00Z
f.foo.facet.date.end=2011-12-12T00:00:00Z
f.foo.facet.date.gap=%2B1DAY

Eventually the goal is to do different ranges on the same field.  Month by
day.  Day by hour.  Year by week.  Something to that effect.  But I thought
I'd start simple to see if I could get the syntax right and what I have
above doesn't seem to work.

I get:
message Missing required parameter: f.date.facet.date.start (or default:
facet.date.start)
description The request sent by the client was syntactically incorrect
(Missing required parameter: f.date.facet.date.start (or default:
facet.date.start)).

So it doesn't seem interested in me using the local key.  From reading here: 
http://lucene.472066.n3.nabble.com/Date-Faceting-on-Solr-3-1-td3302499.html#a3309517
it would seem i should be able to do it (see the note at the bottom).

I know one option is to copyField the date into a few other spots, and I can
use that as a last resort, but if this works and I'm just arsing something
up...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-on-same-date-field-multiple-times-tp3580449p3580449.html
Sent from the Solr - User mailing list archive at Nabble.com.


StatsComponent and multi-valued fields

2010-10-06 Thread dbashford

Running 1.4.1.

I'm able to execute stats queries against multi-valued fields, but when
given a facet, the statscomponent only considers documents that have a facet
value as the last value in the field.

As an example, imagine you are running stats on fooCount, and you want to
facet on bar, which is multi-valued.  Two documents...

1)
fooCount = 100
bar = A, B, C

2) 
fooCount = 5
bar = C, B, A

stats.field=fooCountstats=truestats.facet=bar

I would expect to see stats for A, B, and C all with sums of 105.  But what
I'm seeing is stats for C and A with sums of 100 and 5 respectively.

Is this expected behavior?  Something I'm possibly doing wrong?  Is this
just not advisable?

Thanks!

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StatsComponent-and-multi-valued-fields-tp1644918p1644918.html
Sent from the Solr - User mailing list archive at Nabble.com.


Relevancy and non-matching words

2010-07-06 Thread dbashford

Is there some sort of threshold that I can tweak which sets how many letters
in non-matching words makes a result more or less relevant?

Searching on title, q=fantasy football, and I get this:

{title:The Fantasy Football Guys,
score:2.8387074},
{title:Fantasy Football Bums,
score:2.8387074},
{title:Fantasy Football Xtreme,
score:2.7019854},
{title:Fantasy Football Fools,
score:2.7019634},
{title:Fantasy Football Brothers,
score:2.5917912}

(I have some other scoring things in there that account for the difference
between Xtreme and Fools.)

The behavior I'm noticing is that there is some threshold for the length of
non matching words that, when tripped, kicks the score down a notch.  4 to 5
seems to trip one, 6 to 7.

I would really like something like Bums to score the same as Xtreme and
Brothers and let my other criterion determine which document should come
out on top.  Is there something that can be tweaked to get this to happen?

Or is my assumption a bit off base?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Relevancy-and-non-matching-words-tp946799p946799.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Leading Wildcard query strangeness

2010-06-30 Thread dbashford

An update in case someone stumbles upon this...

At first I thought you mean the fields I intend to do leading wildcard
searches on needed to have ReversedWildcardFilterFactory on them.  But that
didn't make sense because our prod app isn't using that at all.

But our prod app does have the text_rev still in it from the example
schema we copied over and used as our template.  One of the things we've
done in dev is clean that out and try to get down to what we are using. So I
tossed the text_rev back into the schema.xml, didn't actually use that field
type for any fields, and now I can do leading wildcard searches again.

I'm going to guess that is what you meant, that the very presence of the
filter in the schema, whether it is used or not, allows you to do wildcard
searches.

Is that documented anywhere and I just missed it?  I'm sure it is.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Leading-Wildcard-query-strangeness-tp931809p933600.html
Sent from the Solr - User mailing list archive at Nabble.com.


Leading Wildcard query strangeness

2010-06-29 Thread dbashford

We've got an app in production that executes leading wildcard queries just
fine.

lst name=responseHeader
  int name=status0/int
  int name=QTime1298/int
  lst name=params
str name=qtitle:*news/str
  /lst
/lst
result name=response numFound=5514 start=0

The same app in dev/qa has undergone a major schema/solrconfig overhaul,
including introducing multiple cores, and leading wildcard queries no longer
work...

org.apache.lucene.queryParser.ParseException: Cannot parse 'title:*news':
'*' or '?' not allowed as first character in WildcardQuery

All fields are exhibiting this behavior, whether their fieldType changed or
not.  In the case of title, it did not change.

I intend to pour over the changes to see what might have screwed things up,
but is there some simple setting somewhere that turns this on and off?  I
know https://issues.apache.org/jira/browse/SOLR-218 deals with this, but I
have to think there's some other way to do it or else our prod app wouldn't
be successfully executing Wildcard queries.  Is there another way without
QueryParser.setAllowLeadingWildcard(true)?

(Running 1.4 in all environments, although newer versions of 1.4
(downloaded more recently) in dev and qa.)

Thoughts?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Leading-Wildcard-query-strangeness-tp931809p931809.html
Sent from the Solr - User mailing list archive at Nabble.com.


Document boosting troubles

2010-06-17 Thread dbashford

Brand new to this sort of thing so bear with me.

For sake of simplicity, I've got a two field document, title and rank. 
Title gets searched on, rank has values from 1 to 10.  1 being highest. 
What I'd like to do is boost results of searches on title based on the
documents rank.

Because it's fairly cut and dry, I was hoping to do it during indexing.  I
have this in my DIH transformer..

var docBoostVal = 0;
switch (rank) {
case '1': 
docBoostVal = 3.0;
break;
case '2': 
docBoostVal = 2.6;
break;
case '3': 
docBoostVal = 2.2;
break;
case '4': 
docBoostVal = 1.8;
break;
case '5': 
docBoostVal = 1.5;
break;
case '6': 
docBoostVal = 1.2;
break;
case '7':
docBoostVal = 0.9;
break;
case '8': 
docBoostVal = 0.7;
break;
case '9': 
docBoostVal = 0.5;  
break;
}   
row.put('$docBoost',docBoostVal); 

It's my understanding that with this, I can simply do the same /select
queries I've been doing and expect documents to be boosted, but that doesn't
seem to be happening because I'm seeing things like this in the results...

{title:Some title 1,
rank:10,
 score:0.11726039},
{title:Some title 2,
 rank:7,
 score:0.11726039},

Pretty much everything with the same score.  Whatever I'm doing isn't making
its way through. (To cover my bases I did try the case statement with
integers rather than strings, same result)





With that not working I started looking at other options.  Starting playing
with dismax.  

I'm able to add this to a query string a get results I'm somewhat
expecting...

bq=rank:1^3.0 rank:2^2.6 rank:3^2.2 rank:4^1.8 rank:5^1.5 rank:6^1.2
rank:7^0.9 rank:8^0.7 rank:9^0.5

...but I guess I wasn't expecting it to ONLY rank based on those factors. 
That essentially gives me a sort by rank.  

Trying to be super inclusive with the search, so while I'm fiddling my
mm=11.  As expected, a q= like q=red door is returning everything that
contains Red and door.  But I was hoping that items that matched red door
exactly would sort closer to the top.  And if that exact match was a rank 7
that it's score wouldn't be exactly the same as all the other rank 7s? 
Ditto if I searched for q=The Tales Of, anything possessing all 3 terms
would sort closer to the top...and possessing two terms behind them...and
possessing 1 term behind them, and within those groups weight heavily on by
rank.

I think I understand that the score is based entirely on the boosts I
provide...so how do I get something more like what I'm looking for?




Along those lines, I initially had put something like this in my defaults...

 str name=bf
rank:1^10.0 rank:2^9.0 rank:3^8.0 rank:4^7.0 rank:5^6.0 rank:6^5.0
rank:7^4.0 rank:8^3.0 rank:9^2.0
 /str

...but that was not working, queries fail with a syntax exception.  Guessing
this won't work?



Thanks in advance for any help you can provide.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-boosting-troubles-tp902982p902982.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Document boosting troubles

2010-06-17 Thread dbashford

One problem down, two left!  =)  bf == bq did the trick, thanks.  Now at
least if I can't get the DIH solution working I don't have to tack that on
every query string.

Taking the quotes away from $docBoost results in a syntax error.  Needs to
be quoted.

Changed it up to this and still no luck

var rank = row.get('rank'); 
switch (rank) {
case 1: 
row.put($docBoost,3.0);   
break;
case 2: 
row.put($docBoost,2.6);   
break;
case 3:
row.put($docBoost,2.2);   

break;
case 4: 
row.put($docBoost,1.8);   
break;
case 5: 
row.put($docBoost,1.5);   
break;
case 6: 
row.put($docBoost,1.2);   
break;
case 7:
row.put($docBoost,0.9);   
break;
case 8: 
row.put($docBoost,0.7);   
break;
case 9: 
row.put($docBoost,0.5);   
break;
default:
row.put($docBoost,0.1);   
}   



And still can't figure out what I need to do with my dismax querying to get
scores for quality of match.  Thoughts?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-boosting-troubles-tp902982p903638.html
Sent from the Solr - User mailing list archive at Nabble.com.


DataImportHandler + docBoost

2010-06-17 Thread dbashford

Pulled this out of another thread of mine as it's the only bit left that I
haven't been able to figure out.

Can someone show me briefly how one would include a docBoost inside a DIH?

I've got something like this...

var rank = row.get('rank'); 
switch (rank) {
case '1': 
row.put($docBoost,3.0);   
break;
case '2': 
row.put($docBoost,2.6);   
break;
case '3': 
row.put($docBoost,2.2);   

break;
case '4': 
row.put($docBoost,1.8);   
}

...and no effect.  I've tried rank as a int just to cover my bases...

switch (rank) {
case 1: 
row.put($docBoost,3.0);   
break;
case 2: 
row.put($docBoost,2.6);   
break;
case 3: 
row.put($docBoost,2.2);   

break;
case 4: 
row.put($docBoost,1.8);   
}

...still no effect.

And I've tried adding and removing this from my entity:

field column=$docBoost /

...again to no effect.  Not sure if it should be there or not, but gave both
a shot.

The results I see are a lot like this
(/select?fl=score,rank,titleq=title:red):

{title:red,
 rank:10,
 score:0.22583205},
{title:red,
 rank:8,
 score:0.22583205},

I would expect the rank 8 to have a higher score.  Not happening though.

What am I missing?




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-docBoost-tp904116p904116.html
Sent from the Solr - User mailing list archive at Nabble.com.


Case Insensitive search while preserving case

2010-05-04 Thread dbashford

I've looked through the history and tried a lot of things but can't quite get
this to work.

Used this in my last attempt:

fieldType name=lowercase class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType


What I'm looking to do is allow user's to execute case insensitive searches,
which this does.  BLaH should return all the Blahs.  However, what this
also seems to do is render the values lowercased when I do faceted or stats
queries, or if I do a terms search.  Always returned as blah.

Is there any way to only ever get the original value out of Solr no matter
how I ask for it? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Case-Insensitive-search-while-preserving-case-tp777602p777602.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Case Insensitive search while preserving case

2010-05-04 Thread dbashford

All my fields are stored.

And if my field name is state means that your suggestion is appending
fl=state, then no, that's not doing anything for me.  =(

The above config gets me part of the way to where I need to be.  Storing,
for instance, Alaska in such a way that querying for alaska, AlaSkA,
and ALASKA will all return Alaska.  However, if I include the field as a
stats.facet, or I'd doing a faceted search (facet=true), or do a terms
search, what I get out is alaska.

Any way around that without the dupe field?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Case-Insensitive-search-while-preserving-case-tp777602p777674.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sum? Filter in term search? Something else?

2010-01-08 Thread dbashford

I get the feeling what I need to accomplish isn't necessarily in the spirit
of what solr is meant to do, but it's the problem I'm facing.  Of course,
I'm a solr newbie, so this may not be as challenging as I think it is.

Domain is a little tricky, so I'll make one up.  Lets say I have the
following 3 pieces of information in my index == Book Id, Book
Category(mystery, fantasy, etc), # of Pages.  And lets say there's a
million, two million...a lot of books.

Question 1) Is there a way to get a total number of pages across all books
without pulling every book and iterating over the result?  (Lets pretend
doing so is a worthwhile endeavor.)

One thing I've found I can do to reduce iteration is to do a term search on
# of pages, which does a little gathering up for me.  If books have between
100 and 1000 pages, I've got 900 results to iterate over, little multiplying
and adding and I have what I need.  That's acceptable, and in this case its
definitely faster than going to the database and querying a poorly
normalized group of tables, but still, if there was some way to just get a
sum...

Question 2) The next question would be, assuming there's some way to get a
sum for all books, how would I get a sum for all books that are mysteries? 
I don't think the term search works here because I can't seem to find a way
to limit the docs the term search uses.

Thanks in advance for any help you can provide!

-- 
View this message in context: 
http://old.nabble.com/Sum---Filter-in-term-search---Something-else--tp27076924p27076924.html
Sent from the Solr - User mailing list archive at Nabble.com.