date:20100604

exclude docs with null field

2010-06-04 Thread bluestar

hi there,

say my search query is new york, and i am searching field1 and field2
for it, how do i specify that i want to exlude docs where field3 doesnt
exist?

thanks

Multi word synonyms + highlighting

2010-06-04 Thread Xavier Schepler


Hi,

Here's a field type using synonyms :

fieldtype name=SFR class=solr.TextField
analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.SynonymFilterFactory 
synonyms=french-synonyms.txt ignoreCase=true expand=true/

 filter class=solr.LowerCaseFilterFactory/
 charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
/fieldtype

Here are the contents of 'french-synonyms.txt' that I used for testing :

PC,parti communiste
PS,parti socialiste

When I query a field for the words : parti communiste, those things are 
highlighted :

parti communiste
parti socialiste
parti
PC
PS
communiste

Having parti socialiste highlighted is a problem.
I expected only parti communiste, parti, communiste and PC 
highlighted.


Is there a way to have things working like I expected ?

Here is the query I use :

wt=json
q=qAndMSFR%3A%28parti%20communiste%29
q.op=AND
start=0
rows=5
fl=id,studyId,questionFR,modalitiesFR,variableLabelFR,variableName,nesstarVariableId,lang,studyTitle,nesstarStudyId,CevipofConcept,studyQuestionCount,questionPosition,preQuestionText,
sort=score%20desc
facet=true
facet.field=CevipofConceptCode
facet.field=studyDateAndId
facet.sort=lex
spellcheck=true
spellcheck.collate=on
spellcheck.count=10
hl=on
hl.fl=questionSMFR,modalitiesSMFR,variableLabelSMFR
hl.fragsize=1
hl.snippets=100
hl.usePhraseHighlighter=true
hl.highlightMultiTerm=true
hl.simple.pre=%3Cb%3E
hl.simple.post=%3C%2Fb%3E

Re: exclude docs with null field

2010-06-04 Thread Ahmet Arslan

 say my search query is new york, and i am searching
 field1 and field2
 for it, how do i specify that i want to exlude docs where
 field3 doesnt
 exist?


http://search-lucene.com/m/1o5mEk8DjX1/

Re: exclude docs with null field

2010-06-04 Thread bluestar

i could be wrong but it seems this way has a performance hit?

or i am missing something?

 field1:new york+field2:new york+field3:[* TO *]

 2010/6/4 bluestar sea...@butterflycluster.net

 hi there,

 say my search query is new york, and i am searching field1 and field2
 for it, how do i specify that i want to exlude docs where field3 doesnt
 exist?

 thanks

Re: exclude docs with null field

2010-06-04 Thread Ahmet Arslan


 i could be wrong but it seems this
 way has a performance hit?
 
 or i am missing something?

Did you read Chris's message in http://search-lucene.com/m/1o5mEk8DjX1/
He proposes alternative (more efficient) way other than [* TO *]

Re: Logs for Java Replication in Solr

2010-06-04 Thread Peter Karich

Hoss,

thanks a lot! (We are using tomcat so the logging properties file is fine.)
Do you know what the reason of the mentioned exception could be?
It seems to me that if this exception accurs that even the replication
for that index does not work.
If I then remove the data director + reload + poll a replication all is
fine. But sometimes it occurs again :-/

Regards,
Peter.

 : 
 : where can I find more information about a failure of a Java replication
 : in Solr 1.4?
 : (Dashboard does not seem to be the best place!?)

 All the log message are written using the JDK Logging framework, so it 
 really depends on your servlet container, and where it's configured to 
 write the logs...

   http://wiki.apache.org/solr/SolrLogging



 -Hoss

Re: exclude docs with null field

2010-06-04 Thread bluestar

nice one! thanks.


 i could be wrong but it seems this
 way has a performance hit?

 or i am missing something?

 Did you read Chris's message in http://search-lucene.com/m/1o5mEk8DjX1/
 He proposes alternative (more efficient) way other than [* TO *]

Re: exclude docs with null field

2010-06-04 Thread Geert-Jan Brits

Additionally, I should have mentioned that you can instead do:
fq=field_3:[* TO *], which uses the filtercache.

The method presented by Chris will probably outperform the above method but
only on the first request, from then on the filtercache takes over.
From a performance standpoint it's probably not worth going the 'default
value for null-approach' imho.
It IS useful however if you want to be able to query on docs with a
null-value (instead of excluding them)


2010/6/4 bluestar sea...@butterflycluster.net

 nice one! thanks.

 
  i could be wrong but it seems this
  way has a performance hit?
 
  or i am missing something?
 
  Did you read Chris's message in http://search-lucene.com/m/1o5mEk8DjX1/
  He proposes alternative (more efficient) way other than [* TO *]

MultiValue Exclusion

2010-06-04 Thread homerlex


How would you model this?

We have a table of news items that people can view in their news stream and
comment on.  Users have the ability to mute item so they never see them in
their feed or search results.

From what I can see there are a couple ways to accomplish this.

1 - Post process the results and do not render any muted news items.  The
downside of the pagination become problematic.  Its possible we may forgo
pagination because of this but for now assume that pagination is a
requirement.

2 - Whenever we query for a given user we append a clause that excludes all
muted items.  I assume in Solr we'd need to do something like -item_id(1 AND
2 AND 3).  Obviously this doesn't scale very well.

3 - Have a multi-valued property in the index that contains all ids of users
who have muted the item.  Being new to Solr I don't even know how (or if its
possible) to run a query that says user id not this multivalued property. 
Can this even be done (sample query please)?  Again, I know this doesn't
scale very well.

Any other suggestions?

Thanks in advance for the help.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-Exclusion-tp870173p870173.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MultiValue Exclusion

2010-06-04 Thread Geert-Jan Brits

I guess the following works.

A. similar to your option 2, but using the filtercache
fq=-item_id:001 -item_id:002

B. similar to your option 3, but using the filtercache
fq=-users_excluded_field:userid

the advantage being that the filter is cached independently from the rest of
the query so it can be reused efficiently.

adv A over B. the 'muted news items' can be queried dynamically, i.e: they
aren't set in stone at index time.
B will probably perform a little bit better the first time (when nog
cached), but I'm not sure.

hope that helps,
Geert-Jan


2010/6/4 homerlex homerlex.nab...@gmail.com


 How would you model this?

 We have a table of news items that people can view in their news stream and
 comment on.  Users have the ability to mute item so they never see them
 in
 their feed or search results.

 From what I can see there are a couple ways to accomplish this.

 1 - Post process the results and do not render any muted news items.  The
 downside of the pagination become problematic.  Its possible we may forgo
 pagination because of this but for now assume that pagination is a
 requirement.

 2 - Whenever we query for a given user we append a clause that excludes all
 muted items.  I assume in Solr we'd need to do something like -item_id(1
 AND
 2 AND 3).  Obviously this doesn't scale very well.

 3 - Have a multi-valued property in the index that contains all ids of
 users
 who have muted the item.  Being new to Solr I don't even know how (or if
 its
 possible) to run a query that says user id not this multivalued property.
 Can this even be done (sample query please)?  Again, I know this doesn't
 scale very well.

 Any other suggestions?

 Thanks in advance for the help.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/MultiValue-Exclusion-tp870173p870173.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Faceted Search Slows Down as index gets larger

2010-06-04 Thread Furkan Kuru

Hello,

I have been dealing with real-time data.

As the number of total indexed documents gets larger (now 5 M)

a faceted search on a text field limited by the creation time, which we use
to find the most used word in all these text fields, gets slow down.


query string: created_time:[NOW-1HOUR TO NOW] facet.field=text
facet.mincount=1

the document count matching the query is around 9000.


It takes around 80 seconds in a decent computer with 4GB ram, quad core cpu

I do not know the internal details of term indexing and their counts for
faceting.

Any suggestion for speeding up this query is appreciated.

Thanks in advance.

-- 
Furkan Kuru

Re: Faceted Search Slows Down as index gets larger

2010-06-04 Thread Yonik Seeley

Faceting on a full-text field is hard.
What version of Solr are you using?

If it's 1.4 or later, try setting
facet.method=enum

And to use the filterCache less, try
facet.enum.cache.minDf=100

-Yonik
http://www.lucidimagination.com

On Fri, Jun 4, 2010 at 10:31 AM, Furkan Kuru furkank...@gmail.com wrote:
 Hello,

 I have been dealing with real-time data.

 As the number of total indexed documents gets larger (now 5 M)

 a faceted search on a text field limited by the creation time, which we use
 to find the most used word in all these text fields, gets slow down.


 query string: created_time:[NOW-1HOUR TO NOW] facet.field=text
 facet.mincount=1

 the document count matching the query is around 9000.


 It takes around 80 seconds in a decent computer with 4GB ram, quad core cpu

 I do not know the internal details of term indexing and their counts for
 faceting.

 Any suggestion for speeding up this query is appreciated.

 Thanks in advance.

 --
 Furkan Kuru

Re: OverlappingFileLockException when using str name=replicateAfterstartup/str

2010-06-04 Thread rabahb


Hi Guys,

I'm experiencing the same issue with a single war. I'm using a brand new
Solr war built from yestertay's version of the trunk. 

I've got one master with 2 cores and one slave with a single core. I'm using
one core from master as the master of the second core (which is configured
as a repeater). So that, the slave's core can poll the repeater for index
changes. 

( I was using solr 1.4, but experienced some issues with replication. While
rebuilding the index on the one master core, the new index was not
replicated succesfully to the other master core. Files were copied over but
the final commit failed on the snappuller. But sometimes, while restarting
the master, the replication would work fine between  master cores, then no
replication would be successful from master to slave core. I had the same
issue as described here: https://issues.apache.org/jira/browse/SOLR-1769 .
Which seems to be fixed in the trunk.

So I moved on to the trunk version of solr in order to tests the fix. This
seems to work better. As master cores replication works fine. But I've got a
weird behavior on slave. The index replication is successful only the second
time the slave is trying to get it even if for each replication trial, slave
spits out the following Exception (see below).

There seems to be a concurrrency issue but I don't quite undestand where the
concurrency is really happening. Can you please help on that issue?

org.apache.solr.common.SolrException: Index fetch failed :
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecu
 
tor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExec
 
utor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.jav
 
a:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.nio.channels.OverlappingFileLockException
at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1170)
at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1072)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:878)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
at
org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:260)
at org.apache.lucene.store.Lock.obtain(Lock.java:72)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1061)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:950)
at
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:192)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at
org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
... 11 more
 




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/OverlappingFileLockException-when-using-str-name-replicateAfter-startup-str-tp488686p870589.html
Sent from the Solr - User mailing list archive at Nabble.com.

String Sort Nor Working

2010-06-04 Thread Patrick Wilson

All,
I am trying to sort on a text field and can't get it to work. I try sorting on 
sortTitle and get no errors, it just doesn't appear to sort. The pertinent 
parts of my schema:

fieldType
name=text
class=solr.TextField
positionIncrementGap=100
... lots of filters that do work...
/fieldType
fieldType
name=sortString
class=solr.TextField
sortMissingLast=true
omitNorms=true
analyzer
tokenizer

class=solr.KeywordTokenizerFactory /
filter

class=solr.LowerCaseFilterFactory /
filter
class=solr.TrimFilterFactory 
/
/analyzer
/fieldType

field name=title type=text indexed=true stored=true termVectors=true 
/

field name=sortTitle type=sortString indexed=true stored=true /
copyfield source=title dest=sortTitle /

I set stored=true on the sort field so I could see if anything was getting 
copied there, and it would appear that this is not the case. I don't see any 
top 10 summaries like I do for other fiends, including another field 
populated by copyField. Is this just because of the filters I am using?

I'm sure this horse has or similar horses have been beaten to death before, but 
I'm new to this mailing list, so sorry about that. Any help is greatly 
appreciated!

Thanks,
Patrick

Re: Faceted Search Slows Down as index gets larger

2010-06-04 Thread Furkan Kuru

I am using 1.4 version.

I have tried your suggestion,

it takes around 25-30 seconds now.

Thank you,


On Fri, Jun 4, 2010 at 5:54 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 Faceting on a full-text field is hard.
 What version of Solr are you using?

 If it's 1.4 or later, try setting
 facet.method=enum

 And to use the filterCache less, try
 facet.enum.cache.minDf=100

 -Yonik
 http://www.lucidimagination.com

 On Fri, Jun 4, 2010 at 10:31 AM, Furkan Kuru furkank...@gmail.com wrote:
  Hello,
 
  I have been dealing with real-time data.
 
  As the number of total indexed documents gets larger (now 5 M)
 
  a faceted search on a text field limited by the creation time, which we
 use
  to find the most used word in all these text fields, gets slow down.
 
 
  query string: created_time:[NOW-1HOUR TO NOW] facet.field=text
  facet.mincount=1
 
  the document count matching the query is around 9000.
 
 
  It takes around 80 seconds in a decent computer with 4GB ram, quad core
 cpu
 
  I do not know the internal details of term indexing and their counts for
  faceting.
 
  Any suggestion for speeding up this query is appreciated.
 
  Thanks in advance.
 
  --
  Furkan Kuru
 




-- 
Furkan Kuru

RE: index growing with updates

2010-06-04 Thread Nagelberg, Kallin

Ok so I think that Solr (lucene) will only remove deleted/updated documents 
from the disk after an optimize or after an 'expungeDeletes' request. Is there 
a way to trigger the expunsion (new word) across the entire index? I tried :

final UpdateRequest request = new UpdateRequest()
request.setParam(expungeDeletes,true);
request.add someofmydocs
server.sendrequest..


But that didn't seem to do the trick as I know I have about 7 Gigs of documents 
that should be removed from the disk and the index size hasn't really budged.

Any ideas?

Thanks,
Kallin Nagelberg





-Original Message-
From: Nagelberg, Kallin 
Sent: Thursday, June 03, 2010 1:36 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: index growing with updates

Is there a way to trigger a purge, or under what conditions does it occur?

-Kallin Nagelberg

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, June 03, 2010 12:40 PM
To: solr-user@lucene.apache.org
Subject: Re: index growing with updates

Assuming your config is set up to replace unique keys, you're really
doing a delete and an add (under the covers). It could very well be that
the deleted version of the document is still in your index taking up
space and will be until it is purged.

HTH
Erick

On Thu, Jun 3, 2010 at 10:22 AM, Nagelberg, Kallin 
knagelb...@globeandmail.com wrote:

 Hey,

 If I add a document to the index that already exists (same uniquekey) what
 is the expected behavior? I would imagine that if the document is the same
 then the index should not grow, but mine appears to be growing. Any ideas?

 Thanks,
 -Kallin Nagelberg

Re: Highlighting a field with a certain value

2010-06-04 Thread Koji Sekiguchi


(10/05/25 0:31), n...@frameweld.com wrote:

Hello,

How am I able to highlight a field that contains a specific value? If I have a field 
called type, how am I able to highlight the rows whose values contain something like 
title?
   

http://localhost:8983/solr/select?q=titlehl=onhl.fl=type

Koji

--
http://www.rondhuit.com/en/

Re: String Sort Nor Working

2010-06-04 Thread Ahmet Arslan

 copyfield source=title dest=sortTitle /
 

Simple lowercase F is causing this. It should be copyField

RE: String Sort Nor Working

2010-06-04 Thread Patrick Wilson

That did it. Thank you =)

P.S. Might it be helpful for Solr to complain about invalid XML during startup? 
Does it do this and I'm just not noticing?

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Friday, June 04, 2010 12:18 PM
To: solr-user@lucene.apache.org
Subject: Re: String Sort Nor Working

 copyfield source=title dest=sortTitle /

Simple lowercase F is causing this. It should be copyField

Need help to install Solr on JBoss

2010-06-04 Thread Bondiga, Murali

I installed Solr on my local machine and it works fine with Jetty. I am trying 
to install on JBoss which is running on a Sun Solaris box and I have the 
following questions:


 1.  Do I need to copy the entire example folder from my local machine to Solr 
home on Sun Solaris box?
 2.  How can I have multiple cores on the Sun Solaris box?

Any help is appreciated.

Thanks,
Murali

RE: String Sort Nor Working

2010-06-04 Thread Ahmet Arslan

 P.S. Might it be helpful for Solr to complain about invalid
 XML during startup? Does it do this and I'm just not
 noticing?

Chris's explanation about a similar topic:
http://search-lucene.com/m/11JWX1hxL4u/

RE: String Sort Nor Working

2010-06-04 Thread Patrick Wilson

Very informative - thank you!

I think it might be useful to have this feature - maybe have an interface for 
plugins to register a XSD or otherwise declare its expected xml elements and 
attributes. I'm not sure if there's enough demand for this to justify the time 
it would take to make this change though. Just a thought.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Friday, June 04, 2010 1:41 PM
To: solr-user@lucene.apache.org
Subject: RE: String Sort Nor Working

 P.S. Might it be helpful for Solr to complain about invalid
 XML during startup? Does it do this and I'm just not
 noticing?

Chris's explanation about a similar topic:
http://search-lucene.com/m/11JWX1hxL4u/

conditional Document Boost

2010-06-04 Thread MitchK


Hello out there,

I am searching for a solution for conditional Document Boosting.
During analyzing the fields of a document, I want to create a document boost
based on some metrics.

There are two approaches:
First: I preprocess the data. The main problem with this is, that I need to
take care about the preprocessing-part and I can't do it out of the box
(implementing an analyzer,  compute the boosting value and afterwards store
those values or send them to solr.).

Second: Using the UpdateRequestProcessor (does it work with DIH?). However,
the problem would also be custom work and taking care that the used params
are up-to-date. 

Third: Setting the Document Boost while analyzing-process is running with
the help of a TokenFilter  (is this possible?).

What would you do?


I think what I want to do is quite the same as working with Mahout and Solr.
I never worked with Mahout - but how can I use it to improve the user's
search-experience? 
Where can I use Mahout in Solr, if I want to influence document's boosts?
And where in general (i.e. for classification).

References, ideas and whatever could be useful are welcome :-).

Thank you.

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/conditional-Document-Boost-tp871108p871108.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

2010-06-04 Thread Brad Greenlee

You are my hero. I replaced the Tika 0.8 snapshots that were included with Solr 
with 0.6 and it works now. Thank you!

Brad

On Jun 3, 2010, at 6:22 AM, David George wrote:

 
 Which version of Tika do you have? There was a problem introduced somewhere
 between Tika 0.6 and Tika 0.7 whereby the TikaConfig method
 config.getParsers() was returns an empty parser list due to class loader
 scope issues with Solr running under an application server.
 
 There is a fix in the Tika 0.8 branch and I note that a 0.8 snapshot of Tika
 is including in the Solr trunk. I've not tried to get this to work and am
 not sure what config is needed to make this work. I simply installed Tika
 0.6 which can be dowloaded from the apache tika website.
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p867572.html
 Sent from the Solr - User mailing list archive at Nabble.com.

RE: index growing with updates

2010-06-04 Thread Chris Hostetter


: Ok so I think that Solr (lucene) will only remove deleted/updated 
: documents from the disk after an optimize or after an 'expungeDeletes' 
: request. Is there a way to trigger the expunsion (new word) across the 
: entire index? I tried :

deletes are removed when segments are merged -- an optimize merges all 
segments, so it forcibley removes all deleted docs, but regular merges as 
documents are added/updated will clean things up periodicly -- so if you 
have a fixed set of documents that you keep updating over and over your 
index size will not grow with out bounds -- it will ossilate between a min 
(completely optimized) and a max (lots of segments with lots of deletions 
just about to be merged)



-Hoss

Range query on long value

2010-06-04 Thread David


Hi,

I have an issue with range queries on a long value in our dataset (the 
dataset is fairly large, but i believe the problem still exists for 
smaller datasets).  When i query the index with a range, as such: id:[1 
TO 2000], I get values back that are well outside that range.  Its as if 
the range query is ignoring the values and doing something like id:[* TO 
*]. We are running Solr 1.3.  The value is set as the unique key for the 
index.


Our schema is similar to this:

field name=id type=long indexed=true stored=true required=true /
field name=field_1 type=slong indexed=true stored=false 
required=true /
field name=field_2 type=long indexed=true stored=false 
required=false /
field name=field_3 type=long indexed=true stored=false 
required=false /

.
.
.
field name=field_n type=long indexed=true stored=true 
required=false /


uniqueKeyid/uniqueKey


Has anyone else had this problem?  If so, how did you correct it?  
Thanks in advance.

Need help with document format

2010-06-04 Thread Moazzam Khan

Hi guys,


I have a list of consultants and the users (people who work for the
company) are supposed to be able to search for consultants based on
the time frame they worked for, for a company. For example, I should
be able to search for all consultants who worked for Bear Stearns in
the month of july. What is the best of accomplishing this?

I was thinking of formatting the document like this

company
   name Bear Stearns/name
   startDate2000-01-01/startDate
   endDatepresent/endDate
/company
company
   name AIG/name
   startDate1999-01-01/startDate
   endDate2000-01-01/endDate
/company

Is this possible?

Thanks,

Moazzam

Re: Need help to install Solr on JBoss

2010-06-04 Thread Juan Pedro


Check the wiki


1.  Do I need to copy the entire example folder from my local machine to Solr 
home on Sun Solaris box?


http://wiki.apache.org/solr/SolrJBoss


2.  How can I have multiple cores on the Sun Solaris box?


http://wiki.apache.org/solr/CoreAdmin


Regards

Juan

www.linebee.com



Bondiga, Murali wrote:

I installed Solr on my local machine and it works fine with Jetty. I am trying 
to install on JBoss which is running on a Sun Solaris box and I have the 
following questions:


 1.  Do I need to copy the entire example folder from my local machine to Solr 
home on Sun Solaris box?
 2.  How can I have multiple cores on the Sun Solaris box?

Any help is appreciated.

Thanks,
Murali

Index-time vs. search-time boosting performance

2010-06-04 Thread Asif Rahman

Hi,

What are the performance ramifications for using a function-based boost at
search time (through bf in dismax parser) versus an index-time boost?
Currently I'm using boost functions on a 15GB index of ~14mm documents.  Our
queries generally match many thousands of documents.  I'm wondering if I
would see a performance improvement by switching over to index-time
boosting.

Thanks,

Asif

-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com

Re: Range query on long value

2010-06-04 Thread Ahmet Arslan


 I have an issue with range queries on a long value in our
 dataset (the dataset is fairly large, but i believe the
 problem still exists for smaller datasets).  When i
 query the index with a range, as such: id:[1 TO 2000], I get
 values back that are well outside that range.  Its as
 if the range query is ignoring the values and doing
 something like id:[* TO *]. We are running Solr 1.3. 
 The value is set as the unique key for the index.
 
 Our schema is similar to this:
 
 field name=id type=long indexed=true
 stored=true required=true /
 field name=field_1 type=slong indexed=true
 stored=false required=true /
 field name=field_2 type=long indexed=true
 stored=false required=false /
 field name=field_3 type=long indexed=true
 stored=false required=false /

You need to use sortable double in solr 1.3.0 type=slong for range queries to 
work correctly. Default schema.xml has an explanation about sortable (sint etc) 
types.

Re: general debugging techniques?

2010-06-04 Thread Chris Hostetter


: to format the data from my sources.  I can read through the catalina
: log, but this seems to just log requests; not much info is given about
: errors or when the service hangs.  Here are some examples:

if you are only seeing one log line per request, then you are just looking 
at the request log ... there should be more logs with messages from all 
over the code base with various levels of severity -- and using standard 
java log level controls you can turn these up/down for various components.

: Although I am keeping document size under 5MB, I regularly see
: SEVERE: java.lang.OutOfMemoryError: Java heap space errors.  How can
: I find what component had this problem?

that's one of java's most anoying problems -- even if you have the full 
stack trace of the OOM, that just tells you which code path was hte straw 
that broke the camels back -- it doesn't tell you where all your memory 
was being used.  for that you really need to use a java profiler, or turn 
on heap dumps and use a heap dump analyzer after the OOM occurs.

: After the above error, I often see this followup error on the next
: document: SEVERE: org.apache.lucene.store.LockObtainFailedException:
: Lock obtain timed out: NativeFSLock@/var/lib/solr/data/
: index/lucene-d6f7b3bf6fe64f362b4d45bfd4924f54-write.lock .  This has
: a backtrace, so I could dive directly into the code.  Is this the best
: way to track down the problem, or are there debugging settings that
: could help show why the lock is being held elsewhere?

probably not -- after an OOM, most java apps are just screwed in general 
after an OOM (or any other low level error).

: I attempted to turn on indexing logging with the line
: 
: infoStream file=INFOSTREAM.txttrue/infoStream
: 
: but I can't seem to find this file in either the tomacat or the index 
directory.

it will probably be in whatever the Current Working Directory (CWD) is -- 
assuming the file permissions allow writting to it.  the top of the Solr 
admin screen tells you what the CWD is in case it's not clear from how 
your servlet container is run.


-Hoss

RE: general debugging techniques?

2010-06-04 Thread Chris Hostetter


: That is still really small for 5MB documents. I think the default solr 
: document cache is 512 items, so you would need at least 3 GB of memory 
: if you didn't change that and the cache filled up.

that assumes that the extracted text tika extracts from each document is 
the same size as the original raw files *and* that he's configured that 
content field to be stored ... in practice if you only stored=true the 
summary fields (title, author, short summary, etc...) the document cache 
isn't going to be nearly that big (and even if you do store the entire 
content field, the plain text is usually *much* msaller then the binary 
source file)

: -Xmx128M - my understanding is that this bumps heap size to 128M.

FWIW: depending on how many docs you are indexing, and wether you want to 
support things like faceting that rely on building in memory caches to be 
fast, 128MB is really, really, really small for a typical Solr instance.

Even on a box that is only doing indexing (no queries) i would imagine 
Tika likes to have a lot of ram when doing extraction (most doc types are 
gong to require the raw binary data is entirely in the heap, plus all hte 
extracted Strings, plus all of the connecting objects to build the DOM, 
etc  And that's before you even start thinking about Solr  Lucene and 
the index itself.

-Hoss

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Erick Erickson

Index time boosting is different than search time boosting, so
asking about performance is irrelevant.

Paraphrasing Hossman from years ago on the Lucene list (from
memory).

...index time boosting is a way of saying this documents'
title is more important than other documents' titles. Search
time boosting is a way of saying I care about documents
whose titles contain this term more than other documents
whose titles may match other parts of this query

HTH
Erick

On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman a...@newscred.com wrote:

 Hi,

 What are the performance ramifications for using a function-based boost at
 search time (through bf in dismax parser) versus an index-time boost?
 Currently I'm using boost functions on a 15GB index of ~14mm documents.
  Our
 queries generally match many thousands of documents.  I'm wondering if I
 would see a performance improvement by switching over to index-time
 boosting.

 Thanks,

 Asif

 --
 Asif Rahman
 Lead Engineer - NewsCred
 a...@newscred.com
 http://platform.newscred.com

Re: Faceted Search Slows Down as index gets larger

2010-06-04 Thread Andy

Yonik,

Just curious why does using enum improve the facet performance. 

Furkan was faceting on a text field with each word being a facet value. I'd 
imagine that'd mean there's a large number of facet values. According to the 
documentation (http://wiki.apache.org/solr/SimpleFacetParameters#facet.method) 
facet.method=fc is faster when a field has many unique terms. So how come enum, 
not fc, is faster in this case?

Also why use filterCache less?

Thanks
Andy

--- On Fri, 6/4/10, Furkan Kuru furkank...@gmail.com wrote:

 From: Furkan Kuru furkank...@gmail.com
 Subject: Re: Faceted Search Slows Down as index gets larger
 To: solr-user@lucene.apache.org, yo...@lucidimagination.com
 Date: Friday, June 4, 2010, 11:25 AM
 I am using 1.4 version.
 
 I have tried your suggestion,
 
 it takes around 25-30 seconds now.
 
 Thank you,
 
 
 On Fri, Jun 4, 2010 at 5:54 PM, Yonik Seeley 
 yo...@lucidimagination.comwrote:
 
  Faceting on a full-text field is hard.
  What version of Solr are you using?
 
  If it's 1.4 or later, try setting
  facet.method=enum
 
  And to use the filterCache less, try
  facet.enum.cache.minDf=100
 
  -Yonik
  http://www.lucidimagination.com
 
  On Fri, Jun 4, 2010 at 10:31 AM, Furkan Kuru furkank...@gmail.com
 wrote:
   Hello,
  
   I have been dealing with real-time data.
  
   As the number of total indexed documents gets
 larger (now 5 M)
  
   a faceted search on a text field limited by the
 creation time, which we
  use
   to find the most used word in all these text
 fields, gets slow down.
  
  
   query string: created_time:[NOW-1HOUR TO NOW]
 facet.field=text
   facet.mincount=1
  
   the document count matching the query is around
 9000.
  
  
   It takes around 80 seconds in a decent computer
 with 4GB ram, quad core
  cpu
  
   I do not know the internal details of term
 indexing and their counts for
   faceting.
  
   Any suggestion for speeding up this query is
 appreciated.
  
   Thanks in advance.
  
   --
   Furkan Kuru
  
 
 
 
 
 -- 
 Furkan Kuru

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Asif Rahman

Perhaps I should have been more specific in my initial post.  I'm doing
date-based boosting on the documents in my index, so as to assign a higher
score to more recent documents.  Currently I'm using a boost function to
achieve this.  I'm wondering if there would be a performance improvement if
instead of using the boost function at search time, I indexed the documents
with a date-based boost.

On Fri, Jun 4, 2010 at 7:30 PM, Erick Erickson erickerick...@gmail.comwrote:

 Index time boosting is different than search time boosting, so
 asking about performance is irrelevant.

 Paraphrasing Hossman from years ago on the Lucene list (from
 memory).

 ...index time boosting is a way of saying this documents'
 title is more important than other documents' titles. Search
 time boosting is a way of saying I care about documents
 whose titles contain this term more than other documents
 whose titles may match other parts of this query

 HTH
 Erick

 On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman a...@newscred.com wrote:

  Hi,
 
  What are the performance ramifications for using a function-based boost
 at
  search time (through bf in dismax parser) versus an index-time boost?
  Currently I'm using boost functions on a 15GB index of ~14mm documents.
   Our
  queries generally match many thousands of documents.  I'm wondering if I
  would see a performance improvement by switching over to index-time
  boosting.
 
  Thanks,
 
  Asif
 
  --
  Asif Rahman
  Lead Engineer - NewsCred
  a...@newscred.com
  http://platform.newscred.com
 




-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com

Re: Help with Shingled queries

2010-06-04 Thread Robert Muir

the queryparser first splits on whitespace.

so each individual word of your query: short,red,evil,fox gets its own
tokenstream, and therefore isn't shingled.

On Fri, Jun 4, 2010 at 6:21 PM, Greg Bowyer gbow...@shopzilla.com wrote:

 Hi all

 Interesting and by the looks of things very solid project you have here
 with
 SOLR, however ..

 I have an index that contains a large number of phrases that I need to
 search
 for over, each of these phrases is fairly small being on average about 4
 words
 long.

 The search terms that I am given to search these phrases are very long, and
 quite arbitrary, sometimes the search terms will be up to 25 words long.

 As such the performance of my index when built naively is sporadic
 sometimes
 searches are very fast on average they are somewhat slower.

 I have attempted to improve this situation by using shingling for the
 phrases
 and the related search queries, in my schema I have the following


fieldType name=bigramed_phrase class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory outputUnigrams=true
 outputUnigramIfNoNgram=true /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory outputUnigrams=false
 outputUnigramIfNoNgram=true /
  /analyzer
/fieldType

 In the indexes, as seen with luke I do indeed have a large range of
 shingled
 terms.

 When I run the analyser for either query or index terms I also see the
 breakdown
 with the shingled terms correctly displayed.

 However when I attempt to use this in a query I do not see the terms
 applied in
 the debug output, for example with the term short red evil fox I would
 expect
 to see the shingles
 'short_red' 'red_evil' 'evil_fox'

 but instead I get the following

 debug:{
  rawquerystring:short red evil fox,
  querystring:short red evil fox,
  parsedquery:+() (),
  parsedquery_toString:+() (),
  explain:{},
  QParser:DisMaxQParser,
  altquerystring:null,
  boostfuncs:null,
  filter_queries:[atomId:(8235 10914 10911 )],
  parsed_filter_queries:[atomId:8235 atomId:10914 atomId:10911],
  timing:{ ..

 Does anyone know what I could be doing wrong here, is it a bug in the debug
 output, a stupid mistake misconception or piece of idiocy on my part or
 something else.


 Many thanks

 -- Greg Bowyer





-- 
Robert Muir
rcm...@gmail.com

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Jay Hill

I've done a lot of recency boosting to documents, and I'm wondering why you
would want to do that at index time. If you are continuously indexing new
documents, what was recent when it was indexed becomes, over time less
recent. Are you unsatisfied with your current performance with the boost
function? Query-time recency boosting is a fairly common thing to do, and,
if done correctly, shouldn't be a performance concern.

-Jay
http://lucidimagination.com


On Fri, Jun 4, 2010 at 4:50 PM, Asif Rahman a...@newscred.com wrote:

 Perhaps I should have been more specific in my initial post.  I'm doing
 date-based boosting on the documents in my index, so as to assign a higher
 score to more recent documents.  Currently I'm using a boost function to
 achieve this.  I'm wondering if there would be a performance improvement if
 instead of using the boost function at search time, I indexed the documents
 with a date-based boost.

 On Fri, Jun 4, 2010 at 7:30 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Index time boosting is different than search time boosting, so
  asking about performance is irrelevant.
 
  Paraphrasing Hossman from years ago on the Lucene list (from
  memory).
 
  ...index time boosting is a way of saying this documents'
  title is more important than other documents' titles. Search
  time boosting is a way of saying I care about documents
  whose titles contain this term more than other documents
  whose titles may match other parts of this query
 
  HTH
  Erick
 
  On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman a...@newscred.com wrote:
 
   Hi,
  
   What are the performance ramifications for using a function-based boost
  at
   search time (through bf in dismax parser) versus an index-time boost?
   Currently I'm using boost functions on a 15GB index of ~14mm documents.
Our
   queries generally match many thousands of documents.  I'm wondering if
 I
   would see a performance improvement by switching over to index-time
   boosting.
  
   Thanks,
  
   Asif
  
   --
   Asif Rahman
   Lead Engineer - NewsCred
   a...@newscred.com
   http://platform.newscred.com
  
 



 --
 Asif Rahman
 Lead Engineer - NewsCred
 a...@newscred.com
 http://platform.newscred.com

Re: Faceted Search Slows Down as index gets larger

2010-06-04 Thread Yonik Seeley

On Fri, Jun 4, 2010 at 7:33 PM, Andy angelf...@yahoo.com wrote:
 Yonik,

 Just curious why does using enum improve the facet performance.

 Furkan was faceting on a text field with each word being a facet value. I'd 
 imagine that'd mean there's a large number of facet values. According to the 
 documentation 
 (http://wiki.apache.org/solr/SimpleFacetParameters#facet.method) 
 facet.method=fc is faster when a field has many unique terms. So how come 
 enum, not fc, is faster in this case?

facet.method=fc is faster when there are many unique terms, and
relatively few terms per document.  A full-text field doesn't fit that
bill.

 Also why use filterCache less?

Take sup a lot of memory.

-Yonik
http://www.lucidimagination.com

Re: Does SolrJ support nested annotated beans?

2010-06-04 Thread Thomas J. Buhr

+1

Good question, my use of Solr would benefit from nested annotated beans as well.

Awaiting the reply,

Thom


On 2010-06-03, at 1:35 PM, Peter Hanning wrote:

 
 When modeling documents with a lot of fields (hundreds) the bean class used
 with SolrJ to interact with the Solr index tends to get really big and
 unwieldy. I was hoping that it would be possible to extract groups of
 properties into nested beans and move the @Field annotations along.
 
 Basically, I want to refactor something like the following:
 
  // Imports have been omitted for this example.
  public class TheBigOne
  {
@Field(UniqueKey)
private String uniqueKey;
@Field(Name_en)
private String name_en;
@Field(Name_es)
private String name_es;
@Field(Name_fr)
private String name_fr;
@Field(Category)
private String category;
@Field(Color)
private String color;
// Additional properties, getters and setters have been omitted for this
 example.
  }
 
 into something like the following:
 
  // Imports have been omitted for this example.
  public class TheBigOne
  {
@Field(UniqueKey)
private String uniqueKey;
private Names names = new Names();
private Classification classification = new Classification();
// Additional properties, getters and setters have been omitted for this
 example.
  }
 
  // Imports have been omitted for this example.
  public class Names
  {
@Field(Name_en)
private String name_en;
@Field(Name_es)
private String name_es;
@Field(Name_fr)
private String name_fr;
// Additional properties, getters and setters have been omitted for this
 example.
  }
 
  // Imports have been omitted for this example.
  public class Classification
  {
@Field(Category)
private String category;
@Field(Color)
private String color;
// Additional properties, getters and setters have been omitted for this
 example.
  }
 
 This did not work however as the DocumentObjectBinder does not seem to walk
 the nested object graph. Am I doing something wrong, or is this not
 supported?
 
 I see JIRA tickets 1129 and 1357 could alleviate this issue somewhat for the
 Name* fields once 1.5 comes out. Still, it would be great to be able to nest
 beans without using dynamic names in the field annotations like in the
 Classification example above.
 
 
 As a quick and naive test I tried to change the DocumentObjectBinder's
 collectInfo method to something like the following:
 
  private ListDocField collectInfo(Class clazz) {
ListDocField fields = new ArrayListDocField();
Class superClazz = clazz;
ArrayListAccessibleObject members = new ArrayListAccessibleObject();
while (superClazz != null  superClazz != Object.class) {
  members.addAll(Arrays.asList(superClazz.getDeclaredFields()));
  members.addAll(Arrays.asList(superClazz.getDeclaredMethods()));
  superClazz = superClazz.getSuperclass();
}
for (AccessibleObject member : members) {
  if (member.isAnnotationPresent(Field.class)) {
member.setAccessible(true);
fields.add(new DocField(member));
  } // BEGIN changes
  else { // A quick test supporting only Field, not Method and others
if (member instanceof java.lang.reflect.Field) {
  java.lang.reflect.Field field = (java.lang.reflect.Field) member;
  fields.addAll(collectInfo(field.getType()));
}
  } // END changes
}
return fields;
  }
 
 This worked in that SolrJ started walking down into nested beans, checking
 for and handling @Field annotations in the nested beans. However, when
 trying to retrieve the values of the fields in the nested beans, SolrJ still
 tried to look for them in the main bean as far as I can tell.
 
 ERROR 2010-06-02 09:28:35,326 (main) () (SolrIndexer.java:335 main) -
 Exception encountered:
 java.lang.RuntimeException: Exception while getting value: private
 java.lang.String Names.Name_en
at
 org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.get(DocumentObjectBinder.java:377)
at
 org.apache.solr.client.solrj.beans.DocumentObjectBinder.toSolrInputDocument(DocumentObjectBinder.java:71)
at
 org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:56)
 ...
 Caused by: java.lang.IllegalArgumentException: Can not set java.lang.String
 field Names.Name_en to TheBigOne
at
 sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:146)
at
 sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:150)
at
 sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:37)
at
 sun.reflect.UnsafeObjectFieldAccessorImpl.get(UnsafeObjectFieldAccessorImpl.java:18)
at java.lang.reflect.Field.get(Field.java:358)
at
 org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.get(DocumentObjectBinder.java:374)
... 7 more
 
 My conclusion is

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Asif Rahman

It seems like it would be far more efficient to calculate the boost factor
once and store it rather than calculating it for each request in real-time.
Some of our queries match tens of thousands if not hundreds of thousands of
documents in a 15GB index.  However, I'm not well-versed in lucene internals
so I may be misunderstanding what is going on here.


On Fri, Jun 4, 2010 at 8:31 PM, Jay Hill jayallenh...@gmail.com wrote:

 I've done a lot of recency boosting to documents, and I'm wondering why you
 would want to do that at index time. If you are continuously indexing new
 documents, what was recent when it was indexed becomes, over time less
 recent. Are you unsatisfied with your current performance with the boost
 function? Query-time recency boosting is a fairly common thing to do, and,
 if done correctly, shouldn't be a performance concern.

 -Jay
 http://lucidimagination.com


 On Fri, Jun 4, 2010 at 4:50 PM, Asif Rahman a...@newscred.com wrote:

  Perhaps I should have been more specific in my initial post.  I'm doing
  date-based boosting on the documents in my index, so as to assign a
 higher
  score to more recent documents.  Currently I'm using a boost function to
  achieve this.  I'm wondering if there would be a performance improvement
 if
  instead of using the boost function at search time, I indexed the
 documents
  with a date-based boost.
 
  On Fri, Jun 4, 2010 at 7:30 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Index time boosting is different than search time boosting, so
   asking about performance is irrelevant.
  
   Paraphrasing Hossman from years ago on the Lucene list (from
   memory).
  
   ...index time boosting is a way of saying this documents'
   title is more important than other documents' titles. Search
   time boosting is a way of saying I care about documents
   whose titles contain this term more than other documents
   whose titles may match other parts of this query
  
   HTH
   Erick
  
   On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman a...@newscred.com wrote:
  
Hi,
   
What are the performance ramifications for using a function-based
 boost
   at
search time (through bf in dismax parser) versus an index-time boost?
Currently I'm using boost functions on a 15GB index of ~14mm
 documents.
 Our
queries generally match many thousands of documents.  I'm wondering
 if
  I
would see a performance improvement by switching over to index-time
boosting.
   
Thanks,
   
Asif
   
--
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com
   
  
 
 
 
  --
  Asif Rahman
  Lead Engineer - NewsCred
  a...@newscred.com
  http://platform.newscred.com
 




-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com

RE: Index-time vs. search-time boosting performance

2010-06-04 Thread Jonathan Rochkind

The SolrRelevancyFAQ does suggest that both index-time and search-time boosting 
can be used to boost the score of newer documents, but doesn't suggest what 
reasons/contexts one might choose one vs the other.  It only provides an 
example of search-time boost though, so it doesn't answer the question of how 
to do an index time boost, if that was a question. 

http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

Sorry, this doesn't answer your question, but does contribute the fact that 
some author of the FAQ at some point considered index-time boost not 
neccesarily unreasonable. 

From: Asif Rahman [a...@newscred.com]
Sent: Friday, June 04, 2010 11:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Index-time vs. search-time boosting performance

It seems like it would be far more efficient to calculate the boost factor
once and store it rather than calculating it for each request in real-time.
Some of our queries match tens of thousands if not hundreds of thousands of
documents in a 15GB index.  However, I'm not well-versed in lucene internals
so I may be misunderstanding what is going on here.


On Fri, Jun 4, 2010 at 8:31 PM, Jay Hill jayallenh...@gmail.com wrote:

 I've done a lot of recency boosting to documents, and I'm wondering why you
 would want to do that at index time. If you are continuously indexing new
 documents, what was recent when it was indexed becomes, over time less
 recent. Are you unsatisfied with your current performance with the boost
 function? Query-time recency boosting is a fairly common thing to do, and,
 if done correctly, shouldn't be a performance concern.

 -Jay
 http://lucidimagination.com


 On Fri, Jun 4, 2010 at 4:50 PM, Asif Rahman a...@newscred.com wrote:

  Perhaps I should have been more specific in my initial post.  I'm doing
  date-based boosting on the documents in my index, so as to assign a
 higher
  score to more recent documents.  Currently I'm using a boost function to
  achieve this.  I'm wondering if there would be a performance improvement
 if
  instead of using the boost function at search time, I indexed the
 documents
  with a date-based boost.
 
  On Fri, Jun 4, 2010 at 7:30 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Index time boosting is different than search time boosting, so
   asking about performance is irrelevant.
  
   Paraphrasing Hossman from years ago on the Lucene list (from
   memory).
  
   ...index time boosting is a way of saying this documents'
   title is more important than other documents' titles. Search
   time boosting is a way of saying I care about documents
   whose titles contain this term more than other documents
   whose titles may match other parts of this query
  
   HTH
   Erick
  
   On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman a...@newscred.com wrote:
  
Hi,
   
What are the performance ramifications for using a function-based
 boost
   at
search time (through bf in dismax parser) versus an index-time boost?
Currently I'm using boost functions on a 15GB index of ~14mm
 documents.
 Our
queries generally match many thousands of documents.  I'm wondering
 if
  I
would see a performance improvement by switching over to index-time
boosting.
   
Thanks,
   
Asif
   
--
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com
   
  
 
 
 
  --
  Asif Rahman
  Lead Engineer - NewsCred
  a...@newscred.com
  http://platform.newscred.com
 




--
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com

41 matches

Mail list logo