Re: Retrieve time of last optimize

2010-04-23 Thread Jon Baer
I don't think there is anything low level in Lucene that will specifically 
output anything like lastOptimized() to you, since it can be setup a few ways.  

Your best bet is probably adding a postOptimize hook and dumping it to log / 
file / monitor / etc, probably something like ...

listener event=postCommit class=solr.RunExecutableListener
  str name=exelastOptimize.sh/str
  str name=dirsolr/bin/str
  bool name=waittrue/bool
/listener
 
Or writing to a file and reading it back into the admin if you need to display 
it there.

More @ http://wiki.apache.org/solr/SolrConfigXml#Update_Handler_Section

- Jon

On Apr 22, 2010, at 11:16 AM, Shawn Heisey wrote:

 On 4/21/2010 1:24 PM, Shawn Heisey wrote:
 Is it possible to issue some kind of query to a Solr core that will return 
 the last time the index was optimized?  Every day, one of my shards should 
 get optimized, so I would like my monitoring system to tell me when the 
 newest optimize date is more than 24 hours ago.  I could not find a way to 
 get this.  The /admin/cores page has a lot of other useful information, but 
 not that particular piece.
 
 I have found some other useful information on the stats.jsp page, like the 
 number of segments, the size of the index on disk, and so on.  Still have not 
 been able to locate the last optimize date, which would simply be the 
 timestamp on the earliest disk segment.
 
 Thanks,
 Shawn
 



Re: Solr full-import not working as expected

2010-04-23 Thread MitchK

Saratv,

is there any unique-ID (defined in your schema.xml) that may be duplicate?

- Mitch


saratv wrote:
 
 I am trying to use DIH (where database has around 93k rows..from different
 tables), and when i ran full import few times, only 91k documents were
 indexed (not sure why and what documents were unindexed)..is there a way
 to find what went wrong as i am unable to see any errors in log files.
 Also  is there a way to fix the problem and get all of those 93k docs
 (also i checked the database and saw there are no duplicates). Please 
 respond me if anyone has seen a similar behaviour. Appreciate your input.
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-full-import-not-working-as-expected-tp744937p745102.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to prevent this search lockup (apparently caused during big segment merges)?

2010-04-23 Thread Michael McCandless
I don't know much about how Solr does its locking, so I'm guessing below:

It looks like one thread is doing a commit, by closing the writer, and
is likely holding a lock that prevents other (add/delete) ops from
running? Probably this lock is held because the writer is in the
process of being closed, and on close, the write waits for running
merges to complete, so it can take a very long time if a large merge
is running.

And then your while loop keeps using up another of the 200 threads,
blocking on the add/delete request.

I think Solr could, instead, call IndexWriter.finishMerges, without
holding the lock, and then perhaps IndexWriter.close(false), which
would be fast (ie, aborts any running merges, for the race condition
where another merge just started after finishMerges and before close).
 Alternatively, Solr could call IndexWriter.commit, not
IndexWriter.close, and not hold the lock that prevents add/deletes
(but maybe there are other reasons why the IW must be closed?).

Maybe Solr should also have a way to restrict the max # threads to be
used for pending add/delete ops, so that there are always thread free
in the app server's pool for searching?

Or... maybe you could drastically increase the timeout on your client
side HTTP connections?  Or, is there some way to check how many
threads are tied up in Solr and block your add/delete requests when
this gets too large...?

Mike

On Thu, Apr 22, 2010 at 6:28 PM, Chris Harris rygu...@gmail.com wrote:
 I'm running Solr 1.4+ under Tomcat 6, with indexing and searching
 requests simultaneously hitting the same Solr machine. Sometimes Solr,
 Tomcat, and my (C#) indexing process conspire to render search
 inoperable. So far I've only noticed this while big segment merges
 (i.e. merges that take multiple minutes) are taking place.

 Let me explain the situation as best as I understand it.

 My indexer has a main loop that looks roughly like this:

  while true:
    try:
      submit a new add or delete request to Solr via HTTP
    catch timeoutException:
      sleep a few seconds

 When things are going wrong (i.e., when a large segment merge is
 happening), this loop is problematic:

 * When the indexer's request hits Solr, then the corresponding thread
 in Tomcat blocks. (It looks to me like the thread is destined to block
 until the entire merge is complete. I'll paste in what the Java stack
 traces look like at the end of the message if they can help diagnose
 things.)
 * Because the Solr thread stays blocked for so long, eventually the
 indexer hits a timeoutException. (That is, it gives up on Solr.)
 * Hitting the timeout exception doesn't cause the corresponding Tomcat
 thread to die or unblock. Therefore, each time through the loop,
 another Solr-handling thread inside Tomcat enters a blocked state.
 * Eventually so many threads (maxThreads, whose Tomcat default is 200)
 are blocked that Tomcat starts rejecting all new Solr HTTP requests --
 including those coming in from the web tier.
 * Users are unable to search. The problem might self-correct once the
 merge is complete, but that could be quite a while.

 What are my options for changing Solr settings or changing my indexing
 process to avoid this lockup scenario? Do you agree that the segment
 merge is helping cause the lockup? Do adds and deletes really need to
 block on segment merges?

 Partial thread dumps follow, showing example add and delete threads
 that are blocked. Also the active Lucene Merge Thread, and the thread
 that kicked off the merge.

 [doc deletion thread, waiting for DirectUpdateHandler2.iwCommit.lock()
 to return]
 http-1800-200 daemon prio=6 tid=0x0a58cc00 nid=0x1028
 waiting on condition [0x0f9ae000..0x0f9afa90]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  0x00016d801ae0 (a
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
 Source)
        at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown
 Source)
        at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown
 Source)
        at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Unknown
 Source)
        at 
 org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:320)
        at 
 org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:71)
        at org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
        at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at 
 

Questions on autocommit and optimize operations

2010-04-23 Thread dipti khullar
Hi Solr Gurus

We are thinking about optimizing our production master slave solr setup,
just wanted to poll the group on following questions:

1. Currently we are using autocommit feature with setting of 50 docs and 5
mins. Now the requirement is to reduce this time. So we are analyzing the
situation where we will use the time based feature of autocommit. The time
to autocommit will be *1 min*.
Can anyone think of any disadvantages this change can have on index? Is it
possible that autocommit process itself takes more that 1 min?

2. We want to trace the average time it takes to perform commit operation.
Right now on production we have Lucid Solr 1.4 on master/slaves but we are
still using old script based replication method. But we will be moving to
new JAVA based replication soon, hence want to focus more on autocommit and
the time it takes to commit the data. So, how to trace back the logs of
autocommit? Does autocommit executes commit script present under bin folder?

3. What should be the optimum time for optimizing the data? After going
through some posts like -
http://www.mail-archive.com/solr-user@lucene.apache.org/msg10920.html. it
makes sense to optimize the data infrequently.
How to configure this in 1.4? Currently we optimize using optimize script
twice a day. Also, can there be a situation where the optimize can conflict
with commit operation? If yes, then how to avoid such kind of situation.

Many Thanks  Regards
Dipti Khullar


Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Hello,
I configured a Solr server to be able to extract data from various documents, 
including pdfs. Unfortunately, the data extraction fails on several pdfs. I 
have read around here that this may be due to the old Tika library being used?I 
looked around and saw that the svn had a newer version so i checked out the 
trunk, and built it using ant dist, and ant example.I then set up my schema in 
the newly built server, and inserted the library from the newly built cell into 
the lib directory (in solr's home). However, now all i get is a blank 
response... The indexing works, but it doesn't extract anything, only the 
literal values that i pass on are indexed.
Any help would be greatly appreciated!! :)
Thank you.
Marc Ghorayeb 
_
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Otis Gospodnetic
Marc, got anything in your logs?

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Marc Ghorayeb dekay...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 8:42:53 AM
 Subject: Problem with pdf, upgrading Cell
 
 
Hello,
I configured a Solr server to be able to extract data from various 
 documents, including pdfs. Unfortunately, the data extraction fails on 
 several 
 pdfs. I have read around here that this may be due to the old Tika library 
 being 
 used?I looked around and saw that the svn had a newer version so i checked 
 out 
 the trunk, and built it using ant dist, and ant example.I then set up my 
 schema 
 in the newly built server, and inserted the library from the newly built cell 
 into the lib directory (in solr's home). However, now all i get is a blank 
 response... The indexing works, but it doesn't extract anything, only the 
 literal values that i pass on are indexed.
Any help would be greatly 
 appreciated!! :)
Thank you.
Marc Ghorayeb 
 
   
 
_
Hotmail 
 arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, 
 …

 http://www.messengersurvotremobile.com/?d=Hotmail



RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

I'm launching it with the start.jar utility, and there doesn't seem to be 
anything weird inside the console when i upload a pdf. Is there a way to output 
the console to a log file? The only log file that get's updated is a log file 
in the logs directory, and it seems to only show the input/ouput of the web 
requests (get and posts...).
for example:127.0.0.1 -  -  [23/Apr/2010:13:06:47 +] GET 
/solr/core0/admin/luke?show=schemawt=json HTTP/1.1 200 21690 127.0.0.1 -  -  
[23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?wt=json HTTP/1.1 200 
780 127.0.0.1 -  -  [23/Apr/2010:13:06:57 +] POST 
/solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdfliteral.title=lucidworks-solr-refguide-1.4.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
 HTTP/1.1 200 41 127.0.0.1 -  -  [23/Apr/2010:13:06:58 +] POST 
/solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdfliteral.title=mysql-proxy-en.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
 HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:06:59 +] POST 
/solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdfliteral.title=python-cheat-sheet-v1.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
 HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:07:00 +] POST 
/solr/core0/update HTTP/1.1 200 41 127.0.0.1 -  -  [23/Apr/2010:13:07:00 
+] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 -  -  
[23/Apr/2010:13:07:05 +] GET /solr/core0/admin/schema.jsp HTTP/1.1 200 
26395 127.0.0.1 -  -  [23/Apr/2010:13:07:05 +] GET 
/solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1 304 0 
I don't think that's going to help much :)
 Date: Fri, 23 Apr 2010 06:04:34 -0700
 From: otis_gospodne...@yahoo.com
 Subject: Re: Problem with pdf, upgrading Cell
 To: solr-user@lucene.apache.org
 
 Marc, got anything in your logs?
 
  Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
  From: Marc Ghorayeb dekay...@hotmail.com
  To: solr-user@lucene.apache.org
  Sent: Fri, April 23, 2010 8:42:53 AM
  Subject: Problem with pdf, upgrading Cell
  
  
 Hello,
 I configured a Solr server to be able to extract data from various 
  documents, including pdfs. Unfortunately, the data extraction fails on 
  several 
  pdfs. I have read around here that this may be due to the old Tika library 
  being 
  used?I looked around and saw that the svn had a newer version so i checked 
  out 
  the trunk, and built it using ant dist, and ant example.I then set up my 
  schema 
  in the newly built server, and inserted the library from the newly built 
  cell 
  into the lib directory (in solr's home). However, now all i get is a blank 
  response... The indexing works, but it doesn't extract anything, only the 
  literal values that i pass on are indexed.
 Any help would be greatly 
  appreciated!! :)
 Thank you.
 Marc Ghorayeb 
  

  
 _
 Hotmail 
  arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, 
  …
 
  http://www.messengersurvotremobile.com/?d=Hotmail
 
  
_
Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans 
HOTMAIL !
http://www.windowslive.fr/hotmail/agregation/

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Seems like i'm not the only one with this no extraction 
problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
 he tried the same thing, building from the trunk, and indexing a pdf, and no 
extraction occured... Strange.
Marc G.

 From: dekay...@hotmail.com
 To: solr-user@lucene.apache.org
 Subject: RE: Problem with pdf, upgrading Cell
 Date: Fri, 23 Apr 2010 15:12:39 +0200
 
 
 I'm launching it with the start.jar utility, and there doesn't seem to be 
 anything weird inside the console when i upload a pdf. Is there a way to 
 output the console to a log file? The only log file that get's updated is a 
 log file in the logs directory, and it seems to only show the input/ouput of 
 the web requests (get and posts...).
 for example:127.0.0.1 -  -  [23/Apr/2010:13:06:47 +] GET 
 /solr/core0/admin/luke?show=schemawt=json HTTP/1.1 200 21690 127.0.0.1 -  - 
  [23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?wt=json HTTP/1.1 
 200 780 127.0.0.1 -  -  [23/Apr/2010:13:06:57 +] POST 
 /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdfliteral.title=lucidworks-solr-refguide-1.4.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
  HTTP/1.1 200 41 127.0.0.1 -  -  [23/Apr/2010:13:06:58 +] POST 
 /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdfliteral.title=mysql-proxy-en.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
  HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:06:59 +] POST 
 /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdfliteral.title=python-cheat-sheet-v1.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
  HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:07:00 +] POST 
 /solr/core0/update HTTP/1.1 200 41 127.0.0.1 -  -  [23/Apr/2010:13:07:00 
 +] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 -  -  
 [23/Apr/2010:13:07:05 +] GET /solr/core0/admin/schema.jsp HTTP/1.1 200 
 26395 127.0.0.1 -  -  [23/Apr/2010:13:07:05 +] GET 
 /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1 304 0 
 I don't think that's going to help much :)
  Date: Fri, 23 Apr 2010 06:04:34 -0700
  From: otis_gospodne...@yahoo.com
  Subject: Re: Problem with pdf, upgrading Cell
  To: solr-user@lucene.apache.org
  
  Marc, got anything in your logs?
  
   Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
  
  
  
  - Original Message 
   From: Marc Ghorayeb dekay...@hotmail.com
   To: solr-user@lucene.apache.org
   Sent: Fri, April 23, 2010 8:42:53 AM
   Subject: Problem with pdf, upgrading Cell
   
   
  Hello,
  I configured a Solr server to be able to extract data from various 
   documents, including pdfs. Unfortunately, the data extraction fails on 
   several 
   pdfs. I have read around here that this may be due to the old Tika 
   library being 
   used?I looked around and saw that the svn had a newer version so i 
   checked out 
   the trunk, and built it using ant dist, and ant example.I then set up my 
   schema 
   in the newly built server, and inserted the library from the newly built 
   cell 
   into the lib directory (in solr's home). However, now all i get is a 
   blank 
   response... The indexing works, but it doesn't extract anything, only the 
   literal values that i pass on are indexed.
  Any help would be greatly 
   appreciated!! :)
  Thank you.
  Marc Ghorayeb 
   
 
   
  _
  Hotmail 
   arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
   Blackberry, 
   …
  
   http://www.messengersurvotremobile.com/?d=Hotmail
  
 
 _
 Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans 
 HOTMAIL !
 http://www.windowslive.fr/hotmail/agregation/
  

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Seems like i'm not the only one with this no extraction 
problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
 he tried the same thing, building from the trunk, and indexing a pdf, and no 
extraction occured... Strange.
Marc G.
  
_
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

Comparing two queries

2010-04-23 Thread Villemos, Gert
We want to support that a user can register for interest in information,
based on a query he has defined himself. For example that he type in a
query, press a save button, provides his email and the system will now
email him with a daily digest.

 

As part of this, it would be nice to be able to tell the user that the
same / a similar query are already being monitored by another user, as
the users will likely have the same interests. I would therefore like to
evaluate whether two queries will return (almost) the same set of
results.

 

But how can I compare two queries to determine if they will return
(almost) the same set of results?

 

Thanks,

Gert.



Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Multiple query searches in one request

2010-04-23 Thread phoey

Hi there,

Is it possible to do a search more than once, where only the filter query
changes. The response is the three different search results.

We want a page which shows a clustered view of 5 of each of the three
types (images, news articles, editorial articles), ordered by their score.

One possibility is doing three seperate solr search requests, but its not
really a neat solution. 

One answer could be making a custom request handler, could that be possible
to solve this issue? Could you give me some pointers on how to implement
one?

thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-query-searches-in-one-request-tp745827p745827.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Comparing two queries

2010-04-23 Thread Otis Gospodnetic
Hello Gert,

I think you'd have to apply custom heuristics that involves looking at top N 
hits for each query and looking at the % overlap.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Villemos, Gert gert.ville...@logica.com
 To: solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 10:20:54 AM
 Subject: Comparing two queries
 
 We want to support that a user can register for interest in 
 information,
based on a query he has defined himself. For example that he 
 type in a
query, press a save button, provides his email and the system will 
 now
email him with a daily digest.



As part of this, it would 
 be nice to be able to tell the user that the
same / a similar query are 
 already being monitored by another user, as
the users will likely have the 
 same interests. I would therefore like to
evaluate whether two queries will 
 return (almost) the same set of
results.



But how can I 
 compare two queries to determine if they will return
(almost) the same set of 
 results?



Thanks,

Gert.



Please help Logica 
 to respect the environment by not printing this email  / Pour contribuer 
 comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail 
 /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica 
 dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a respeitar o 
 ambiente nao imprimindo este correio electronico.



This e-mail and 
 any attachment is for authorised use by the intended recipient(s) only. It 
 may 
 contain proprietary material, confidential information and/or be subject to 
 legal privilege. It should not be copied, disclosed to, retained or used by, 
 any 
 other party. If you are not an intended recipient then please promptly delete 
 this e-mail and any attachment and all copies and inform the sender. Thank 
 you.


What hardware do I need ?

2010-04-23 Thread Xavier Schepler

Hi,

I'm working with Solr 1.4.
My schema has about 50 fields.
I'm using full text search in short strings (~ 30-100 terms) and 
facetted search.

My index will have 100 000 documents.

The number of requests per second will be low. Let's say between 0 and 
1000 because of auto-complete.


Is a standard server (3ghz proc, 4gb ram) with the client application 
(apache + php5 + ZF + apc) and Tomcat + Solr enough ???

Do I need more hardware ?

Thanks in advance,

Xavier S.




Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Otis Gospodnetic
Marc,

These are your request logs.  You want to look at your Solr logs.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Marc Ghorayeb dekay...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 9:12:39 AM
 Subject: RE: Problem with pdf, upgrading Cell
 
 
I'm launching it with the start.jar utility, and there doesn't seem to be 
 anything weird inside the console when i upload a pdf. Is there a way to 
 output 
 the console to a log file? The only log file that get's updated is a log file 
 in 
 the logs directory, and it seems to only show the input/ouput of the web 
 requests (get and posts...).
for example:127.0.0.1 -  -  
 [23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?show=schemawt=json 
 HTTP/1.1 200 21690 127.0.0.1 -  -  [23/Apr/2010:13:06:47 +] GET 
 /solr/core0/admin/luke?wt=json HTTP/1.1 200 780 127.0.0.1 -  -  
 [23/Apr/2010:13:06:57 +] POST 
 /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdfliteral.title=lucidworks-solr-refguide-1.4.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
  
 HTTP/1.1 200 41 127.0.0.1 -  -  [23/Apr/2010:13:06:58 +] POST 
 /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdfliteral.title=mysql-proxy-en.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
  
 HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:06:59 +] POST 
 /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdfliteral.title=python-cheat-sheet-v1.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
  
 HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:07:00 +] POST 
 /solr/core0/update HTTP/1.1 200 41 127.0.0.1 -  -  
 [23/Apr/2010:13:07:00 +] POST /solr/core0/update HTTP/1.1 200 41 
 127.0.0.1 
 -  -  [23/Apr/2010:13:07:05 +] GET /solr/core0/admin/schema.jsp 
 HTTP/1.1 200 26395 127.0.0.1 -  -  [23/Apr/2010:13:07:05 +] GET 
 /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1 304 0 
I don't think that's 
 going to help much :)
 Date: Fri, 23 Apr 2010 06:04:34 -0700
 
 From: 
 href=mailto:otis_gospodne...@yahoo.com;otis_gospodne...@yahoo.com
 
 Subject: Re: Problem with pdf, upgrading Cell
 To: 
 ymailto=mailto:solr-user@lucene.apache.org; 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
 
 Marc, got anything in your logs?
 
  Otis
 
 
 Sematext :: 
 http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem 
 search :: 
 http://search-lucene.com/
 
 
 
 - Original 
 Message 
  From: Marc Ghorayeb 
 ymailto=mailto:dekay...@hotmail.com; 
 href=mailto:dekay...@hotmail.com;dekay...@hotmail.com
  To: 
 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
  Sent: Fri, April 23, 2010 8:42:53 AM
  Subject: Problem with 
 pdf, upgrading Cell
  
  
 Hello,
 I 
 configured a Solr server to be able to extract data from various 
  
 documents, including pdfs. Unfortunately, the data extraction fails on 
 several 
 
  pdfs. I have read around here that this may be due to the old Tika 
 library being 
  used?I looked around and saw that the svn had a 
 newer version so i checked out 
  the trunk, and built it using ant 
 dist, and ant example.I then set up my schema 
  in the newly built 
 server, and inserted the library from the newly built cell 
  into 
 the lib directory (in solr's home). However, now all i get is a blank 
 
  response... The indexing works, but it doesn't extract anything, only the 
 
  literal values that i pass on are indexed.
 Any help would 
 be greatly 
  appreciated!! :)
 Thank you.
 Marc 
 Ghorayeb
 
 
   
  
 
 _
 
 Hotmail 
  arrive sur votre téléphone ! Compatible Iphone, Windows 
 Phone, Blackberry, 
  …
 
  
 href=http://www.messengersurvotremobile.com/?d=Hotmail; target=_blank 
 http://www.messengersurvotremobile.com/?d=Hotmail
 


 

Merging Solr Cores Urgent

2010-04-23 Thread abhatna...@vantage.com

Hi,

I have a Question- Merging Solr Cores

The Wiki Documentation says that Merged core must exist prior to calling
the merge command

So I created the Merged core and pointed it to some data dir.

However even after merging the cores it does still points to the old data
dir

Shouldn't the merge command create a new data/index or at least the contents
of the merged index.?


Ankit
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Merging-Solr-Cores-Urgent-tp745938p745938.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple query searches in one request

2010-04-23 Thread Otis Gospodnetic
Hi,

Yes, a custom SearchComponent will do this.  We'd done stuff like this before 
and actually have this sort of functionality in some of Sematext products - it 
works well if you don't mind writing and adding another SearchComponent to your 
chain.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: phoey pho...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 10:23:38 AM
 Subject: Multiple query searches in one request
 
 
Hi there,

Is it possible to do a search more than once, where only 
 the filter query
changes. The response is the three different search 
 results.

We want a page which shows a clustered view of 5 of each of 
 the three
types (images, news articles, editorial articles), ordered by their 
 score.

One possibility is doing three seperate solr search requests, but 
 its not
really a neat solution. 

One answer could be making a custom 
 request handler, could that be possible
to solve this issue? Could you give 
 me some pointers on how to implement
one?

thanks
-- 
View this 
 message in context: 
 href=http://lucene.472066.n3.nabble.com/Multiple-query-searches-in-one-request-tp745827p745827.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Multiple-query-searches-in-one-request-tp745827p745827.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: What hardware do I need ?

2010-04-23 Thread Otis Gospodnetic
Xavier,

0-1000 QPS is a pretty wide range.  Plus, it depends on how good your 
auto-complete is, which depends on types of queries it issues, among other 
things.
100K short docs is small, so that will all fit in RAM nicely, assuming those 
other processes leave enough RAM for the OS to cache the index.

 That said, you do need more than 1 box if you want your auto-complete more 
fault tolerant.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Xavier Schepler xavier.schep...@sciences-po.fr
 To: solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 11:01:24 AM
 Subject: What hardware do I need ?
 
 Hi,

I'm working with Solr 1.4.
My schema has about 50 fields.
I'm 
 using full text search in short strings (~ 30-100 terms) and facetted 
 search.
My index will have 100 000 documents.

The number of requests 
 per second will be low. Let's say between 0 and 1000 because of 
 auto-complete.

Is a standard server (3ghz proc, 4gb ram) with the client 
 application (apache + php5 + ZF + apc) and Tomcat + Solr enough ???
Do I need 
 more hardware ?

Thanks in advance,

Xavier S.


RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Yes, the only log i can actually get is the one in the command console from 
windows and there are no errors there ...
Here are the last lines when i upload a pdf to the update/extract url:
Apr 23, 2010 5:47:03 PM org.apache.solr.servlet.SolrServlet initINFO: 
SolrServlet.init() doneApr 23, 2010 5:47:03 PM org.apache.solr.core.SolrCore 
executeINFO: [] webapp=null path=null 
params={event=firstSearcherq=static+firstSearcher+warming+query+from+solrconfig.xml}
 hits=0 status=0 QTime=0Apr 23, 2010 5:47:03 PM 
org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: JNDI not configured 
for solr (NoInitialContextEx)Apr 23, 2010 5:47:03 PM 
org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: solr home defaulted 
to 'solr/' (could not find system property or JNDI)Apr 23, 2010 5:47:03 PM 
org.apache.solr.servlet.SolrUpdateServlet initINFO: SolrUpdateServlet.init() 
doneApr 23, 2010 5:47:03 PM org.apache.solr.core.QuerySenderListener 
newSearcherINFO: QuerySenderListener done.Apr 23, 2010 5:47:03 PM 
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener 
newSearcherINFO: Loading spell index for spellchecker: default2010-04-23 
17:47:03.530:INFO::Opened 
E:\users\M1B\search\solr-new\example\logs\2010_04_23.request.log2010-04-23 
17:47:03.546:INFO::Started socketconnec...@0.0.0.0:8983Apr 23, 2010 5:47:03 PM 
org.apache.solr.core.SolrCore registerSearcherINFO: [] Registered new searcher 
searc...@259a8416 mainApr 23, 2010 5:47:11 PM 
org.apache.solr.update.processor.LogUpdateProcessor finishINFO: {} 0 297Apr 23, 
2010 5:47:11 PM org.apache.solr.core.SolrCore executeINFO: [] webapp=/solr 
path=/update/extract 
params={extractOnly=trueliteral.url=http://www.3ds.com/lucidworks-solr-refguide-1.4.pdfliteral.id=C:\Documents+and+Settings\M1B\workspace\3DS_FileIndexer\test\lucidworks-solr-refguide-1.4.pdfliteral.type=documentliteral.appKey=medialiteral.title=lucidworks-solr-refguide-1.4.pdfwt=javabinliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bversion=1literal.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17}
 status=0 QTime=297
Apr 23, 2010 5:47:12 PM org.apache.solr.update.processor.LogUpdateProcessor 
finishINFO: {} 0 0Apr 23, 2010 5:47:12 PM org.apache.solr.core.SolrCore 
executeINFO: [] webapp=/solr path=/update/extract 
params={extractOnly=trueliteral.url=http://www.3ds.com/mysql-proxy-en.pdfliteral.id=C:\Documents+and+Settings\M1B\workspace\3DS_FileIndexer\test\mysql-proxy-en.pdfliteral.type=documentliteral.appKey=medialiteral.title=mysql-proxy-en.pdfwt=javabinliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bversion=1literal.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17}
 status=0 QTime=0Apr 23, 2010 5:47:13 PM 
org.apache.solr.update.processor.LogUpdateProcessor finishINFO: {} 0 0Apr 23, 
2010 5:47:13 PM org.apache.solr.core.SolrCore executeINFO: [] webapp=/solr 
path=/update/extract 
params={extractOnly=trueliteral.url=http://www.3ds.com/python-cheat-sheet-v1.pdfliteral.id=C:\Documents+and+Settings\M1B\workspace\3DS_FileIndexer\test\python-cheat-sheet-v1.pdfliteral.type=documentliteral.appKey=medialiteral.title=python-cheat-sheet-v1.pdfwt=javabinliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bversion=1literal.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17}
 status=0 QTime=0Apr 23, 2010 5:47:14 PM 
org.apache.solr.update.DirectUpdateHandler2 commitINFO: start 
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)Apr 
23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher initINFO: 
Opening searc...@2efeecca mainApr 23, 2010 5:47:14 PM 
org.apache.solr.update.DirectUpdateHandler2 commitINFO: end_commit_flushApr 23, 
2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: autowarming 
searc...@2efeecca main from searc...@259a8416 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming result for searc...@2efeecca main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming searc...@2efeecca main from searc...@259a8416 main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming result for 

Re: What hardware do I need ?

2010-04-23 Thread Xavier Schepler

Le 23/04/2010 17:08, Otis Gospodnetic a écrit :

Xavier,

0-1000 QPS is a pretty wide range.  Plus, it depends on how good your 
auto-complete is, which depends on types of queries it issues, among other 
things.
100K short docs is small, so that will all fit in RAM nicely, assuming those 
other processes leave enough RAM for the OS to cache the index.

  That said, you do need more than 1 box if you want your auto-complete more 
fault tolerant.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
   

From: Xavier Scheplerxavier.schep...@sciences-po.fr
To: solr-user@lucene.apache.org
Sent: Fri, April 23, 2010 11:01:24 AM
Subject: What hardware do I need ?

Hi,
 

I'm working with Solr 1.4.
My schema has about 50 fields.
I'm
   

using full text search in short strings (~ 30-100 terms) and facetted
search.
 

My index will have 100 000 documents.

The number of requests
   

per second will be low. Let's say between 0 and 1000 because of
auto-complete.
 

Is a standard server (3ghz proc, 4gb ram) with the client
   

application (apache + php5 + ZF + apc) and Tomcat + Solr enough ???
 

Do I need
   

more hardware ?
 

Thanks in advance,

Xavier S.
   

Well my auto-complete is built on the facet prefix search component.
I think that 100-700 requests per seconds is maybe a better approximation.



Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Paul Borgermans
On Fri, Apr 23, 2010 at 5:48 PM, Marc Ghorayeb dekay...@hotmail.com wrote:

 Yes, the only log i can actually get is the one in the command console from 
 windows and there are no errors there ...
 Here are the last lines when i upload a pdf to the update/extract url:

snip

I am pretty sure it is the tika itself that does not manage to convert
your pdf. I'm not using solr cell but tika from a commandline, and it
is only with very recent tika builds pdf extraction works in most
cases.

So I suggest to build tika from svn yourself, and if the commandlien
extraction works, integarte it back with Solr. See

http://wiki.apache.org/solr/ExtractingRequestHandler

for instructions (the comitter section)

hth
Paul


SolrJ + BasicAuth

2010-04-23 Thread Jon Baer
Uggg I just got bit hard by this on a Tomcat project ... 

https://issues.apache.org/jira/browse/SOLR-1238

Is there anyway to get access to that RequestEntity w/o patching?  Also are 
there security implications w/ using the repeatable payloads?

Thanks.

- Jon

Re: Comparing two queries

2010-04-23 Thread Erik Hatcher
Or, use facet.query to get the overlap.  Here's ? 
q=query1facet=onfacet.query=query2


You'll get the hit count from query #1 in the results, and the  
overlapping count to query #2 in the facet query response.


Erik - http://www.lucidimagination.com

On Apr 23, 2010, at 11:01 AM, Otis Gospodnetic wrote:


Hello Gert,

I think you'd have to apply custom heuristics that involves looking  
at top N hits for each query and looking at the % overlap.


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 

From: Villemos, Gert gert.ville...@logica.com
To: solr-user@lucene.apache.org
Sent: Fri, April 23, 2010 10:20:54 AM
Subject: Comparing two queries

We want to support that a user can register for interest in
information,

based on a query he has defined himself. For example that he

type in a

query, press a save button, provides his email and the system will

now

email him with a daily digest.



As part of this, it would

be nice to be able to tell the user that the

same / a similar query are

already being monitored by another user, as

the users will likely have the

same interests. I would therefore like to

evaluate whether two queries will

return (almost) the same set of

results.



But how can I

compare two queries to determine if they will return

(almost) the same set of

results?




Thanks,

Gert.



Please help Logica
to respect the environment by not printing this email  / Pour  
contribuer
comme Logica au respect de l'environnement, merci de ne pas  
imprimer ce mail
/  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so  
Logica
dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a  
respeitar o

ambiente nao imprimindo este correio electronico.




This e-mail and
any attachment is for authorised use by the intended recipient(s)  
only. It may
contain proprietary material, confidential information and/or be  
subject to
legal privilege. It should not be copied, disclosed to, retained or  
used by, any
other party. If you are not an intended recipient then please  
promptly delete
this e-mail and any attachment and all copies and inform the  
sender. Thank

you.




Solr does not honor facet.mincount and field.facet.mincount

2010-04-23 Thread Umesh_

Hi All, 

I am trying to restrict facets in solr response, by setting facet.mincount =
1, which does not work as the request and response are shown below: 

REQUEST:
http://localhost:8983/solr/select/?q=*%3A*version=2.2rows=0start=0indent=onfacet=truefacet.field=Instrumentfacet.field=Locationfacet.mincount=9

RESPONSE: 
response 
− 
lst name=responseHeader 
int name=status0/int 
int name=QTime1/int 
− 
lst name=params 
str name=facettrue/str 
str name=indenton/str 
str name=start0/str 
str name=q*:*/str 
− 
arr name=facet.field 
strInstrument/str 
strLocation/str 
/arr 
str name=facet.minCount9/str 
str name=version2.2/str 
str name=rows0/str 
/lst 
/lst 
result name=response numFound=188 start=0/ 
− 
lst name=facet_counts 
lst name=facet_queries/ 
− 
lst name=facet_fields 
lst name=Instrument/ 
− 
lst name=Location 
int name=Camden, New Jersey [unconfirmed]118/int 
int name=Philadelphia, Pennsylvania [unconfirmed]7/int 
/lst 
/lst 
lst name=facet_dates/ 
/lst 
/response 

As we can see from the response that Instrument facet which has zero number
of distinct values, is included in the response. Also facet Philadelphia,
Pennsylvania [unconfirmed] which has count less than mincount (9) is
included in the response. 

I also tried Instrument.facet.mincount=1 but still I see Instrument facet in
the response. 

Please let me know if my understanding of mincount is different than what it
is intended to do, OR if I am doing something which is not correct. 

Regards, 
Umesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-does-not-honor-facet-mincount-and-field-facet-mincount-tp746499p746499.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tomcat vs. WebSphere

2010-04-23 Thread Ken Lane (kenlane)
Does anyone know of any advantages/disadvantages to running SOLR on
WebSphere versus Tomcat?

 

Thanks,

Ken



Re: Tomcat vs. WebSphere

2010-04-23 Thread Otis Gospodnetic
I've never used WebSphere, but I always got the impression that people have 
more issues with it than with simpler solutions.
Personally, I would suggest Jetty.  I've used it dozens of times and never had 
issues with it.  It's small, simple, and fast.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Ken Lane (kenlane) kenl...@cisco.com
 To: solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 3:53:07 PM
 Subject: Tomcat vs. WebSphere
 
 Does anyone know of any advantages/disadvantages to running SOLR on
WebSphere 
 versus Tomcat?



Thanks,

Ken


Re: What hardware do I need ?

2010-04-23 Thread Otis Gospodnetic
Xavier,

100-700 QPS is still high.  I'm guessing your 1 box won't handle that without 
sweating a lot (read: slow queries).
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Xavier Schepler xavier.schep...@sciences-po.fr
 To: solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 11:53:23 AM
 Subject: Re: What hardware do I need ?
 
 Le 23/04/2010 17:08, Otis Gospodnetic a écrit :
 Xavier,

 
 0-1000 QPS is a pretty wide range.  Plus, it depends on how good your 
 auto-complete is, which depends on types of queries it issues, among other 
 things.
 100K short docs is small, so that will all fit in RAM nicely, 
 assuming those other processes leave enough RAM for the OS to cache the 
 index.

   That said, you do need more than 1 box if you want 
 your auto-complete more fault tolerant.

 Otis
 
 
 Sematext :: 
 http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem 
 search :: 
 http://search-lucene.com/



 - Original 
 Message 

 From: Xavier Schepler
 ymailto=mailto:xavier.schep...@sciences-po.fr; 
 href=mailto:xavier.schep...@sciences-po.fr;xavier.schep...@sciences-po.fr
 
 To: 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
 Sent: Fri, April 23, 2010 11:01:24 AM
 Subject: What hardware do I 
 need ?

 Hi,
  
 I'm 
 working with Solr 1.4.
 My schema has about 50 fields.
 
 I'm

 using full text search in short strings (~ 
 30-100 terms) and facetted
 search.
  
 
 My index will have 100 000 documents.

 The number of 
 requests

 per second will be low. Let's say 
 between 0 and 1000 because of
 auto-complete.
  
 
 Is a standard server (3ghz proc, 4gb ram) with the 
 client

 application (apache + php5 + ZF + apc) 
 and Tomcat + Solr enough ???
  
 Do I 
 need

 more hardware ?

   
 Thanks in advance,

 Xavier S.
  
   
Well my auto-complete is built on the facet prefix search 
 component.
I think that 100-700 requests per seconds is maybe a better 
 approximation.


Re: Best way to prevent this search lockup (apparently caused during big segment merges)?

2010-04-23 Thread Otis Gospodnetic
Chris,

It looks like Mike already offered several solutions though I don't know 
what Solr does without looking at the code.

But I'm curious:
* how big is your index? and do you know how large the segments being merged 
are?
* do you batch docs or do you make use of Streaming SolrServer?
 I'm curious, because I've never encountered this problem before...

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Chris Harris rygu...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, April 22, 2010 6:28:29 PM
 Subject: Best way to prevent this search lockup (apparently caused during big 
  segment merges)?
 
 I'm running Solr 1.4+ under Tomcat 6, with indexing and searching
requests 
 simultaneously hitting the same Solr machine. Sometimes Solr,
Tomcat, and my 
 (C#) indexing process conspire to render search
inoperable. So far I've only 
 noticed this while big segment merges
(i.e. merges that take multiple 
 minutes) are taking place.

Let me explain the situation as best as I 
 understand it.

My indexer has a main loop that looks roughly like 
 this:

  while true:
try:
  
 submit a new add or delete request to Solr via HTTP
catch 
 timeoutException:
  sleep a few seconds

When things 
 are going wrong (i.e., when a large segment merge is
happening), this loop is 
 problematic:

* When the indexer's request hits Solr, then the 
 corresponding thread
in Tomcat blocks. (It looks to me like the thread is 
 destined to block
until the entire merge is complete. I'll paste in what the 
 Java stack
traces look like at the end of the message if they can help 
 diagnose
things.)
* Because the Solr thread stays blocked for so long, 
 eventually the
indexer hits a timeoutException. (That is, it gives up on 
 Solr.)
* Hitting the timeout exception doesn't cause the corresponding 
 Tomcat
thread to die or unblock. Therefore, each time through the 
 loop,
another Solr-handling thread inside Tomcat enters a blocked state.
* 
 Eventually so many threads (maxThreads, whose Tomcat default is 200)
are 
 blocked that Tomcat starts rejecting all new Solr HTTP requests --
including 
 those coming in from the web tier.
* Users are unable to search. The problem 
 might self-correct once the
merge is complete, but that could be quite a 
 while.

What are my options for changing Solr settings or changing my 
 indexing
process to avoid this lockup scenario? Do you agree that the 
 segment
merge is helping cause the lockup? Do adds and deletes really need 
 to
block on segment merges?

Partial thread dumps follow, showing 
 example add and delete threads
that are blocked. Also the active Lucene Merge 
 Thread, and the thread
that kicked off the merge.

[doc deletion 
 thread, waiting for DirectUpdateHandler2.iwCommit.lock()
to 
 return]
http-1800-200 daemon prio=6 tid=0x0a58cc00 
 nid=0x1028
waiting on condition 
 [0x0f9ae000..0x0f9afa90]
   java.lang.Thread.State: 
 WAITING (parking)
at sun.misc.Unsafe.park(Native 
 Method)
- parking to wait for  
 0x00016d801ae0 
 (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)

 at java.util.concurrent.locks.LockSupport.park(Unknown 
 Source)
at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
Source)

 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown
Source)

 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown
Source)

 at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Unknown
Source)

 at 
 org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:320)

 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:71)

 at 
 org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234)

 at 
 org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180)

 at 
 org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)

 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)

 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

 at 
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)

 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)

 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

 at 
 

Re: Collapse problem

2010-04-23 Thread Chris Hostetter
: basically, we are running query with field collapsing (Solr 1.4 with 
: patch 236). The responses tells us that there are about 2700 documents 
: matching our query. However, I can not get passed the 431th document. 
: From this point on, the response will not contain any document.

isn't that how collapse is suppose to work?  a total of 2700 match, but it 
collapses away many of them according to some criteria, so you only 
paginate through 431?


-Hoss



Re: Tomcat vs. WebSphere

2010-04-23 Thread Deo, Shantanu
We have run SOLR in weblogic without problems. The only change we see is
some spurious extra logging info which we don¹t see in the case of tomcat.
Anyone have an idea of how to control that ?

Thanks
Shantanu


On 4/23/10 12:53 PM, Ken Lane (kenlane) kenl...@cisco.com wrote:

 Does anyone know of any advantages/disadvantages to running SOLR on
 WebSphere versus Tomcat?
 
 
 
 Thanks,
 
 Ken
 
 




RE: Comparing two queries

2010-04-23 Thread Villemos, Gert
I was thinking along the lines
 
1. Retrieve the top result for one query.
2. Take the resulting document and evaluate the score that it would get in 
another query.
3. If the scores are similar, then the queries most likely overlap.
 
I guess that if I had two simple query strings archive crash and query 
archiving failure then I could:
 
1. Use the query ?q=archive crashrows=1 which will return me one result (if 
any).
2. Read the score of the returned document.
3. Read the unique identifier field value, lets say it has field name 'URI' and 
value 50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'.
4. Use the query ?q=archiving 
failureqf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55rows=1
5. Read the score of the returned document (the document will be the same as 
returned under 1, the score will be different, evaluated based on the second 
query).
6. Evaluate how similar the scores are.
 
My question this approach is; is the score calculated in 4 affected by the 
subquery, whoes role is solely to select a specific result?
 
I'm using the dismax by the way. Should I use the standard handler instead? 
Would it make a difference?
 
Thanks,
Gert.
 
 



From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: Fri 4/23/2010 8:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Comparing two queries



Or, use facet.query to get the overlap.  Here's ?
q=query1facet=onfacet.query=query2

You'll get the hit count from query #1 in the results, and the 
overlapping count to query #2 in the facet query response.

Erik - http://www.lucidimagination.com 
http://www.lucidimagination.com/ 

On Apr 23, 2010, at 11:01 AM, Otis Gospodnetic wrote:

 Hello Gert,

 I think you'd have to apply custom heuristics that involves looking 
 at top N hits for each query and looking at the % overlap.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
 From: Villemos, Gert gert.ville...@logica.com
 To: solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 10:20:54 AM
 Subject: Comparing two queries

 We want to support that a user can register for interest in
 information,
 based on a query he has defined himself. For example that he
 type in a
 query, press a save button, provides his email and the system will
 now
 email him with a daily digest.



 As part of this, it would
 be nice to be able to tell the user that the
 same / a similar query are
 already being monitored by another user, as
 the users will likely have the
 same interests. I would therefore like to
 evaluate whether two queries will
 return (almost) the same set of
 results.



 But how can I
 compare two queries to determine if they will return
 (almost) the same set of
 results?



 Thanks,

 Gert.



 Please help Logica
 to respect the environment by not printing this email  / Pour 
 contribuer
 comme Logica au respect de l'environnement, merci de ne pas 
 imprimer ce mail
 /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so 
 Logica
 dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
 respeitar o
 ambiente nao imprimindo este correio electronico.



 This e-mail and
 any attachment is for authorised use by the intended recipient(s) 
 only. It may
 contain proprietary material, confidential information and/or be 
 subject to
 legal privilege. It should not be copied, disclosed to, retained or 
 used by, any
 other party. If you are not an intended recipient then please 
 promptly delete
 this e-mail and any attachment and all copies and inform the 
 sender. Thank
 you.






Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: Comparing two queries

2010-04-23 Thread Otis Gospodnetic
Gert,

In your second query example you used qf=  Did you mean fq= ?  If 
so, the answer is no - filter queries don't affect the score.


I haven't tried your approach, but intuitively feel that looking at % overlap 
may work better. 
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Villemos, Gert gert.ville...@logica.com
 To: solr-user@lucene.apache.org; solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 5:08:04 PM
 Subject: RE: Comparing two queries
 
 I was thinking along the lines

1. Retrieve the top result for one 
 query.
2. Take the resulting document and evaluate the score that it would 
 get in another query.
3. If the scores are similar, then the queries most 
 likely overlap.

I guess that if I had two simple query strings archive 
 crash and query archiving failure then I could:

1. Use the query 
 ?q=archive crashrows=1 which will return me one result (if any).
2. 
 Read the score of the returned document.
3. Read the unique identifier field 
 value, lets say it has field name 'URI' and value 
 50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'.
4. Use the query ?q=archiving 
 failureqf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55rows=1
5. Read 
 the score of the returned document (the document will be the same as returned 
 under 1, the score will be different, evaluated based on the second 
 query).
6. Evaluate how similar the scores are.

My question this 
 approach is; is the score calculated in 4 affected by the subquery, whoes 
 role 
 is solely to select a specific result?

I'm using the dismax by the way. 
 Should I use the standard handler instead? Would it make a difference?

 
Thanks,
Gert.


 



From: Erik Hatcher [mailto:
 ymailto=mailto:erik.hatc...@gmail.com; 
 href=mailto:erik.hatc...@gmail.com;erik.hatc...@gmail.com]
Sent: Fri 
 4/23/2010 8:08 PM
To: 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
Subject: 
 Re: Comparing two queries



Or, use facet.query to get the 
 overlap.  Here's 
 ?
q=query1facet=onfacet.query=query2

You'll 
 get the hit count from query #1 in the results, and the 
overlapping count to 
 query #2 in the facet query response.

Erik - 
 
 http://www.lucidimagination.com 
 href=http://www.lucidimagination.com/; target=_blank 
 http://www.lucidimagination.com/ 

On Apr 23, 2010, at 11:01 AM, 
 Otis Gospodnetic wrote:

 Hello Gert,

 I think you'd 
 have to apply custom heuristics that involves looking 
 at top N hits for 
 each query and looking at the % overlap.

 Otis
 
 
 Sematext :: 
 http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem 
 search :: 
 http://search-lucene.com/



 - Original 
 Message 
 From: Villemos, Gert 
 ymailto=mailto:gert.ville...@logica.com; 
 href=mailto:gert.ville...@logica.com;gert.ville...@logica.com
 
 To: 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
 Sent: Fri, April 23, 2010 10:20:54 AM
 Subject: Comparing two 
 queries

 We want to support that a user can register for 
 interest in
 information,
 based on a query he has defined 
 himself. For example that he
 type in a
 query, press a save 
 button, provides his email and the system will
 now
 email him 
 with a daily digest.



 As part of this, it 
 would
 be nice to be able to tell the user that the
 same / a 
 similar query are
 already being monitored by another user, 
 as
 the users will likely have the
 same interests. I would 
 therefore like to
 evaluate whether two queries will
 return 
 (almost) the same set of
 results.



 But 
 how can I
 compare two queries to determine if they will 
 return
 (almost) the same set of
 
 results?



 Thanks,

 
 Gert.



 Please help Logica
 to respect 
 the environment by not printing this email  / Pour 
 
 contribuer
 comme Logica au respect de l'environnement, merci de ne 
 pas 
 imprimer ce mail
 /  Bitte drucken Sie diese 
 Nachricht nicht aus und helfen Sie so 
 Logica
 dabei, die 
 Umwelt zu schützen. /  Por favor ajude a Logica a 
 respeitar 
 o
 ambiente nao imprimindo este correio 
 electronico.



 This e-mail and
 any 
 attachment is for authorised use by the intended recipient(s) 
 only. 
 It may
 contain proprietary material, confidential information and/or 
 be 
 subject to
 legal privilege. It should not be copied, 
 disclosed to, retained or 
 used by, any
 other party. If 
 you are not an intended recipient then please 
 promptly 
 delete
 this e-mail and any attachment and all copies and inform the 
 
 sender. Thank
 you.






Please 
 help Logica to respect the environment by not printing this email  / Pour 
 contribuer comme Logica au respect de l'environnement, merci de ne pas 
 imprimer 
 ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so 
 Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
 respeitar o ambiente nao imprimindo este correio 
 

RE: Comparing two queries

2010-04-23 Thread Villemos, Gert
Yes, your solution is much simpler, providing the result through a single 
query. I didnt understand it the first time I read it.
 
I guess you would need to run it backwards as well to really evaluate the 
relevance, i.e. 
 
First 
q=query1facet=onfacet.query=query2

Then 
q=query2facet=onfacet.query=query1
 
Query 1 may return 100.000 hits with 500 overlapping with query 2. This would 
indicate no relevance.
Query 2 may return 1.000 documents with 500 overlaping with 1. This would 
indicate relevance.
 
I will test it out the next days and let you know how it works for us.
 
Regardsm
Gert.
 
 
 



From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Fri 4/23/2010 11:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Comparing two queries



Gert,

In your second query example you used qf=  Did you mean fq= ?  If 
so, the answer is no - filter queries don't affect the score.


I haven't tried your approach, but intuitively feel that looking at % overlap 
may work better.
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Villemos, Gert gert.ville...@logica.com
 To: solr-user@lucene.apache.org; solr-user@lucene.apache.org
 Sent: Fri, April 23, 2010 5:08:04 PM
 Subject: RE: Comparing two queries

 I was thinking along the lines

1. Retrieve the top result for one
 query.
2. Take the resulting document and evaluate the score that it would
 get in another query.
3. If the scores are similar, then the queries most
 likely overlap.

I guess that if I had two simple query strings archive
 crash and query archiving failure then I could:

1. Use the query
 ?q=archive crashrows=1 which will return me one result (if any).
2.
 Read the score of the returned document.
3. Read the unique identifier field
 value, lets say it has field name 'URI' and value
 50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'.
4. Use the query ?q=archiving
 failureqf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55rows=1
5. Read
 the score of the returned document (the document will be the same as returned
 under 1, the score will be different, evaluated based on the second
 query).
6. Evaluate how similar the scores are.

My question this
 approach is; is the score calculated in 4 affected by the subquery, whoes role
 is solely to select a specific result?

I'm using the dismax by the way.
 Should I use the standard handler instead? Would it make a difference?


Thanks,
Gert.






From: Erik Hatcher [mailto:
 ymailto=mailto:erik.hatc...@gmail.com;
 href=mailto:erik.hatc...@gmail.com;erik.hatc...@gmail.com]
Sent: Fri
 4/23/2010 8:08 PM
To:
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
Subject:
 Re: Comparing two queries



Or, use facet.query to get the
 overlap.  Here's
 ?
q=query1facet=onfacet.query=query2

You'll
 get the hit count from query #1 in the results, and the
overlapping count to
 query #2 in the facet query response.

Erik -

 http://www.lucidimagination.com http://www.lucidimagination.com/  
 href=http://www.lucidimagination.com/; target=_blank
 http://www.lucidimagination.com/

On Apr 23, 2010, at 11:01 AM,
 Otis Gospodnetic wrote:

 Hello Gert,

 I think you'd
 have to apply custom heuristics that involves looking
 at top N hits for
 each query and looking at the % overlap.

 Otis

 
 Sematext ::
 http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem
 search ::
 http://search-lucene.com/



 - Original
 Message 
 From: Villemos, Gert 
 ymailto=mailto:gert.ville...@logica.com;
 href=mailto:gert.ville...@logica.com;gert.ville...@logica.com

 To:
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org

 Sent: Fri, April 23, 2010 10:20:54 AM
 Subject: Comparing two
 queries

 We want to support that a user can register for
 interest in
 information,
 based on a query he has defined
 himself. For example that he
 type in a
 query, press a save
 button, provides his email and the system will
 now
 email him
 with a daily digest.



 As part of this, it
 would
 be nice to be able to tell the user that the
 same / a
 similar query are
 already being monitored by another user,
 as
 the users will likely have the
 same interests. I would
 therefore like to
 evaluate whether two queries will
 return
 (almost) the same set of
 results.



 But
 how can I
 compare two queries to determine if they will
 return
 (almost) the same set of

 results?



 Thanks,


 Gert.



 Please help Logica
 to respect
 the environment by not printing this email  / Pour

 contribuer
 comme Logica au respect de l'environnement, merci de ne
 pas
 imprimer ce mail
 /  Bitte drucken Sie diese
 Nachricht nicht aus und helfen Sie so
 Logica
 dabei, die
 Umwelt zu schützen. /  Por favor ajude a Logica a
 respeitar
 o
 ambiente nao imprimindo este correio
 electronico.



 This e-mail and
 any
 attachment is for 

Re: Solr full-import not working as expected

2010-04-23 Thread MitchK

Unfortunately you haven't answered my question, saratv.
The important question is, why did your DIH-configuration not import those
rows. 

Without providing any schema-information or configuration-details of your
DIH, no one will be able to help you.
Just for the future: If something don't work, present detailed information
to get fast and good-quality help.

Regards
- Mitch 


saratv wrote:
 
 is there a way ..from a java program  that i can find missing rows in the
 database...and import those rows into solr as docs...or atleast is there a
 way to find which rows r missing...
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-full-import-not-working-as-expected-tp744937p746927.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not honor facet.mincount and field.facet.mincount

2010-04-23 Thread Koji Sekiguchi

Umesh_ wrote:
Hi All, 


I am trying to restrict facets in solr response, by setting facet.mincount =
1, which does not work as the request and response are shown below: 


REQUEST:
http://localhost:8983/solr/select/?q=*%3A*version=2.2rows=0start=0indent=onfacet=truefacet.field=Instrumentfacet.field=Locationfacet.mincount=9

RESPONSE: 
response 
− 
lst name=responseHeader 
int name=status0/int 
int name=QTime1/int 
− 
lst name=params 
str name=facettrue/str 
str name=indenton/str 
str name=start0/str 
str name=q*:*/str 
− 
arr name=facet.field 
strInstrument/str 
strLocation/str 
/arr 
str name=facet.minCount9/str 
str name=version2.2/str 
str name=rows0/str 
/lst 
/lst 
result name=response numFound=188 start=0/ 
− 
lst name=facet_counts 
lst name=facet_queries/ 
− 
lst name=facet_fields 
lst name=Instrument/ 
− 
lst name=Location 
int name=Camden, New Jersey [unconfirmed]118/int 
int name=Philadelphia, Pennsylvania [unconfirmed]7/int 
/lst 
/lst 
lst name=facet_dates/ 
/lst 
/response 


As we can see from the response that Instrument facet which has zero number
of distinct values, is included in the response. Also facet Philadelphia,
Pennsylvania [unconfirmed] which has count less than mincount (9) is
included in the response. 

  

The emptiness of Instrument field of the response shows that Solr
couldn't facet data (9 or above docs) on the field. Regarding Location
field, the result is weird. Can you show us the data and the
field type of the field to reproduce the problem?


I also tried Instrument.facet.mincount=1 but still I see Instrument facet in
the response. 

  
Per field parameter needs f. prefix. It should be 
f.Instrument.face.mincount=1.


Koji

--
http://www.rondhuit.com/en/



Boost function on *:*

2010-04-23 Thread Blargy

Is it possible to use boost function across the whole index/empty search
term? 

I'm guessing the next question that would be asked is Why would you want to
do that. Well with have a bunch of custom business metrics included in each
document (a product). I would like to only show the best products (based on
our metrics and some boost functions) in absence of a search term. 

Is this possible?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-function-on-tp747131p747131.html
Sent from the Solr - User mailing list archive at Nabble.com.


mix cased search terms

2010-04-23 Thread Tuan Nguyen
Hello list, first time posting here. I am trying to find an answer to  
a strange search behaviour we're finding in our VuFind application. In  
order to eliminate any VuFind related variables, I have used the  
vanilla Solr example schema to try our problematic search.


I posted this xml to the example schema, slightly modified version of  
the monitor.xml for testing:


adddoc
  field name=id1/field
  field name=nameIn pursuit of the PhD/field
  field name=manuDell, Inc./field
  field name=catelectronics/field
  field name=catmonitor/field
  field name=featuresIn pursuit of the PhD/field
  field name=includesIn pursuit of the PhD/field
  field name=weight401.6/field
  field name=price2199/field
  field name=popularity6/field
  field name=inStocktrue/field
/doc/add


Then run a query in the admin interface with debug on and got no match:

features:PhD

The debug info shows:

str name=rawquerystringfeatures:PhD/str
str name=querystringfeatures:PhD/str
str name=parsedqueryPhraseQuery(features:ph d)/str
str name=parsedquery_toStringfeatures:ph d/str

But, In the analysis tool, it shows a match for the split ph d given  
the query term PhD.


If I set the splitOnCaseChange=0 option in the WordDelimiterFilter,  
then a match is found as expected.


I appreciate any insight on this problem. Thanks in advance.

Tuan