Re: Boosting by facets with standard query

2009-04-17 Thread ashokc

What you indicated here is for a different purpose, is it not? I already do
something similar with my 'q'. For example a sample query logged in
'catalina.out' looks like

webapp=/search path=/select
params={rows=15start=0q=(+(content:umts)+OR+(title:umts)^2+OR+(urltext:umts)^2)}

when the search term is umts. I am looking for this term umts in the
fields  - (a) content, (b) title (boosted by a factor of 2) and (c) urltext
(boosted by a factor of 2). So the presense of the term umts in title or
url is weighed more than its presense in the regular content. So far so
good.

Now, I have other fields as well, like document type, file type etc... that
serve as facets to telescope down. Among the above set of search results, I
want to boost a specific document type 'white_papers'  a specific file type
pdf. By boosting I mean that these white_paper  pdf documents should
float to the top of the heap in the search results, if such documents are at
all present in the search results.

So would I simply add the following to the above q?

q=(+(content:umts)+OR+(title:umts)^2+OR+(urltext:umts)^2)+AND+(doctype:white_papers)^2+AND+(filetype:pdf)^2

But wouldn't the above give 0 results if there are no white_papers  pdfs
(because of the AND)? If I use OR, then the meaning of the query is lost
altogether.

What we need is for the white_papers  pdfs to be boosted, but if and only
if such doucments are valid results to the search term in question. How
would I write my above 'q' to accomplish that?

Thanks

- ashok



Shalin Shekhar Mangar wrote:
 
 On Fri, Apr 17, 2009 at 1:03 AM, ashokc ash...@qualcomm.com wrote:
 

 I have a query that yields results binned in several facets. How can I
 boost
 the results that fall in certain facets over the rest of them that do not
 belong to those facets? I use the standard query format. Thank you
 
 
 I'm not sure what you mean by boosting by facet. Do you mean that you want
 to boost documents which match a term query?
 
 If yes, you can use your_field_name:value^2.0 in the q parameter.
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Boosting-by-facets-with-standard-query-tp23084860p23091586.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: OutofMemory on Highlightling

2009-04-17 Thread Gargate, Siddharth
I tried hl.maxAnalyzedChars=500 but still the same issue. I get OOM for
row size 20 only.


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Thursday, April 16, 2009 9:56 PM
To: solr-user@lucene.apache.org
Subject: Re: OutofMemory on Highlightling


Hi,

Have you tried:
http://wiki.apache.org/solr/HighlightingParameters#head-2ca22f63cb8d1b2b
a3ff0cfc05e85b94898c59cf

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Gargate, Siddharth sgarg...@ptc.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 16, 2009 6:33:46 AM
 Subject: OutofMemory on Highlightling
 
 Hi,
 
 I am analyzing the memory usage for my Solr setup. I
am
 testing with 500 text documents of 2 MB each.
 
 I have defined a field for displaying the teasers and storing 1 MB of
 text in it.  I am testing with just 128 MB maxHeap(I know I should be
 increasing it but just testing the worst case scenario).
 
 If I search for all 500 documents with row size as 500 and
highlighting
 disabled, it works fine. But if I enable highlighting I get
 OutofMemoryError. 
 
 Looks like stored field for all the matched results are read into the
 memory. How to avoid this memory consumption?
 
 
 
 Thanks,
 
 Siddharth 



Re: Boosting by facets with standard query

2009-04-17 Thread Shalin Shekhar Mangar
On Fri, Apr 17, 2009 at 11:32 AM, ashokc ash...@qualcomm.com wrote:


 What we need is for the white_papers  pdfs to be boosted, but if and only
 if such doucments are valid results to the search term in question. How
 would I write my above 'q' to accomplish that?


Thanks for explaining in detail.

Basically, all you want to do is sort the results in the following order:
1. White papers
2. PDFs
3. Others

or maybe #1 and #2 are equivalent and can be intermingled.

Easiest way to do this is to index a new field whose values (when sorted)
give you the desired order. Then you can simply sort on that field and
score.

-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImport, remove doc when marked as deleted

2009-04-17 Thread Ruben Chadien

I have now :-)
Thanks , missed that in the Wiki.
Ruben

On Apr 16, 2009, at 7:10 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



did you try the deletedPkQuery?

On Thu, Apr 16, 2009 at 7:49 PM, Ruben Chadien ruben.chad...@aspiro.com 
 wrote:

Hi

I am new to Solr, but have been using Lucene for a while. I am  
trying to

rewrite
some old lucene indexing code using the Jdbc DataImport i Solr, my  
problem:


I have Entities that can be marked in the db as deleted, these i  
don't

want to index
and thats no problem when doing a full-import. When doing a delta- 
import my

deltaQuery will catch
Entities that has been marked as deleted since last index, but how  
do i get

it to delete those from the index ?
I tried making the deltaImportQuery so that in don't return the  
Entity if

its deleted, that didnt help...

Any ideas ?

Thanks
Ruben







--
--Noble Paul




Re: Faceted Search

2009-04-17 Thread Alejandro Gonzalez
if you are querying using a http request you can add these two parameters:

facet=true
facet.field=field_for_faceting

and optionally this one to set the max number of facets:

facet.limit=facet_limit

I don't know if it's what you need...


On Fri, Apr 17, 2009 at 6:17 AM, Sajith Weerakoon saji...@zone24x7.comwrote:

 Hi all,

 Can someone of you tell me how to implement a faceted search?



 Thanks,

 Regards,

 Sajith Vimukthi Weerakoon.






Re: Authentication Error

2009-04-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
It is fixed in the trunk

On Thu, Apr 16, 2009 at 10:47 PM, Allahbaksh Asadullah
allahbaks...@gmail.com wrote:
 Thanks Noble.Regards,
 Allahbaksh

 2009/4/16 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah
 allahbaks...@gmail.com wrote:
  Hi,I have followed the procedure given on this blog to setup the solr
 
  Below is my code. I am trying to index the data but I am not able to
 connect
  to server and getting authentication error.
 
 
  HttpClient client=new HttpClient();
  client.getState().setCredentials(new AuthScope(localhost, 80,
  AuthScope.ANY_SCHEME),
                 new UsernamePasswordCredentials(admin, admin));
 
  Can you please let me know what may be the problem.
 
  The other problem which I am facing is using Load Banlancing
  SolrServer lbHttpSolrServer = new LBHttpSolrServer(
  http://localhost:8080/solr,http://localhost:8983/solr;);
 
  Now the problem is the first server is down then I will get an error. If
 I
  swap the server in constructor by giving port 8983 server as first and
 8080
  as second it works fine. The thing
 
  Problem is If only the last server which is set is active and the rest of
  other are down then Solr throws and exception and search is not
 performed.
 
 I shall write a testcase and let you know
  Regards,
  Allahbaksh
 



 --
 --Noble Paul




 --
 Allahbaksh Mohammedali Asadullah,
 Software Engineering  Technology Labs,
 Infosys Technolgies Limited, Electronic City,
 Hosur Road, Bangalore 560 100, India.
 (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
 Fax: 91-80-28520362 | Mobile: 91-9845505322.




-- 
--Noble Paul


Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Hello,

I am searching for a way to use the Lucene MultiFieldQueryParser in my 
SOLR Installation.

Is there a chance to change the solrQueryParser ?

In my old Lucene Setting I used to combine many different types of 
QueryParser in my Querry...


Or is there a chance to get MultiFieldQueryParser  functions in SOLR ?

Greets -Ralf-


Re: Sorting performance + replication of index between cores

2009-04-17 Thread sunnyfr

Hi Christophe, 

Did you find a way to fix up your problem, cuz even with replication will
have this problem, lot of update means clear cache and manage that.
I've the same issue, I just wondering if I won't turn off servers during
update ??? 
How did you fix that ? 

Thanks,
sunny


christophe-2 wrote:
 
 Hi,
 
 After fully reloading my index, using another field than a Data does not 
 help that much.
 Using a warmup query avoids having the first request slow, but:
  - Frequents commits means that the Searcher is reloaded frequently 
 and, as the warmup takes time, the clients must wait.
  - Having warmup slows down the index process (I guess this is 
 because after a commit, the Searchers are recreated)
 
 So I'm considering, as suggested,  to have two instances: one for 
 indexing and one for searching.
 I was wondering if there are simple ways to replicate the index in a 
 single Solr server running two cores ? Any such config already tested ? 
 I guess that the standard replication based on rsync can be simplified a 
 lot in this case as the two indexes are on the same server.
 
 Thanks
 Christophe
 
 Beniamin Janicki wrote:
 :so you can send your updates anytime you want, and as long as you only 
 :commit every 5 minutes (or commit on a master as often as you want, but 
 :only run snappuller/snapinstaller on your slaves every 5 minutes) your 
 :results will be at most 5minutes + warming time stale.

 This is what I do as well ( commits are done once per 5 minutes ). I've
 got
 master - slave configuration. Master has turned off all caches (commented
 in
 solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
 ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
 with warming it took from 30 mins up to 2 hours). 

 Slave caches are configured to have autowarmCount=0 and
 maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
 done. I haven't noticed any huge delays while serving search request.
 Try to use those values - may be they'll help in your case too.

 Ben Janicki


 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
 Sent: 22 October 2008 04:56
 To: solr-user@lucene.apache.org
 Subject: Re: Sorting performance


 : The problem is that I will have hundreds of users doing queries, and a
 : continuous flow of document coming in.
 : So a delay in warming up a cache could be acceptable if I do it a few
 times
 : per day. But not on a too regular basis (right now, the first query
 that
 loads
 : the cache takes 150s).
 : 
 : However: I'm not sure why it looks not to be a good idea to update the
 caches

 you can refresh the caches automaticly after updating, the newSearcher 
 event is fired whenever a searcher is opened (but before it's used by 
 clients) so you can configure warming queries for it -- it doesn't have
 to 
 be done manually (or by the first user to use that reader)

 so you can send your updates anytime you want, and as long as you only 
 commit every 5 minutes (or commit on a master as often as you want, but 
 only run snappuller/snapinstaller on your slaves every 5 minutes) your 
 results will be at most 5minutes + warming time stale.


 -Hoss

   
 
 

-- 
View this message in context: 
http://www.nabble.com/Sorting-performance-tp20037712p23094174.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Marc Sturlese

Think there's no search handler that uses MultiFieldQueryParser in Solr. But
check DismaxRequestHandler, probably will do the job. Yo can specify all the
fields where you want to search in and it will build the query using boolean
queries. It includes also many more features:
http://wiki.apache.org/solr/DisMaxRequestHandler



Kraus, Ralf | pixelhouse GmbH wrote:
 
 Hello,
 
 I am searching for a way to use the Lucene MultiFieldQueryParser in my 
 SOLR Installation.
 Is there a chance to change the solrQueryParser ?
 
 In my old Lucene Setting I used to combine many different types of 
 QueryParser in my Querry...
 
 Or is there a chance to get MultiFieldQueryParser  functions in SOLR ?
 
 Greets -Ralf-
 
 

-- 
View this message in context: 
http://www.nabble.com/Using-Lucene-MultiFieldQueryParser-with-SOLR-tp23094412p23094692.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Marc Sturlese schrieb:

Think there's no search handler that uses MultiFieldQueryParser in Solr. But
check DismaxRequestHandler, probably will do the job. Yo can specify all the
fields where you want to search in and it will build the query using boolean
queries. It includes also many more features:
http://wiki.apache.org/solr/DisMaxRequestHandler
  

Is there a chance to combine RequestHandler ?
I need to use some additional normal boolean and integer querries !

Greets -Ralf-


Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Marc Sturlese schrieb:

Think there's no search handler that uses MultiFieldQueryParser in Solr. But
check DismaxRequestHandler, probably will do the job. Yo can specify all the
fields where you want to search in and it will build the query using boolean
queries. It includes also many more features:
http://wiki.apache.org/solr/DisMaxRequestHandler

THX A LOT !

You really made my day !

Greets -Ralf-


Re: Authentication Error

2009-04-17 Thread Allahbaksh Asadullah
Hi Noble.
Thank you very much. I will download the latest solr nightly build.
Please note this is the another problem which I think is bug.


I am trying out load balancing feature in Solr 1.4 using LBHTTPSolrServer.

Below is setup
I have three solr server. A, B and C.

Now the problem is if I make first two solr server (Note I have specified A,
B, C in order) i.e A and B down then it throws and exception. It does not
check it with server C. Though the server C is still active.

In short the if only last server specified in the constructor is active then
I get a Exception and query doesnot get fired.

Is it a bug or what may be the exact problem.

Regards,
Allahbaksh



2009/4/17 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 It is fixed in the trunk

 On Thu, Apr 16, 2009 at 10:47 PM, Allahbaksh Asadullah
 allahbaks...@gmail.com wrote:
  Thanks Noble.Regards,
  Allahbaksh
 
  2009/4/16 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com
 
  On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah
  allahbaks...@gmail.com wrote:
   Hi,I have followed the procedure given on this blog to setup the solr
  
   Below is my code. I am trying to index the data but I am not able to
  connect
   to server and getting authentication error.
  
  
   HttpClient client=new HttpClient();
   client.getState().setCredentials(new AuthScope(localhost, 80,
   AuthScope.ANY_SCHEME),
  new UsernamePasswordCredentials(admin, admin));
  
   Can you please let me know what may be the problem.
  
   The other problem which I am facing is using Load Banlancing
   SolrServer lbHttpSolrServer = new LBHttpSolrServer(
   http://localhost:8080/solr,http://localhost:8983/solr;);
  
   Now the problem is the first server is down then I will get an error.
 If
  I
   swap the server in constructor by giving port 8983 server as first and
  8080
   as second it works fine. The thing
  
   Problem is If only the last server which is set is active and the rest
 of
   other are down then Solr throws and exception and search is not
  performed.
  
  I shall write a testcase and let you know
   Regards,
   Allahbaksh
  
 
 
 
  --
  --Noble Paul
 
 
 
 
  --
  Allahbaksh Mohammedali Asadullah,
  Software Engineering  Technology Labs,
  Infosys Technolgies Limited, Electronic City,
  Hosur Road, Bangalore 560 100, India.
  (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
  Fax: 91-80-28520362 | Mobile: 91-9845505322.
 



 --
 --Noble Paul




-- 
Allahbaksh Mohammedali Asadullah,
Software Engineering  Technology Labs,
Infosys Technolgies Limited, Electronic City,
Hosur Road, Bangalore 560 100, India.
(Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
Fax: 91-80-28520362 | Mobile: 91-9845505322.


Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Marc Sturlese

Well dismax has a q.alt parameter where you can specify a query in lucene
sintax. The query must be empty to use q.alt:
http://.../select?q=q.alt=phone_number:1234567
This would search in the field phone_number independly of what fields you
have configured in teh dismax.

Another way would be to confiure various requesthandlers (one with dismax
and one standard for the filed that you want for example). You can tell Solr
wich to use in the url request

Don't know if this is what you need...


Kraus, Ralf | pixelhouse GmbH wrote:
 
 Marc Sturlese schrieb:
 Think there's no search handler that uses MultiFieldQueryParser in Solr.
 But
 check DismaxRequestHandler, probably will do the job. Yo can specify all
 the
 fields where you want to search in and it will build the query using
 boolean
 queries. It includes also many more features:
 http://wiki.apache.org/solr/DisMaxRequestHandler
   
 Is there a chance to combine RequestHandler ?
 I need to use some additional normal boolean and integer querries !
 
 Greets -Ralf-
 
 

-- 
View this message in context: 
http://www.nabble.com/Using-Lucene-MultiFieldQueryParser-with-SOLR-tp23094412p23097365.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Marc Sturlese schrieb:

Well dismax has a q.alt parameter where you can specify a query in lucene
sintax. The query must be empty to use q.alt:
http://.../select?q=q.alt=phone_number:1234567
This would search in the field phone_number independly of what fields you
have configured in teh dismax.
  
Now I use the fq parameter in combination with q.alt ... Runs fine 
yet :-)

The fq parameter sets my additional query parameter :-)

Greets -Ralf-




Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Marc Sturlese schrieb:

The only problem I found with q.alt is that it doesn't allow highlighting (or
at least it doesn't showed it for me). If you find out how to do it let me
know.

I use highlighting only with the normal querry !
My q.alt is *.*

But its really sad that the dismax dont support wildcarts :-(

Greets -Ralf-


Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Marc Sturlese

The only problem I found with q.alt is that it doesn't allow highlighting (or
at least it doesn't showed it for me). If you find out how to do it let me
know.
Thanks!

Kraus, Ralf | pixelhouse GmbH wrote:
 
 Marc Sturlese schrieb:
 Well dismax has a q.alt parameter where you can specify a query in
 lucene
 sintax. The query must be empty to use q.alt:
 http://.../select?q=q.alt=phone_number:1234567
 This would search in the field phone_number independly of what fields you
 have configured in teh dismax.
   
 Now I use the fq parameter in combination with q.alt ... Runs fine 
 yet :-)
 The fq parameter sets my additional query parameter :-)
 
 Greets -Ralf-
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Using-Lucene-MultiFieldQueryParser-with-SOLR-tp23094412p23097737.html
Sent from the Solr - User mailing list archive at Nabble.com.



EventListeners of DIM

2009-04-17 Thread Marc Sturlese

Hey there,
I have seen the new feature of EventListeners of DIH in trunk.

dataConfig
document onImportStart =com.FooStart onImportEnd=comFooEnd

/document
/dataConfig

These events are called at the begining and end of the whole indexing
process or at the begining and end of indexing just a document.
My idea is to update a field of a row of a mysl table every time a doc is
indexed. Is this possible or I should I save all doc ids and do the update
of the row of the table using onImportEnd?

Thanks in advance!


-- 
View this message in context: 
http://www.nabble.com/EventListeners-of-DIM-tp23098357p23098357.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: EventListeners of DIM

2009-04-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
these are for the beginning and end of the whoke indexing process

On Fri, Apr 17, 2009 at 7:38 PM, Marc Sturlese marc.sturl...@gmail.com wrote:

 Hey there,
 I have seen the new feature of EventListeners of DIH in trunk.

 dataConfig
 document onImportStart =com.FooStart onImportEnd=comFooEnd
 
 /document
 /dataConfig

 These events are called at the begining and end of the whole indexing
 process or at the begining and end of indexing just a document.
 My idea is to update a field of a row of a mysl table every time a doc is
 indexed. Is this possible or I should I save all doc ids and do the update
 of the row of the table using onImportEnd?

 Thanks in advance!


 --
 View this message in context: 
 http://www.nabble.com/EventListeners-of-DIM-tp23098357p23098357.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: Customizing solr with my lucene

2009-04-17 Thread mirage1987

Hey Erik,
 I also checked the index using luke and the index shows that
the terms are indexed as they should have been. So that implies that
something is wrong with the querying only and the results are not getting
retrieved.(As i said earlier even the parsed query is the way it should be
according to the changes i have made to lucene.)
Any ideas you have on this. Why this could be happening.

One more thing... tried to query the solr index using luke ...but still no
resultsmay be the index is not stored correctlycould it be changes
in the lucene api???should i revert to an older version of solr???



-- 
View this message in context: 
http://www.nabble.com/Customizing-solr-with-my-lucene-tp23038007p23098700.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Garbage Collectors

2009-04-17 Thread Bill Au
I would also include the -XX:+HeapDumpOnOutOfMemoryError option to get
a heap dump when the JVM runs out of heap space.



On Thu, Apr 16, 2009 at 9:43 PM, Bryan Talbot btal...@aeriagames.comwrote:

 If you're using java 5 or 6 jmap is a useful tool in tracking down memory
 leaks.

 http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html

 jmap -histo:live pid

 will print a histogram of all live objects in the heap.  Start at the top
 and work your way down until you find something suspicious -- the trick is
 in knowing what is suspicious of course.


 -Bryan





 On Apr 16, 2009, at Apr 16, 3:40 PM, David Baker wrote:

  Otis Gospodnetic wrote:

 Personally, I'd start from scratch:
 -Xmx -Xms...

 -server is not even needed any more.

 If you are not using Java 1.6, I suggest you do.

 Next, I'd try to investigate why objects are not being cleaned up - this
 should not be happening in the first place.  Is Solr the only webapp
 running?


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 

  From: David Baker dav...@mate1inc.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 16, 2009 3:33:18 PM
 Subject: Garbage Collectors

 I have an issue with garbage collection on our solr servers.  We have an
 issue where the  old generation  never  gets cleaned up on one of our
 servers.  This server has a little over 2 million records which are updated
 every hour or so.  I have tried the parallel GC and the concurrent GC.  The
 parallel seems more stable for us, but both end up running out of memory.  
 I
 have increased the memory allocated to the servers, but this just seems to
 delay the problem.  My question is, what are the suggested options for 
 using
 the parallel GC.  Currently we are using something of this nature:

 -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy
 -XX:+UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m
 -XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr

 I am new to solr and GC tuning, so any advice is appreciated.

  Thanks for the reply, yes, solr is the only app running under this
 tomcat server. I will remove -server, and other options except the heap
 allocation options and see how it performs. Any suggestions on how to go
 about finding out why objects are not being cleaned up if these changes dont
 work?





Re: CollapseFilter with the latest Solr in trunk

2009-04-17 Thread Jeff Newburn
We are currently trying to do the same thing.  With the patch unaltered we
can use fq as long as collapsing is turned on.  If we just send a normal
document level query with an fq parameter it blows up.

Additionally, it does not appear that the collapse.facet option works at
all.

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: climbingrose climbingr...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Fri, 17 Apr 2009 16:53:00 +1000
 To: solr-user solr-user@lucene.apache.org
 Subject: CollapseFilter with the latest Solr in trunk
 
 Hi all,
 
 Have any one try to use CollapseFilter with the latest version of Solr in
 trunk? However, it looks like Solr 1.4 doesn't allow calling setFilterList()
 and setFilter() on one instance of the QueryCommand. I modified the code in
 QueryCommand to allow this:
 
 public QueryCommand setFilterList(Query f) {
 //  if( filter != null ) {
 //throw new IllegalArgumentException( Either filter or filterList
 may be set in the QueryCommand, but not both. );
 //  }
   filterList = null;
   if (f != null) {
 filterList = new ArrayListQuery(2);
 filterList.add(f);
   }
   return this;
 }
 
 However, I still have a problem which prevent query filters from working
 when used in conjunction with CollapseFilter. In other words, query filters
 doesn't seem to have any effects on the result set when CollapseFilter is
 used.
 
 The other problem is related to OpenBitSet:
 
 java.lang.ArrayIndexOutOfBoundsException: 2183
 at org.apache.lucene.util.OpenBitSet.fastSet(OpenBitSet.java:242)
 at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:202)
 
 at 
 
org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:161
)
 at org.apache.solr.search.CollapseFilter.lt;initgt;(CollapseFilter.java:141)
 
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:2
 17)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandle
 r.java:195)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.ja
 va:131)
 
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
 at 
 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303
)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:23
 2)
 
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFi
 lterChain.java:202)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChai
 n.java:173)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java
 :213)
 
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java
 :178)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
 
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:1
 07)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
 
 at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processCon
 nection(Http11BaseProtocol.java:664)
 at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:
 527)
 at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWork
 erThread.java:80)
 
 at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:
 684)
 
 
 at java.lang.Thread.run(Thread.java:619)
 
 I think CollapseFilter is rather an important function in Solr that gets
 used quite frequently. Does anyone have a solution for this?
 
 -- 
 Regards,
 
 Cuong Hoang



WordDelimiterFilterFactory removes words when options set to 0

2009-04-17 Thread Burton-West, Tom
In trying to understand the various options for WordDelimiterFilterFactory, I 
tried setting all options to 0.
This seems to prevent a number of words from being output at all. In particular 
can't and 99dxl don't get output, nor do any wods containing hypens. Is 
this correct behavior?


Here is what the Solr Analyzer output

org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1   2   3   4   5   6   7   8   
9
term text   ca-55   99_3_a9 55-67   powerShot   ca999x15foo-bar 
can't   joe's   99dxl

 org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=0, 
generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, 
catenateNumbers=0}

term position   1   5
term text   powerShot   joe
term type   wordword
source start,end20,29   53,56

Here is the schema
fieldtype name=mbooksOcrXPatLike class=solr.TextField
  analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=0
generateWordParts=0
generateNumberParts=0
catenateWords=0
catenateNumbers=0
catenateAll=0
/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype

Tom

Re: python response handler treats unschema'd fields differently

2009-04-17 Thread Yonik Seeley
Seems like we could handle this 2 ways... leave out the field if it's
not defined in the schema, or include it and write it out as a string.
 I think either would probably be more useful than throwing an error
(which isn't really a request error but rather a schema/indexing
error).

Thoughts?

-Yonik
http://www.lucidimagination.com


On Fri, Apr 17, 2009 at 4:36 PM, Brian Whitman br...@echonest.com wrote:
 I have a solr index where we removed a field from the schema but it still
 had some documents with that field in it.
 Queries using the standard response handler had no problem but the
 wt=python handler would break on any query (with fl=* or asking for that
 field directly) with:

 SolrHTTPException: HTTP code=400, reason=undefined_field_oldfield

 I fixed it by putting that field back in the schema.

 One related weirdness is that fl=oldfield would cause the exception but not
 fl=othernonschemafield -- that is, it would only break on field names that
 were not in schema but were in the documents.

 I know this is undefined behavior territory but it was still weird that the
 standard response writer does not do this-- if you give a nonexistent field
 name to fl on wt=standard, either one that is in documents or is not -- it
 happily performs the query just skipping the ones that are not in the
 schema.



Re: Hierarchal Faceting Field Type

2009-04-17 Thread Chris Hostetter

: level one#
: level one#level two#
: level one#level two#level three#
: 
: Trying to find the right combination of field type and query to get the
: desired results. Saw some previous posts about hierarchal facets which helped
: in the generating the right query but having an issue using the built in text
: field which ignores our delimiter and the string field which prevents us from
: doing a start with search. Does anyone have any insight into the field
: declaration?

Use TextField, with a PatternTokenizer

BTW: if this isn't thread you've already seen, it's handy to know about...

http://www.nabble.com/Hierarchical-Faceting-to20090898.html#a20176326


-Hoss



Re: SNMP monitoring

2009-04-17 Thread Chris Hostetter

:  How would I set up SNMP monitoring of my Solr server? I've done some
: searching of the wiki and Google and have come up with a blank. Any
: pointers?

it depends on what you want to monitor.  if you just want to know what the 
JVM is running, this should be fairly easy...

if you wnat to be able to get Solr specific stats/data your best bet is 
probably to look into ways to access JMX MBeans via SNMP (there seem to be 
some tools out there to do things like this)

http://blogs.sun.com/jmxetc/entry/jmx_vs_snmp
http://www.google.co.uk/search?hl=enq=jmx+snmp



-Hoss



Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-17 Thread Bradford Stephens
OK, we've got 3 people... that's enough for a party? :)

Surely there must be dozens more of you guys out there... c'mon,
accelerate your knowledge! Join us in Seattle!



On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens
bradfordsteph...@gmail.com wrote:
 Greetings,

 Would anybody be willing to join a PNW Hadoop and/or Lucene User Group
 with me in the Seattle area? I can donate some facilities, etc. -- I
 also always have topics to speak about :)

 Cheers,
 Bradford



Re: Garbage Collectors

2009-04-17 Thread Otis Gospodnetic

The only thing that comes to mind is running Solr under a profiler (e.g. 
YourKit) and figuring out which objects are not getting cleaned up and who's 
holding references to them.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: David Baker dav...@mate1inc.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 16, 2009 6:40:31 PM
 Subject: Re: Garbage Collectors
 
 Otis Gospodnetic wrote:
  Personally, I'd start from scratch:
  -Xmx -Xms...
  
  -server is not even needed any more.
  
  If you are not using Java 1.6, I suggest you do.
  
  Next, I'd try to investigate why objects are not being cleaned up - this 
 should not be happening in the first place.  Is Solr the only webapp running?
  
  
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
  - Original Message 
   
  From: David Baker 
  To: solr-user@lucene.apache.org
  Sent: Thursday, April 16, 2009 3:33:18 PM
  Subject: Garbage Collectors
  
  I have an issue with garbage collection on our solr servers.  We have an 
 issue where the  old generation  never  gets cleaned up on one of our 
 servers.  
 This server has a little over 2 million records which are updated every hour 
 or 
 so.  I have tried the parallel GC and the concurrent GC.  The parallel seems 
 more stable for us, but both end up running out of memory.  I have increased 
 the 
 memory allocated to the servers, but this just seems to delay the problem.  
 My 
 question is, what are the suggested options for using the parallel GC.  
 Currently we are using something of this nature:
  
  -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy 
  -XX:+UseParallelOldGC 
 -XX:GCTimeRatio=19 -XX:NewSize=128m -XX:SurvivorRatio=2 
 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr
  
  I am new to solr and GC tuning, so any advice is appreciated.
 
 Thanks for the reply, yes, solr is the only app running under this tomcat 
 server. I will remove -server, and other options except the heap allocation 
 options and see how it performs. Any suggestions on how to go about finding 
 out 
 why objects are not being cleaned up if these changes dont work?



Re: dual of method - CommonsHttpSolrServer(url) to close and destroy underlying httpclient connection

2009-04-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
httpClient.getHttpConnectionManager().closeIdleConnections();

--Noble

On Sat, Apr 18, 2009 at 1:31 AM, Rakesh Sinha rakesh.use...@gmail.com wrote:
 When we instantiate a commonshttpsolrserver - we use the following method.

 CommonsHttpSolrServer    server = new CommonsHttpSolrServer(this.endPoint);

 how do we do we a 'kill all' of all the underlying httpclient connections  ?

 server.getHttpClient() returns a HttpClient reference, but I am trying
 to figure out the right method to close all currently active
 httpclient connections .




-- 
--Noble Paul