Re: JVM OOM when using field collapse component

2009-10-02 Thread Martijn v Groningen
No I have not encountered OOM exception yet with current field collapse patch. How large is your configured JVM heap space (-Xmx)? Field collapsing requires more memory then regular searches so. Does Solr run out of memory during the first search(es) or does it run out of memory after a while when

Re: populating synonyms.txt

2009-10-02 Thread Michael Engesgaard
I understand that synonyms are domain-specific, although I could still see a benefit of having standardized synonyms.txt files (a thesaurus) for general use. Just like the ones you can download or is already embedded in word processors like Open Office Writer or MS Word. I can understand that

Re: field collapsing sums

2009-10-02 Thread Martijn v Groningen
Well that is odd. How have you configured field collapsing with the dismax request handler? The collapse counts should X - 1 (if collapse.threshold=1). Martijn 2009/10/1 Joe Calderon calderon@gmail.com: thx for the reply, i just want the number of dupes in the query result, but it seems i

Re: best way to get the size of an index

2009-10-02 Thread Grant Ingersoll
On Oct 1, 2009, at 12:18 PM, Phillip Farber wrote: Resuming this discussion in a new thread to focus only on this question: What is the best way to get the size of an index so it does not get too big to be optimized (or to allow a very large segment merge) given space limits? I

yellow pages navigation kind menu. howto take every 100th row from resultset

2009-10-02 Thread Julian Davchev
Hi, Long story short: how can I take every 100th row from solr resultset. What would syntax for this be. Long story: Currently I have lots of say documents(articles) indexed. They all have field title with corresponding value. atitle btitle . *title How do I build menu so I can search

debugQuery different score for same query. dismax

2009-10-02 Thread Julian Davchev
Hi, I run debug on a query to examine the score as I was surprised of results. Here is the diff of same explain section of two different rows that I found troubling. It looks for pari in ancestorName field but first row looks in 241135 records and the second row it's just 187821 records.

debugQuery rows get different score for same field same value

2009-10-02 Thread Julian Davchev
Hi, I run debug on a query to examine the score as I was surprised of results. Here is the diff of same explain section of two different rows that I found troubling. It looks for pari in ancestorName field but first row looks in 241135 records and the second row it's just 187821 records.

Re: Keepwords Schema

2009-10-02 Thread Shalin Shekhar Mangar
On Thu, Oct 1, 2009 at 7:37 PM, matrix_psj matrix_...@hotmail.com wrote: An example: My schema is about web files. Part of the syntax is a text field of authors that have worked on each file, e.g. file filenamelogin.php/filename lastModDate2009-01-01/lastModDate authorsalex,

Re: Query filters/analyzers

2009-10-02 Thread Shalin Shekhar Mangar
On Thu, Oct 1, 2009 at 7:59 PM, Claudio Martella claudio.marte...@tis.bz.it wrote: About the copyField issue in general: as it copies the content to the other field, what is the sense to define analyzers for the destination field? The source is already analyzed so i guess that the RESULT of

Re: best way to get the size of an index

2009-10-02 Thread Mark Miller
Phillip Farber wrote: Resuming this discussion in a new thread to focus only on this question: What is the best way to get the size of an index so it does not get too big to be optimized (or to allow a very large segment merge) given space limits? I already have the largest 15,000rpm SCSI

Re: Only one usage of each socket address error

2009-10-02 Thread Mauricio Scheffer
Did you try this? http://blogs.msdn.com/dgorti/archive/2005/09/18/470766.aspx http://blogs.msdn.com/dgorti/archive/2005/09/18/470766.aspxAlso, please post the full exception stack trace. 2009/10/2 Steinar Asbjørnsen steinar...@gmail.com Tried running solr on jetty now, and I still get the same

Re: Question on modifying solr behavior on indexing xml files..

2009-10-02 Thread Shalin Shekhar Mangar
On Thu, Oct 1, 2009 at 3:10 PM, Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 peter.th...@navy.mil wrote: 1. In my playing around with sending in an XML document within a an XML CDATA tag, with termVectors=true I noticed the following behavior: personpeter/person collapses to the term

conditional sorting

2009-10-02 Thread Bojan Šmid
Hi all, I need to perform sorting of my query hits by different criterion depending on the number of hits. For instance, if there are 10 hits, sort by date_entered, otherwise, sort by popularity. Does anyone know if there is a way to do that with a single query, or I'll have to send another

Re: Query filters/analyzers

2009-10-02 Thread Fergus McMenemie
On Thu, Oct 1, 2009 at 7:59 PM, Claudio Martella claudio.marte...@tis.bz.it wrote: About the copyField issue in general: as it copies the content to the other field, what is the sense to define analyzers for the destination field? The source is already analyzed so i guess that the RESULT of

Re: trie fields and sortMissingLast

2009-10-02 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 2:54 PM, Lance Norskog goks...@gmail.com wrote: Trie fields also do not support faceting. Only those that index multiple tokens per value to speed up range queries. They also take more ram in some operations. Should be less memory on average. -Yonik

Re: Query filters/analyzers

2009-10-02 Thread Shalin Shekhar Mangar
On Fri, Oct 2, 2009 at 6:44 PM, Fergus McMenemie fer...@twig.me.uk wrote: The copy is done before analysis. The original text is sent to the copyField which can choose to do analysis differently from the source field. I have been wondering about this as well. The WIKI is not explicit about

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Jeff Newburn
Ah yes we do have some warming queries which would look like a search. Did that side change enough to push up the memory limits where we would run out like this? Also, would FastLRU cache make a difference? -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Yonik Seeley
On Fri, Oct 2, 2009 at 9:54 AM, Jeff Newburn jnewb...@zappos.com wrote: Ah yes we do have some warming queries which would look like a search.  Did that side change enough to push up the memory limits where we would run out like this? What does the warming request(s) look like, and what are

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Mark Miller
Jeff Newburn wrote: that side change enough to push up the memory limits where we would run out like this? Yes - now give us the FieldCache section from the stats section please :) Its not likely gonna do you any good, but it could be good information for us. -- - Mark

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Yonik Seeley
On Fri, Oct 2, 2009 at 10:02 AM, Mark Miller markrmil...@gmail.com wrote: Jeff Newburn wrote: that side change enough to push up the memory limits where we would run out like this? Yes - now give us the FieldCache section from the stats section please :) And the fieldValueCache section too

RE: Solr and Garbage Collection

2009-10-02 Thread siping liu
Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me at least. I'm new to

Re: Solr and Garbage Collection

2009-10-02 Thread Mark Miller
siping liu wrote: Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me

Re: conditional sorting

2009-10-02 Thread Uri Boness
If the threshold is only 10, why can't you always sort by popularity and if the result set is 10 then resort on the client side based on date_entered? Uri Bojan Šmid wrote: Hi all, I need to perform sorting of my query hits by different criterion depending on the number of hits. For

Re: How to access the information from SolrJ

2009-10-02 Thread Paul Tomblin
Nope, that just gets you the number of results returned, not how many there could be. Like I said, if you look at the XML returned, you'll see something like result name='response' numFound='1251' start='0' but only 10 doc returned. getNumFound returns 10 in that case, not 1251. 2009/10/2

Re: conditional sorting

2009-10-02 Thread Bojan Šmid
I tried to simplify the problem, but the point is that I could have really complex requirements. For instance, if in the first 5 results none are older than one year, use sort by X, otherwise sort by Y. So, the question is, is there a way to make Solr recognize complex situations and apply

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Jeff Newburn
The warmers return 11 fields: 3 Strings 2 booleans 2 doubles 2 longs 1 sint (solr.SortableIntField) Let me know if you need the fields actually be searched on. name:  fieldCache   class:  org.apache.solr.search.SolrFieldCacheMBean   version:  1.0   description:  Provides introspection of the

Re: Problem with Wildcard...

2009-10-02 Thread Christian Zambrano
Another thing to remember about wildcard and fuzzy searches is that none of the token filters will be applied. If you are using the LowerCaseFilterFactory at index time, then RI-MC50034-1 gets converted to ri-mc50034-1 which is never going to match RI-MC5000* Also, I would probably use the

Re: JVM OOM when using field collapse component

2009-10-02 Thread Joe Calderon
heap space is 4gb set to grow up to 8gb, usage is normally ~1-2gb, seems to happen within a few searches. if its just me ill try to isolate it, it could be some other part of my implementation thx much On Fri, Oct 2, 2009 at 1:18 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: No I

TermVector term frequencies for tag cloud

2009-10-02 Thread aodhol
Hello, I'm trying to create a tag cloud from a term vector, but the array returned (using JSON wt) is quite complex and takes an inordinate amount of time to process. Is there a better way to retrieve terms and their document TF? The TermVectorComponent allows for retrieval of tf and df though

snapshot creation and distribution

2009-10-02 Thread Robert . Kay
Hello, A couple questions with regard to snapshots and distribution: 1. If two snapshots are created in between a snappull, are the changes from the first snapshot missed by the slave, as it only pulls the most recent snapshot? 2. When triggering snapshooter from the postCommit hook, does a

Google Side-By-Side UI

2009-10-02 Thread Lance Norskog
http://googleenterprise.blogspot.com/2009/08/compare-enterprise-search-relevance.html This is really cool, and a version for Solr would help in doing relevance experiments. We don't need the select A or B feature, just seeing search result sets side-by-side would be great. -- Lance Norskog

Re: best way to get the size of an index

2009-10-02 Thread Mark Miller
Mark Miller wrote: Phillip Farber wrote: Resuming this discussion in a new thread to focus only on this question: What is the best way to get the size of an index so it does not get too big to be optimized (or to allow a very large segment merge) given space limits? I already have the

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Jeff Newburn
I reran the test to try to ensure that other cores on the instance didn't have searches against them. This time I get NPE errors just trying to get into the stats after the system hits its limit. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Jeff

Re: snapshot creation and distribution

2009-10-02 Thread Bill Au
A snapshot is a copy of the index at a particular moment in time. So changes in earlier snapshots are in the latest one as well. Nothing is missed by pulling the latest snapshot. When triggering snapshooter with the postCommit hook, a commit always results in a snapshot being created. Bill On

Re: Google Side-By-Side UI

2009-10-02 Thread Yao Ge
Yes. I think would be very helpful tool for tunning search relevancy - you can do a controlled experiment with your target audiences to understand their responses to the parameter changes. We plan to use this feature to benchmark Lucene/SOLR against our in-house commercial search engine - it will

Re: TermVector term frequencies for tag cloud

2009-10-02 Thread Bill Au
Have you considered using facet counts for your tag cloud? Bill On Fri, Oct 2, 2009 at 11:34 AM, aod...@gmail.com wrote: Hello, I'm trying to create a tag cloud from a term vector, but the array returned (using JSON wt) is quite complex and takes an inordinate amount of time to process. Is

Question about PatternReplace filter and automatic Synonym generation

2009-10-02 Thread Prasanna Ranganathan
Does the PatternReplaceFilter have an option where you can keep the original token in addition to the modified token? From what I looked at it does not seem to but I want to confirm the same. Alternatively, is there a filter available which takes in a pattern and produces additional forms of

Question regarding synonym

2009-10-02 Thread darniz
Hi i have a question regarding synonymfilter i have a one way mapping defined austin martin, astonmartin = aston martin what baffling me is that if i give at query time the word austin martin it first goes through white space and generate two words in analysis page austin and martin then

Re: How to access the information from SolrJ

2009-10-02 Thread Shalin Shekhar Mangar
On Fri, Oct 2, 2009 at 8:11 PM, Paul Tomblin ptomb...@xcski.com wrote: Nope, that just gets you the number of results returned, not how many there could be. Like I said, if you look at the XML returned, you'll see something like result name='response' numFound='1251' start='0' but only 10

Re: How to access the information from SolrJ

2009-10-02 Thread Paul Tomblin
On Fri, Oct 2, 2009 at 3:13 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Oct 2, 2009 at 8:11 PM, Paul Tomblin ptomb...@xcski.com wrote: Nope, that just gets you the number of results returned, not how many there could be.  Like I said, if you look at the XML returned, you'll

Re: How to access the information from SolrJ

2009-10-02 Thread Adam Allgaier
We have the same issue as Paul. We currently parse the XML manually to pull out the numFound from the response. Cheers! Adam - Original Message From: Paul Tomblin ptomb...@xcski.com To: solr-user@lucene.apache.org Sent: Friday, October 2, 2009 2:39:01 PM Subject: Re: How to access

search by some functionality

2009-10-02 Thread Elaine Li
Hi, My doc has three fields, say field1, field2, field3. My search would be q=field1:string1 field2:string2. I also need to do some computation and comparison of the string1 and string2 with the contents in field3 and then determine if it is a hit. What can I do to implement this? Thanks.

Invoke expungeDeletes using SolrJ's SolrServer.commit()

2009-10-02 Thread Jibo John
Hello, I know I can invoke expungeDeletes using updatehandler ( curl update - F stream.body=' commit expungeDeletes=true/' ), however, I was wondering if it is possible to invoke it using SolrJ. It looks like, currently, there are no SolrServer.commit(..) methods that I can use for this

Advantages of different Servlet Containers

2009-10-02 Thread Simon Wistow
I know that the Solr FAQ says Users should decide for themselves which Servlet Container they consider the easiest/best for their use cases based on their needs/experience. For high traffic scenarios, investing time for tuning the servlet container can often make a big difference. but is

Specifying all except field in field list?

2009-10-02 Thread Paul Rosen
Hi, Is there a way to request all fields in an object EXCEPT a particular one? In other words, the following pseudo code is what I'd like to express: req = Solr::Request::Standard.new(:start = page*size, :rows = size, :query = my_query, :field_list = [ ALL EXCEPT 'text' ]) Is there a way to

Re: Advantages of different Servlet Containers

2009-10-02 Thread Lajos
Just go for Tomcat. For all its problems, and I should know having used it since it was originally JavaWebServer, it is perfectly capable of handling high-end production environments provided you tune it correctly. We use it with our customized Solr 1.3 version without any problems. Lajos

RE: Advantages of different Servlet Containers

2009-10-02 Thread Walter Underwood
Netflix uses Tomcat throuought and they tail the log to figure out whether it has started, except they look for a message from Solr to see whether Solr is ready to go to work. wunder -Original Message- From: Lajos [mailto:la...@protulae.com] Sent: Friday, October 02, 2009 1:35 PM To:

RE: Question regarding synonym

2009-10-02 Thread Ensdorf Ken
Hi i have a question regarding synonymfilter i have a one way mapping defined austin martin, astonmartin = aston martin ... Can anybody please explain if my observation is correct. This is a very critical aspect for my work. That is correct - the synonym filter can recognize multi-token

Re: How to access the information from SolrJ

2009-10-02 Thread Shalin Shekhar Mangar
On Sat, Oct 3, 2009 at 1:09 AM, Paul Tomblin ptomb...@xcski.com wrote: Nope. Check again. getNumFound will definitely give you 1251. SolrDocumentList#size() will give you 10. I don't have to check again. I put this log into my query code: QueryResponse resp =

Re: Invoke expungeDeletes using SolrJ's SolrServer.commit()

2009-10-02 Thread Shalin Shekhar Mangar
On Sat, Oct 3, 2009 at 1:35 AM, Jibo John jiboj...@mac.com wrote: Hello, I know I can invoke expungeDeletes using updatehandler ( curl update -F stream.body=' commit expungeDeletes=true/' ), however, I was wondering if it is possible to invoke it using SolrJ. It looks like, currently,

Re: How to access the information from SolrJ

2009-10-02 Thread Paul Tomblin
LucidityWorks.com is my client.  The similarity to lucid is purely coincidental - the client didn't even know I was going to choose Solr.  I am using Solr trunk, last updated and compiled a few weeks ago. -- Sent from my Palm Prē Shalin Shekhar Mangar wrote: On Sat, Oct 3, 2009 at 1:09 AM,

Re: Advantages of different Servlet Containers

2009-10-02 Thread Shalin Shekhar Mangar
AOL uses Tomcat for all Solr deployments. Our load balancers use a ping query to put a box back into rotation. On Sat, Oct 3, 2009 at 2:15 AM, Walter Underwood wun...@wunderwood.orgwrote: Netflix uses Tomcat throuought and they tail the log to figure out whether it has started, except they

Re: best way to get the size of an index

2009-10-02 Thread Phillip Farber
Thanks, Mark. I really appreciate your confirmation. Phil Mark Miller wrote: Phillip Farber wrote: Resuming this discussion in a new thread to focus only on this question: What is the best way to get the size of an index so it does not get too big to be optimized (or to allow a very large

Re: Invoke expungeDeletes using SolrJ's SolrServer.commit()

2009-10-02 Thread Yonik Seeley
You can always add arbitrary parameters to an update request: UpdateRequest ureq = new UpdateRequest(); ureq.add(doc); ureq.setParam(expungeDeletes,true); NamedListObject rsp = server.request(ureq); -Yonik http://www.lucidimagination.com On Fri, Oct 2, 2009 at 4:05 PM, Jibo

Re: Invoke expungeDeletes using SolrJ's SolrServer.commit()

2009-10-02 Thread Jibo John
Created jira issue https://issues.apache.org/jira/browse/SOLR-1487 Thanks, -Jibo On Oct 2, 2009, at 2:17 PM, Shalin Shekhar Mangar wrote: On Sat, Oct 3, 2009 at 1:35 AM, Jibo John jiboj...@mac.com wrote: Hello, I know I can invoke expungeDeletes using updatehandler ( curl update -F

Re: conditional sorting

2009-10-02 Thread Lance Norskog
Doing a second search immediately after the first one is consistently under 100 ms for me, usually under 25, on cheap hardware. Even while sorting the results, you should have no problems. If necessary, you could run Solr with the embedded client and do one search right after the other, avoid the

Re: How to access the information from SolrJ

2009-10-02 Thread Paul Tomblin
On Fri, Oct 2, 2009 at 5:04 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Can you try this with the Solrj client in the official 1.3 release or even trunk? I did a svn update to 821188 and that seems to have fixed the problem. (The jar files changed from -1.3.0 to -1.4-dev) I guess

RE: Question regarding synonym

2009-10-02 Thread darniz
This is not working when i search documents i have a document which contains text aston martin when i search carDescription:austin martin i get a match but when i dont give double quotes like carDescription:austin martin there is no match in the analyser if i give austin martin with out

Re: Question regarding synonym

2009-10-02 Thread Christian Zambrano
When you use a field qualifier(fieldName:valueToLookFor) it only applies to the word right after the semicolon. If you look at the debug infomation you will notice that for the second word it is using the default field. str name=parsedquery_toStringcarDescription:austin *text*:martin/str the

Re: Question regarding synonym

2009-10-02 Thread darniz
Thanks As i said it even works by giving double quotes too. like carDescription:austin martin So is that the conclusion that in order to map two word synonym i have to always enclose in double quotes, so that it doen not split the words Christian Zambrano wrote: When you use a

Re: Specifying all except field in field list?

2009-10-02 Thread Lance Norskog
No, there is only list of fields, star, and score. You can choose to index it and not store it, and then have your application fetch it from the original data store. This is a common system design pattern to avoid storing giant text blobs in the index.

Re: Specifying all except field in field list?

2009-10-02 Thread Paul Rosen
Thanks, Lance, for the quick reply. Well, unfortunately, we need the highlighting feature on that field, so I think we have to store it. It's not a big deal, it just seemed like something that would be useful and probably be easy to implement, so I figured I just missed it. Alternately, is

Re: Advantages of different Servlet Containers

2009-10-02 Thread Joshua Tuberville
Simon, Have you tried the bin/jetty.sh script that comes with Jetty distributions? It contains the standard start|stop|restart functions. Joshua On Oct 2, 2009, at 1:11 PM, Simon Wistow wrote: I know that the Solr FAQ says Users should decide for themselves which Servlet Container they

Re: Specifying all except field in field list?

2009-10-02 Thread Lance Norskog
Maybe the TermsComponent? You can't ask for facets with a wildcard in the field name. This would do the trick. It's an issue in JIRA, if you want to vote for it. http://issues.apache.org/jira/browse/SOLR-247 http://issues.apache.org/jira/browse/SOLR-1387 On Fri, Oct 2, 2009 at 6:36 PM, Paul