commit, concurrency, full text search

2007-09-17 Thread Dilip.TS
Hi, 1)How does the commit works with multiple requests? 2)Does SOLR handle the concurrency during updates? 3)Does solr support any thing like, if I enclose the keywords within quotes, then we are searching for exactly those keywords together. Some thing like google does, for example if I enclose

largish test data set?

2007-09-17 Thread David Welton
Hi, I'm in the process of evaluating solr and sphinx, and have come to realize that actually having a large data set to run them against would be handy. However, I'm pretty new to both systems, so thought that perhaps asking around my produce something useful. What *I* mean by largish is

solr locked itself out

2007-09-17 Thread vanderkerkoff
Hello everyone. I've been reading some posts on this forum and I thought it best to start my own post as our situation is different from evveryone elses, isn't it always :-) We've got a django powered website that has solr as it's search engine. We're using the example solr application and

Re: solr locked itself out

2007-09-17 Thread Ryan McKinley
vanderkerkoff wrote: I found another post that suggested editing the unlockonstartup value in solrconfig.xml. Is that a wise idea? If you only have a single solr instance at at time, it should be totally fine.

Re: Can we build complex filter queries in SOLR

2007-09-17 Thread Alessandro Ferrucci
yeah that is possible, I just tried on one of my solr instances..let's say you have an index of player names: (first-name:Tim AND last-name:Anderson) OR (first-name:Anwar AND last-name:Johnson) OR (conference:Mountain West) will give you the results that logically match this query.. HTH.

Re: largish test data set?

2007-09-17 Thread Grant Ingersoll
You might be interested in the Lucene Java contrib/Benchmark task, which provides an indexing implementation of a download of Wikipedia (available at http://people.apache.org/~gsingers/wikipedia/) It is pretty trivial to convert the indexing code to send add commands to Solr. HTH, Grant

Re: largish test data set?

2007-09-17 Thread Daniel Alheiros
Hi Yonik. Do you have any performance statistics about those changes? Is it possible to upgrade to this new Lucene version using the Solr 1.2 stable version? Regards, Daniel On 17/9/07 17:37, Yonik Seeley [EMAIL PROTECTED] wrote: If you want to see what performance will be like on the next

Re: 'suggest' query sorting

2007-09-17 Thread Matthew Runo
Hello! Were you able to find out anything? I'd be interested to know what you found out. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Sep 15,

Re: largish test data set?

2007-09-17 Thread Yonik Seeley
If you want to see what performance will be like on the next release, you could try upgrading Solr's internal version of lucene to trunk (current dev version)... there have been some fantastic improvements in indexing speed. For query speed/throughput, Solr 1.2 or trunk should do fine. -Yonik

Re: largish test data set?

2007-09-17 Thread Karl Wettin
17 sep 2007 kl. 12.06 skrev David Welton: I'm in the process of evaluating solr and sphinx, and have come to realize that actually having a large data set to run them against would be handy. However, I'm pretty new to both systems, so thought that perhaps asking around my produce something

Re: Re[2]: multiple indices

2007-09-17 Thread Matt Kangas
Jack, the JNDI-enabling jarfiles now ship as part of the main .zip distribution. There is no need for a separate JettyPlus download as of Jetty 6. I used Jetty 6.1.3 (http://dist.codehaus.org/jetty/jetty-6.1.x/ jetty-6.1.3.zip) at the time, and I am using only these jarfiles from the main

Re: Indexing Speed

2007-09-17 Thread Mike Klaas
On 16-Sep-07, at 8:01 PM, erolagnab wrote: Hi, Just a FYI. I've seen some posts mentioned that Solr can index 100-150 docs/s and the comparison between embedded solr and HTTP. I've tried to do the indexing with 1.7+ million docs, each doc has 30 fields among which 10 fields are

RE: Triggering snapshooter through web admin interface

2007-09-17 Thread Wu, Daniel
There is no way to trigger snapshots taking through Solr's admin interface now. Taking a snapshot is a very light-weight operation. It uses hard links so each snapshot doesn't take up much additional disk space. If you [Wu, Daniel] It is not a concern on the snapshot performance. Rather,

Faceting Vs using lucene filters ?

2007-09-17 Thread cricdigs
Hi, I have a collection of blogs. Each Solr document has one blog with 3 fields - blogger(id), title and blog text. The search is performed over all 3 fields. When doing the search I need to show 2 things: 1. Bloggers block with all the matching bloggers (so if a title, blog or blogger contains

RE: Triggering snapshooter through web admin interface

2007-09-17 Thread Chris Hostetter
: I was also suggesting a new feature to allow sending messages to Solr : through http interface and a mechanism to handling the message on the : Solr server; in this case, a message to trigger snapshooter script. It : seems to me, a very useful feature to help simplify operational issues. it's

Re: Faceting Vs using lucene filters ?

2007-09-17 Thread Chris Hostetter
: 1. Bloggers block with all the matching bloggers (so if a title, blog or : blogger contains the search term, I show the blogger's id) : The first block is my problem since it shows multiple instances of the same : blogger if that blogger has multiple matching blogs. I can use faceting to :

Re: Combining Proximity Range search

2007-09-17 Thread Chris Hostetter
: My document will have a multivalued compound field like : : revision_01012007 : review_02012007 : : i am thinking of a query like comp:type:review date:[02012007 TO : 02282007]~0 your best bet is to change that so revision and review are the names of a field, and do a range search on them

Re: 'suggest' query sorting

2007-09-17 Thread Chris Hostetter
: How can I boost words where the whole value (not just the token) is closer to : the front of the value? That is, I want 'ca' to return: : 1. Canon PowerShot : 2. Canon EX PIXMA : 3. iPod Cable : 4. Video Card : (actually 12 could be swapped) i would argue that you don't want #3 and #4 at

Re: EdgeNGramTokenFilter, term position?

2007-09-17 Thread Chris Hostetter
: Should the EdgeNGramFilter use the same term position for the ngrams within a : single token? i can see the argument going both ways ... imagine a hypothetical CharSplitterTokenFilter that takes replaces each token in the stream with one token per character in the orriginal token (ie: hello

Re: EdgeNGramTokenFilter, term position?

2007-09-17 Thread Yonik Seeley
On 9/16/07, Ryan McKinley [EMAIL PROTECTED] wrote: Should the EdgeNGramFilter use the same term position for the ngrams within a single token? It feels like that is the right approach. I don't see value in having them sequential, and I can think of uses for having them overlap. -Yonik

Re: Control index/store at document level

2007-09-17 Thread Chris Hostetter
: nope, the field options are created on startup -- you can't change them : dynamically (i don't know all the details, but I think it is a file format : issue, not just a configuration issue) In the underlying Lucene library most of these options can be controlled per document, but Solr

Re: Solr - rudimentary problems

2007-09-17 Thread Chris Hostetter
: The corresponding entry for this field in schema.xml is : : field name=id type=text indexed=true : stored=true multiValued=false required=true/ i'm guessing text is from the example schema.xml ... this is not a good type to use for a uniqueId field ... that alone might

Re: solr locked itself out

2007-09-17 Thread Adrian Sutton
ulimit is unlimited and cat /proc/sys/fs/file-max 11769 I just went through the same kind of mistake - ulimit doesn't report what you think it does, what you should check is ulimit -n (the -n isn't just the option to set the value). If you're using bash as your shell that will almost

UserTagDesign

2007-09-17 Thread Karl Wettin
I've been looking at http://wiki.apache.org/solr/UserTagDesign on and off for a while and think all the use cases could be explained with simple UML class diagram semantics: [Taggable](tag:Tag)-- {0..*} |--- {0..*} --(tag:Tag)[Tagger] |

Re: 'suggest' query sorting

2007-09-17 Thread Ryan McKinley
if you really want #3 and #4 to show up, then have two fields: one using whitespace tokenizer, one using keyword tokenizer; both using EdgeNGramFilter ... boost the query to the first field higher then the second field (or just rely on the coordFactor and the fact that ca will match on both

RE: Triggering snapshooter through web admin interface

2007-09-17 Thread Wu, Daniel
-Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, September 17, 2007 1:28 PM To: solr-user@lucene.apache.org Subject: RE: Triggering snapshooter through web admin interface : I was also suggesting a new feature to allow sending messages to Solr :

Re: Solr - rudimentary problems

2007-09-17 Thread Venkatraman S
C'est Parfait! .. yes - that was the problem. thanks a lot. I am compiling a complete list of FAQs - will update it in the wiki soon. -vEnKAt On 9/18/07, Chris Hostetter [EMAIL PROTECTED] wrote: : The corresponding entry for this field in schema.xml is : : field name=id