no segments* file found

2007-11-12 Thread SDIS M. Beauchamp
I'm using solr to index our files servers ( 480K files ) If I don't optimize, I 've got a too many files open at about 450K files and 3 Gb index If i optimize I've got this stacktrace during the commit of all the following update result status=1java.io.FileNotFoundException: no segments*

Re: no segments* file found

2007-11-12 Thread Venkatraman S
are you using embedded solr? I had stumbled on a similar error : http://www.mail-archive.com/solr-user@lucene.apache.org/msg06085.html -V On Nov 12, 2007 2:16 PM, SDIS M. Beauchamp [EMAIL PROTECTED] wrote: I'm using solr to index our files servers ( 480K files ) If I don't optimize, I 've

Re: Trim filer active for solr.StrField ?

2007-11-12 Thread Jörg Kiegeland
what is your specific SolrQuery? calling: query.setQuery( stuff with spaces); does not call trim(), but some other calls do. My query looks e.g. (myField:_T8sY05EAEdyU7fJs63mvdA OR myField:_T8sY0ZEAEdyU7fJs63mvdA OR myField:_T8sY0pEAEdyU7fJs63mvdA) AND NOT

RE: no segments* file found

2007-11-12 Thread SDIS M. Beauchamp
No , I'm using a custom indexer, written in C# which submits content using some post request. I let lucene manage the index on his own Florent BEAUCHAMP -Message d'origine- De : Venkatraman S [mailto:[EMAIL PROTECTED] Envoyé : lundi 12 novembre 2007 10:19 À :

RE: Multiple indexes

2007-11-12 Thread Pierre-Yves LANDRON
Hello, Until now, i've used two instance of solr, one for each of my collections ; it works fine, but i wonder if there is an advantage to use multiple indexes in one instance over several instances with one index each ? Note that the two indexes have different schema.xml. Thanks. PL Date:

solr range query

2007-11-12 Thread Heba Farouk
Hello, I would like to use solr to return ranges of searches on an integer field, if I wrote in the url offset:[0 TO 10], it returns documents with offset values 0, 1, 10 only but I want to return the range 0,1,2, 3, 4 ,10. How can I do that with solr Thanks in advance Best

Re: I18N with SOLR?

2007-11-12 Thread Ed Summers
I'd say yes. Solr supports Unicode and ships with language specific analyzers, and allows you to provide your own custom analyzers if you need them. This allows you to create different fieldType definitions for the languages you want to support. For example here is an example field type for French

Re: Multiple indexes

2007-11-12 Thread Ryan McKinley
The advantages of a multi-core setup are configuration flexibility and dynamically changing available options (without a full restart). For high-performance production solr servers, I don't think there is much reason for it. You may want to split the two indexes on to two machines. You may

Re: solr range query

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 8:02 AM, Heba Farouk [EMAIL PROTECTED] wrote: I would like to use solr to return ranges of searches on an integer field, if I wrote in the url offset:[0 TO 10], it returns documents with offset values 0, 1, 10 only but I want to return the range 0,1,2, 3, 4 ,10. How can

Query and heap Size

2007-11-12 Thread Jae Joo
In my system, the heap size (old generation) keeps growing up caused by heavy traffic. I have adjusted the size of young generation, but it does not work well. Does anyone have any recommendation regarding this issue? - Solr configuration and/or web.xml ...etc... Thanks, Jae

Re: Multiple indexes

2007-11-12 Thread Jae Joo
Here is my situation. I have 6 millions articles indexed and adding about 10k articles everyday. If I maintain only one index, whenever the daily feeding is running, it consumes the heap area and causes FGC. I am thinking the way to have multiple indexes - one is for ongoing querying service and

Re: Multiple indexes

2007-11-12 Thread Ryan McKinley
just use the standard collection distribution stuff. That is what it is made for! http://wiki.apache.org/solr/CollectionDistribution Alternatively, open up two indexes using the same config/dir -- do your indexing on one and the searching on the other. when indexing is done (or finishes a

Re: Best way to create multiple indexes

2007-11-12 Thread Ryan McKinley
For starters, do you need to be able to search across groups or sub-groups (in one query?) If so, then you have to stick everything in one index. You can add a field to each document saying what 'group' or 'sub-group' it is in and then limit it at query time q=kittens +group:A The

Re: Best way to create multiple indexes

2007-11-12 Thread Dwarak R
Hi Guys How do we add word documents / pdf / text / etc documents in solr ?. How the content of the files are stored or indexed ?. Does the documents are stored as XML in the filesystem ? Regards Dwarak R - Original Message - From: Ryan McKinley [EMAIL PROTECTED] To:

solr workflow ?

2007-11-12 Thread Dwarak R
Hi Guys How do we add word documents / pdf / text / etc documents in solr ?. How do the content of the files are stored or indexed ?. Are these documents stored as XML in the SOLR filesystem ? Regards Dwarak R This message is for the designated recipient only and may contain privileged,

RE: Best way to create multiple indexes

2007-11-12 Thread Rishabh Joshi
Ryan, We currently have 8-9 million documents to index and this number will grow in the future. Also, we will never have a query that will search across groups, but, we will have queries that will search across sub-groups for sure. Now, keeping this in mind we were thinking if we could have

Re: no segments* file found

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 3:46 AM, SDIS M. Beauchamp [EMAIL PROTECTED] wrote: If I don't optimize, I 've got a too many files open at about 450K files and 3 Gb index You may need to increase the number of filedescriptors in your system. If you're using Linux, see this:

Does SOLR supports multiple instances within the same webapplication?

2007-11-12 Thread Dilip.TS
Hello, Does SOLR supports multiple instances within the same web application? If so how is this achieved? Thanks in advance. Regards, Dilip TS

leading wildcards

2007-11-12 Thread Traut
Hi I found the thread about enabling leading wildcards in Solr as additional option in config file. I've got nightly Solr build and I can't find any options connected with leading wildcards in config files. How I can enable leading wildcard queries in Solr? Thank you

Re: solr workflow ?

2007-11-12 Thread Traut
rtfm :) http://lucene.apache.org/solr/tutorial.html On Nov 12, 2007 4:33 PM, Dwarak R [EMAIL PROTECTED] wrote: Hi Guys How do we add word documents / pdf / text / etc documents in solr ?. How do the content of the files are stored or indexed ?. Are these documents stored as XML in the SOLR

Re: Does SOLR supports multiple instances within the same webapplication?

2007-11-12 Thread Ryan McKinley
Dilip.TS wrote: Hello, Does SOLR supports multiple instances within the same web application? If so how is this achieved? If you want multiple indices, you can run multiple web-apps. If you need multiple indices in the same web-app, check SOLR-350 -- it is still in development, and make

Re: solr workflow ?

2007-11-12 Thread Venkatraman S
Highly unfortunate! On Nov 12, 2007 9:07 PM, Traut [EMAIL PROTECTED] wrote: rtfm :) http://lucene.apache.org/solr/tutorial.html On Nov 12, 2007 4:33 PM, Dwarak R [EMAIL PROTECTED] wrote: Hi Guys How do we add word documents / pdf / text / etc documents in solr ?. How do the content of

Re: leading wildcards

2007-11-12 Thread Traut
Seems like there is no way to enable leading wildcard queries except code editing and files repacking. :( On 11/12/07, Bill Au [EMAIL PROTECTED] wrote: The related bug is still open: http://issues.apache.org/jira/browse/SOLR-218 Bill On Nov 12, 2007 10:25 AM, Traut [EMAIL PROTECTED] wrote:

Re: leading wildcards

2007-11-12 Thread Michael Kimsal
Vote for that issue and perhaps it'll gain some more traction. A former colleague of mine was the one who contributed the patch in SOLR 218 and it would be nice to have that configuration option 'standard' (if off by default) in the next SOLR release. On Nov 12, 2007 11:18 AM, Traut [EMAIL

Re: Multiple indexes

2007-11-12 Thread Jae Joo
I have built the master solr instance and indexed some files. Once I run snapshotter, i complains the error.. - snapshooter -d data/index (in solr/bin directory) Did I missed something? ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/11/12 12:38:40 taking snapshot

RE: Solr + autocomplete

2007-11-12 Thread Park, Michael
Thanks Ryan, This looks like the way to go. However, when I set up my schema I get, Error loading class 'solr.EdgeNGramFilterFactory'. For some reason the class is not found. I tried the stable 1.2 build and even tried the nightly build. I'm using filter class=solr.EdgeNGramFilterFactory

RE: Solr + autocomplete

2007-11-12 Thread Chris Hostetter
: Error loading class 'solr.EdgeNGramFilterFactory'. For some reason EdgeNGramFilterFactory didn't exist when Solr 1.2 was released, but the EdgeNGramTokenizerFactory did. (the javadocs that come with each release list all of the various factories in that release) -Hoss

DINSTINCT ON functionality in Solr?

2007-11-12 Thread Jörg Kiegeland
Is there a way to define a query in that way that a search result contains only one representative of every set of documents which are equal on a given field (it is not important which representative document), i.e. to have the DINTINCT ON-concept from relational databases in Solr? If this

Phrase-based (vs. Word-Based) Proximity Search

2007-11-12 Thread Chris Harris
I gather that the standard Solr query parser uses the same syntax for proximity searches as Lucene, and that Lucene syntax is described at http://lucene.apache.org/java/docs/queryparsersyntax.html#Proximity%20Searches This syntax lets me look for terms that are within x words of each other.

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Erick Erickson
DISCLAIMER: This is from a Lucene-centric viewpoint. That said, this may be useful For your line number, page number etc perspective, it is possible to index special guaranteed-to-not-match tokens then use the termdocs/termenum data, along with SpanQueries to figure this out at search time.

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread David Neubert
Erik, Probably because of my newness to SOLR/Lucene, I see now what you/Yonik meant by case field, but I am not clear about your wording per-book setting attached at index time - would you mind ellaborating on that, so I am clear? Dave - Original Message From: Erik Hatcher [EMAIL

RE: Solr + autocomplete

2007-11-12 Thread Park, Michael
Will I need to use Solr 1.3 with the EdgeNGramFilterFactory in order to get the autosuggest feature? -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, November 12, 2007 1:05 PM To: solr-user@lucene.apache.org Subject: RE: Solr + autocomplete : Error

Re: Phrase-based (vs. Word-Based) Proximity Search

2007-11-12 Thread Ken Krugler
Hi Chris, I gather that the standard Solr query parser uses the same syntax for proximity searches as Lucene, and that Lucene syntax is described at http://lucene.apache.org/java/docs/queryparsersyntax.html#Proximity%20Searches This syntax lets me look for terms that are within x words of

Re: Associating pronouns instances to proper nouns?

2007-11-12 Thread David Neubert
Attempting to answer my own question, which I should probably just try, assuming I can doctor the indexed text ---I suppose I could do something like change all instances or I, he, etc that refer to one person to IJBA HEJBA, HIMJBA (making sure they would never equal a normal word) -- then use

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread David Neubert
Erik - thanks, I am considering this approach, verses explicit redundant indexing -- and am also considering Lucene -- problem is, I am one week into both technologies (though have years in the search space) -- wish I could go to Hong Kong -- any discounts available anywhere :) Dave -

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 2:20 PM, David Neubert [EMAIL PROTECTED] wrote: Erik - thanks, I am considering this approach, verses explicit redundant indexing -- and am also considering Lucene - There's not a well defined solution in either IMO. - problem is, I am one week into both technologies (though

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Chris Hostetter
: - problem is, I am one week into both technologies (though have years in the search space) -- wish I could : go to Hong Kong -- any discounts available anywhere :) : : Unfortunately the OS Summit has been canceled. Or rescheduled to 2008 ... depending on wether you are a half-empty /

Re: Associating pronouns instances to proper nouns?

2007-11-12 Thread David Neubert
All have found (from using the Admin/Analysis page) that if I were to append unique initials (that didn't match any other word or acronym) to each pronoun (e.g. I-WCN, she-WCN, my-WCN etc) that the default parsing and tokenization for the text field in SOLR might actually do the trick -- it

Re: Faceting over limited result set

2007-11-12 Thread Pieter Berkel
On 13/11/2007, Chris Hostetter [EMAIL PROTECTED] wrote: can you elaborate on your use case ... the only time i've ever seen people ask about something like this it was because true facet counts were too expensive to compute, so they were doing sampling of the first N results. In Solr,

Re: DINSTINCT ON functionality in Solr?

2007-11-12 Thread Pieter Berkel
Currently this functionality is not available in Solr out-of-the-box, however there is a patch implementing Field Collapsing http://issues.apache.org/jira/browse/SOLR-236 which might be similar to what you are trying to achieve. Piete On 13/11/2007, Jörg Kiegeland [EMAIL PROTECTED] wrote: Is

Re: Faceting over limited result set

2007-11-12 Thread Chris Hostetter
: It's not really a performance-related issue, the primary goal is to use the : facet information to determine the most relevant product category related to : the particular search being performed. ah ... ok, i understand now. the order does matter, you want the top N documents sorted by some

Re: Does SOLR supports multiple instances within the same webapplication?

2007-11-12 Thread James liu
if I understand correct,,u just do it like that:(i use php) $data1 = getDataFromInstance1($url); $data2 = getDataFromInstance2($url); it just have multi solr Instance. and getData from the distance. On Nov 12, 2007 11:15 PM, Dilip.TS [EMAIL PROTECTED] wrote: Hello, Does SOLR supports