[ANNOUNCE] Apache Solr 4.7.0 released.

2014-02-26 Thread Simon Willnauer
February 2014, Apache Solr™ 4.7 available The Lucene PMC is pleased to announce the release of Apache Solr 4.7 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

[ANNOUNCE] Apache Solr 4.6 released.

2013-11-24 Thread Simon Willnauer
24 November 2013, Apache Solr™ 4.6 available The Lucene PMC is pleased to announce the release of Apache Solr 4.6 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

[ANNOUNCE] Apache Solr 4.3 released

2013-05-06 Thread Simon Willnauer
May 2013, Apache Solr™ 4.3 available The Lucene PMC is pleased to announce the release of Apache Solr 4.3. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search,

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Simon Willnauer
On Thu, Aug 2, 2012 at 7:53 AM, roz dev rozde...@gmail.com wrote: Thanks Robert for these inputs. Since we do not really Snowball analyzer for this field, we would not use it for now. If this still does not address our issue, we would tweak thread pool as per eks dev suggestion - I am bit

Re: Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit

2012-07-10 Thread Simon Willnauer
it really seems that you are hitting an OOM during auto warming. can this be the case for your failure. Can you raise the JVM memory and see if you still hit the spike and go OOM? this is very unlikely a IndexWriter problem. I'd rather look at your warmup queries ie. fieldcache, FieldValueCache

Re: Multiple document types

2012-01-25 Thread Simon Willnauer
determine which index was to be loaded by the dataimport command. seems like you should look at solr's multicore feature: http://wiki.apache.org/solr/CoreAdmin simon F -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Wednesday, January 25, 2012 2:08

Call for Submission Berlin Buzzwords 2012all for Submission Berlin Buzzwords - http://berlinbuzzwords.de

2012-01-11 Thread Simon Willnauer
Chairs:  *  Isabel Drost (Nokia Apache Mahout)  *  Jan Lehnardt (CouchBase Apache CouchDB)  *  Simon Willnauer (SearchWorkings Apache Lucene)  *  Grant Ingersoll (Lucid Imagination Apache Lucene)  *  Owen O’Malley (Yahoo Inc. Apache Hadoop)  *  Jim Webber (Neo Technology Neo4j)  *  Sean Treadway

Heads Up - Index File Format Change on Trunk

2012-01-05 Thread Simon Willnauer
Folks, I just committed LUCENE-3628 [1] which cuts over Norms to DocVaues. This is an index file format change and if you are using trunk you need to reindex before updating. happy indexing :) simon [1] https://issues.apache.org/jira/browse/LUCENE-3628

Re: Solr Scoring question

2012-01-05 Thread Simon Willnauer
hey, On Thu, Jan 5, 2012 at 9:31 PM, Christopher Gross cogr...@gmail.com wrote: I'm getting different results running these queries: http://localhost:8080/solr/select?q=*:*fq=source:wikifq=tag:carsort=score+desc,dateSubmitted+ascfl=title,score,dateSubmittedrows=100

Re: spellcheck-index is rebuilt on commit

2012-01-03 Thread Simon Willnauer
On Tue, Jan 3, 2012 at 9:12 AM, OliverS oliver.schi...@unibas.ch wrote: Hi all Thanks a lot, and it seems to be a bug, but not of 4.0 only. You are right, I was doing a commit on an optimized index without adding any new docs (in fact, I did this for replication on the master). I will open a

Re: spellcheck-index is rebuilt on commit

2012-01-02 Thread Simon Willnauer
hey, is it possible that during those commits nothing has changed in the index? I mean are you committing nevertheless there are changes? if so this could happen since the spellchecker gets a new even that you did a commit but the index didn't really change. The spellchecker really only checks if

Re: Matching all documents in the index

2011-12-13 Thread Simon Willnauer
try *:* instead of *.* simon On Tue, Dec 13, 2011 at 5:03 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I have come across this query in the admin interface: *.* Is this meant to match all documents in my index? Currently when i run query with q= *.*, numFound is 130310 but the actuall

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Simon Willnauer
I wonder if you have a explicitly configured merge policy? In Solr 1.4 ie. Lucene 2.9 LogMergePolicy was the default but in 3.5 TieredMergePolicy is used by default. This could explain the differences segment wise since from what I understand you are indexing the same data on 1.4 and 3.5? simon

Re: Seek past EOF

2011-11-30 Thread Simon Willnauer
can you give us some details about what filesystem you are using? simon On Wed, Nov 30, 2011 at 3:07 PM, Ruben Chadien ruben.chad...@aspiro.com wrote: Happened again…. I got 3 directories in my index dir 4096 Nov  4 09:31 index.2004083156 4096 Nov 21 10:04 index.2021090440 4096

[ANNOUNCE] Apache Solr 3.5 released

2011-11-26 Thread Simon Willnauer
27 November 2011, Apache Solr™ 3.5.0 available The Lucene PMC is pleased to announce the release of Apache Solr 3.5.0. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

JVM Bugs affecting Lucene Solr

2011-11-15 Thread Simon Willnauer
hey folks, we lately looked into https://issues.apache.org/jira/browse/LUCENE-3235 again, an issue where a class using ConcurrentHashMap hangs / deadlocks on specific JVMs in combination with specific CPUs. It turns out its a JVM bug in Sun / Oracle Java 1.5 as well as Java 1.6. Its apparently

Re: changing omitNorms on an already built index

2011-10-28 Thread Simon Willnauer
On Fri, Oct 28, 2011 at 12:20 AM, Robert Muir rcm...@gmail.com wrote: On Thu, Oct 27, 2011 at 6:00 PM, Simon Willnauer simon.willna...@googlemail.com wrote: we are not actively removing norms. if you set omitNorms=true and index documents they won't have norms for this field. Yet, other

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Simon Willnauer
Hey Roman, On Fri, Oct 28, 2011 at 8:38 PM, Roman Alekseenkov ralekseen...@gmail.com wrote: Hi everyone, I'm looking for some help with Solr indexing issues on a large scale. We are indexing few terabytes/month on a sizeable Solr cluster (8 masters / serving writes, 16 slaves / serving

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Simon Willnauer
On Fri, Oct 28, 2011 at 9:17 PM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Roman, On Fri, Oct 28, 2011 at 8:38 PM, Roman Alekseenkov ralekseen...@gmail.com wrote: Hi everyone, I'm looking for some help with Solr indexing issues on a large scale. We are indexing few

Re: How can I force the threshold for a fuzzy query?

2011-10-27 Thread Simon Willnauer
I am not sure if there is such an option but you might be able to override your query parser and reset that value if it is too fuzzy. look for protected Query newFuzzyQuery(Term term, float minimumSimilarity, int prefixLength) there you can change the actual value used for minimumSimilarity

Re: changing omitNorms on an already built index

2011-10-27 Thread Simon Willnauer
we are not actively removing norms. if you set omitNorms=true and index documents they won't have norms for this field. Yet, other segment still have norms until they get merged with a segment that has no norms for that field ie. omits norms. omitNorms is anti-viral so once you set it to true it

Re: accessing the query string from inside TokenFilter

2011-10-25 Thread Simon Willnauer
On Tue, Oct 25, 2011 at 3:51 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Dear list, while writing some TokenFilter for my analyzer chain I need access to the query string from inside of my TokenFilter for some comparison, but the Filters are working with a TokenStream and get

Re: some basic information on Solr

2011-10-25 Thread Simon Willnauer
hey, 2011/10/24 Dan Wu wudan1...@gmail.com:  Hi all, I am doing a student project on search engine research. Right now I have some basic questions about Slor. 1. How many types of data file Solr can support (estimate)? i.e. No. of file types solr can look at for indexing and searching.

Re: Optimization /Commit memory

2011-10-25 Thread Simon Willnauer
RAM costs during optimize / merge is generally low. Optimize is basically a merge of all segments into one, however there are exceptions. Lucene streams existing segments from disk and serializes the new segment on the fly. When you optimize or in general when you merge segments you need disk

Re: How to make UnInvertedField faster?

2011-10-22 Thread Simon Willnauer
limitation here. simon Hopefully we can fix that at some point :) Mike McCandless http://blog.mikemccandless.com On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer simon.willna...@googlemail.com wrote: In trunk we have a feature called IndexDocValues which basically creates the uninverted

Re: Painfully slow indexing

2011-10-21 Thread Simon Willnauer
On Wed, Oct 19, 2011 at 3:58 PM, Pranav Prakash pra...@gmail.com wrote: Hi guys, I have set up a Solr instance and upon attempting to index document, the whole process is painfully slow. I will try to put as much info as I can in this mail. Pl. feel free to ask me anything else that might be

Re: How to make UnInvertedField faster?

2011-10-21 Thread Simon Willnauer
In trunk we have a feature called IndexDocValues which basically creates the uninverted structure at index time. You can then simply suck that into memory or even access it on disk directly (RandomAccess). Even if I can't help you right now this is certainly going to help you here. There is no

Checkout SearchWorkings.org - it just went live!

2011-09-09 Thread Simon Willnauer
Hey folks, Some of you might have heard, myself and a small group of other passionate search technology professionals have been working hard in the last few months to launch a community site known as SearchWorkings.org [1]. This initiative has been set up for other search professionals to have a

heads up: re-index 3.x branch Lucene/Solr indices

2011-08-22 Thread Simon Willnauer
I just reverted a previous commit related to CompoundFile in the 3.x stable branch. If you are using unreleased 3.x branch you need to reindex. See here for details: https://issues.apache.org/jira/browse/LUCENE-3218 If you are using a released version of Lucene/Solr then you can ignore this

Re: heads up: re-index 3.x branch Lucene/Solr indices

2011-08-22 Thread Simon Willnauer
Shawn, as long as you are only using a release version of lucene /solr you don't need to be worried at all. This is a index format change that has never been released. only if you use a svn checkout you should reindex. simon On Mon, Aug 22, 2011 at 8:56 PM, Shawn Heisey s...@elyograg.org wrote:

Re: Requiring multiple matches of a term

2011-08-22 Thread Simon Willnauer
On Mon, Aug 22, 2011 at 8:10 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : One simple way of doing this is maybe to write a wrapper for TermQuery : that only returns docs with a Term Frequency   X as far as I : understand the question those terms don't have to be within a certain :

Re: Requiring multiple matches of a term

2011-08-21 Thread Simon Willnauer
On Fri, Aug 19, 2011 at 6:26 PM, Michael Ryan mr...@moreover.com wrote: Is there a way to specify in a query that a term must match at least X times in a document, where X is some value greater than 1? One simple way of doing this is maybe to write a wrapper for TermQuery that only returns

Re: OOM due to JRE Issue (LUCENE-1566)

2011-08-16 Thread Simon Willnauer
hey, On Tue, Aug 16, 2011 at 9:34 AM, Pranav Prakash pra...@gmail.com wrote: Hi, This might probably have been discussed long time back, but I got this error recently in one of my production slaves. SEVERE: java.lang.OutOfMemoryError: OutOfMemoryError likely caused by the Sun VM Bug

Re: Can I delete the stored value?

2011-07-11 Thread Simon Willnauer
On Mon, Jul 11, 2011 at 8:28 AM, Andrzej Bialecki a...@getopt.org wrote: On 7/10/11 2:33 PM, Simon Willnauer wrote: Currently there is no easy way to do this. I would need to think how you can force the index to drop those so the answer here is no you can't! simon On Sat, Jul 9, 2011

Re: DelimitedPayloadTokenFilter and Highlighter

2011-07-10 Thread Simon Willnauer
Hey hannes, the simplest solution here is maybe using a second field that is for highlighting only. This field would then store your content without the payloads. The other way would be stripping off the payloads during rendering which is not a nice option I guess. Since I am not a highlighter

Re: Can I delete the stored value?

2011-07-10 Thread Simon Willnauer
Currently there is no easy way to do this. I would need to think how you can force the index to drop those so the answer here is no you can't! simon On Sat, Jul 9, 2011 at 11:11 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: I've stored the contents of some pages I no longer need. How can

Heads Up - Index File Format Change on Trunk

2011-06-10 Thread Simon Willnauer
Hey folks, I just committed LUCENE-3108 (Landing DocValues on Trunk) which adds a byte to FieldInfo. If you are running on trunk you must / should re-index any trunk indexes once you update to the latest trunk. its likely if you open up old trunk (4.0) indexes, you will get an exception related

Travel Assistance applications now open for ApacheCon NA 2011

2011-06-06 Thread Simon Willnauer
The Apache Software Foundation (ASF)'s Travel Assistance Committee (TAC) is now accepting applications for ApacheCon North America 2011, 7-11 November in Vancouver BC, Canada. The TAC is seeking individuals from the Apache community at-large --users, developers, educators, students, Committers,

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Simon Willnauer
On Mon, May 16, 2011 at 3:51 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, May 16, 2011 at 5:30 AM, Michael McCandless luc...@mikemccandless.com wrote: To be clear, I'm asking that Yonik revert his commit from yesterday (rev 1103444), where he added text_nwd fieldType and dynamic

Berlin Buzzwords - conference schedule released

2011-04-12 Thread Simon Willnauer
gives a presentation on how to integrate Solr with J2EE applications. The second day features presentations by Jonathan Gray on Facebook's use of HBase in their Messaging architecture, Dawid Weiss, Simon Willnauer and Uwe Schindler are showing the latest Apache Lucene developments, Mark Miller

[GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-11 Thread Simon Willnauer
Hey folks, Google Summer of Code 2011 is very close and the Project Applications Period has started recently. Now it's time to get some excited students on board for this year's GSoC. I encourage students to submit an application to the Google Summer of Code web-application. Lucene Solr are

Re: Lucene 2.9.x vs 3.x

2011-01-16 Thread Simon Willnauer
On Sat, Jan 15, 2011 at 2:19 PM, Salman Akram salman.ak...@northbaysolutions.net wrote: Hi, SOLR 1.4.1 uses Lucene 2.9.3 by default (I think so). I have few questions Are there any major performance (or other) improvements in Lucene 3.0.3/Lucene 2.9.4? you can see all major changes here:

Re: Lucene Scorer Extension?

2011-01-09 Thread Simon Willnauer
you should look into this http://wiki.apache.org/solr/FunctionQuery simon On Fri, Jan 7, 2011 at 3:59 PM, dante stroe dante.st...@gmail.com wrote: Hello,     What I am trying to do is build a personalized search engine. The aim is to have the resulting documents' scores depend on users'

Re: The search response time is too loong

2010-09-27 Thread Simon Willnauer
2010/9/27 newsam new...@zju.edu.cn: I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the response time is too long. Here is my scenario: 1. The index file is 8.2G. The doc num is 6110745. 2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem. I used Key:* to

Re: trie

2010-09-21 Thread Simon Willnauer
2010/9/21 Péter Király kirun...@gmail.com: You can read about it in Lucene in Action second edition. have a look at http://www.lucidimagination.com/developer/whitepaper/Whats-New-in-Apache-Lucene-3-0 page 4 to 8 should give you a good intro to the topic simon Péter 2010/9/21 Papp Richard

Re: Can I tell Solr to merge segments more slowly on an I/O starved system?

2010-09-19 Thread Simon Willnauer
On Sun, Sep 19, 2010 at 6:04 AM, Ron Mayer r...@0ape.com wrote: My system which has documents being added pretty much continually seems pretty well behaved except, it seems, when large segments get merged.     During that time the system starts really dragging, and queries that took only a

Re: No more trunk support for 2.9 indexes

2010-09-18 Thread Simon Willnauer
On Sat, Sep 18, 2010 at 4:13 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Since Lucene 3.0.2 is 'out there', does this mean the format is nailed down, : and some sort of porting is possible? : Does anyone know of a tool that can read the entire contents of a Solr index : and

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-13 Thread Simon Willnauer
On Mon, Sep 13, 2010 at 8:02 AM, Dennis Gearon gear...@sbcglobal.net wrote: BTW, what is a segment? On the Lucene level an index is composed of one or more index segments. Each segment is an index by itself and consists of several files like doc stores, proximity data, term dictionaries etc.

Re: stopwords in AND clauses

2010-09-13 Thread Simon Willnauer
On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria f...@hashref.com wrote: Let's suppose we have a regular search field body_t, and an internal boolean flag flag_t not exposed to the user. I'd like    body_t:foo AND flag_t:true this is solr right? why don't you use filterquery for you unexposed

Re: mm=0?

2010-09-13 Thread Simon Willnauer
On Mon, Sep 13, 2010 at 8:07 PM, Lance Norskog goks...@gmail.com wrote: Java Swing no longer gives ads for swinger's clubs. damned no i have to explicitly enter it?! - argh! :) simon On Mon, Sep 13, 2010 at 9:37 AM, Dennis Gearon gear...@sbcglobal.net wrote: I just tried several searches

Re: Field names

2010-09-13 Thread Simon Willnauer
On Tue, Sep 14, 2010 at 1:39 AM, Peter A. Kirk p...@alpha-solutions.dk wrote: Fantastic - that is exactly what I was looking for! But here is one thing I don't undertstand: If I call the url: http://localhost:8983/solr/admin/luke?numTerms=10fl=name Some of the result looks like: lst

Re: Solr memory use, jmap and TermInfos/tii

2010-09-12 Thread Simon Willnauer
On Sun, Sep 12, 2010 at 1:51 AM, Michael McCandless luc...@mikemccandless.com wrote: On Sat, Sep 11, 2010 at 11:07 AM, Burton-West, Tom tburt...@umich.edu wrote:  Is there an example of how to set up the divisor parameter in solrconfig.xml somewhere? Alas I don't know how to configure terms

Re: Solr memory use, jmap and TermInfos/tii

2010-09-12 Thread Simon Willnauer
On Sun, Sep 12, 2010 at 12:42 PM, Robert Muir rcm...@gmail.com wrote: On Sat, Sep 11, 2010 at 7:51 PM, Michael McCandless luc...@mikemccandless.com wrote: On Sat, Sep 11, 2010 at 11:07 AM, Burton-West, Tom tburt...@umich.edu wrote:  Is there an example of how to set up the divisor

Re: How to give path in SCRIPT tag?

2010-09-07 Thread Simon Willnauer
ankita, your questions seems to be somewhat unrelated to solr / lucene and should be asked somewhere else but not on this list. Please try to keep the focus of your questions to Solr related topics or use java-user@ for lucene related topics. Thanks, Simon On Tue, Sep 7, 2010 at 3:46 PM,

Re: minMergeDocs supported ?

2010-08-24 Thread Simon Willnauer
Hey, I guess this option has been removed in Lucene 2.0 - you could look as maxBufferedDocs and ramBufferSizeMB to control how many documents / heap space is used to buffer documents before they are flushed and merged into a new segment. Don't know what you are trying to do but those are the

Re: search multiple default fields

2010-07-06 Thread Simon Willnauer
Have a look at http://wiki.apache.org/solr/DisMaxRequestHandler and http://wiki.apache.org/solr/DisMaxRequestHandler#qf_.28Query_Fields.29 that might help with what you are looking for... simon On Tue, Jul 6, 2010 at 3:48 AM, bluestar sea...@butterflycluster.net wrote: hi there, is it

Re: Not split a field on whitespaces?

2010-07-05 Thread Simon Willnauer
Use solr.StrField or solr.KeywordTokenizerFactory instead. simon On Mon, Jul 5, 2010 at 2:47 PM, Sebastian Funk qbasti.f...@googlemail.com wrote: Hey there, I might be just to blind to see this, but isn't it possible to have a solr.TextField not getting filtered in any way. That means the

Re: Weird memory error.

2007-11-21 Thread Simon Willnauer
Actually when I look at the errormessage, this has nothing to do with memory. The error message: java.lang.OutOfMemoryError: unable to create new native thread means that the OS can not create any new native threads for this JVM. So the limit you are running into is not the JVM Memory. I guess

Re: Weird memory error.

2007-11-20 Thread Simon Willnauer
I'm using the Eclipse TPTP platfrom and I'm very happy with it. You will also find good howto or tutorial pages on the web. - simon On Nov 20, 2007 5:29 PM, Brian Carmalt [EMAIL PROTECTED] wrote: Can you recommend one? I am not familar with how to profile under Java. Yonik Seeley schrieb:

Re: Extending Solr's Admin functionality

2006-09-27 Thread Simon Willnauer
@Otis: I suggest we go a bit more in detail about the features solr should expose via JMX and talk about the contribution. I'd love to extend solr with more JMX support. On 9/27/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/26/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: On the other hand,

Re: Extending Solr's Admin functionality

2006-09-27 Thread Simon Willnauer
, and allowing users to plug it in or not. If I'm understanding that correctly then I'm quite +1 on JMX! And I suppose some of these adapters already have built in web service interfaces. Erik On Sep 27, 2006, at 6:20 AM, Simon Willnauer wrote: @Otis: I suggest we go a bit more in detail about

Re: Extending Solr's Admin functionality

2006-09-24 Thread Simon Willnauer
I followed the discussion the last 3 day and I still wondering why nobody turned up with an integration of solr monitoring and administration functionality using javas fantastic management extension JMX. I joined a team 2 years ago building a distributed webspider / searcher (similar to nutch).

Re: update partial document

2006-09-18 Thread Simon Willnauer
I'm not into the code of Solr at all but I know that Solr is based on the lucene core which has no kind of update mechanism. To update a document using lucene you have to delete and reinsert the document. That might be the reason for the solr behaviour as well. You should consider that lucene is

Re: does solr know classpath

2006-09-16 Thread Simon Willnauer
/solrwebapp/WEB-INF/lib to point out one solution best regards simon On 9/16/06, James liu [EMAIL PROTECTED] wrote: i set classpath where i put lucene-analyzers-2.0.0.jar...i can use it. but solr not find it.. where i should put it in?

Solr in production env.

2006-09-11 Thread Simon Willnauer
Hello, I almost convinced my boss to use Solr in production for a new project and hopefully for lots of following projects but I'm a bit confused that there is no release available for download. Is Solr still in a beta state, are there solr servers in production. Is it recommendable to use it in