Re: [Solr 4.0] what is stored in .tim index file format?

2012-04-17 Thread Robert Muir
This is the term dictionary for 4.0's default codec (currently uses BlockTree implementation) .tim is the on-disk portion of the terms (similar in function to .tis in previous releases) .tip is the in-memory "terms index" (similar in function to .tii in previous releases) On Tue, Apr 17, 2012 at

Re: codecs for sorted indexes

2012-04-12 Thread Robert Muir
On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas wrote: > Hello Michael, > > Yes, we are pre-sorting the documents before adding them to the index. We > have a score associated to every document (not an IR score but a > document-related score that reflects its "importance"). Therefore, the

Re: Suggester not working for digit starting terms

2012-04-12 Thread Robert Muir
On Thu, Apr 12, 2012 at 3:52 PM, jmlucjav wrote: > Well now I am really lost... > > 1. yes I want to suggest whole sentences too, I want the tokenizer to be > taken into account, and apparently it is working for me in 3.5.0?? I get > suggestions that are like "foo bar abc".  Maybe what you mention

Re: [ANNOUNCE] Apache Solr 3.6 released

2012-04-12 Thread Robert Muir
; -Original Message----- > From: Robert Muir [mailto:rm...@apache.org] > Sent: Thursday, April 12, 2012 1:33 PM > To: d...@lucene.apache.org; solr-user@lucene.apache.org; Lucene mailing list; > announce > Subject: [ANNOUNCE] Apache Solr 3.6 released > > 12 April 2012, Apac

[ANNOUNCE] Apache Solr 3.6 released

2012-04-12 Thread Robert Muir
12 April 2012, Apache Solr™ 3.6.0 available The Lucene PMC is pleased to announce the release of Apache Solr 3.6.0. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, facet

Re: Suggester not working for digit starting terms

2012-04-11 Thread Robert Muir
On Wed, Apr 11, 2012 at 4:37 PM, jmlucjav wrote: > Just to be sure, reproduced this with example config from 3.5. > Regardless of your tokenizer, be aware that with this version of solr its going to split up terms based on 'identifier rules' (including splitting on whitespace). This is because su

Re: Preserving punctuation tokens with ICUTokenizerFactory

2012-04-10 Thread Robert Muir
you can actually plug in customized grammars and stuff like that, but the simplest approach is to configure mappingcharfilter before your tokenizer, with mappings like: "c++" => "cplusplus" On Tue, Apr 10, 2012 at 11:50 AM, Demian Katz wrote: > It has been brought to my attention that ICUTokenize

Re: Highlighting a font without bold or italic modes

2012-03-13 Thread Robert Muir
Google and Baidu highlight chinese queries by making text red. On Mon, Mar 12, 2012 at 11:50 PM, Lance Norskog wrote: > How do you highlight terms in languages without boldface or italic > modes? Maybe raise the text size a couple of sizes just for that word? > > > -- > Lance Norskog > goks...@gm

Re: Solr 4.0 and production environments

2012-03-07 Thread Robert Muir
t; get it fixed*. Not only they will fix it, they will thank you for > bringing it up! > > I can, as an old user, 100 % vouch what Robert said below. > > Simply, just go for it, test you application a bit and make your users happy. > > > > > On Wed, Mar 7, 2012 at 5

Re: Solr 4.0 and production environments

2012-03-07 Thread Robert Muir
On Wed, Mar 7, 2012 at 11:47 AM, Dirceu Vieira wrote: > Hi All, > > Has anybody started using Solr 4.0 in production environments? Is it stable > enough? > I'm planning to create a proof of concept using solr 4.0, we have some > projects that will gain a lot with features such as near real time se

Re: Using multiple DirectSolrSpellcheckers for a query

2012-03-07 Thread Robert Muir
On Wed, Jan 25, 2012 at 12:55 PM, Nalini Kartha wrote: > > Is there any reason why Solr doesn't support using multiple spellcheckers > for a query? Is it because of performance overhead? > Thats not the case really, see https://issues.apache.org/jira/browse/SOLR-2926 I think the issue is that th

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread Robert Muir
On Fri, Mar 2, 2012 at 9:41 AM, Ahmet Arslan wrote: > >> Robert, I just tried with >> 3.6-SNAPSHOT 1296203 from svn - the problem is >> still there. >> >> I am just about to leave for a vacation. I'll try to open a >> JIRA issue this >> evening. > > Andrew, thanks for providing files. I also re-pr

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-02 Thread Robert Muir
On Fri, Mar 2, 2012 at 7:37 AM, andrew wrote: > I was able to create a test case. > > We are querying ranges of documents. When I tried to isolate the document > that causes trouble, I found it happens with exactly every second request > only for a single document query (it fails constantly when r

Re: Spelling Corrector Algorithm

2012-03-01 Thread Robert Muir
On Thu, Mar 1, 2012 at 6:43 AM, Husain, Yavar wrote: > Hi > > For spell checking component I set extendedResults to get the frequencies and > then select the word with the best frequency. I understand the spell check > algorithm based on Edit Distance. For an example: > > Query to Solr: Marien >

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Robert Muir
eb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote: > >> Please attach your docs if you dont mind. >> >> I worked up tests for this (in general for ANY phrase query, >> increasing the slop should never remove results, only potentially >> enlarge them). >> >

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Robert Muir
Dushay wrote: > Robert, > > I will create a jira issue with the documentation.  FYI, I tried ps values of > 3, 2, 1 and 0 and none of them worked with dismax;   For lucene QueryParser, > only the value of 0 got results. > > - Naomi > > > On Feb 23, 2012, at 11:12 AM,

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Robert Muir
n revolv through the antholog"~3 > > NO result > > > >> lucene QueryParser: >> >> URL:  q=all_search:"The Beatles as musicians : Revolver through the >> Anthology" >> final query:  all_search:"the beatl as musician revolv through

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Robert Muir
On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay wrote: > Jonathan has brought it to my attention that BOTH of my failing searches > happen to have 8 terms, and one of the terms is repeated: > >  "The Beatles as musicians : Revolver through the Anthology" >  "Color-blindness [print/digital]; its dan

Re: custom scoring

2012-02-16 Thread Robert Muir
On Thu, Feb 16, 2012 at 8:34 AM, Carlos Gonzalez-Cadenas wrote: > Hello all: > > We'd like to score the matching documents using a combination of SOLR's IR > score with another application-specific score that we store within the > documents themselves (i.e. a float field containing the app-specifi

Re: Search for hashtags and mentions

2012-02-15 Thread Robert Muir
On Wed, Feb 15, 2012 at 2:04 PM, Rohit wrote: > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" > preserveOriginal="1" handleAsChar="@#"/> There is no such parameter as 'handleAsChar'. If you want to do this, you need to u

Re: Solr / Tika Integration

2012-02-10 Thread Robert Muir
On Fri, Feb 10, 2012 at 6:18 AM, Dirk Högemann wrote: > > Our suggest component and parts of our search is getting hard to use by > this. Any other ideas? > Looks like https://issues.apache.org/jira/browse/PDFBOX-371 The title of the issue is a bit confusing (I don't think it should go to hyphen

Re: custom TokenFilter

2012-02-09 Thread Robert Muir
On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson wrote: > Again thanks.  I'll take a stab at that are you aware of any > resources/examples of how to do this?  I figured I'd start with > WhiteSpaceTokenizer but wasn't sure if there was a simpler place to > start. > Well, easiest is if you can build

Re: custom TokenFilter

2012-02-09 Thread Robert Muir
On Thu, Feb 9, 2012 at 8:28 PM, Jamie Johnson wrote: > Thanks Robert, I'll take a look there.  Does it sound like I'm on the > right the right track with what I'm implementing, in other words is a > TokenFilter appropriate or is there something else that would be a > better fit for what I've descr

Re: custom TokenFilter

2012-02-09 Thread Robert Muir
If you are writing a custom tokenstream, I recommend using some of the resources in Lucene's test-framework.jar to test it. These find lots of bugs! (including thread-safety bugs) For a filter: I recommend to use the assertions in BaseTokenStreamTestCase: assertTokenStreamContents, assertAnalyzesT

Re: Trying to understand SOLR memory requirements

2012-01-19 Thread Robert Muir
countryid, > c.plainname as countryname, p.timezone as timezone, r.id as regionid, > r.plainname as regionname from places p, regions r, countries c, cities c2 > where c2.id = p.cityid AND p.settingid = 1 AND p.regionid > 1 AND > p.countryid=c.id AND r.id=p.regionid" >            transformer="TemplateTransformer"> >             >             >    

Re: Trying to understand SOLR memory requirements

2012-01-19 Thread Robert Muir
>> at >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) >>  at >> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) >> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) >

Re: Trying to understand SOLR memory requirements

2012-01-17 Thread Robert Muir
how long it will take to > get a fix? Would I be better switching to trunk? Is trunk stable enough for > someone who's very much a SOLR novice? > > Thanks, > Dave > > On Mon, Jan 16, 2012 at 10:08 PM, Robert Muir wrote: > >> looks like https://issues.apache.org/j

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread Robert Muir
looks like https://issues.apache.org/jira/browse/SOLR-2888. Previously, FST would need to hold all the terms in RAM during construction, but with the patch it uses offline sorts/temporary files. I'll reopen the issue to backport this to the 3.x branch. On Mon, Jan 16, 2012 at 8:31 PM, Dave wrot

Re: GermanAnalyzer

2012-01-14 Thread Robert Muir
On Sat, Jan 14, 2012 at 5:09 PM, Lance Norskog wrote: > Has the GermanAnalyzer behavior changed at all? This is another kind > of mismatch, and it can cause very subtle problems.  If text is > indexed and queried using different Analyzers, queries will not do > what you think they should. It acts

Re: GermanAnalyzer

2012-01-14 Thread Robert Muir
On Sat, Jan 14, 2012 at 12:58 PM, wrote: > Hi, > > I'm switching from Lucene 2.3 to Solr 3.5. I want to reuse the existing > indexes (huge...). If you want to use a Lucene 2.3 index, then you should set this in your solrconfig.xml: LUCENE_23 > > In Lucene I use an untweaked org.apache.lucene.a

Re: feature of FST version of SynonymFilter affects Highlighter

2011-12-26 Thread Robert Muir
On Mon, Dec 26, 2011 at 10:54 AM, Koji Sekiguchi wrote: > I don't have JUnit test case. What I tried was: > > I have indexing time synonym definition: > > nhl, national hockey league > > and I indexed "I like national hockey league". > > Then I searched nhl with hl=on, I got an unwanted highlight

Re: feature of FST version of SynonymFilter affects Highlighter

2011-12-26 Thread Robert Muir
The old one didn't really handle this correctly either. Koji, what is the highlighting problem? Can we have a test case? 2011/12/26 Koji Sekiguchi : > I found that SynonymFilter javadoc says: > > "Matches single or multi word synonyms in a token stream. > This token stream cannot properly handle

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Robert Muir
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote: > It seems like there is some weird stuff going on when folding the > string, it can be seen in the analysis view, too: > > http://i.imgur.com/6B2Uh.png > I created a bug here, https://issues.apache.org/jira/browse/LUCENE-3642 Thanks for the screensho

Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams

2011-12-12 Thread Robert Muir
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote: > The end offset remains 11 even after folding and transforming "æ" to > "ae", which seems wrong to me. End offsets refer to the *original text* so this is correct. What is wrong, is EdgeNGramsFilter. See how it turns that 11 to a 12? > > I also stum

Re: codec="Pulsing" per field broken?

2011-12-11 Thread Robert Muir
On Sun, Dec 11, 2011 at 11:34 AM, eks dev wrote: > on the latest trunk, my schema.xml with field type declaration > containing //codec="Pulsing"// does not work any more (throws > exception from FieldType). It used to work wit approx. a month old > trunk version. > > I didn't dig deeper, can be th

Re: Solr Lucene Index Version

2011-12-08 Thread Robert Muir
On Thu, Dec 8, 2011 at 12:55 PM, Jamie Johnson wrote: > Thanks Andrzej.  I'll continue to follow the portable format JIRA > along with 3622, are there any others that you're aware of that are > blockers that would be useful to watch? > There is a lot to be done, particularly norms and deleted doc

Re: Solr Lucene Index Version

2011-12-08 Thread Robert Muir
On Thu, Dec 8, 2011 at 10:46 AM, Mark Miller wrote: > > On Dec 8, 2011, at 8:50 AM, Jamie Johnson wrote: > >> Isn't the codec stuff merged with trunk now? > > Robert merged this recently AFAIK. > true but that issue only moved the majority of the rest of the index (stored fields, term vectors, fi

Re: RegexQuery performance

2011-12-08 Thread Robert Muir
On Thu, Dec 8, 2011 at 11:01 AM, Jay Luker wrote: > Hi, > > I am trying to provide a means to search our corpus of nearly 2 > million fulltext astronomy and physics articles using regular > expressions. A small percentage of our users need to be able to > locate, for example, certain types of iden

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread Robert Muir
On Tue, Nov 29, 2011 at 9:21 AM, elisabeth benoit wrote: > ok, thanks. > > I think it would be a nice improvment to consider inversion as distance = > 1, since it's a so common mistake. The distance = 2 makes it difficult to > correct transpositions on small words (for instance, the DirectSpellChe

Re: Solr 4.0 Levenshtein distance algorithm for DirectSpellChecker

2011-11-29 Thread Robert Muir
On Tue, Nov 29, 2011 at 8:07 AM, elisabeth benoit wrote: > Hello, > > I'd like to know if the Levensthein distance algorithm used by Solr 4.0 > DirectSpellChecker (working quite well I must say) is considering an > inversion as distance = 1 or distance = 2? > > For instance, if I write Monteruil a

Re: DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Robert Muir
On Mon, Nov 28, 2011 at 4:36 PM, Phil Hoy wrote: > Added issue: https://issues.apache.org/jira/browse/SOLR-2926 > Please let me know if more information needs adding to JIRA. > > Phil > Thanks, I'll followup on the issue -- lucidimagination.com

Re: DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Robert Muir
technically it could? I'm just not sure if the current spellchecking apis allow for it? But maybe someone has a good idea on how to easily expose this. I think its a good idea. Care to open a JIRA issue? On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy wrote: > Hi, > > Can the DirectSolrSpellChecker b

Re: trouble with CollationKeyFilter

2011-11-27 Thread Robert Muir
On Sat, Nov 26, 2011 at 8:43 PM, Michael Sokolov wrote: > That's great news!  We can't really track trunk, but it looks like this is > targeted for 3.6, right? As a short-term alternative, I was considering > using ICUFoldingFilter; this won't preserve some of the finer distinctions, > but will at

Re: trouble with CollationKeyFilter

2011-11-25 Thread Robert Muir
On Wed, Nov 23, 2011 at 11:22 PM, Michael Sokolov wrote: > Thanks for confirming that, and laying out the options, Robert. > FYI: Erick committed the multiterm stuff, so I opened an issue for this: https://issues.apache.org/jira/browse/SOLR-2919 -- lucidimagination.com

Re: trouble with CollationKeyFilter

2011-11-23 Thread Robert Muir
hi, locale sensitive range queries don't work with these filters, only sort, although erick erickson has a patch that will enable this (the lowercasing wildcards patch, then you could add this filter to your multiterm chain). separately locale range queries and sort both work easily on trunk (wit

Re: [Solr-3.4] Norms file size is large in case of many unique indexed fields in index

2011-11-10 Thread Robert Muir
what is the point of a unique indexed field? If for all of your fields, there is only one possible document, you don't need length normalization, scoring, or a search engine at all... just use a HashMap? On Thu, Nov 10, 2011 at 7:42 AM, Ivan Hrytsyuk wrote: > Hello everyone, > > We have large in

Re: SolrCloud with large synonym files

2011-11-02 Thread Robert Muir
On Wed, Nov 2, 2011 at 8:53 AM, Phil Hoy wrote: > It is solr 4.0 and uses the new FSTSynonymFilterFactory i believe but defers > to ZkSolrResourceLoader to load the synonym file when in cloud mode. > Phil > FYI: The synonyms implementation supports multiple formats (currently "solr" and "wordnet

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Robert Muir
On Fri, Oct 28, 2011 at 8:10 PM, Jason Rutherglen wrote: >> Otherwise we have "flexible indexing" where "flexible" means "slower >> if you do anything but the default". > > The other encodings should exist as modules since they are pluggable. > 4.0 can ship with the existing codec.  4.1 with addit

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Robert Muir
On Fri, Oct 28, 2011 at 5:03 PM, Jason Rutherglen wrote: > +1 I suggested it should be backported a while back.  Or that Lucene > 4.x should be released.  I'm not sure what is holding up Lucene 4.x at > this point, bulk postings is only needed useful for PFOR. This is not true, most modern index

Re: changing omitNorms on an already built index

2011-10-27 Thread Robert Muir
On Thu, Oct 27, 2011 at 6:00 PM, Simon Willnauer wrote: > we are not actively removing norms. if you set omitNorms=true and > index documents they won't have norms for this field. Yet, other > segment still have norms until they get merged with a segment that has > no norms for that field ie. omit

Re: stemEnglishPossessive and contractions

2011-10-19 Thread Robert Muir
The word delimiter filter also does other things, it treats ' as punctuation by default. So it normally splits on ', except if its 's (in this case it removes the 's completely if you use this stemEnglishPossessive). There are a couple approaches you can use: 1. you can keep worddelimiterfilter wi

Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread Robert Muir
On Wed, Oct 5, 2011 at 3:03 PM, David Ryan wrote: > Do you mean both BM25 and BM25F? > > No, BM25F and other "fielded" or structured models are somewhat different. In these model, if you have two fields (body/title) you are saying that "dogs" in body is actually the same term as "dogs" in title.

Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread Robert Muir
On Wed, Oct 5, 2011 at 2:23 PM, David Ryan wrote: > Hi, > > According to the IRA issue 2959, > https://issues.apache.org/jira/browse/LUCENE-2959 > > BM25 will be included in the next release of LUCENE. > > 1). Will BM25F be included in the next release as well as part > of LUCENE-2959? should be

Re: Indexing PDF

2011-10-04 Thread Robert Muir
Your persian pdf problem is different, and already taken care of in pdfbox trunk https://issues.apache.org/jira/browse/PDFBOX-1127 On Tue, Oct 4, 2011 at 2:04 PM, ahmad ajiloo wrote: > I have this problem too, in indexing some of persian pdf files. > > 2011/10/4 Héctor Trujillo > >> Hi all, I'm

Re: payloads - Inconsistency between the document score and the explain score

2011-09-27 Thread Robert Muir
https://issues.apache.org/jira/browse/LUCENE-3421 Note: if you are using this 'includeSpanScore=false' (which I think you are, as thats where the bug applies), be aware this means the score is *only* the result of your payload, boosts, tf, length normalization, idf, none of this is incorporated in

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-21 Thread Robert Muir
On Tue, Sep 20, 2011 at 12:32 PM, Michael McCandless wrote: > > Or: is it possible you reopened the reader several times against the > index (ie, after committing from Solr)?  If so, I think 2.9.x never > unmaps the mapped areas, and so this would "accumulate" against the > system limit. In order

Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-20 Thread Robert Muir
On Mon, Sep 19, 2011 at 9:57 AM, Burton-West, Tom wrote: > Thanks Robert, > > Removing "set" from " setMaxMergedSegmentMB" and using "maxMergedSegmentMB" > fixed the problem. > ( Sorry about the multiple posts.  Our mail server was being flaky and the > client lied to me about whether the messag

Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-16 Thread Robert Muir
On Fri, Sep 16, 2011 at 6:53 PM, Burton-West, Tom wrote: > Hello, > > The TieredMergePolicy has become the default with Solr 3.3, but the > configuration in the example uses the mergeFactor setting which applys to the > LogByteSizeMergePolicy. > > How is the mergeFactor interpreted by the Tiered

Re: Faceted Search Patent Lawsuit - Please Read

2011-08-17 Thread Robert Muir
On Wed, Aug 17, 2011 at 3:12 AM, Tomas Zerolo wrote: > On Tue, Aug 16, 2011 at 03:58:29PM -0400, Grant Ingersoll wrote: >> I know you mean well and are probably wondering what to do next [...] > > Still, a short heads-up like Johnson's would seem OK? > > After all, this is of concern to us all. >

Re: Exception DirectSolrSpellChecker when using spellcheck.q

2011-08-15 Thread Robert Muir
what subversion revision are you using? I think you just need to svn up, as from the line number I can tell its before I fixed this bug in trunk :) On Fri, Aug 12, 2011 at 11:36 AM, O. Klein wrote: > Spellchecker works fine, but when using spellcheck.q it gives following > exception (queryAnalyze

Re: Can't mix Synonyms with Shingles?

2011-08-10 Thread Robert Muir
On Wed, Aug 10, 2011 at 7:10 PM, Jeff Wartes wrote: > > After some further playing around, I think I understand what's going on. > Because the SynonymFilterFactory pays attention to term position when it > inserts a multi-word synonym, I had assumed it scanned for matches in a way > that respec

Re: Loading huge synonym list in Solr

2011-08-04 Thread Robert Muir
https://issues.apache.org/jira/browse/LUCENE-3233 On Thu, Aug 4, 2011 at 7:24 PM, Arun Atreya wrote: > Hello, > > I would like to know the best way to load a huge synonym list into Solr. > > I would like to do concept indexing (a.k.a category indexing) with Solr. For > example, I want to be able

Re: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.ICUTokenizerFactory'

2011-08-02 Thread Robert Muir
did you add the analysis-extras jar itself? thats what has this factory. On Tue, Aug 2, 2011 at 5:03 AM, Satish Talim wrote: > I am using Solr 3.3 on a Windows box. > > I want to use the solr.ICUTokenizerFactory in my schema.xml and added the > fieldType name="text_icu" as per the URL - > http://

Re: Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Robert Muir
On Wed, Jul 27, 2011 at 4:12 PM, Fuad Efendi wrote: > Thanks Robert!!! > > "Submitted On 26-JUL-2011" - yesterday. > > This option was popular in HbaseŠ Then you should tell them also, not to use it, if they want their loops to work. -- lucidimagination.com

Re: Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Robert Muir
Don't use this option, these optimizations are buggy: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134 On Wed, Jul 27, 2011 at 3:56 PM, Fuad Efendi wrote: > Anyone tried this? I can not start Solr-Tomcat with following options on > Ubuntu: > > JAVA_OPTS="$JAVA_OPTS -Xms2048m -Xmx2048m

Re: fst must be non null

2011-07-11 Thread Robert Muir
ually have any synonyms, so it could indicate a configuration mistake. On Tue, Jul 12, 2011 at 12:02 AM, Stuart King wrote: > Sorry Robert, > > What does that mean? Should I be providing synonyms in my queries? > > Cheers > > Stu > > On Tue, Jul 12, 2011 at 1:49 PM,

Re: fst must be non null

2011-07-11 Thread Robert Muir
I just committed a fix for this, to warn that you are using an empty set of synonyms instead of error. On Mon, Jul 11, 2011 at 10:50 PM, Stuart King wrote: > I have been building and running against trunk. In my build I have a number > of tests, testing solr functionality within my app. > > As of

Re: Cannot I search documents added by IndexWriter after commit?

2011-07-05 Thread Robert Muir
re-open does work, but you cannot ignore its return value! see the javadocs for an example. On Tue, Jul 5, 2011 at 3:10 PM, Gabriele Kahlout wrote: > Re-open doens't work, but open does. > > @Test >    public void testUpdate() throws IOException, > ParserConfigurationException, SAXException, Pars

[ANNOUNCE] Apache Solr 3.3

2011-06-30 Thread Robert Muir
July 2011, Apache Solr™ 3.3 available The Lucene PMC is pleased to announce the release of Apache Solr 3.3. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted searc

Re: Solr 3.1 indexing error Invalid UTF-8 character 0xffff

2011-06-27 Thread Robert Muir
On Mon, Jun 27, 2011 at 8:47 AM, Bernd Fehling wrote: > > correct!!! > but what i said, is totally different than what you said. you are still wrong.

Re: Solr 3.1 indexing error Invalid UTF-8 character 0xffff

2011-06-27 Thread Robert Muir
On Mon, Jun 27, 2011 at 8:30 AM, Bernd Fehling wrote: > Unicode U+ ist UTF-8 byte sequence "ef bf bf" that is right. > > But I was saying that UTF-8 0x (which is byte sequence "ff ff") is > illegal > and that's what the java.io.CharConversionException is complaining about. > "Invalid UTF-

Re: Solr 3.1 indexing error Invalid UTF-8 character 0xffff

2011-06-27 Thread Robert Muir
On Mon, Jun 27, 2011 at 7:11 AM, Bernd Fehling wrote: > > So there is no UTF-8 0x. It is illegal. > you are wrong: it is legally encoded as a three byte sequence: ef bf bf

Re: Problem with SolrTestCaseJ4

2011-06-23 Thread Robert Muir
On Thu, Jun 23, 2011 at 4:10 AM, Tarjei Huse wrote: > On 06/20/2011 01:51 PM, Robert Muir wrote: >> you must use junit 4.7.x, not junit 4.8.x > Is there a way around this? > No, the only thing option we can do is decide to require 4.8 > Depending on a specific Junit version

Re: size of synonyms.txt

2011-06-22 Thread Robert Muir
On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling wrote: > While trying some synonyms.txt files I noticed a huge increase of heap > usage. > > synonyms_1.txt --> 6645 lines (2826104 bytes in size) > results in 66364 entries in SynonymMap with 730MB heap usage. > Startup time about 2 minutes. > > syn

Re: Optimize taking two steps and extra disk space

2011-06-21 Thread Robert Muir
the problem is that before https://issues.apache.org/jira/browse/SOLR-2567, Solr invoked the TieredMergePolicy "setters" *before* it tried to apply these 'global' mergeFactor etc params. So, even if you set them explicitly inside the , they would then get clobbered by these 'global' params / defau

Re: PorterStemFilter kills JVM

2011-06-20 Thread Robert Muir
if you can create a issue, with a reproducible test, we can try to come up with a workaround... no promises but I'd be willing to give it a shot. On Mon, Jun 20, 2011 at 10:11 AM, Bernd Fehling wrote: > > Now this is a good one, PorterStemFilter kills JVM (reproducible). > > Should I post this on

Re: Problem with SolrTestCaseJ4

2011-06-20 Thread Robert Muir
you must use junit 4.7.x, not junit 4.8.x On Mon, Jun 20, 2011 at 6:21 AM, Jakob Vad Nielsen wrote: > Hi, > > I'm trying to create some integrations tests within my project using JUnit > and the SolrTestCaseJ4 (from Solr-test-framework 3.2.0) helper class. The > problem is that I'm getting an Ass

Re: score of Infinity on dismax query

2011-06-19 Thread Robert Muir
This is a bug, thanks for including all the information necessary to reproduce! https://issues.apache.org/jira/browse/LUCENE-3215 On Sun, Jun 19, 2011 at 10:24 PM, Chris Book wrote: > Hello, I have a solr search server running and in at least one very rare > case, I'm seeing a strange scoring re

Re: It's not possible to decide at run-time which similarity class to use, right?

2011-06-16 Thread Robert Muir
On Thu, Jun 16, 2011 at 3:23 PM, Gabriele Kahlout wrote: >> I'm trying to assess the impact of coord (search-time) on Qtime. In one > implementation coord returns 1, while in another it's actually computed. On query time? coord should be really cheap (unless your impl does something like calcula

Re: International filters/tokenizers doing too much

2011-06-14 Thread Robert Muir
On Tue, Jun 14, 2011 at 7:07 PM, Shawn Heisey wrote: > Because the text in my index comes in many different languages with no > ability to know the language ahead of time, I have a need to use > ICUTokenizer and/or the CJK filters, but I have a problem with them as they > are implemented currently

[ANNOUNCE] Apache Solr 3.2

2011-06-04 Thread Robert Muir
June 2011, Apache Solr 3.2™ available The Lucene PMC is pleased to announce the release of Apache Solr 3.2. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted searc

Re: Disable IDF scoring on certain fields

2011-05-17 Thread Robert Muir
On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma wrote: > If you still want IDF for other fields then i > think you have a problem because Solr doesn't yet support per-field > similarity. > it does in trunk: https://issues.apache.org/jira/browse/SOLR-2338

Re: K-Stemmer for Solr 3.1

2011-05-16 Thread Robert Muir
On Mon, May 16, 2011 at 5:33 PM, Smiley, David W. wrote: > Lucid's KStemmer is LGPL and the Solr committers have shown that they don't > want LGPL libraries shipping with Solr. If you are intent on releasing your > changes, I suggest attaching both the modified source and the compiled jar > ont

Re: Results with and without whitespace(soccer club and soccerclub)

2011-05-13 Thread Robert Muir
On Fri, May 13, 2011 at 7:07 AM, Paul Libbrecht wrote: > I sure wish such a compound-analysis would be done with a lucene-powered > dictionary! > That would rock. > me too, but its a chicken-and-egg problem (you would have to basically index everything without decomposition to get the dictionar

Re: Has NRT been abandoned?

2011-05-01 Thread Robert Muir
On Sun, May 1, 2011 at 11:28 AM, Andy wrote: > Hi, > > I read on this mailing list previously that NRT was implemented in 4.0, it > just  wasn't ready for production yet. Then I looked at the wiki > (http://wiki.apache.org/solr/NearRealtimeSearch). It listed 2 jira issues > related to NRT: SOLR

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Robert Muir
e, "PET" might be a > synonym of "positron emission tomography", but "pet" wouldn't be. > > -Mike > > On 04/26/2011 09:51 AM, Robert Muir wrote: >> >> On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic >>  wrote: >> >> &

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Robert Muir
On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic wrote: > But somehow this feels bad (well, so does sticking word variations in what's > supposed to be a synonyms file), partly because it means that the person > adding > new synonyms would need to know what they stem to (or always check it aga

Re: Problem with autogeneratePhraseQueries

2011-04-26 Thread Robert Muir
What do you have in solrconfig.xml for luceneMatchVersion? If you don't set this, then its going to default to "Lucene 2.9" emulation so that old solr 1.4 configs work the same way. I tried your example and it worked fine here, and I'm guessing this is probably whats happening. the default in the

Re: Good protwords.txt ?

2011-04-25 Thread Robert Muir
On Mon, Apr 25, 2011 at 2:05 PM, Otis Gospodnetic wrote: > Hi, > > Are there any good / comprehensive examples of protwords.txt for English? > Or good stemdict.txt examples that work with StemmerOverrideFilterFactory? > > Would be good to have a good example to include in Solr distribution... > I

Re: Solr - Multi Term highlighting issue

2011-04-23 Thread Robert Muir
On Sat, Apr 23, 2011 at 11:36 PM, Ramanathapuram, Rajesh wrote: > What is really weird is if I search for srchterm1 and srchterm2 > separately, the results come up fine. If I search for multiple terms, > this issue seems to happen when the terms are separated by html tags and > special characters

Re: Localized alphabetical order

2011-04-22 Thread Robert Muir
On Fri, Apr 22, 2011 at 3:09 PM, Bently Preece wrote: > What if there is no standard localization already?  The case I'm > specifically interested in is Ojibwe. > this is standard? to sort a field with a specific locale, you have to tell it the locale you want. if you use the ICU implementation y

Re: Localized alphabetical order

2011-04-22 Thread Robert Muir
On Fri, Apr 22, 2011 at 2:37 PM, Bently Preece wrote: > Thank you.  This looks like the right direction. > > I see the docs say ICUCollationKeyFilterFactory is deprecated in favor of > ICUCollationField.  So ... I'd implement a subclass of ICUCollationField, > and use that as the fieldtype in sche

Re: Localized alphabetical order

2011-04-22 Thread Robert Muir
please see http://wiki.apache.org/solr/UnicodeCollation In general the idea is similar to how this is handled in databases, you can index collation keys into a sort field at analysis time, then you just do a standard solr sort. However, I am not sure if your JRE provides a "haw" Locale for the Ha

Re: Bug in solr.KeywordMarkerFilterFactory?

2011-04-20 Thread Robert Muir
No, this is only a bug in analysis.jsp. you can see this by comparing analysis.jsp's "dontstems bees" to using the query debug interface: "dontstems bees" "dontstems bees" PhraseQuery(text:"dontstems bee") text:"dontstems bee" On Wed, Apr 20, 2011 at 2:43 PM, Yonik Seeley wrote: > On We

Re: HTMLStripCharFilterFactory, highlighting and InvalidTokenOffsetsException

2011-04-20 Thread Robert Muir
Hi, there is a proposed patch uploaded to the issue. Maybe you can help by reviewing/testing it? 2011/4/20 Robert Gründler : > Hi all, > > i'm getting the following exception when using highlighting for a field > containing HTMLStripCharFilterFactory: > > org.apache.lucene.search.highlight.Invalid

Re: Solr 3.1 ICU filters (error loading class)

2011-04-18 Thread Robert Muir
On Mon, Apr 18, 2011 at 1:31 PM, Demian Katz wrote: > Hello, > > I'm interested in trying out the new ICU features in Solr 3.1.  However, when > I attempt to set up a field type using solr.ICUTokenizerFactory and/or > solr.ICUFoldingFilterFactory, Solr refuses to start up, issuing "Error > load

Re: AbstractSolrTestCase and Solr 3.1.0

2011-04-12 Thread Robert Muir
On Tue, Apr 12, 2011 at 6:44 AM, Tommaso Teofili wrote: > Hi all, > I am porting a previously series of Solr plugins developed for 1.4.1 version > to 3.1.0, I've written some integration tests extending the > AbstractSolrTestCase [1] utility class but now it seems that wasn't included > in the sol

Re: Encoding issue on synonyms.txt

2011-04-07 Thread Robert Muir
On Thu, Apr 7, 2011 at 2:13 PM, Siddharth Powar wrote: > Hey guys, > > I am in the process of moving to solr3.1 from solr1.4. I am having this > issue where solr3.1 now complains about the synonyms.txt file. I get the > following error: > *org.apache.solr.common.SolrException: Error loading resour

Re: Eclipse: Invalid character constant

2011-04-05 Thread Robert Muir
in eclipse you need to set your project's character encoding to UTF-8. if you are checking out the source code from svn, you can run 'ant eclipse' from the top level, and then hit refresh on your project. it will set your encoding and your classpath up. On Tue, Apr 5, 2011 at 6:10 PM, Eric Groble

Re: FW: no results searching for stadium seating chairs

2011-03-30 Thread Robert Muir
There are some new features in 3.1 to make it easier to tune this stuff, especially: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/src/java/org/apache/solr/analysis/StemmerOverrideFilterFactory.java This takes a tab separate list of words->stems, and sets a flag to any down

<    1   2   3   4   >