[ANNOUNCE] Apache Solr 4.9.0 released

2014-06-25 Thread Robert Muir
25 June 2014, Apache Solr™ 4.9.0 available The Lucene PMC is pleased to announce the release of Apache Solr 4.9.0 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

[ANNOUNCE] Apache Solr 4.8.1 released

2014-05-20 Thread Robert Muir
May 2014, Apache Solr™ 4.8.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.8.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

[ANNOUNCE] Apache Solr 4.7.2 released.

2014-04-15 Thread Robert Muir
April 2014, Apache Solr™ 4.7.2 available The Lucene PMC is pleased to announce the release of Apache Solr 4.7.2 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: Unable to get offsets using AtomicReader.termPositionsEnum(Term)

2014-03-10 Thread Robert Muir
Hello, I think you are confused between two different index structures, probably because of the name of the options in solr. 1. indexing term vectors: this means given a document, you can go lookup a miniature inverted index just for that document. That means each document has term vectors which

Re: ANNOUNCE: Apache Solr Reference Guide for 4.7

2014-03-05 Thread Robert Muir
I debugged the PDF a little. FWIW, the following code (using iText) takes it to 9MB: public static void main(String args[]) throws Exception { Document document = new Document(); PdfSmartCopy copy = new PdfSmartCopy(document, new FileOutputStream(/home/rmuir/Downloads/test.pdf));

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
you need the solr analysis-extras jar in your classpath, too. On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer fischer...@aon.at wrote: Hello, I'm migrating to solr 4.6.1 and have problems with the ICUCollationField (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100). I get consistently the

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
mentioned are loaded. Do you know which jar is supposed to contain the ICUCollationField? Best regards Thomas Am 19.02.2014 um 13:54 schrieb Robert Muir: you need the solr analysis-extras jar in your classpath, too. On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer fischer...@aon.at

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
-analysis-extras-4.6.1.jar in dist. Best Thomas Am 19.02.2014 um 14:27 schrieb Robert Muir: you need the solr analysis-extras jar itself, too. On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer fischer...@aon.at wrote: Hello Robert, I already added contrib/analysis-extras/lib

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
On Wed, Feb 19, 2014 at 10:33 AM, Thomas Fischer fischer...@aon.at wrote: Hmm, for standardization of text fields, collation might be a little awkward. I arrived there after using custom rules for a while (see RuleBasedCollator on http://wiki.apache.org/solr/UnicodeCollation) and then

[ANNOUNCE] Apache Solr 4.6.1 released.

2014-01-28 Thread Robert Muir
January 2014, Apache Solr™ 4.6.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.6.1Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: Tracking down the input that hits an analysis chain bug

2014-01-03 Thread Robert Muir
This exception comes from OffsetAttributeImpl (e.g. you dont need to index anything to reproduce it). Maybe you have a missing clearAttributes() call (your tokenizer 'returns true' without calling that first)? This could explain it, if something like a StopFilter is also present in the chain:

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Robert Muir
no, its turned on by default in the default similarity. as i said, all that is necessary is to fix your analyzer to emit the proper position increments. On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: In order to set discountOverlaps to true you must have

Re: Bad fieldNorm when using morphologic synonyms

2013-12-08 Thread Robert Muir
its accurate, you are wrong. please, look at setDiscountOverlaps in your similarity. This is really easy to understand. On Sun, Dec 8, 2013 at 7:23 AM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Robert, you last reply is not accurate. It's true that the field norms and termVectors are

Re: Bad fieldNorm when using morphologic synonyms

2013-12-06 Thread Robert Muir
Your analyzer needs to set positionIncrement correctly: sounds like its broken. On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, we implemented a morphologic analyzer, which stems words on index time. For some reasons, we index both the original word and the stem

Re: Bad fieldNorm when using morphologic synonyms

2013-12-06 Thread Robert Muir
) positions look all right (for me). 2) fieldNorm is determined by the size of the termVector, isn't it? the termVector size isn't affected by the positions. On Fri, Dec 6, 2013 at 10:46 AM, Robert Muir rcm...@gmail.com wrote: Your analyzer needs to set positionIncrement correctly: sounds like its

Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Robert Muir
which example? there are so many. On Wed, Nov 13, 2013 at 1:00 PM, Mark Miller markrmil...@gmail.com wrote: RE: the example folder It’s something I’ve been pushing towards moving away from for a long time - see https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to

Re: Background merge errors with Solr 4.4.0 on Optimize call

2013-10-29 Thread Robert Muir
I think its a bug, but thats just my opinion. i sent a patch to dev@ for thoughts. On Tue, Oct 29, 2013 at 6:09 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, so you're saying that merging indexes where a field has been removed isn't handled. So you have some documents that do have a

Re: Problems installing Solr4 in Jetty9

2013-08-17 Thread Robert Muir
On Sat, Aug 17, 2013 at 3:59 AM, Chris Collins ch...@geekychris.com wrote: I am using 4.4 in an embedded mode and found that it has a dependency on hadoop 2.0.5. alpha that in turn depends on jetty 6.1.26 which I think pre-dates electricity :-} I think this is only a test dependency ?

Re: PostingsHighlighter returning fields which don't match

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 3:53 AM, ses stew...@ssims.co.uk wrote: We are trying out the new PostingsHighlighter with Solr 4.2.1 and finding that the highlighting section of the response includes self-closing tags for all the fields in hl.fl (by default for edismax it is all fields in qf) where

Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 5:29 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : why? Those are my sort fields and they are occupying a lot of space (doubled : in this case but I see that sometimes I have three or four old segment : references) : : Is there something I can do to remove

Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 5:58 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : FieldCaches are managed using a WeakHashMap - so once the IndexReader's : associated with those FieldCaches are no logner used, they will be garbage : collected when and if the JVMs garbage collector get

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
did you do a (real) commit before trying to use this? I am not sure how this splitting works, but at least the merge option requires that. i can't see this happening unless you are somehow splitting a 0 document index (or, if the splitter is creating 0 document splits) so this is likely just a

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
Well, i meant before, but i just took a look and this is implemented differently than the merge one. In any case, i think its the same bug, because I think the only way this can happen is if somehow this splitter is trying to create a 0-document split (or maybe a split containing all deletions).

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
On Tue, Aug 13, 2013 at 11:39 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The splitting code calls commit before it starts the splitting. It creates a LiveDocsReader using a bitset created by the split. This reader is merged to an index using addIndexes. Shouldn't the addIndexes

Re: Is there a way to store binary data (byte[]) in DocValues?

2013-08-12 Thread Robert Muir
On Mon, Aug 12, 2013 at 8:38 AM, Mathias Lux m...@itec.uni-klu.ac.at wrote: Hi! I'm basically searching for a method to put byte[] data into Lucene DocValues of type BINARY (see [1]). Currently only primitives and Strings are supported according to [1]. I know that this can be done with a

Re: Is there a way to store binary data (byte[]) in DocValues?

2013-08-12 Thread Robert Muir
On Mon, Aug 12, 2013 at 12:25 PM, Mathias Lux m...@itec.uni-klu.ac.at wrote: Another thing for not using the the SORTED_SET and SORTED implementations is, that Solr currently works with Strings on that and I want to have a small memory footprint for millions of images ... which does not go

Re: Purging unused segments.

2013-08-09 Thread Robert Muir
On Fri, Aug 9, 2013 at 7:48 PM, Erick Erickson erickerick...@gmail.com wrote: So is there a good way, without optimizing, to purge any segments not referenced in the segments file? Actually I doubt that optimizing would even do it if I _could_, any phantom segments aren't visible from the

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Robert Muir
On Mon, Aug 5, 2013 at 11:42 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I agree with you, 0xfffe is a special character, that is why I was asking : how it's handled in solr. : In my document, 0xfffe does not appear at the beginning, it's in the : content. Unless i'm

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Robert Muir
On Mon, Aug 5, 2013 at 3:03 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : 0xfffe is not a special character -- it is explicitly *not* a character in : Unicode at all, it is set asside as not a character. specifically so : that the character 0xfeff can be used as a BOM, and if the

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Robert Muir
If you use wikipediatokenizer it will tag different wiki elements with different types (you can see it in the admin UI). so then followup with typetokenfilter to only filter the types you care about, and i think it will do what you want. On Tue, Jul 23, 2013 at 7:53 AM, Furkan KAMACI

Re: Using per-segment FieldCache or DocValues in custom component?

2013-07-02 Thread Robert Muir
Where do you get the docid from? Usually its best to just look at the whole algorithm, e.g. docids come from per-segment readers by default anyway so ideally you want to access any per-document things from that same segmentreader. As far as supporting docvalues, FieldCache API passes thru to

Re: Are there any plans to change example directory layout?

2013-06-11 Thread Robert Muir
If you have a good idea... Just do it. Open an issue On Jun 11, 2013 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I think it is quite hard for beginners that basic solr example directory is competing for attention with other - nested - examples. I see quite a lot of questions on

Re: Requesting to add into a Contributor Group

2013-05-05 Thread Robert Muir
done. let us know if you have any problems. On Sat, May 4, 2013 at 10:12 AM, Krunal jariwalakru...@gmail.com wrote: Dear Sir, Kindly add me to the contributor group to help me contribute to the Solr wiki. My Email id: jariwalakru...@gmail.com Login Name: Krunal Specific changes I would

Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Robert Muir
On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen j...@mcb.dk wrote: Schema with DocValues attempt at solving problem: http://pastebin.com/Ne23NnW4 Config: http://pastebin.com/x1qykyXW This schema isn't using docvalues, due to a typo in your config. it should not be DocValues=true but

Re: Fuzzy Suggester and exactMatchFirst

2013-03-18 Thread Robert Muir
On Sun, Mar 17, 2013 at 8:19 PM, Eoghan Ó Carragáin eoghan.ocarrag...@gmail.com wrote: I can see why the Fuzzy Suggester sees college as a match for colla but expected the exactMatchFirst parameter to ensure that suggestions beginning with colla to be weighted higher than fuzzier matches. I

Re: Out of Memory doing a query Solr 4.2

2013-03-15 Thread Robert Muir
On Fri, Mar 15, 2013 at 6:46 AM, raulgrande83 raulgrand...@hotmail.com wrote: Thank you for your help. I'm afraid it won't be so easy to change de jvm version, because it is required at the moment. It seems that Solr 4.2 supports Java 1.6 at least. Is that correct? Could you find any clue of

Re: Out of Memory doing a query Solr 4.2

2013-03-14 Thread Robert Muir
On Thu, Mar 14, 2013 at 12:07 PM, raulgrande83 raulgrand...@hotmail.com wrote: JVM: IBM J9 VM(1.6.0.2.4) I don't recommend using this JVM.

Re: Using suggester for smarter phrase autocomplete

2013-03-13 Thread Robert Muir
On Wed, Mar 13, 2013 at 11:07 AM, Eric Wilson wilson.eri...@gmail.com wrote: I'm trying to use the suggester for auto-completion with Solr 4. I have followed the example configuration for phrase suggestions at the bottom of this wiki page:

Re: It seems a issue of deal with chinese synonym for solr

2013-03-12 Thread Robert Muir
I agree. Actually that top-level logic is fine. its the loop that follows thats wrong: it needs to look at position increment and do the right thing. Want to open a JIRA issue? On Mon, Mar 11, 2013 at 9:15 PM, 李威 li...@antvision.cn wrote: in org.apache.solr.parser.SolrQueryParserBase, there is

[ANNOUNCE] Apache Solr 4.2 released

2013-03-11 Thread Robert Muir
March 2013, Apache Solr™ 4.2 available The Lucene PMC is pleased to announce the release of Apache Solr 4.2 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search,

Re: MockAnalyzer in Lucene: attach stemmer or any custom filter?

2013-02-15 Thread Robert Muir
fieldName,Reader reader) in LUCENE_34. Instead, there is a method required to override: tokenStream(String fieldName, Reader reader). Is there a way of incorporating the custom filter into the TokenStream? Dmitry On Thu, Feb 14, 2013 at 5:37 PM, Robert Muir rcm...@gmail.com wrote: MockAnalyzer

Re: MockAnalyzer in Lucene: attach stemmer or any custom filter?

2013-02-14 Thread Robert Muir
MockAnalyzer is really just MocKTokenizer+MockTokenFilter+ Instead you just define your own analyzer chain using MockTokenizer. This is the way all lucene's own analysis tests work: e.g.

Re: Exception when trying to save to a field with storeOffsetsWithPositions=true

2013-01-22 Thread Robert Muir
On Tue, Jan 22, 2013 at 12:23 PM, Meng Muk meng@uniqueinteractive.com wrote: If I set the field type to text_en however it works, I'm guessing something in the way the text is being analyzed is causing this exception to appear? Is there a limitation in how storeOffsetsWithPositions should

[ANNOUNCE] Apache Solr 3.6.2 released

2012-12-25 Thread Robert Muir
25 December 2012, Apache Solr™ 3.6.2 available The Lucene PMC and Santa Claus are pleased to announce the release of Apache Solr 3.6.2. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search,

Re: Japanese exact match results do not show on top of results

2012-12-20 Thread Robert Muir
I think you are hitting solr-3589. There is a vote underway for a 3.6.2 that contains this fix On Dec 20, 2012 6:29 PM, kirpakaro khem...@yahoo.com wrote: Hi folks, I am having couple of problems with Japanese data, 1. it is not properly indexing all the data 2. displaying the exact

Re: ICUTokenizer labels number as Han character?

2012-12-19 Thread Robert Muir
Your attachment didnt come through: I think the list strips them. Maybe just open a JIRA and attach your screenshots? or put them elsewhere and just include a link? As far as the ultimate behavior, I think its correct. Keep in mind tokens don't really get a script value: runs of untokenized text

Re: order question on solr multi value field

2012-12-18 Thread Robert Muir
I agree with James. Actually lucene tests will fail if a codec violates this. Actually it goes much deeper than this. From the lucene apis, when you call IndexReader.document() with your storedfieldVisitor, it must visit the fields in the original order added. so even if you do: add(title,

Re: Regexp and speed

2012-11-30 Thread Robert Muir
On Fri, Nov 30, 2012 at 12:13 PM, Roman Chyla roman.ch...@gmail.com wrote: The code here: https://github.com/romanchyla/montysolr/blob/solr-trunk/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java The benchmark should probably not be called 'benchmark', do you think it

Re: Skewed IDF in multi lingual index

2012-11-26 Thread Robert Muir
Hi again Markus. Sorry for the slow reply here. I'm confused: are you saying the score goes negative? Are you sure there is no 3.x segments? Can you check that docCount is not -1? Do you happen to have a test, can you share your modified similarity, or give more details? I just want to make sure

Re: Error loading class solr.CJKBigramFilterFactory

2012-11-14 Thread Robert Muir
On Wed, Nov 14, 2012 at 8:12 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Fo make some further testing I installed SOLR 3.5.0 using default Jetty server. When tried to start SOLR using the same schema I get: SEVERE: org.apache.solr.common.SolrException: Error loading class

Re: Error loading class solr.CJKBigramFilterFactory

2012-11-14 Thread Robert Muir
:07 WET 2012 Server Start Time:Wed Nov 14 11:40:36 WET 2012 ?? Thanks, Frederico -Mensagem original- De: Robert Muir [mailto:rcm...@gmail.com] Enviada: quarta-feira, 14 de Novembro de 2012 16:28 Para: solr-user@lucene.apache.org Assunto: Re: Error loading class

Re: Does ICUFoldingFilterFactory make CJKWidthFilterFactory unnecessary?

2012-11-14 Thread Robert Muir
Yes, its a subset On Nov 14, 2012 1:18 PM, Shawn Heisey s...@elyograg.org wrote: I am using ICUFoldingFilterFactory in my Solr schema. Now I am looking at adding CJKBigramFilterFactory, and I've noticed that it often goes with CJKWidthFilterFactory. Here are the relevant Javadocs for my

Re: URL parameters to use FieldAnalysisRequestHandler

2012-11-13 Thread Robert Muir
I think the UI uses this behind the scenes, as in no more analysis.jsp like before? So maybe try using something like burpsuite and just using the analysis UI in your browser to see what requests its sending. On Tue, Nov 13, 2012 at 11:00 AM, Tom Burton-West tburt...@umich.edu wrote: Hello, I

Re: customize solr search/scoring for performance

2012-11-12 Thread Robert Muir
Whenever I look at solr users' stacktraces for disjunctions, I always notice they get BooleanScorer2. Is there some reason for this or is it not intentional (e.g. maybe a in-order collector is always being used when its possible at least in simple cases to allow for out-of-order hits?) When I

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Robert Muir
On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords file which is in the correct encoding. What makes you think that? Note: Because I can read it is not the correct answer.

Re: Skewed IDF in multi lingual index

2012-11-08 Thread Robert Muir
Hi Markus: how are the languages distributed across documents? Imagine I have a text_en field and a text_fr field. Lets say I have 100 documents, 95 are english and only 5 are french. So the text_en field is populated 95% of the time, and the text_fr 5% of the time. But the default IDF

Re: Where can I find an example of a 4.0 contraction file?

2012-11-01 Thread Robert Muir
You have a character encoding issue: this is telling you the file is not correctly encoded as UTF-8. On Thu, Nov 1, 2012 at 6:11 PM, dm_tim dm_...@yahoo.com wrote: I should have mentioned I tried that. I get the following exception: SEVERE: Unable to create core: core0

Re: Unable to build trunk

2012-10-31 Thread Robert Muir
you will have to use 'find' on your .ivy2 ! On Wed, Oct 31, 2012 at 6:32 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Where is that lock file located? I triggered it again (in another contrib) and wil trigger it again in the future and don't want to remove my ivy cache each time

Re: Unable to build trunk

2012-10-30 Thread Robert Muir
Its not wonky. you just have to ensure you have nothing else (like some IDE, or build somewhere else) using ivy, then its safe to remove the .lck file there. I turned on this locking so that it hangs instead of causing cache corruption, but ivy only has simplelockfactory so if you ^C at the wrong

Re: Improving performance for use-case where large (200) number of phrase queries are used?

2012-10-24 Thread Robert Muir
On Wed, Oct 24, 2012 at 11:09 AM, Aaron Daubman daub...@gmail.com wrote: Greetings, We have a solr instance in use that gets some perhaps atypical queries and suffers from poor (2 second) QTimes. Documents (~2,350,000) in this instance are mainly comprised of various descriptive fields,

Re: ICUTokenizer ArrayIndexOutOfBounds

2012-10-17 Thread Robert Muir
calling reset() is mandatory part of the consumer lifecycle before calling incrementToken(), see: https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/TokenStream.html A lot of people don't consume these correctly, thats why these tokenizers now try to throw exceptions if you do

[ANNOUNCE] Apache Solr 4.0 released.

2012-10-12 Thread Robert Muir
October 12 2012, Apache Solr™ 4.0 available. The Lucene PMC is pleased to announce the release of Apache Solr 4.0. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread Robert Muir
On Wed, Oct 10, 2012 at 9:02 AM, O. Klein kl...@octoweb.nl wrote: I don't want to tweak the threshold. For majority of cases it works fine. It's for cases where term has low frequency but is spelled correctly. If you lower the threshold you would also get incorrect spelled terms as

Re: Indexing in Solr: invalid UTF-8

2012-09-25 Thread Robert Muir
On Tue, Sep 25, 2012 at 2:02 PM, Patrick Oliver Glauner patrick.oliver.glau...@cern.ch wrote: Hi Thanks. But I see that 0xd835 is missing in this list (see my exceptions). What's the best way to get rid of all of them in Python? I am new to unicode in Python but I am sure that this use case

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Robert Muir
On Thu, Sep 20, 2012 at 3:09 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: By the way while looking for upgrading to JDK7, the release notes say under section known issues about the PorterStemmer bug: ...The recommended workaround is to specify -XX:-UseLoopPredicate on the

Re: Solr - Lucene Debuging help

2012-09-10 Thread Robert Muir
On Mon, Sep 10, 2012 at 4:43 PM, BadalChhatbar badal...@yahoo.com wrote: Steve, Those document tips didn't help. errors i m getting are like (_TestUtil cannot be resolved). Did you do these two steps: 1. ant eclipse 2. refresh your project -- lucidworks.com

Re: Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Robert Muir
On Fri, Sep 7, 2012 at 2:19 PM, Tom Burton-West tburt...@umich.edu wrote: Thanks Robert, I'll have to spend some time understanding the default codec for Solr 4.0. Did I miss something in the changes file? http://lucene.apache.org/core/4_0_0-BETA/ see the file formats section, especially

[ANNOUNCE] Apache Solr 4.0-beta released.

2012-08-14 Thread Robert Muir
14 August 2012, Apache Solr™ 4.0-beta available The Lucene PMC is pleased to announce the release of Apache Solr 4.0-beta. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

Re: how to retrieve total token count per collection/index

2012-08-09 Thread Robert Muir
On Thu, Aug 9, 2012 at 10:20 AM, tech.vronk t...@vronk.net wrote: Hello, I wonder how to figure out the total token count in a collection (per index), i.e. the size of a corpus/collection measured in tokens. You want to use this statistic, which tells you number of tokens for an indexed

Re: how to retrieve total token count per collection/index

2012-08-09 Thread Robert Muir
On Thu, Aug 9, 2012 at 4:24 PM, tech.vronk t...@vronk.net wrote: Is there any 3.6 equivalent for this, before I install and run 4.0? I can't seem to find a corresponding class (org.apache.lucene.index.Terms) in 3.6. unfortunately 3.6 does not carry this statistic, there is really no clear

Re: Highlighting error InvalidTokenOffsetsException: Token oedipus exceeds length of provided text sized 11

2012-08-03 Thread Robert Muir
On Fri, Aug 3, 2012 at 12:38 AM, Justin Engelman jus...@smalldemons.com wrote: I have an autocomplete index that I return highlighting information for but am getting an error with certain search strings and fields on Solr 3.5. try the 3.6 release: * LUCENE-3642, SOLR-2891, LUCENE-3717: Fixed

Re: Using Solr-319 with Solr 3.6.0

2012-08-03 Thread Robert Muir
On Fri, Aug 3, 2012 at 12:57 PM, Himanshu Jindal himanshujin...@gmail.com wrote: filter class=solr.SynonymFilterFactory synonyms=synonyms_ja.txt ignoreCase=true expand=true tokenFactory=solr.JapaneseTokenizerFactory randomAttribute=randomValue/ I think you have a typo here, it should be

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Robert Muir
On Thu, Aug 2, 2012 at 3:13 AM, Laurent Vaills laurent.vai...@gmail.com wrote: Hi everyone, Is there any chance to get his backported for a 3.6.2 ? Hello, I personally have no problem with it: but its really technically not a bugfix, just an optimization. It also doesnt solve the actual

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread Robert Muir
On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote: Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up

Re: ICUCollation throws exception

2012-07-21 Thread Robert Muir
/swissbib/solr.versions/configs/current.home/viaf Jul 16, 2012 5:27:48 PM org.apache.solr.core.SolrResourceLoader init **end of Exception*** 2012/7/21 Robert Muir rcm...@gmail.com Can you include the entire exception? This is really necessary! On Tue, Jul 17, 2012 at 2:58 AM, Oliver Schihin

Re: ICUCollation throws exception

2012-07-20 Thread Robert Muir
Can you include the entire exception? This is really necessary! On Tue, Jul 17, 2012 at 2:58 AM, Oliver Schihin oliver.schi...@unibas.ch wrote: Hello According to release notes from 4.0.0-ALPHA, SOLR-2396, I replaced ICUCollationKeyFilterFactory with ICUCollationField in our schema. But this

Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Robert Muir
On Thu, Jul 19, 2012 at 12:10 AM, Aaron Daubman daub...@gmail.com wrote: Greetings, I've been digging in to this for two days now and have come up short - hopefully there is some simple answer I am just not seeing: I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as

Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Robert Muir
On Thu, Jul 19, 2012 at 11:11 AM, Aaron Daubman daub...@gmail.com wrote: Apologies if I didn't clearly state my goal/concern: I am not looking for the exact same scoring - I am looking to explain scoring differences. Deprecated components will eventually go away, time moves on, etc... etc...

Re: Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit

2012-07-10 Thread Robert Muir
On Tue, Jul 10, 2012 at 3:11 AM, Vadim Kisselmann v.kisselm...@gmail.com wrote: Hi folks, my Test-Server with Solr 4.0 from trunk(version 1292064 from late february) throws this exception... Can you run Lucene's checkIndex tool on your index? If that is clean, can you try a newer version?

Re: problem adding new fields in DIH

2012-07-09 Thread Robert Muir
Hello, This is because Solr's Codec implementation defers to the schema, to determine how the field should be indexed. When a core is reloaded, the IndexWriter is not closed but the existing writer is kept around: so you are basically trying to index to the old version of schema before the

Re: problem adding new fields in DIH

2012-07-09 Thread Robert Muir
Thanks again for reporting this Brent. I opened a JIRA issue: https://issues.apache.org/jira/browse/SOLR-3610 On Mon, Jul 9, 2012 at 3:36 PM, Brent Mills bmi...@uship.com wrote: We're having an issue when we add or change a field in the db-data-config.xml and schema.xml files in solr.

[ANNOUNCE] Apache Solr 4.0-alpha released.

2012-07-03 Thread Robert Muir
3 July 2012, Apache Solr™ 4.0-alpha available The Lucene PMC is pleased to announce the release of Apache Solr 4.0-alpha. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

Re: Exception when optimizing index

2012-06-13 Thread Robert Muir
On Thu, Jun 7, 2012 at 5:50 AM, Rok Rejc rokrej...@gmail.com wrote:   - java.runtime.nameOpenJDK Runtime Environment   - java.runtime.version1.6.0_22-b22 ... As far as I see from the JIRA issue I have the patch attached (as mentioned I have a trunk version from May 12). Any ideas? its not

Re: Solr1.4 and threads ....

2012-06-13 Thread Robert Muir
On Wed, Jun 13, 2012 at 4:38 PM, Benson Margulies bimargul...@gmail.com wrote: Does this suggest anything to anyone? Other than that we've misanalyzed the logic in the tokenizer and there's a way to make it burp on one thread? it might suggest the different tokenstream instances refer to some

Re: per-fieldtype similarity not working

2012-06-08 Thread Robert Muir
On Fri, Jun 8, 2012 at 5:04 AM, Markus Jelsma markus.jel...@openindex.io wrote: Thanks Robert, The difference in scores is clear now so it shouldn't matter as queryNorm doesn't affect ranking but coord does. Can you explain why coord is left out now and why it is considered to skew results

Re: per-fieldtype similarity not working

2012-06-01 Thread Robert Muir
On Fri, Jun 1, 2012 at 5:13 AM, Markus Jelsma markus.jel...@openindex.io wrote: Thanks but i am clearly missing something? We declare the similarity in the fieldType just as in the example and looking at the example again i don't see how it's being done differently. What am i missnig and

Re: per-fieldtype similarity not working

2012-06-01 Thread Robert Muir
On Fri, Jun 1, 2012 at 11:39 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi! Ah, it makes sense now! This global configured similarity knows returns a fieldType defined similarity if available and if not the standard Lucene similarity. This would, i assume, mean that the two

Re: per-fieldtype similarity not working

2012-05-31 Thread Robert Muir
On Thu, May 31, 2012 at 11:23 AM, Markus Jelsma markus.jel...@openindex.io wrote: We simply declare the following in our fieldType: similarity class=FQCN/ Thats not enough, see the example: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/schema-sim.xml --

Re: boost not showing up in Solr 3.6 debugQueries?

2012-05-17 Thread Robert Muir
On Thu, May 17, 2012 at 4:51 PM, Tom Burton-West tburt...@umich.edu wrote: But in Solr 3.6 I am not seeing the boost factor called out.  On the other hand it looks like it may now be incoroporated in the queryNorm (Please see example below). Is there a bug in Solr 3.6 debugQueries?  Is

Re: Language analyzers

2012-05-16 Thread Robert Muir
On Wed, May 16, 2012 at 10:17 AM, anarchos78 rigasathanasio...@hotmail.com wrote: Hello, Is it possible to use two language analyzers for one fieldtype. Lets say Greek and English (for indexing and querying) For greek and english, its easy, they use totally different characters so none of

Re: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Robert Muir
On Wed, May 16, 2012 at 8:28 AM, Tanguy Moal tanguy.m...@gmail.com wrote: Any idea someone ? I think this is important since this could produce weird results on collections with numbers mixed in text. I agree, i think we should just add ' Character.isLetter(ch)' to the undoublet check?

Re: apostrophe / ayn / alif

2012-05-15 Thread Robert Muir
On Tue, May 15, 2012 at 2:47 PM, Naomi Dushay ndus...@stanford.edu wrote: We are using the ICUFoldingFilterFactory with great success to fold diacritics so searches with and without the diacritics get the same results. We recently discovered we have some Korean records that use an alif

Re: Implementing multiterm chain for ICUCollationKeyFilterFactory

2012-05-03 Thread Robert Muir
On Thu, May 3, 2012 at 9:35 AM, OliverS oliver.schi...@unibas.ch wrote: Hello I read and tried a lot, but somehow I don't fully understand and it doesn't work. I'm working on solr 4.0 (latest trunk) and use ICUCollationKeyFilterFactory for my main field type. Now, wildcard queries don't

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-02 Thread Robert Muir
On Wed, May 2, 2012 at 12:16 PM, Ken Krugler kkrugler_li...@transpac.com wrote: What confuses me is that Suggester says it's based on SpellChecker, which supposedly does work with shards. It is based on spellchecker apis, but spellchecker's ranking is based on simple comparators like string

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-01 Thread Robert Muir
On Tue, May 1, 2012 at 6:48 PM, Ken Krugler kkrugler_li...@transpac.com wrote: Hi list, Does anybody know if the Suggester component is designed to work with shards? I'm not really sure it is? They would probably have to override the default merge implementation specified by SpellChecker.

Re: Language Identification

2012-04-23 Thread Robert Muir
On Mon, Apr 23, 2012 at 1:27 PM, Bai Shen baishen.li...@gmail.com wrote: I was under the impression that solr does Tika and the language identifier that Shuyo did.  The page at http://wiki.apache.org/solr/LanguageDetectionlists them both. processor

Re: Special characters in synonyms.txt on Solr 3.5

2012-04-20 Thread Robert Muir
On Fri, Apr 20, 2012 at 12:10 PM, carl.nordenf...@bwinparty.com carl.nordenf...@bwinparty.com wrote: Directly injecting the letter ö into synonyms like so: island, ön island, ön renders the following exception on startup (both lines renders the same error): java.lang.RuntimeException:

Re: maxMergeDocs in Solr 3.6

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 11:54 AM, Burton-West, Tom tburt...@umich.edu wrote: Hello all, I'm getting ready to upgrade from Solr 3.4 to Solr 3.6 and I noticed that maxMergeDocs is no longer in the example solrconfig.xml. Has maxMergeDocs been deprecated? or doe the tieredMergePolicy ignore it?

Re: [Solr 4.0] what is stored in .tim index file format?

2012-04-17 Thread Robert Muir
This is the term dictionary for 4.0's default codec (currently uses BlockTree implementation) .tim is the on-disk portion of the terms (similar in function to .tis in previous releases) .tip is the in-memory terms index (similar in function to .tii in previous releases) On Tue, Apr 17, 2012 at

[ANNOUNCE] Apache Solr 3.6 released

2012-04-12 Thread Robert Muir
12 April 2012, Apache Solr™ 3.6.0 available The Lucene PMC is pleased to announce the release of Apache Solr 3.6.0. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

  1   2   3   4   >