[ANNOUNCE] Apache Solr 4.9.0 released

2014-06-25 Thread Robert Muir
25 June 2014, Apache Solr™ 4.9.0 available The Lucene PMC is pleased to announce the release of Apache Solr 4.9.0 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted se

[ANNOUNCE] Apache Solr 4.8.1 released

2014-05-20 Thread Robert Muir
May 2014, Apache Solr™ 4.8.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.8.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search

[ANNOUNCE] Apache Solr 4.7.2 released.

2014-04-15 Thread Robert Muir
April 2014, Apache Solr™ 4.7.2 available The Lucene PMC is pleased to announce the release of Apache Solr 4.7.2 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted sear

Re: Unable to get offsets using AtomicReader.termPositionsEnum(Term)

2014-03-10 Thread Robert Muir
Hello, I think you are confused between two different index structures, probably because of the name of the options in solr. 1. indexing term vectors: this means given a document, you can go lookup a miniature "inverted index" just for that document. That means each document has "term vectors" whi

Re: ANNOUNCE: Apache Solr Reference Guide for 4.7

2014-03-05 Thread Robert Muir
I debugged the PDF a little. FWIW, the following code (using iText) takes it to 9MB: public static void main(String args[]) throws Exception { Document document = new Document(); PdfSmartCopy copy = new PdfSmartCopy(document, new FileOutputStream("/home/rmuir/Downloads/test.pdf")); /

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
On Wed, Feb 19, 2014 at 10:33 AM, Thomas Fischer wrote: > > > Hmm, for standardization of text fields, collation might be a little > > awkward. > > I arrived there after using custom rules for a while (see > "RuleBasedCollator" on http://wiki.apache.org/solr/UnicodeCollation) and > then being tol

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
HOME/lib in order to use it." > > is misleading insofar as this README.txt doesn't mention the > solr-analysis-extras-4.6.1.jar in dist. > > Best > Thomas > > > Am 19.02.2014 um 14:27 schrieb Robert Muir: > > > you need the solr analysis-extras

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
classes mentioned are > loaded. > > Do you know which jar is supposed to contain the ICUCollationField? > > Best regards > Thomas > > > > Am 19.02.2014 um 13:54 schrieb Robert Muir: > > > you need the solr analysis-extras jar in your classpath, too. > > &g

Re: Problems with ICUCollationField

2014-02-19 Thread Robert Muir
you need the solr analysis-extras jar in your classpath, too. On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer wrote: > Hello, > > I'm migrating to solr 4.6.1 and have problems with the ICUCollationField > (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100). > > I get consistently the error message

[ANNOUNCE] Apache Solr 4.6.1 released.

2014-01-28 Thread Robert Muir
January 2014, Apache Solr™ 4.6.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.6.1Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted searc

Re: Tracking down the input that hits an analysis chain bug

2014-01-03 Thread Robert Muir
This exception comes from OffsetAttributeImpl (e.g. you dont need to index anything to reproduce it). Maybe you have a missing clearAttributes() call (your tokenizer 'returns true' without calling that first)? This could explain it, if something like a StopFilter is also present in the chain: basi

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Robert Muir
no, its turned on by default in the default similarity. as i said, all that is necessary is to fix your analyzer to emit the proper position increments. On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand wrote: > In order to set discountOverlaps to true you must have added the > to the schema.x

Re: Bad fieldNorm when using morphologic synonyms

2013-12-08 Thread Robert Muir
its accurate, you are wrong. please, look at setDiscountOverlaps in your similarity. This is really easy to understand. On Sun, Dec 8, 2013 at 7:23 AM, Manuel Le Normand wrote: > Robert, you last reply is not accurate. > It's true that the field norms and termVectors are independent. But this >

Re: Bad fieldNorm when using morphologic synonyms

2013-12-06 Thread Robert Muir
ll right (for me). > 2) fieldNorm is determined by the size of the termVector, isn't it? the > termVector size isn't affected by the positions. > > > On Fri, Dec 6, 2013 at 10:46 AM, Robert Muir wrote: > >> Your analyzer needs to set positionIncrement correctly: so

Re: Bad fieldNorm when using morphologic synonyms

2013-12-06 Thread Robert Muir
Your analyzer needs to set positionIncrement correctly: sounds like its broken. On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh wrote: > Hi, > we implemented a morphologic analyzer, which stems words on index time. > For some reasons, we index both the original word and the stem (on the same > positi

Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Robert Muir
which example? there are so many. On Wed, Nov 13, 2013 at 1:00 PM, Mark Miller wrote: > RE: the example folder > > It’s something I’ve been pushing towards moving away from for a long time - > see https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to > 'server' and pull exampl

Re: Background merge errors with Solr 4.4.0 on Optimize call

2013-10-29 Thread Robert Muir
I think its a bug, but thats just my opinion. i sent a patch to dev@ for thoughts. On Tue, Oct 29, 2013 at 6:09 PM, Erick Erickson wrote: > Hmmm, so you're saying that merging indexes where a field > has been removed isn't handled. So you have some documents > that do have a "what" field, but you

Re: Problems installing Solr4 in Jetty9

2013-08-17 Thread Robert Muir
On Sat, Aug 17, 2013 at 3:59 AM, Chris Collins wrote: > I am using 4.4 in an embedded mode and found that it has a dependency on > hadoop 2.0.5. alpha that in turn depends on jetty 6.1.26 which I think > pre-dates electricity :-} > I think this is only a "test dependency" ?

Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 5:58 PM, Chris Hostetter wrote: > > : > FieldCaches are managed using a WeakHashMap - so once the IndexReader's > : > associated with those FieldCaches are no logner used, they will be garbage > : > collected when and if the JVMs garbage collector get arround to it. > : > >

Re: Who's cleaning the Fieldcache?

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 5:29 PM, Chris Hostetter wrote: > > : why? Those are my sort fields and they are occupying a lot of space (doubled > : in this case but I see that sometimes I have three or four "old" segment > : references) > : > : Is there something I can do to remove those old references

Re: PostingsHighlighter returning fields which don't match

2013-08-14 Thread Robert Muir
On Wed, Aug 14, 2013 at 3:53 AM, ses wrote: > We are trying out the new PostingsHighlighter with Solr 4.2.1 and finding > that the highlighting section of the response includes self-closing tags > for > all the fields in hl.fl (by default for edismax it is all fields in qf) > where there are no h

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
On Tue, Aug 13, 2013 at 11:39 AM, Shalin Shekhar Mangar wrote: > The splitting code calls commit before it starts the splitting. It creates > a LiveDocsReader using a bitset created by the split. This reader is merged > to an index using addIndexes. > > Shouldn't the addIndexes code then ignore al

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
Well, i meant before, but i just took a look and this is implemented differently than the "merge" one. In any case, i think its the same bug, because I think the only way this can happen is if somehow this splitter is trying to create a 0-document "split" (or maybe a split containing all deletions

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
did you do a (real) commit before trying to use this? I am not sure how this splitting works, but at least the merge option requires that. i can't see this happening unless you are somehow splitting a 0 document index (or, if the splitter is creating 0 document splits) so this is likely just a sym

Re: Is there a way to store binary data (byte[]) in DocValues?

2013-08-12 Thread Robert Muir
On Mon, Aug 12, 2013 at 12:25 PM, Mathias Lux wrote: > > Another thing for not using the the SORTED_SET and SORTED > implementations is, that Solr currently works with Strings on that and > I want to have a small memory footprint for millions of images ... > which does not go well with immutables.

Re: Is there a way to store binary data (byte[]) in DocValues?

2013-08-12 Thread Robert Muir
On Mon, Aug 12, 2013 at 8:38 AM, Mathias Lux wrote: > Hi! > > I'm basically searching for a method to put byte[] data into Lucene > DocValues of type BINARY (see [1]). Currently only primitives and > Strings are supported according to [1]. > > I know that this can be done with a custom update hand

Re: Purging unused segments.

2013-08-09 Thread Robert Muir
On Fri, Aug 9, 2013 at 7:48 PM, Erick Erickson wrote: > > So is there a good way, without optimizing, to purge any segments not > referenced in the segments file? Actually I doubt that optimizing would > even do it if I _could_, any phantom segments aren't visible from the > segments file anyway..

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Robert Muir
On Mon, Aug 5, 2013 at 3:03 PM, Chris Hostetter wrote: > > : > 0xfffe is not a special character -- it is explicitly *not* a character in > : > Unicode at all, it is set asside as "not a character." specifically so > : > that the character 0xfeff can be used as a BOM, and if the BOM is read > : >

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Robert Muir
On Mon, Aug 5, 2013 at 11:42 AM, Chris Hostetter wrote: > > : I agree with you, 0xfffe is a special character, that is why I was asking > : how it's handled in solr. > : In my document, 0xfffe does not appear at the beginning, it's in the > : content. > > Unless i'm missunderstanding something (an

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Robert Muir
If you use wikipediatokenizer it will tag different wiki elements with different types (you can see it in the admin UI). so then followup with typetokenfilter to only filter the types you care about, and i think it will do what you want. On Tue, Jul 23, 2013 at 7:53 AM, Furkan KAMACI wrote: > Hi

Re: Using per-segment FieldCache or DocValues in custom component?

2013-07-02 Thread Robert Muir
Where do you get the docid from? Usually its best to just look at the whole algorithm, e.g. docids come from per-segment readers by default anyway so ideally you want to access any per-document things from that same segmentreader. As far as supporting docvalues, FieldCache API "passes thru" to doc

Re: Are there any plans to change example directory layout?

2013-06-11 Thread Robert Muir
If you have a good idea... Just do it. Open an issue On Jun 11, 2013 9:34 PM, "Alexandre Rafalovitch" wrote: > I think it is quite hard for beginners that basic solr example > directory is competing for attention with other - nested - examples. I > see quite a lot of questions on which directory

Re: Requesting to add into a Contributor Group

2013-05-04 Thread Robert Muir
done. let us know if you have any problems. On Sat, May 4, 2013 at 10:12 AM, Krunal wrote: > Dear Sir, > > Kindly add me to the contributor group to help me contribute to the Solr > wiki. > > My Email id: jariwalakru...@gmail.com > Login Name: Krunal > > Specific changes I would like to make to

Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Robert Muir
On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen wrote: > Schema with DocValues attempt at solving problem: > http://pastebin.com/Ne23NnW4 > Config: http://pastebin.com/x1qykyXW > This schema isn't using docvalues, due to a typo in your config. it should not be DocValues="true" but docValues="true"

Re: Fuzzy Suggester and exactMatchFirst

2013-03-18 Thread Robert Muir
On Sun, Mar 17, 2013 at 8:19 PM, Eoghan Ó Carragáin wrote: > > I can see why the Fuzzy Suggester sees "college" as a match for "colla" but > expected the exactMatchFirst parameter to ensure that suggestions beginning > with "colla" to be weighted higher than "fuzzier" matches. I > have spellcheck.

Re: Analyzing Suggester and Fuzzy Suggester - configuration and comparison

2013-03-15 Thread Robert Muir
On Fri, Mar 15, 2013 at 3:04 PM, Eoghan Ó Carragáin wrote: > Hi, > I'm interested in using the new Analyzing Suggester described by Mike > McCandless [1], but I'm not sure how it should be configured. > > I've setup my SpellCheckComponent with > org.apache.solr.spelling.suggest.Suggester >

Re: Out of Memory doing a query Solr 4.2

2013-03-15 Thread Robert Muir
On Fri, Mar 15, 2013 at 6:46 AM, raulgrande83 wrote: > Thank you for your help. I'm afraid it won't be so easy to change de jvm > version, because it is required at the moment. > > It seems that Solr 4.2 supports Java 1.6 at least. Is that correct? > > Could you find any clue of what is happening

Re: Out of Memory doing a query Solr 4.2

2013-03-14 Thread Robert Muir
On Thu, Mar 14, 2013 at 12:07 PM, raulgrande83 wrote: > JVM: IBM J9 VM(1.6.0.2.4) I don't recommend using this JVM.

Re: Using suggester for smarter phrase autocomplete

2013-03-13 Thread Robert Muir
On Wed, Mar 13, 2013 at 11:07 AM, Eric Wilson wrote: > I'm trying to use the suggester for auto-completion with Solr 4. I have > followed the example configuration for phrase suggestions at the bottom of > this wiki page: > http://wiki.apache.org/solr/Suggester

Re: It seems a issue of deal with chinese synonym for solr

2013-03-12 Thread Robert Muir
I agree. Actually that top-level logic is fine. its the loop that follows thats wrong: it needs to look at position increment and do the right thing. Want to open a JIRA issue? On Mon, Mar 11, 2013 at 9:15 PM, 李威 wrote: > in org.apache.solr.parser.SolrQueryParserBase, there is a function: > "pr

[ANNOUNCE] Apache Solr 4.2 released

2013-03-11 Thread Robert Muir
March 2013, Apache Solr™ 4.2 available The Lucene PMC is pleased to announce the release of Apache Solr 4.2 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, d

Re: MockAnalyzer in Lucene: attach stemmer or any custom filter?

2013-02-15 Thread Robert Muir
g > fieldName,Reader reader) in LUCENE_34. Instead, there is a method required > to override: tokenStream(String fieldName, Reader reader). Is there a way > of incorporating the custom filter into the TokenStream? > > > Dmitry > > On Thu, Feb 14, 2013 at 5:37 PM, Robert M

Re: MockAnalyzer in Lucene: attach stemmer or any custom filter?

2013-02-14 Thread Robert Muir
MockAnalyzer is really just MocKTokenizer+MockTokenFilter+ Instead you just define your own analyzer chain using MockTokenizer. This is the way all lucene's own analysis tests work: e.g. http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/test/org/apache/lucene/analysis

Re: Exception when trying to save to a field with storeOffsetsWithPositions="true"

2013-01-22 Thread Robert Muir
On Tue, Jan 22, 2013 at 12:23 PM, Meng Muk wrote: > If I set the field type to "text_en" however it works, I'm guessing > something in the way the text is being analyzed is causing this exception > to appear? Is there a limitation in how storeOffsetsWithPositions should be > used? > IndexWriter

[ANNOUNCE] Apache Solr 3.6.2 released

2012-12-25 Thread Robert Muir
25 December 2012, Apache Solr™ 3.6.2 available The Lucene PMC and Santa Claus are pleased to announce the release of Apache Solr 3.6.2. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hi

Re: Japanese exact match results do not show on top of results

2012-12-20 Thread Robert Muir
I think you are hitting solr-3589. There is a vote underway for a 3.6.2 that contains this fix On Dec 20, 2012 6:29 PM, "kirpakaro" wrote: > Hi folks, > > I am having couple of problems with Japanese data, 1. it is not > properly > indexing all the data 2. displaying the exact match result on

Re: ICUTokenizer labels number as Han character?

2012-12-19 Thread Robert Muir
Your attachment didnt come through: I think the list strips them. Maybe just open a JIRA and attach your screenshots? or put them elsewhere and just include a link? As far as the ultimate behavior, I think its correct. Keep in mind tokens don't really get a script value: runs of untokenized text d

Re: "order" question on solr multi value field

2012-12-18 Thread Robert Muir
I agree with James. Actually lucene tests will fail if a codec violates this. Actually it goes much deeper than this. >From the lucene apis, when you call IndexReader.document() with your storedfieldVisitor, it must visit the fields in the original order added. so even if you do: add("title", "

Re: Regexp and speed

2012-11-30 Thread Robert Muir
On Fri, Nov 30, 2012 at 12:13 PM, Roman Chyla wrote: > > The code here: > > https://github.com/romanchyla/montysolr/blob/solr-trunk/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java > > The benchmark should probably not be called 'benchmark', do you think it > may be too simpli

Re: Skewed IDF in multi lingual index

2012-11-26 Thread Robert Muir
Hi again Markus. Sorry for the slow reply here. I'm confused: are you saying the score goes negative? Are you sure there is no 3.x segments? Can you check that docCount is not -1? Do you happen to have a test, can you share your modified similarity, or give more details? I just want to make sure

Re: Does ICUFoldingFilterFactory make CJKWidthFilterFactory unnecessary?

2012-11-14 Thread Robert Muir
Yes, its a subset On Nov 14, 2012 1:18 PM, "Shawn Heisey" wrote: > I am using ICUFoldingFilterFactory in my Solr schema. Now I am looking at > adding CJKBigramFilterFactory, and I've noticed that it often goes with > CJKWidthFilterFactory. Here are the relevant Javadocs for my question: > > htt

Re: Error loading class solr.CJKBigramFilterFactory

2012-11-14 Thread Robert Muir
ov 14 17:30:07 WET 2012 > Server Start Time:Wed Nov 14 11:40:36 WET 2012 > > ?? > > Thanks, > Frederico > > > -Mensagem original- > De: Robert Muir [mailto:rcm...@gmail.com] > Enviada: quarta-feira, 14 de Novembro de 2012 16:28 > Para: solr-user@lucene.apach

Re: Error loading class solr.CJKBigramFilterFactory

2012-11-14 Thread Robert Muir
On Wed, Nov 14, 2012 at 8:12 AM, Frederico Azeiteiro wrote: > Fo make some further testing I installed SOLR 3.5.0 using default Jetty > server. > > When tried to start SOLR using the same schema I get: > > > > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.CJKBigramFilte

Re: URL parameters to use FieldAnalysisRequestHandler

2012-11-13 Thread Robert Muir
I think the UI uses this behind the scenes, as in no more "analysis.jsp" like before? So maybe try using something like burpsuite and just using the analysis UI in your browser to see what requests its sending. On Tue, Nov 13, 2012 at 11:00 AM, Tom Burton-West wrote: > Hello, > > I would like t

Re: customize solr search/scoring for performance

2012-11-12 Thread Robert Muir
Whenever I look at solr users' stacktraces for disjunctions, I always notice they get BooleanScorer2. Is there some reason for this or is it not intentional (e.g. maybe a in-order collector is always being used when its possible at least in simple cases to allow for out-of-order hits?) When I exa

Re: Skewed IDF in multi lingual index

2012-11-08 Thread Robert Muir
Hi Markus: how are the languages distributed across documents? Imagine I have a text_en field and a text_fr field. Lets say I have 100 documents, 95 are english and only 5 are french. So the text_en field is populated 95% of the time, and the text_fr 5% of the time. But the default IDF computatio

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Robert Muir
On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge wrote: > Hi, > > i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords > file > which is in the correct encoding. What makes you think that? Note: "Because I can read it" is not the correct answer. Ensure any of your stopwords

Re: Where can I find an example of a 4.0 contraction file?

2012-11-01 Thread Robert Muir
You have a character encoding issue: this is telling you the file is not correctly encoded as UTF-8. On Thu, Nov 1, 2012 at 6:11 PM, dm_tim wrote: > I should have mentioned I tried that. I get the following exception: > SEVERE: Unable to create core: core0 > java.lang.RuntimeException: java.nio.c

Re: Unable to build trunk

2012-10-31 Thread Robert Muir
you will have to use 'find' on your .ivy2 ! On Wed, Oct 31, 2012 at 6:32 AM, Markus Jelsma wrote: > Hi, > > Where is that lock file located? I triggered it again (in another contrib) > and wil trigger it again in the future and don't want to remove my ivy cache > each time :) > > Thanks > > > -

Re: Unable to build trunk

2012-10-30 Thread Robert Muir
Its not "wonky". you just have to ensure you have nothing else (like some IDE, or build somewhere else) using ivy, then its safe to remove the .lck file there. I turned on this locking so that it hangs instead of causing cache corruption, but ivy only has "simplelockfactory" so if you ^C at the wr

Re: Improving performance for use-case where large (200) number of phrase queries are used?

2012-10-24 Thread Robert Muir
On Wed, Oct 24, 2012 at 11:09 AM, Aaron Daubman wrote: > Greetings, > > We have a solr instance in use that gets some perhaps atypical queries > and suffers from poor (>2 second) QTimes. > > Documents (~2,350,000) in this instance are mainly comprised of > various "descriptive fields", such as mul

Re: ICUTokenizer ArrayIndexOutOfBounds

2012-10-17 Thread Robert Muir
calling reset() is mandatory part of the consumer lifecycle before calling incrementToken(), see: https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/TokenStream.html A lot of people don't consume these correctly, thats why these tokenizers now try to throw exceptions if you do i

[ANNOUNCE] Apache Solr 4.0 released.

2012-10-12 Thread Robert Muir
October 12 2012, Apache Solr™ 4.0 available. The Lucene PMC is pleased to announce the release of Apache Solr 4.0. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted se

Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread Robert Muir
On Wed, Oct 10, 2012 at 9:02 AM, O. Klein wrote: > I don't want to tweak the threshold. For majority of cases it works fine. > > It's for cases where term has low frequency but is spelled correctly. > > If you lower the threshold you would also get incorrect spelled terms as > suggestions. > Yeah

Re: Indexing in Solr: invalid UTF-8

2012-09-25 Thread Robert Muir
On Tue, Sep 25, 2012 at 2:02 PM, Patrick Oliver Glauner wrote: > Hi > Thanks. But I see that 0xd835 is missing in this list (see my exceptions). > > What's the best way to get rid of all of them in Python? I am new to unicode > in Python but I am sure that this use case is quite frequent. > I do

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Robert Muir
On Thu, Sep 20, 2012 at 3:09 AM, Bernd Fehling wrote: > By the way while looking for upgrading to JDK7, the release notes say under > section > "known issues" about the "PorterStemmer" bug: > "...The recommended workaround is to specify -XX:-UseLoopPredicate on the > command line." > Is this st

Re: Solr - Lucene Debuging help

2012-09-10 Thread Robert Muir
On Mon, Sep 10, 2012 at 4:43 PM, BadalChhatbar wrote: > Steve, > > Those document tips didn't help. > > errors i m getting are like (_TestUtil cannot be resolved). > > Did you do these two steps: 1. ant eclipse 2. refresh your project -- lucidworks.com

Re: Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Robert Muir
On Fri, Sep 7, 2012 at 2:19 PM, Tom Burton-West wrote: > Thanks Robert, > > I'll have to spend some time understanding the default codec for Solr 4.0. > Did I miss something in the changes file? http://lucene.apache.org/core/4_0_0-BETA/ see the file formats section, especially http://lucene.apac

Re: Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Robert Muir
Hi Tom: I already enhanced the javadocs about this for Lucene, putting warnings everywhere in bold: NOTE: This parameter does not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for term indexes that are implemented as a fixed gap between

[ANNOUNCE] Apache Solr 4.0-beta released.

2012-08-14 Thread Robert Muir
14 August 2012, Apache Solr™ 4.0-beta available The Lucene PMC is pleased to announce the release of Apache Solr 4.0-beta. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, fa

Re: how to retrieve total token count per collection/index

2012-08-09 Thread Robert Muir
On Thu, Aug 9, 2012 at 4:24 PM, tech.vronk wrote: > Is there any 3.6 equivalent for this, before I install and run 4.0? > I can't seem to find a corresponding class (org.apache.lucene.index.Terms) > in 3.6. > unfortunately 3.6 does not carry this statistic, there is really no clear delineation o

Re: how to retrieve total token count per collection/index

2012-08-09 Thread Robert Muir
On Thu, Aug 9, 2012 at 10:20 AM, tech.vronk wrote: > Hello, > > I wonder how to figure out the total token count in a collection (per > index), i.e. the size of a corpus/collection measured in tokens. > You want to use this statistic, which tells you number of tokens for an indexed field: http://

Re: Using Solr-319 with Solr 3.6.0

2012-08-03 Thread Robert Muir
On Fri, Aug 3, 2012 at 12:57 PM, Himanshu Jindal wrote: > ignoreCase="true" expand="true" > tokenFactory="solr.JapaneseTokenizerFactory" randomAttribute="randomValue"/> I think you have a typo here, it should be tokenizerFactory, not tokenFactory -- lucidimagination.com

Re: Highlighting error InvalidTokenOffsetsException: Token oedipus exceeds length of provided text sized 11

2012-08-03 Thread Robert Muir
On Fri, Aug 3, 2012 at 12:38 AM, Justin Engelman wrote: > I have an autocomplete index that I return highlighting information for but > am getting an error with certain search strings and fields on Solr 3.5. try the 3.6 release: * LUCENE-3642, SOLR-2891, LUCENE-3717: Fixed bugs in CharTokenizer,

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Robert Muir
On Thu, Aug 2, 2012 at 3:13 AM, Laurent Vaills wrote: > Hi everyone, > > Is there any chance to get his backported for a 3.6.2 ? > Hello, I personally have no problem with it: but its really technically not a bugfix, just an optimization. It also doesnt solve the actual problem if you have a tom

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread Robert Muir
On Tue, Jul 31, 2012 at 2:34 PM, roz dev wrote: > Hi All > > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that > when we are indexing lots of data with 16 concurrent threads, Heap grows > continuously. It remains high and ultimately most of the stuff ends up > being moved

Re: ICUCollation throws exception

2012-07-21 Thread Robert Muir
a:3018) > at > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:409) > at > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:430) > at > org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:86) >

Re: ICUCollation throws exception

2012-07-20 Thread Robert Muir
Can you include the entire exception? This is really necessary! On Tue, Jul 17, 2012 at 2:58 AM, Oliver Schihin wrote: > Hello > > According to release notes from 4.0.0-ALPHA, SOLR-2396, I replaced > ICUCollationKeyFilterFactory with ICUCollationField in our schema. But this > throws an exception

Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Robert Muir
On Thu, Jul 19, 2012 at 11:11 AM, Aaron Daubman wrote: > Apologies if I didn't clearly state my goal/concern: I am not looking for > the exact same scoring - I am looking to explain scoring differences. > Deprecated components will eventually go away, time moves on, etc... > etc... I would like

Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Robert Muir
On Thu, Jul 19, 2012 at 12:10 AM, Aaron Daubman wrote: > Greetings, > > I've been digging in to this for two days now and have come up short - > hopefully there is some simple answer I am just not seeing: > > I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as > identically

Re: Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit

2012-07-10 Thread Robert Muir
On Tue, Jul 10, 2012 at 3:11 AM, Vadim Kisselmann wrote: > Hi folks, > my Test-Server with Solr 4.0 from trunk(version 1292064 from late > february) throws this exception... Can you run Lucene's checkIndex tool on your index? If that is clean, can you try a newer version? This could be a number

Re: problem adding new fields in DIH

2012-07-09 Thread Robert Muir
Thanks again for reporting this Brent. I opened a JIRA issue: https://issues.apache.org/jira/browse/SOLR-3610 On Mon, Jul 9, 2012 at 3:36 PM, Brent Mills wrote: > We're having an issue when we add or change a field in the db-data-config.xml > and schema.xml files in solr. Basically whenever I a

Re: problem adding new fields in DIH

2012-07-09 Thread Robert Muir
Hello, This is because Solr's Codec implementation defers to the schema, to determine how the field should be indexed. When a core is reloaded, the IndexWriter is not closed but the existing writer is kept around: so you are basically trying to index to the old version of schema before the reload.

[ANNOUNCE] Apache Solr 4.0-alpha released.

2012-07-03 Thread Robert Muir
3 July 2012, Apache Solr™ 4.0-alpha available The Lucene PMC is pleased to announce the release of Apache Solr 4.0-alpha. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, fac

Re: Solr1.4 and threads ....

2012-06-13 Thread Robert Muir
On Wed, Jun 13, 2012 at 4:38 PM, Benson Margulies wrote: > > Does this suggest anything to anyone? Other than that we've > misanalyzed the logic in the tokenizer and there's a way to make it > burp on one thread? it might suggest the different tokenstream instances refer to some shared object tha

Re: Exception when optimizing index

2012-06-13 Thread Robert Muir
On Thu, Jun 7, 2012 at 5:50 AM, Rok Rejc wrote: >   - java.runtime.nameOpenJDK Runtime Environment >   - java.runtime.version1.6.0_22-b22 ... > > As far as I see from the JIRA issue I have the patch attached (as mentioned > I have a trunk version from May 12). Any ideas? > its not guaranteed that

Re: per-fieldtype similarity not working

2012-06-08 Thread Robert Muir
On Fri, Jun 8, 2012 at 5:04 AM, Markus Jelsma wrote: > Thanks Robert, > > The difference in scores is clear now so it shouldn't matter as queryNorm > doesn't affect ranking but coord does. Can you explain why coord is left out > now and why it is considered to skew results and why queryNorm skew

Re: per-fieldtype similarity not working

2012-06-01 Thread Robert Muir
On Fri, Jun 1, 2012 at 11:39 AM, Markus Jelsma wrote: > Hi! > > > Ah, it makes sense now! This global configured similarity knows returns a > fieldType defined similarity if available and if not the standard Lucene > similarity. This would, i assume, mean that the two defined similarities > bel

Re: per-fieldtype similarity not working

2012-06-01 Thread Robert Muir
On Fri, Jun 1, 2012 at 5:13 AM, Markus Jelsma wrote: > Thanks but i am clearly missing something? We declare the similarity in the > fieldType just as in the example and looking at the example again i don't see > how it's being done differently. What am i missnig and where do i miss it? :) > Hi

Re: per-fieldtype similarity not working

2012-05-31 Thread Robert Muir
On Thu, May 31, 2012 at 11:23 AM, Markus Jelsma wrote: > We simply declare the following in our fieldType: > > Thats not enough, see the example: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/schema-sim.xml -- lucidimagination.com

Re: boost not showing up in Solr 3.6 debugQueries?

2012-05-17 Thread Robert Muir
On Thu, May 17, 2012 at 4:51 PM, Tom Burton-West wrote: > But in Solr 3.6 I am not seeing the boost factor called out. > >  On the other hand it looks like it may now be incoroporated in the > queryNorm (Please see example below). > > Is there a bug in Solr 3.6 debugQueries?  Is there some new be

Re: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Robert Muir
On Wed, May 16, 2012 at 8:28 AM, Tanguy Moal wrote: > Any idea someone ? > > I think this is important since this could produce weird results on > collections with numbers mixed in text. I agree, i think we should just add '&& Character.isLetter(ch)' to the undoublet check? Thanks for bringing t

Re: Language analyzers

2012-05-16 Thread Robert Muir
On Wed, May 16, 2012 at 10:17 AM, anarchos78 wrote: > Hello, > > Is it possible to use two language analyzers for one fieldtype. Lets say > Greek and English (for indexing and querying) > For greek and english, its easy, they use totally different characters so none of their tokenfilters will con

Re: apostrophe / ayn / alif

2012-05-15 Thread Robert Muir
On Tue, May 15, 2012 at 2:47 PM, Naomi Dushay wrote: > We are using the ICUFoldingFilterFactory with great success to fold > diacritics so searches with and without the diacritics get the same results. > > We recently discovered we have some Korean records that use an alif diacritic > instead of

Re: Implementing multiterm chain for ICUCollationKeyFilterFactory

2012-05-03 Thread Robert Muir
On Thu, May 3, 2012 at 9:35 AM, OliverS wrote: > Hello > > I read and tried a lot, but somehow I don't fully understand and it doesn't > work. I'm working on solr 4.0 (latest trunk) and use > ICUCollationKeyFilterFactory for my main field type. Now, wildcard queries > don't work, even though ICUCo

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-02 Thread Robert Muir
On Wed, May 2, 2012 at 12:16 PM, Ken Krugler wrote: > What confuses me is that Suggester says it's based on SpellChecker, which > supposedly does work with shards. > It is based on spellchecker apis, but spellchecker's ranking is based on simple comparators like string similarity, whereas sugge

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-01 Thread Robert Muir
On Tue, May 1, 2012 at 6:48 PM, Ken Krugler wrote: > Hi list, > > Does anybody know if the Suggester component is designed to work with shards? > I'm not really sure it is? They would probably have to override the default merge implementation specified by SpellChecker. But, all of the current su

Re: Language Identification

2012-04-23 Thread Robert Muir
On Mon, Apr 23, 2012 at 1:27 PM, Bai Shen wrote: > I was under the impression that solr does Tika and the language identifier > that Shuyo did.  The page at > http://wiki.apache.org/solr/LanguageDetectionlists them both. > > class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProc

Re: Special characters in synonyms.txt on Solr 3.5

2012-04-20 Thread Robert Muir
On Fri, Apr 20, 2012 at 12:10 PM, carl.nordenf...@bwinparty.com wrote: > Directly injecting the letter "ö" into synonyms like so: > island, ön > island, "ön" > > renders the following exception on startup (both lines renders the same > error): > > java.lang.RuntimeException: java.nio.charset.Malf

Re: maxMergeDocs in Solr 3.6

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 11:54 AM, Burton-West, Tom wrote: > Hello all, > > I'm getting ready to upgrade from Solr 3.4 to Solr 3.6 and I noticed that > maxMergeDocs is no longer in the example solrconfig.xml. > Has maxMergeDocs been deprecated? or doe the tieredMergePolicy ignore it? its not appl

  1   2   3   4   >