This is the term dictionary for 4.0's default codec (currently uses
BlockTree implementation)
.tim is the on-disk portion of the terms (similar in function to .tis
in previous releases)
.tip is the in-memory "terms index" (similar in function to .tii in
previous releases)
On Tue, Apr 17, 2012 at
On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas
wrote:
> Hello Michael,
>
> Yes, we are pre-sorting the documents before adding them to the index. We
> have a score associated to every document (not an IR score but a
> document-related score that reflects its "importance"). Therefore, the
On Thu, Apr 12, 2012 at 3:52 PM, jmlucjav wrote:
> Well now I am really lost...
>
> 1. yes I want to suggest whole sentences too, I want the tokenizer to be
> taken into account, and apparently it is working for me in 3.5.0?? I get
> suggestions that are like "foo bar abc". Maybe what you mention
; -Original Message-----
> From: Robert Muir [mailto:rm...@apache.org]
> Sent: Thursday, April 12, 2012 1:33 PM
> To: d...@lucene.apache.org; solr-user@lucene.apache.org; Lucene mailing list;
> announce
> Subject: [ANNOUNCE] Apache Solr 3.6 released
>
> 12 April 2012, Apac
12 April 2012, Apache Solr™ 3.6.0 available
The Lucene PMC is pleased to announce the release of Apache Solr 3.6.0.
Solr is the popular, blazing fast open source enterprise search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, facet
On Wed, Apr 11, 2012 at 4:37 PM, jmlucjav wrote:
> Just to be sure, reproduced this with example config from 3.5.
>
Regardless of your tokenizer, be aware that with this version of solr
its going to split up terms based on 'identifier rules' (including
splitting on whitespace).
This is because su
you can actually plug in customized grammars and stuff like that, but
the simplest approach is to configure mappingcharfilter before your
tokenizer,
with mappings like: "c++" => "cplusplus"
On Tue, Apr 10, 2012 at 11:50 AM, Demian Katz wrote:
> It has been brought to my attention that ICUTokenize
Google and Baidu highlight chinese queries by making text red.
On Mon, Mar 12, 2012 at 11:50 PM, Lance Norskog wrote:
> How do you highlight terms in languages without boldface or italic
> modes? Maybe raise the text size a couple of sizes just for that word?
>
>
> --
> Lance Norskog
> goks...@gm
t; get it fixed*. Not only they will fix it, they will thank you for
> bringing it up!
>
> I can, as an old user, 100 % vouch what Robert said below.
>
> Simply, just go for it, test you application a bit and make your users happy.
>
>
>
>
> On Wed, Mar 7, 2012 at 5
On Wed, Mar 7, 2012 at 11:47 AM, Dirceu Vieira wrote:
> Hi All,
>
> Has anybody started using Solr 4.0 in production environments? Is it stable
> enough?
> I'm planning to create a proof of concept using solr 4.0, we have some
> projects that will gain a lot with features such as near real time se
On Wed, Jan 25, 2012 at 12:55 PM, Nalini Kartha wrote:
>
> Is there any reason why Solr doesn't support using multiple spellcheckers
> for a query? Is it because of performance overhead?
>
Thats not the case really, see https://issues.apache.org/jira/browse/SOLR-2926
I think the issue is that th
On Fri, Mar 2, 2012 at 9:41 AM, Ahmet Arslan wrote:
>
>> Robert, I just tried with
>> 3.6-SNAPSHOT 1296203 from svn - the problem is
>> still there.
>>
>> I am just about to leave for a vacation. I'll try to open a
>> JIRA issue this
>> evening.
>
> Andrew, thanks for providing files. I also re-pr
On Fri, Mar 2, 2012 at 7:37 AM, andrew wrote:
> I was able to create a test case.
>
> We are querying ranges of documents. When I tried to isolate the document
> that causes trouble, I found it happens with exactly every second request
> only for a single document query (it fails constantly when r
On Thu, Mar 1, 2012 at 6:43 AM, Husain, Yavar wrote:
> Hi
>
> For spell checking component I set extendedResults to get the frequencies and
> then select the word with the best frequency. I understand the spell check
> algorithm based on Edit Distance. For an example:
>
> Query to Solr: Marien
>
eb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote:
>
>> Please attach your docs if you dont mind.
>>
>> I worked up tests for this (in general for ANY phrase query,
>> increasing the slop should never remove results, only potentially
>> enlarge them).
>>
>
Dushay wrote:
> Robert,
>
> I will create a jira issue with the documentation. FYI, I tried ps values of
> 3, 2, 1 and 0 and none of them worked with dismax; For lucene QueryParser,
> only the value of 0 got results.
>
> - Naomi
>
>
> On Feb 23, 2012, at 11:12 AM,
n revolv through the antholog"~3
>
> NO result
>
>
>
>> lucene QueryParser:
>>
>> URL: q=all_search:"The Beatles as musicians : Revolver through the
>> Anthology"
>> final query: all_search:"the beatl as musician revolv through
On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay wrote:
> Jonathan has brought it to my attention that BOTH of my failing searches
> happen to have 8 terms, and one of the terms is repeated:
>
> "The Beatles as musicians : Revolver through the Anthology"
> "Color-blindness [print/digital]; its dan
On Thu, Feb 16, 2012 at 8:34 AM, Carlos Gonzalez-Cadenas
wrote:
> Hello all:
>
> We'd like to score the matching documents using a combination of SOLR's IR
> score with another application-specific score that we store within the
> documents themselves (i.e. a float field containing the app-specifi
On Wed, Feb 15, 2012 at 2:04 PM, Rohit wrote:
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> preserveOriginal="1" handleAsChar="@#"/>
There is no such parameter as 'handleAsChar'. If you want to do this,
you need to u
On Fri, Feb 10, 2012 at 6:18 AM, Dirk Högemann
wrote:
>
> Our suggest component and parts of our search is getting hard to use by
> this. Any other ideas?
>
Looks like https://issues.apache.org/jira/browse/PDFBOX-371
The title of the issue is a bit confusing (I don't think it should go
to hyphen
On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson wrote:
> Again thanks. I'll take a stab at that are you aware of any
> resources/examples of how to do this? I figured I'd start with
> WhiteSpaceTokenizer but wasn't sure if there was a simpler place to
> start.
>
Well, easiest is if you can build
On Thu, Feb 9, 2012 at 8:28 PM, Jamie Johnson wrote:
> Thanks Robert, I'll take a look there. Does it sound like I'm on the
> right the right track with what I'm implementing, in other words is a
> TokenFilter appropriate or is there something else that would be a
> better fit for what I've descr
If you are writing a custom tokenstream, I recommend using some of the
resources in Lucene's test-framework.jar to test it.
These find lots of bugs! (including thread-safety bugs)
For a filter: I recommend to use the assertions in
BaseTokenStreamTestCase: assertTokenStreamContents, assertAnalyzesT
countryid,
> c.plainname as countryname, p.timezone as timezone, r.id as regionid,
> r.plainname as regionname from places p, regions r, countries c, cities c2
> where c2.id = p.cityid AND p.settingid = 1 AND p.regionid > 1 AND
> p.countryid=c.id AND r.id=p.regionid"
> transformer="TemplateTransformer">
>
>
>
>> at
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>> at
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>
how long it will take to
> get a fix? Would I be better switching to trunk? Is trunk stable enough for
> someone who's very much a SOLR novice?
>
> Thanks,
> Dave
>
> On Mon, Jan 16, 2012 at 10:08 PM, Robert Muir wrote:
>
>> looks like https://issues.apache.org/j
looks like https://issues.apache.org/jira/browse/SOLR-2888.
Previously, FST would need to hold all the terms in RAM during
construction, but with the patch it uses offline sorts/temporary
files.
I'll reopen the issue to backport this to the 3.x branch.
On Mon, Jan 16, 2012 at 8:31 PM, Dave wrot
On Sat, Jan 14, 2012 at 5:09 PM, Lance Norskog wrote:
> Has the GermanAnalyzer behavior changed at all? This is another kind
> of mismatch, and it can cause very subtle problems. If text is
> indexed and queried using different Analyzers, queries will not do
> what you think they should.
It acts
On Sat, Jan 14, 2012 at 12:58 PM, wrote:
> Hi,
>
> I'm switching from Lucene 2.3 to Solr 3.5. I want to reuse the existing
> indexes (huge...).
If you want to use a Lucene 2.3 index, then you should set this in
your solrconfig.xml:
LUCENE_23
>
> In Lucene I use an untweaked org.apache.lucene.a
On Mon, Dec 26, 2011 at 10:54 AM, Koji Sekiguchi wrote:
> I don't have JUnit test case. What I tried was:
>
> I have indexing time synonym definition:
>
> nhl, national hockey league
>
> and I indexed "I like national hockey league".
>
> Then I searched nhl with hl=on, I got an unwanted highlight
The old one didn't really handle this correctly either.
Koji, what is the highlighting problem? Can we have a test case?
2011/12/26 Koji Sekiguchi :
> I found that SynonymFilter javadoc says:
>
> "Matches single or multi word synonyms in a token stream.
> This token stream cannot properly handle
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote:
> It seems like there is some weird stuff going on when folding the
> string, it can be seen in the analysis view, too:
>
> http://i.imgur.com/6B2Uh.png
>
I created a bug here, https://issues.apache.org/jira/browse/LUCENE-3642
Thanks for the screensho
On Mon, Dec 12, 2011 at 5:18 AM, Max wrote:
> The end offset remains 11 even after folding and transforming "æ" to
> "ae", which seems wrong to me.
End offsets refer to the *original text* so this is correct.
What is wrong, is EdgeNGramsFilter. See how it turns that 11 to a 12?
>
> I also stum
On Sun, Dec 11, 2011 at 11:34 AM, eks dev wrote:
> on the latest trunk, my schema.xml with field type declaration
> containing //codec="Pulsing"// does not work any more (throws
> exception from FieldType). It used to work wit approx. a month old
> trunk version.
>
> I didn't dig deeper, can be th
On Thu, Dec 8, 2011 at 12:55 PM, Jamie Johnson wrote:
> Thanks Andrzej. I'll continue to follow the portable format JIRA
> along with 3622, are there any others that you're aware of that are
> blockers that would be useful to watch?
>
There is a lot to be done, particularly norms and deleted doc
On Thu, Dec 8, 2011 at 10:46 AM, Mark Miller wrote:
>
> On Dec 8, 2011, at 8:50 AM, Jamie Johnson wrote:
>
>> Isn't the codec stuff merged with trunk now?
>
> Robert merged this recently AFAIK.
>
true but that issue only moved the majority of the rest of the index
(stored fields, term vectors, fi
On Thu, Dec 8, 2011 at 11:01 AM, Jay Luker wrote:
> Hi,
>
> I am trying to provide a means to search our corpus of nearly 2
> million fulltext astronomy and physics articles using regular
> expressions. A small percentage of our users need to be able to
> locate, for example, certain types of iden
On Tue, Nov 29, 2011 at 9:21 AM, elisabeth benoit
wrote:
> ok, thanks.
>
> I think it would be a nice improvment to consider inversion as distance =
> 1, since it's a so common mistake. The distance = 2 makes it difficult to
> correct transpositions on small words (for instance, the DirectSpellChe
On Tue, Nov 29, 2011 at 8:07 AM, elisabeth benoit
wrote:
> Hello,
>
> I'd like to know if the Levensthein distance algorithm used by Solr 4.0
> DirectSpellChecker (working quite well I must say) is considering an
> inversion as distance = 1 or distance = 2?
>
> For instance, if I write Monteruil a
On Mon, Nov 28, 2011 at 4:36 PM, Phil Hoy wrote:
> Added issue: https://issues.apache.org/jira/browse/SOLR-2926
> Please let me know if more information needs adding to JIRA.
>
> Phil
>
Thanks, I'll followup on the issue
--
lucidimagination.com
technically it could? I'm just not sure if the current spellchecking
apis allow for it? But maybe someone has a good idea on how to easily
expose this.
I think its a good idea.
Care to open a JIRA issue?
On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy wrote:
> Hi,
>
> Can the DirectSolrSpellChecker b
On Sat, Nov 26, 2011 at 8:43 PM, Michael Sokolov wrote:
> That's great news! We can't really track trunk, but it looks like this is
> targeted for 3.6, right? As a short-term alternative, I was considering
> using ICUFoldingFilter; this won't preserve some of the finer distinctions,
> but will at
On Wed, Nov 23, 2011 at 11:22 PM, Michael Sokolov wrote:
> Thanks for confirming that, and laying out the options, Robert.
>
FYI: Erick committed the multiterm stuff, so I opened an issue for
this: https://issues.apache.org/jira/browse/SOLR-2919
--
lucidimagination.com
hi,
locale sensitive range queries don't work with these filters, only sort,
although erick erickson has a patch that will enable this (the lowercasing
wildcards patch, then you could add this filter to your multiterm chain).
separately locale range queries and sort both work easily on trunk (wit
what is the point of a unique indexed field?
If for all of your fields, there is only one possible document, you
don't need length normalization, scoring, or a search engine at all...
just use a HashMap?
On Thu, Nov 10, 2011 at 7:42 AM, Ivan Hrytsyuk
wrote:
> Hello everyone,
>
> We have large in
On Wed, Nov 2, 2011 at 8:53 AM, Phil Hoy wrote:
> It is solr 4.0 and uses the new FSTSynonymFilterFactory i believe but defers
> to ZkSolrResourceLoader to load the synonym file when in cloud mode.
> Phil
>
FYI: The synonyms implementation supports multiple formats (currently
"solr" and "wordnet
On Fri, Oct 28, 2011 at 8:10 PM, Jason Rutherglen
wrote:
>> Otherwise we have "flexible indexing" where "flexible" means "slower
>> if you do anything but the default".
>
> The other encodings should exist as modules since they are pluggable.
> 4.0 can ship with the existing codec. 4.1 with addit
On Fri, Oct 28, 2011 at 5:03 PM, Jason Rutherglen
wrote:
> +1 I suggested it should be backported a while back. Or that Lucene
> 4.x should be released. I'm not sure what is holding up Lucene 4.x at
> this point, bulk postings is only needed useful for PFOR.
This is not true, most modern index
On Thu, Oct 27, 2011 at 6:00 PM, Simon Willnauer
wrote:
> we are not actively removing norms. if you set omitNorms=true and
> index documents they won't have norms for this field. Yet, other
> segment still have norms until they get merged with a segment that has
> no norms for that field ie. omit
The word delimiter filter also does other things, it treats ' as
punctuation by default. So it normally splits on ', except if its 's
(in this case it removes the 's completely if you use this
stemEnglishPossessive).
There are a couple approaches you can use:
1. you can keep worddelimiterfilter wi
On Wed, Oct 5, 2011 at 3:03 PM, David Ryan wrote:
> Do you mean both BM25 and BM25F?
>
>
No, BM25F and other "fielded" or structured models are somewhat different.
In these model, if you have two fields (body/title) you are saying
that "dogs" in body is actually the same term as "dogs" in title.
On Wed, Oct 5, 2011 at 2:23 PM, David Ryan wrote:
> Hi,
>
> According to the IRA issue 2959,
> https://issues.apache.org/jira/browse/LUCENE-2959
>
> BM25 will be included in the next release of LUCENE.
>
> 1). Will BM25F be included in the next release as well as part
> of LUCENE-2959?
should be
Your persian pdf problem is different, and already taken care of in pdfbox trunk
https://issues.apache.org/jira/browse/PDFBOX-1127
On Tue, Oct 4, 2011 at 2:04 PM, ahmad ajiloo wrote:
> I have this problem too, in indexing some of persian pdf files.
>
> 2011/10/4 Héctor Trujillo
>
>> Hi all, I'm
https://issues.apache.org/jira/browse/LUCENE-3421
Note: if you are using this 'includeSpanScore=false' (which I think
you are, as thats where the bug applies), be aware this means the
score is *only* the result of your payload, boosts, tf, length
normalization, idf, none of this is incorporated in
On Tue, Sep 20, 2011 at 12:32 PM, Michael McCandless
wrote:
>
> Or: is it possible you reopened the reader several times against the
> index (ie, after committing from Solr)? If so, I think 2.9.x never
> unmaps the mapped areas, and so this would "accumulate" against the
> system limit.
In order
On Mon, Sep 19, 2011 at 9:57 AM, Burton-West, Tom wrote:
> Thanks Robert,
>
> Removing "set" from " setMaxMergedSegmentMB" and using "maxMergedSegmentMB"
> fixed the problem.
> ( Sorry about the multiple posts. Our mail server was being flaky and the
> client lied to me about whether the messag
On Fri, Sep 16, 2011 at 6:53 PM, Burton-West, Tom wrote:
> Hello,
>
> The TieredMergePolicy has become the default with Solr 3.3, but the
> configuration in the example uses the mergeFactor setting which applys to the
> LogByteSizeMergePolicy.
>
> How is the mergeFactor interpreted by the Tiered
On Wed, Aug 17, 2011 at 3:12 AM, Tomas Zerolo
wrote:
> On Tue, Aug 16, 2011 at 03:58:29PM -0400, Grant Ingersoll wrote:
>> I know you mean well and are probably wondering what to do next [...]
>
> Still, a short heads-up like Johnson's would seem OK?
>
> After all, this is of concern to us all.
>
what subversion revision are you using? I think you just need to svn
up, as from the line number I can tell its before I fixed this bug in
trunk :)
On Fri, Aug 12, 2011 at 11:36 AM, O. Klein wrote:
> Spellchecker works fine, but when using spellcheck.q it gives following
> exception (queryAnalyze
On Wed, Aug 10, 2011 at 7:10 PM, Jeff Wartes wrote:
>
> After some further playing around, I think I understand what's going on.
> Because the SynonymFilterFactory pays attention to term position when it
> inserts a multi-word synonym, I had assumed it scanned for matches in a way
> that respec
https://issues.apache.org/jira/browse/LUCENE-3233
On Thu, Aug 4, 2011 at 7:24 PM, Arun Atreya wrote:
> Hello,
>
> I would like to know the best way to load a huge synonym list into Solr.
>
> I would like to do concept indexing (a.k.a category indexing) with Solr. For
> example, I want to be able
did you add the analysis-extras jar itself? thats what has this factory.
On Tue, Aug 2, 2011 at 5:03 AM, Satish Talim wrote:
> I am using Solr 3.3 on a Windows box.
>
> I want to use the solr.ICUTokenizerFactory in my schema.xml and added the
> fieldType name="text_icu" as per the URL -
> http://
On Wed, Jul 27, 2011 at 4:12 PM, Fuad Efendi wrote:
> Thanks Robert!!!
>
> "Submitted On 26-JUL-2011" - yesterday.
>
> This option was popular in HbaseŠ
Then you should tell them also, not to use it, if they want their loops to work.
--
lucidimagination.com
Don't use this option, these optimizations are buggy:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134
On Wed, Jul 27, 2011 at 3:56 PM, Fuad Efendi wrote:
> Anyone tried this? I can not start Solr-Tomcat with following options on
> Ubuntu:
>
> JAVA_OPTS="$JAVA_OPTS -Xms2048m -Xmx2048m
ually have
any synonyms, so it could indicate a configuration mistake.
On Tue, Jul 12, 2011 at 12:02 AM, Stuart King wrote:
> Sorry Robert,
>
> What does that mean? Should I be providing synonyms in my queries?
>
> Cheers
>
> Stu
>
> On Tue, Jul 12, 2011 at 1:49 PM,
I just committed a fix for this, to warn that you are using an empty
set of synonyms instead of error.
On Mon, Jul 11, 2011 at 10:50 PM, Stuart King wrote:
> I have been building and running against trunk. In my build I have a number
> of tests, testing solr functionality within my app.
>
> As of
re-open does work, but you cannot ignore its return value! see the
javadocs for an example.
On Tue, Jul 5, 2011 at 3:10 PM, Gabriele Kahlout
wrote:
> Re-open doens't work, but open does.
>
> @Test
> public void testUpdate() throws IOException,
> ParserConfigurationException, SAXException, Pars
July 2011, Apache Solr™ 3.3 available
The Lucene PMC is pleased to announce the release of Apache Solr 3.3.
Solr is the popular, blazing fast open source enterprise search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted searc
On Mon, Jun 27, 2011 at 8:47 AM, Bernd Fehling
wrote:
>
> correct!!!
>
but what i said, is totally different than what you said.
you are still wrong.
On Mon, Jun 27, 2011 at 8:30 AM, Bernd Fehling
wrote:
> Unicode U+ ist UTF-8 byte sequence "ef bf bf" that is right.
>
> But I was saying that UTF-8 0x (which is byte sequence "ff ff") is
> illegal
> and that's what the java.io.CharConversionException is complaining about.
> "Invalid UTF-
On Mon, Jun 27, 2011 at 7:11 AM, Bernd Fehling
wrote:
>
> So there is no UTF-8 0x. It is illegal.
>
you are wrong: it is legally encoded as a three byte sequence: ef bf bf
On Thu, Jun 23, 2011 at 4:10 AM, Tarjei Huse wrote:
> On 06/20/2011 01:51 PM, Robert Muir wrote:
>> you must use junit 4.7.x, not junit 4.8.x
> Is there a way around this?
>
No, the only thing option we can do is decide to require 4.8
> Depending on a specific Junit version
On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling
wrote:
> While trying some synonyms.txt files I noticed a huge increase of heap
> usage.
>
> synonyms_1.txt --> 6645 lines (2826104 bytes in size)
> results in 66364 entries in SynonymMap with 730MB heap usage.
> Startup time about 2 minutes.
>
> syn
the problem is that before
https://issues.apache.org/jira/browse/SOLR-2567, Solr invoked the
TieredMergePolicy "setters" *before* it tried to apply these 'global'
mergeFactor etc params.
So, even if you set them explicitly inside the , they
would then get clobbered by these 'global' params / defau
if you can create a issue, with a reproducible test, we can try to
come up with a workaround... no promises but I'd be willing to give it
a shot.
On Mon, Jun 20, 2011 at 10:11 AM, Bernd Fehling
wrote:
>
> Now this is a good one, PorterStemFilter kills JVM (reproducible).
>
> Should I post this on
you must use junit 4.7.x, not junit 4.8.x
On Mon, Jun 20, 2011 at 6:21 AM, Jakob Vad Nielsen
wrote:
> Hi,
>
> I'm trying to create some integrations tests within my project using JUnit
> and the SolrTestCaseJ4 (from Solr-test-framework 3.2.0) helper class. The
> problem is that I'm getting an Ass
This is a bug, thanks for including all the information necessary to reproduce!
https://issues.apache.org/jira/browse/LUCENE-3215
On Sun, Jun 19, 2011 at 10:24 PM, Chris Book wrote:
> Hello, I have a solr search server running and in at least one very rare
> case, I'm seeing a strange scoring re
On Thu, Jun 16, 2011 at 3:23 PM, Gabriele Kahlout
wrote:
>> I'm trying to assess the impact of coord (search-time) on Qtime. In one
> implementation coord returns 1, while in another it's actually computed.
On query time?
coord should be really cheap (unless your impl does something like
calcula
On Tue, Jun 14, 2011 at 7:07 PM, Shawn Heisey wrote:
> Because the text in my index comes in many different languages with no
> ability to know the language ahead of time, I have a need to use
> ICUTokenizer and/or the CJK filters, but I have a problem with them as they
> are implemented currently
June 2011, Apache Solr 3.2™ available
The Lucene PMC is pleased to announce the release of Apache Solr 3.2.
Solr is the popular, blazing fast open source enterprise search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted searc
On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
wrote:
> If you still want IDF for other fields then i
> think you have a problem because Solr doesn't yet support per-field
> similarity.
>
it does in trunk: https://issues.apache.org/jira/browse/SOLR-2338
On Mon, May 16, 2011 at 5:33 PM, Smiley, David W. wrote:
> Lucid's KStemmer is LGPL and the Solr committers have shown that they don't
> want LGPL libraries shipping with Solr. If you are intent on releasing your
> changes, I suggest attaching both the modified source and the compiled jar
> ont
On Fri, May 13, 2011 at 7:07 AM, Paul Libbrecht wrote:
> I sure wish such a compound-analysis would be done with a lucene-powered
> dictionary!
> That would rock.
>
me too, but its a chicken-and-egg problem (you would have to basically
index everything without decomposition to get the dictionar
On Sun, May 1, 2011 at 11:28 AM, Andy wrote:
> Hi,
>
> I read on this mailing list previously that NRT was implemented in 4.0, it
> just wasn't ready for production yet. Then I looked at the wiki
> (http://wiki.apache.org/solr/NearRealtimeSearch). It listed 2 jira issues
> related to NRT: SOLR
e, "PET" might be a
> synonym of "positron emission tomography", but "pet" wouldn't be.
>
> -Mike
>
> On 04/26/2011 09:51 AM, Robert Muir wrote:
>>
>> On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic
>> wrote:
>>
>>
&
On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic
wrote:
> But somehow this feels bad (well, so does sticking word variations in what's
> supposed to be a synonyms file), partly because it means that the person
> adding
> new synonyms would need to know what they stem to (or always check it aga
What do you have in solrconfig.xml for luceneMatchVersion?
If you don't set this, then its going to default to "Lucene 2.9"
emulation so that old solr 1.4 configs work the same way. I tried your
example and it worked fine here, and I'm guessing this is probably
whats happening.
the default in the
On Mon, Apr 25, 2011 at 2:05 PM, Otis Gospodnetic
wrote:
> Hi,
>
> Are there any good / comprehensive examples of protwords.txt for English?
> Or good stemdict.txt examples that work with StemmerOverrideFilterFactory?
>
> Would be good to have a good example to include in Solr distribution...
>
I
On Sat, Apr 23, 2011 at 11:36 PM, Ramanathapuram, Rajesh
wrote:
> What is really weird is if I search for srchterm1 and srchterm2
> separately, the results come up fine. If I search for multiple terms,
> this issue seems to happen when the terms are separated by html tags and
> special characters
On Fri, Apr 22, 2011 at 3:09 PM, Bently Preece wrote:
> What if there is no standard localization already? The case I'm
> specifically interested in is Ojibwe.
>
this is standard? to sort a field with a specific locale, you have to
tell it the locale you want. if you use the ICU implementation y
On Fri, Apr 22, 2011 at 2:37 PM, Bently Preece wrote:
> Thank you. This looks like the right direction.
>
> I see the docs say ICUCollationKeyFilterFactory is deprecated in favor of
> ICUCollationField. So ... I'd implement a subclass of ICUCollationField,
> and use that as the fieldtype in sche
please see http://wiki.apache.org/solr/UnicodeCollation
In general the idea is similar to how this is handled in databases,
you can index collation keys into a sort field at analysis time, then
you just do a standard solr sort.
However, I am not sure if your JRE provides a "haw" Locale for the
Ha
No, this is only a bug in analysis.jsp.
you can see this by comparing analysis.jsp's "dontstems bees" to using
the query debug interface:
"dontstems bees"
"dontstems bees"
PhraseQuery(text:"dontstems bee")
text:"dontstems bee"
On Wed, Apr 20, 2011 at 2:43 PM, Yonik Seeley
wrote:
> On We
Hi, there is a proposed patch uploaded to the issue. Maybe you can
help by reviewing/testing it?
2011/4/20 Robert Gründler :
> Hi all,
>
> i'm getting the following exception when using highlighting for a field
> containing HTMLStripCharFilterFactory:
>
> org.apache.lucene.search.highlight.Invalid
On Mon, Apr 18, 2011 at 1:31 PM, Demian Katz wrote:
> Hello,
>
> I'm interested in trying out the new ICU features in Solr 3.1. However, when
> I attempt to set up a field type using solr.ICUTokenizerFactory and/or
> solr.ICUFoldingFilterFactory, Solr refuses to start up, issuing "Error
> load
On Tue, Apr 12, 2011 at 6:44 AM, Tommaso Teofili
wrote:
> Hi all,
> I am porting a previously series of Solr plugins developed for 1.4.1 version
> to 3.1.0, I've written some integration tests extending the
> AbstractSolrTestCase [1] utility class but now it seems that wasn't included
> in the sol
On Thu, Apr 7, 2011 at 2:13 PM, Siddharth Powar
wrote:
> Hey guys,
>
> I am in the process of moving to solr3.1 from solr1.4. I am having this
> issue where solr3.1 now complains about the synonyms.txt file. I get the
> following error:
> *org.apache.solr.common.SolrException: Error loading resour
in eclipse you need to set your project's character encoding to UTF-8.
if you are checking out the source code from svn, you can run 'ant eclipse'
from the top level, and then hit refresh on your project. it will set your
encoding and your classpath up.
On Tue, Apr 5, 2011 at 6:10 PM, Eric Groble
There are some new features in 3.1 to make it easier to tune this
stuff, especially:
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/src/java/org/apache/solr/analysis/StemmerOverrideFilterFactory.java
This takes a tab separate list of words->stems, and sets a flag to any
down
101 - 200 of 373 matches
Mail list logo