Re: MappingCharFilterFactory equivalent for use after tokenizer?

2010-06-18 Thread Robert Muir
idea.) > > I don't think we should do this. how many tokens would make? (such malformed input exists in the wild, e.g. someone spills beer on their keyboard and they key gets sticky) -- Robert Muir rcm...@gmail.com

Re: Plural only stemmer

2010-06-17 Thread Robert Muir
make it work with the KStem jars? > > Thanks! > -- Robert Muir rcm...@gmail.com

Re: Index-time vs. search-time boosting performance

2010-06-05 Thread Robert Muir
rage efficiency and written to the directory (when writing the document) in a single byte (!)" If you do this as an index-time boost, your boosts will lose lots of precision for this reason. -- Robert Muir rcm...@gmail.com

Re: Help with Shingled queries

2010-06-04 Thread Robert Muir
) ()", > "parsedquery_toString":"+() ()", > "explain":{}, > "QParser":"DisMaxQParser", > "altquerystring":null, > "boostfuncs":null, > "filter_queries":["atomId:(8235 10914 10911 )"], > "parsed_filter_queries":["atomId:8235 atomId:10914 atomId:10911"], > "timing":{ .. > > Does anyone know what I could be doing wrong here, is it a bug in the debug > output, a stupid mistake misconception or piece of idiocy on my part or > something else. > > > Many thanks > > -- Greg Bowyer > > > -- Robert Muir rcm...@gmail.com

Re: Wildcard queries

2010-05-21 Thread Robert Muir
is german character. Another tokenstream that does, is the unicode case-folding algorithm [requires code dependent on ICU at the moment] LowerCaseFilter is *not* unicode-compliant as far as casing goes. toLowerCase is intended for display, not for case-insensitive matching. -- Robert Muir rcm...@gmail.com

Re: Wildcard queries

2010-05-21 Thread Robert Muir
er refine my initial question > to: What's the idea behind the fact that no *lowercasing* is performed on > wildcarded search terms if the field in question contains a LowercaseFilter > in its associated field type definition? > > -Sascha > > Robert Muir wrote: >> >>

Re: Wildcard queries

2010-05-21 Thread Robert Muir
;odd" behaviour if needed. To ensure backward compatibility the "odd" >> behaviour should be the default anymore. >> >> Am I missing any drawbacks? >> >> Best, >> Sascha >> > > -- Robert Muir rcm...@gmail.com

Re: Wildcard queries

2010-05-21 Thread Robert Muir
see a config parameter (per field type) that allows to disable this "odd" > behaviour if needed. To ensure backward compatibility the "odd" behaviour > should be the default anymore. > > Am I missing any drawbacks? > > Best, > Sascha > > -- Robert Muir rcm...@gmail.com

Re: Stemming Filters in wiki

2010-05-19 Thread Robert Muir
Analyzers, Tokenizers, and Token Filters wiki page.  Is > there a reason for this? > > Thanks, > > asif > > > -- > Asif Rahman > Lead Engineer - NewsCred > a...@newscred.com > http://platform.newscred.com > -- Robert Muir rcm...@gmail.com

Re: Which Solr to use?

2010-05-18 Thread Robert Muir
e/dev/branches/branch_3x/lucene/CHANGES.txt https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/contrib/CHANGES.txt -- Robert Muir rcm...@gmail.com

Re: mix cased search terms

2010-04-24 Thread Robert Muir
querytime. >> >> Kind regards >> - Mitch >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/mix-cased-search-terms-tp747279p747502.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > -- Robert Muir rcm...@gmail.com

Re: LucidWorks Solr

2010-04-22 Thread Robert Muir
stemmer to ignore. Both of these filters set a special attribute for this token in the tokenstream that all stemmers respect, and they won't do any stemming on this token -- Robert Muir rcm...@gmail.com

Re: LucidWorks Solr

2010-04-21 Thread Robert Muir
http://www.clef-campaign.org/2003/WN_web/19.pdf (for languages with compound word forms, the lexical approach helps, obviously, but for stuff like English, Italian, nope) -- Robert Muir rcm...@gmail.com

Re: LucidWorks Solr

2010-04-21 Thread Robert Muir
of > running. Its also just a simple example. > > Its not orthogonal, e.g. "running water" -- Robert Muir rcm...@gmail.com

Re: LucidWorks Solr

2010-04-21 Thread Robert Muir
t; > I think running/ran has more problems, the word is so ambiguous that whether or not your search engine stems it right isn't going to matter anyway (running for office has nothing to do with running shoes, etc) -- Robert Muir rcm...@gmail.com

Re: LucidWorks Solr

2010-04-21 Thread Robert Muir
nce there isn't much context. So what snowball does (simply stemming build, building, buildings all to "build") might seem silly at first, but you can see how it avoids this entire mess. -- Robert Muir rcm...@gmail.com

Re: LucidWorks Solr

2010-04-21 Thread Robert Muir
lemmatization rather than > stemming. > > Lemmatization usually requires part-of-speech, too. I was gonna use my build, building, buildings example but I see wikipedia already has a nice explained example (meeting) here: http://en.wikipedia.org/wiki/Lemmatisation -- Robert Muir rcm...@gmail.com

Re: LucidWorks Solr

2010-04-21 Thread Robert Muir
quot;googling" or "tweets" that have recently slipped into English vocabulary, but an algorithmic stemmer will likely deal with these just fine. -- Robert Muir rcm...@gmail.com

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-12 Thread Robert Muir
blem... but > it still seems to me that something isn't working the way it is supposed to > in this particular case. > > - Demian > > > -Original Message- > > From: Robert Muir [mailto:rcm...@gmail.com] > > Sent: Friday, April 09, 2010 12:05 PM > &g

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-09 Thread Robert Muir
t; > > > // hyphenated word as phrase, hyphen removed > > > > > > > > > > Here is VuFind's text field type definition: > > > > > > > > > > > > > > positionIncrementGap="100"> > > > > > > > > > > > > > > > > > > > words="stopwords.txt" enablePositionIncrements="true"/> > > > > > > > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > > > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > > > > > > > > > > > language="English" > > > > > protected="protwords.txt"/> > > > > > > > > > > > > > > version="icu4j" composed="false" remove_diacritics="true" > > > > > remove_modifiers="true" fold="true"/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > synonyms="synonyms.txt" > > > > > ignoreCase="true" expand="true"/> > > > > > > > > > words="stopwords.txt" enablePositionIncrements="true"/> > > > > > > > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > > > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > > > > > > > > > > > language="English" > > > > > protected="protwords.txt"/> > > > > > > > > > > > > > > version="icu4j" composed="false" remove_diacritics="true" > > > > > remove_modifiers="true" fold="true"/> > > > > > > > > > > > > > > > > > > > > > > > > > I did notice that in the "text" field type in VuFind's schema has > > > > > "catenateWords" and "catenateNumbers" turned on in both the index > > and > > > > query > > > > > analyzer chains. It is my understanding that these options > > should be > > > > > disabled for the query chain and only enabled for the index > > chain. > > > > However, > > > > > this may be a red herring -- I have already tried changing this > > > > setting, but > > > > > it didn't change the success/failure pattern described above. I > > have > > > > also > > > > > played with the preserveOriginal setting without apparent effect. > > > > > > > > > > From playing with the Field Analysis tool, I notice that there is > > a > > > > gap in > > > > > the term position sequence after analysis... but I'm not sure if > > > > this is > > > > > significant. > > > > > > > > > > Has anybody else run into this sort of problem? Any ideas on a > > fix? > > > > > > > > > > thanks, > > > > > Demian > > > > > > > > > > > > > > -- Robert Muir rcm...@gmail.com

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Robert Muir
e behavior and maybe poke around > to see if it's an easy fix. > > Thanks > Erick > > On Thu, Apr 8, 2010 at 8:16 AM, Robert Muir wrote: > > > Erick, this sounds like https://issues.apache.org/jira/browse/SOLR-1852 > > > > On Wed, Apr 7, 2010

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Robert Muir
ements="true"/> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > > language="English" > > protected="protwords.txt"/> > > > > > version="icu4j" composed="false" remove_diacritics="true" > > remove_modifiers="true" fold="true"/> > > > > > > > > > > I did notice that in the "text" field type in VuFind's schema has > > "catenateWords" and "catenateNumbers" turned on in both the index and > query > > analyzer chains. It is my understanding that these options should be > > disabled for the query chain and only enabled for the index chain. > However, > > this may be a red herring -- I have already tried changing this setting, > but > > it didn't change the success/failure pattern described above. I have > also > > played with the preserveOriginal setting without apparent effect. > > > > From playing with the Field Analysis tool, I notice that there is a gap > in > > the term position sequence after analysis... but I'm not sure if this is > > significant. > > > > Has anybody else run into this sort of problem? Any ideas on a fix? > > > > thanks, > > Demian > > > > > -- Robert Muir rcm...@gmail.com

Re: Is this a bug of the RessourceLoader?

2010-04-05 Thread Robert Muir
ng dealt with by > Solr right now, should we consider that a bug? No. > Is there something we > can/should be doing in SolrResourceLoader to make Solr handle this > situation better? > > Yes, we can ignore them for the first line of the file to be more user-friendly. I'll open an issue. -- Robert Muir rcm...@gmail.com

Re: Is this a bug of the RessourceLoader?

2010-04-01 Thread Robert Muir
look at the file with "more" and the first character appears to be , then you can confirm thats the problem. -- Robert Muir rcm...@gmail.com

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-30 Thread Robert Muir
don't know if this would even be a good fit, just suggesting a way for an individual term to get back an enumeration of similar terms very quickly, that could be some portion of the overall larger algorithm. -- Robert Muir rcm...@gmail.com

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-30 Thread Robert Muir
with the support thats there (the LevenshteinAutomata class), to quickly get back candidates. You can intersect/concatenate/union these DFAs with prefix or suffix DFAs if you want too, don't really understand what the algorithm should do, but I'm happy to try to help. -- Robert Muir rcm...@gmail.com

Re: Trouble compiling SOLR

2010-03-26 Thread Robert Muir
Lucene's trunk. Its needed for a variety of purposes such as Junit4 support. -- Robert Muir rcm...@gmail.com

Re: Stopwords

2010-03-17 Thread Robert Muir
ich can address both concerns (performance and relevance) to some degree. -- Robert Muir rcm...@gmail.com

Re: Cleaning up dirty OCR

2010-03-11 Thread Robert Muir
ss (and then partition by language > and attempt to find the suspicious words in each partition) > and if you are really OCR'ing Urdu text and trying to search it automatically, then this is your last priority. -- Robert Muir rcm...@gmail.com

Re: Cleaning up dirty OCR

2010-03-11 Thread Robert Muir
sort of character frequency data for your non-english text, are you OCR'ing that data too? Are you using Unicode normalization or anything to prevent explosion of terms that are really the same? -- Robert Muir rcm...@gmail.com

Re: Cleaning up dirty OCR

2010-03-11 Thread Robert Muir
; reading and then I'll post a question to either the Solr or Lucene list.  Can > you suggest which list I should post an index pruning question to? > I would recommend posting it to the JIRA issue: http://issues.apache.org/jira/browse/LUCENE-1812 This way someone who knows more (Andrzej) could see it, too. -- Robert Muir rcm...@gmail.com

Re: Cleaning up dirty OCR

2010-03-09 Thread Robert Muir
> Can anyone suggest any practical solutions to removing some fraction of the > tokens containing OCR errors from our input stream? one approach would be to try http://issues.apache.org/jira/browse/LUCENE-1812 and filter terms that only appear once in the document. -- Robert Mu

Re: PDF extraction leads to reversed words

2010-03-09 Thread Robert Muir
u can apply to trunk (with all necessary resources) here: https://issues.apache.org/jira/browse/SOLR-1657 The included testcase fails without adding icu4j to the lib directory (as the arabic text is reversed), and passes with it. -- Robert Muir rcm...@gmail.com

Re: PDF extraction leads to reversed words

2010-03-09 Thread Robert Muir
. Solr trunk uses pdfbox-0.8.0-incubating.jar, which does support Arabic, if you also put ICU in the classpath. -- Robert Muir rcm...@gmail.com

Re: PDF extraction leads to reversed words

2010-03-09 Thread Robert Muir
is fixed now. However > Tika doesn't yet use this version of PDFBox. > So for PDF text extraction, I doesn't use Tika but pdftotext. > > Dominique > > > Le 09/03/10 06:00, Robert Muir a écrit : >> >> it is an optional dependency of PDFBox. If ICU is availabl

Re: PDF extraction leads to reversed words

2010-03-09 Thread Robert Muir
sorry for the link to the wrong JIRA issue, was looking at another issue. its here: https://issues.apache.org/jira/browse/SOLR-1813 again you will need to apply it to trunk I think, as thats the only place I have tested it. -- Robert Muir rcm...@gmail.com

Re: PDF extraction leads to reversed words

2010-03-08 Thread Robert Muir
n the Solr trunk? > > On Mon, Mar 8, 2010 at 5:15 PM, Robert Muir wrote: >> I think the problem is that Solr does not include the ICU4J jar, so it >> won't work with Arabic PDF files. >> >> Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your

Re: PDF extraction leads to reversed words

2010-03-08 Thread Robert Muir
ting MsWord document. > I think the problem come from Tika ! > > Any clue ? > > -- > elsadek > Software Engineer- J2EE / WEB / ESB MULE > -- Robert Muir rcm...@gmail.com

Re: Cyrillic problem

2010-03-01 Thread Robert Muir
rainian characters? > > -- > > View this message in context: > > http://old.nabble.com/Cyrillic-problem-tp27744106p27749323.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > -- Robert Muir rcm...@gmail.com

Re: If you could have one feature in Solr...

2010-02-25 Thread Robert Muir
0 at 10:48 AM, Gora Mohanty wrote: > On Thu, 25 Feb 2010 07:54:06 -0500 > Robert Muir wrote: > > > Gora, I wonder perhaps if there is a documentation issue. > > > > e.g. Thai, Arabic, Chinese were mentioned here previously, these > > are all supported, too

Re: If you could have one feature in Solr...

2010-02-25 Thread Robert Muir
Gora, I wonder perhaps if there is a documentation issue. e.g. Thai, Arabic, Chinese were mentioned here previously, these are all supported, too. Let me know if you have any ideas! On Thu, Feb 25, 2010 at 7:45 AM, Gora Mohanty wrote: > On Thu, 25 Feb 2010 07:37:33 -0500 > Robert Muir

Re: If you could have one feature in Solr...

2010-02-25 Thread Robert Muir
east in the open-source world. > -- Robert Muir rcm...@gmail.com

Re: If you could have one feature in Solr...

2010-02-24 Thread Robert Muir
On Wed, Feb 24, 2010 at 9:22 AM, Markus Jelsma wrote: > > - stemmers for many more different languages > > I don't want to hijack this thread, but i would like to know which languages you are interested in! -- Robert Muir rcm...@gmail.com

Re: Odd wildcard behavior

2010-02-22 Thread Robert Muir
words="stopwords.txt" >enablePositionIncrements="true" > /> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > -- > View this message in context: > http://old.nabble.com/Odd-wildcard-behavior-tp27695404p27695404.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Robert Muir rcm...@gmail.com

Re: Why ASCIIFoldingFilter is not a CharFilter

2010-02-22 Thread Robert Muir
ed but trying to understand > why. This makes sense. Thanks Erik and Robert. > > On Mon, Feb 22, 2010 at 6:16 AM, Robert Muir wrote: > > > right, most stemmers expect the diacritics to be in their input to work > > correctly, too. > > > > On Sun, Feb 21, 2010 a

Re: Why ASCIIFoldingFilter is not a CharFilter

2010-02-21 Thread Robert Muir
r it >> is >> not a CharFilter. Is there a reason why? >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > > -- Robert Muir rcm...@gmail.com

Re: parsing strings into phrase queries

2010-02-18 Thread Robert Muir
i gave it a rough shot Lance, if there's a better way to explain it, please edit On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog wrote: > That would be great. After reading this and the PositionFilter class I > still don't know how to use it. > > On Wed, Feb 17, 2010 at

Re: parsing strings into phrase queries

2010-02-17 Thread Robert Muir
nFilter when i > wrote that message. > > > > -Hoss > > -- Robert Muir rcm...@gmail.com

Re: problem with edgengramtokenfilter and highlighter

2010-02-14 Thread Robert Muir
thanks Joe, good catch! On Sun, Feb 14, 2010 at 2:43 PM, Joe Calderon wrote: > lucene-2266 filed and patch posted. > > On 02/13/2010 09:14 PM, Robert Muir wrote: > >> Joe, can you open a Lucene JIRA issue for this? >> >> I just glanced at the code and it looks like

Re: problem with edgengramtokenfilter and highlighter

2010-02-13 Thread Robert Muir
ported > start/end offsets. > -- Robert Muir rcm...@gmail.com

Re: parsing strings into phrase queries

2010-02-13 Thread Robert Muir
take a look at PositionFilter On Feb 13, 2010 1:14 AM, "Kevin Osborn" wrote: Right now if I have the query model:(Nokia BH-212V), the parser turns this into +(model:nokia model:"bh 212 v"). The problem is that I might have a model called Nokia BH-212, so this is completely missed. In my case, I

Re: Thanks Robert!

2010-02-04 Thread Robert Muir
these tokenstreams. On Thu, Feb 4, 2010 at 5:37 PM, Jason Rutherglen wrote: > Robert, thanks for redoing all the Solr analyzers to the new API! It > helps to have many examples to work from, best practices so to speak. > -- Robert Muir rcm...@gmail.com

Re: Hindi language support in solr

2010-01-21 Thread Robert Muir
ry new in solr. > I download latest release 1.4 and install. For Indexing and Searching I am > using SolrJ api. > My Question is "How to enable solr to search hindi language text ?". > Please Help me.. > > thanks > with regards > Ranveer K Kumar > -- Robert Muir rcm...@gmail.com

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

2010-01-18 Thread Robert Muir
field:ami field:mil >> field:ily >> >> [1] >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory >> >> >> >> > > > -- > 梅旺生 > -- Robert Muir rcm...@gmail.com

Re: Stripping Punctuation in a fieldType

2010-01-15 Thread Robert Muir
the >> > same way. >> > >> > Would the StandardTokenizerFactory accomplish this? >> > Does it have any language specific functionality? >> > Does it do anything with stemming? >> > >> > Thanks for everyone's input! >> > >> > -Dave >> > >> > >> > >> > -Original Message- >> > From: Ahmet Arslan [mailto:iori...@yahoo.com] >> > Sent: Friday, January 15, 2010 12:42 PM >> > To: solr-user@lucene.apache.org >> > Subject: Re: Stripping Punctuation in a fieldType >> > >> > > I'm trying to find the best way to set up a fieldType that >> > > strips punctuation. >> > >> > Use solr.StandardTokenizerFactory that strips punctuations. >> > >> > Or if you do not care about alphanumeric or numeric queries use >> > solr.LowerCaseTokenizerFactory that uses LetterTokenizer. >> > >> > I think the right way to do this is using a >> > > CharacterFilter >> > > of some type, but I can't seem to find any examples of how >> > > to set this >> > > up in a schema.xml file. >> > >> > If you want to use solr.MappingCharFilterFactory you need to write all >> > punctiation characters to a text file manually. e.g. "," => "" >> > >> > >> > >> > >> > -- Robert Muir rcm...@gmail.com

Re: Stripping Punctuation in a fieldType

2010-01-15 Thread Robert Muir
ve >> >> >> >> -Original Message- >> From: Ahmet Arslan [mailto:iori...@yahoo.com] >> Sent: Friday, January 15, 2010 12:42 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Stripping Punctuation in a fieldType >> >> > I'm trying to find the best way to set up a fieldType that >> > strips punctuation. >> >> Use solr.StandardTokenizerFactory that strips punctuations. >> >> Or if you do not care about alphanumeric or numeric queries use >> solr.LowerCaseTokenizerFactory that uses LetterTokenizer. >> >> I think the right way to do this is using a >> > CharacterFilter >> > of some type, but I can't seem to find any examples of how >> > to set this >> > up in a schema.xml file. >> >> If you want to use solr.MappingCharFilterFactory you need to write all >> punctiation characters to a text file manually. e.g. "," => "" >> >> >> >> > -- Robert Muir rcm...@gmail.com

Re: Multi language support

2010-01-13 Thread Robert Muir
two languages. And a very good movie. > > wunder > > On Jan 12, 2010, at 6:55 PM, Robert Muir wrote: > >> sorry, i forgot to include this 2009 paper comparing what stopwords do >> across 3 languages: >> >> http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456

Re: Multi language support

2010-01-12 Thread Robert Muir
. I'm >>> > interested in using as many features of solr as possible. Synonyms, >>> > Stopwords and stems all sounds quite interesting and useful but how do >>> > I set up this in a good way for a multilingual site? >>> > >>> > The site don't have a huge text mass so performance issues don't >>> > really bother me but still I'd like to hear your suggestions before I >>> > try to implement an solution. >>> > >>> > Best regards >>> > >>> > Daniel >>> >> > > > > -- > Lance Norskog > goks...@gmail.com > -- Robert Muir rcm...@gmail.com

Re: Multi language support

2010-01-12 Thread Robert Muir
tion of the site. I'm >>> > interested in using as many features of solr as possible. Synonyms, >>> > Stopwords and stems all sounds quite interesting and useful but how do >>> > I set up this in a good way for a multilingual site? >>> > >>> > The site don't have a huge text mass so performance issues don't >>> > really bother me but still I'd like to hear your suggestions before I >>> > try to implement an solution. >>> > >>> > Best regards >>> > >>> > Daniel >>> >> > > > > -- > Lance Norskog > goks...@gmail.com > -- Robert Muir rcm...@gmail.com

Re: weird sorting behavior

2009-12-25 Thread Robert Muir
characters > except a-z. This is the reason behind those wierd results. You could try > removing that filter and see if thats what you need. > > -- > Regards, > Shalin Shekhar Mangar. -- Robert Muir rcm...@gmail.com

Re: SOLR 1.4: how to configure the improved chinese analyzer?

2009-12-09 Thread Robert Muir
lp you may provide on this. > > Thanks, > -- > View this message in context: > http://old.nabble.com/SOLR-1.4%3A-how-to-configure-the-improved-chinese-analyzer--tp26706709p26706709.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Robert Muir rcm...@gmail.com

Re: how to do partial word searches?

2009-11-25 Thread Robert Muir
d case, so I created another field that is a lowered version of the > title called "textTitle", it is of type text. > > Is it possible with solr to achieve what I am trying to do, if so how? If > not, anything closer than what I have? > > thanks > Joel > > -- Robert Muir rcm...@gmail.com

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-13 Thread Robert Muir
Thanks for the link - there doesn't seem a be a fix version specified, > so I guess this will not officially ship with lucene 2.9? > > -Peter > > On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir wrote: > > Peter, here is a project that does this: > > http://issues.ap

Re: non english languages

2009-11-13 Thread Robert Muir
> > Best regards, > > Chuck > -- Robert Muir rcm...@gmail.com

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-11 Thread Robert Muir
ing to be added, or is there any other > >> >> documentation in addition to the blog post? In particular, there was > >> >> a thread last year about using an N-gram tokenizer to enable > >> >> reasonable (if not ideal) searching of CJK text, so I'd be curious to > >> >> know how people are configuring their schema (with this tokenizer?) > >> >> for that use case. > >> >> > >> >> Thanks, > >> >> > >> >> Peter > >> >> > >> >> -- > >> >> Peter M. Wolanin, Ph.D. > >> >> Momentum Specialist, Acquia. Inc. > >> >> peter.wola...@acquia.com > >> > > >> > > >> > >> > >> > >> -- > >> Peter M. Wolanin, Ph.D. > >> Momentum Specialist, Acquia. Inc. > >> peter.wola...@acquia.com > > > > > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Robert Muir rcm...@gmail.com

Re: Wordnet dictionary integration with Solr - help

2009-10-20 Thread Robert Muir
ll use it in my existing (default) sysname dictinary. > > > > Appreciate your help on answering this. > > > > Thanks. > > > > > > -- > View this message in context: > http://www.nabble.com/Wordnet-dictionary-integration-with-Solr---help-tp25963682p25983067.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Robert Muir rcm...@gmail.com

Re: A little help with indexing joined words

2009-10-05 Thread Robert Muir
okens are counted just like non-overlap tokens. > Well, I don't see a reason as to why someone would need a length based > normalization on such matches. I always have done omitNorms while using > fields with this filter. -- Robert Muir rcm...@gmail.com

Re: Difficulty with Multi-Word Synonyms

2009-09-17 Thread Robert Muir
this level? SynonymMap subMap = map.submap.get(tok.termBuffer(), 0, tok.termLength()); -- Robert Muir rcm...@gmail.com

Re: Spanish Stemmer

2009-08-18 Thread Robert Muir
hi, it looks like you might just have a simple typo: if you change it to language="Spanish" it should work. -- Robert Muir rcm...@gmail.com

Re: Language Detection for Analysis?

2009-08-06 Thread Robert Muir
we're actually collecting -- we just get blocks of text. Has >> anyone here worked on language detection so we can figure out what >> analyzers to use? Are there commercial solutions? >> >> Much appreciated! >> >> -- >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > -- Robert Muir rcm...@gmail.com

Re: Language Detection for Analysis?

2009-08-06 Thread Robert Muir
e here worked on language detection so we can figure out what > analyzers to use? Are there commercial solutions? > > Much appreciated! > > -- > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > -- Robert Muir rcm...@gmail.com

Re: Indexing TIKA extracted text. Are there some issues?

2009-07-29 Thread Robert Muir
ghts. >>> >>> - ashok >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24708854.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> > > -- > View this message in context: > http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24728917.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Robert Muir rcm...@gmail.com

Re: µTorrent indexed as µTorrent

2009-07-28 Thread Robert Muir
t's going > on??? > > Bill > -- Robert Muir rcm...@gmail.com

<    1   2   3   4