Same reason, i do not know which delimiter regex to use:-(
On Wed, Dec 16, 2009 at 3:02 PM, Ghazal Gharooni
wrote:
> Hello,
> Why don't you use String Tokenizer for splitting the result?
>
>
> On Tue, Dec 15, 2009 at 9:45 PM, Weiwei Wang wrote:
> > I want to split this parsed result string: name
Hello,
Why don't you use String Tokenizer for splitting the result?
On Tue, Dec 15, 2009 at 9:45 PM, Weiwei Wang wrote:
> I want to split this parsed result string: name:"zhong guo" name:friend
> server:172.16.65.79
>
> into
>
> name:"zhong guo"
> name:friend
> server:172.16.65.79
>
> how can I
I want to split this parsed result string: name:"zhong guo" name:friend
server:172.16.65.79
into
name:"zhong guo"
name:friend
server:172.16.65.79
how can I write a regular pattern to do that?
I'm not familiar with regex and tried a few patterns which didn't work
--
Weiwei Wang
Alex Wang
王巍巍
R
Thanks Robert, a lot is learned from you:-)
On Wed, Dec 16, 2009 at 11:53 AM, Robert Muir wrote:
> Hi, just one more thought for you.
>
> I think even more important than anything I said before, you should ensure
> you implement reusableTokenStream in your analyzer.
> this becomes a necessity if
Hi, just one more thought for you.
I think even more important than anything I said before, you should ensure
you implement reusableTokenStream in your analyzer.
this becomes a necessity if you are using expensive objects like this.
2009/12/15 Weiwei Wang
> Finally, i make it run, however, it w
hello, this was mainly to show you a quick-and-dirty way to solve the
problem.
if you have a lot of text, here are some ways to optimize:
1. the 'cleanup' step I showed you, is extremely inefficient way to remove
the space and diacritics.
For your case perhaps you can use more efficient ways to av
Finally, i make it run, however, it works so slow
2009/12/15 Weiwei Wang
> got it, thanks, Robert
>
>
> On Tue, Dec 15, 2009 at 10:19 PM, Robert Muir wrote:
>
>> if you have lucene 2.9 or 3.0 source code, just run patch -p0 <
>> /path/to/LUCENE-XXYY.patch from the lucene source code root direc
Query classification is an interesting question and there are many papers
discussed this. For more infomation, you could refe these papers, "A
taxonomy of web search", "Understanding user goal in web search", "Our
winning solution to query classification in KDDCUP 2005".
In your question, i think
I think you can do this with search suggestion like algorithms.
First, you should categorize the search log, e.g. Thai Restaurant or Chinese
Restaurant or KFC should be assigned categories including Restaurant.
When user is typing, figure out from the search log which keyword is nearest
to the in
Can anybody help me or maybe point me to relevant resources I could learn
from ?
Thanks.
Check your field typo first
- 原始邮件 -
发件人: Michel Nadeau
发送时间: 2009年12月16日 星期三 4:48
收件人: java-user@lucene.apache.org
主题: Re: Tokenized fields in Lucene 3.0.0
I search like this -
IndexReader reader = IndexReader.open(idx, true);
IndexSearcher searcher = new IndexSearcher(reader);
Que
Thanks for bringing closure, this was scaring me ...
On Tue, Dec 15, 2009 at 4:31 PM, Michel Nadeau wrote:
> Forget it - I found the problem. There was an escaping problem on the
> search-client side.
>
> Sorry about that.
>
> - Mike
> aka...@gmail.com
>
>
> On Tue, Dec 15, 2009 at 3:48 PM, Mich
Forget it - I found the problem. There was an escaping problem on the
search-client side.
Sorry about that.
- Mike
aka...@gmail.com
On Tue, Dec 15, 2009 at 3:48 PM, Michel Nadeau wrote:
> I search like this -
>
> IndexReader reader = IndexReader.open(idx, true);
> IndexSearcher searcher =
I search like this -
IndexReader reader = IndexReader.open(idx, true);
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",
cluStdAn); // StandardAnalyzer
q = parser.parse(QUERY);
TopDocs td = searcher.search(q, cluCF,
Any more info to share?
In 2.9, Tokenized literally == Analyzed.
/** @deprecated this has been renamed to {...@link #ANALYZED} */
public static final Index TOKENIZED = ANALYZED;
Michel Nadeau wrote:
> Hi,
>
> I just realized that since I upgraded from Lucene 2.x to 3.0.0 (and removed
> a
Hi,
I just realized that since I upgraded from Lucene 2.x to 3.0.0 (and removed
all deprecated things), searches like that don't work anymore:
test AND blue
test NOT blue
(test AND blue) OR red
etc.
Before 3.0.0, I was inserting my fields like this:
doc.add(new Field("content", sValues[j], Fiel
I haven't used Lucene are read the Lucene book in quite a while since I
handed over my university thesis quite a few years ago. However I'm
currently building an ecommerce site from an asp skeleton, the current
search and recommendation algorithms are built on limited SQL searches but
I'd like to
got it, thanks, Robert
On Tue, Dec 15, 2009 at 10:19 PM, Robert Muir wrote:
> if you have lucene 2.9 or 3.0 source code, just run patch -p0 <
> /path/to/LUCENE-XXYY.patch from the lucene source code root directory... it
> should create the necessary directory and files.
> then run 'ant' , in th
If you're using an IDE, there should be an "apply patch" somewhere. In
Eclipse, you right-click on the project>>team>>apply patch.
In IntelliJ, it's something like Version Control>>(subversion???)>>apply
patch
Or do as Robert suggests from the command line...
HTH
Erick
On Tue, Dec 15, 2009
Paul Taylor wrote:
CharStream.Found it at
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/PatternReplaceFilter.java?revision=804726&view=markup,
BTW why not ad this to the Lucene coebase rather than solr code base.
Unfortunately it doesn't address my problem be
if you have lucene 2.9 or 3.0 source code, just run patch -p0 <
/path/to/LUCENE-XXYY.patch from the lucene source code root directory... it
should create the necessary directory and files.
then run 'ant' , in this case it should create a lucene-icu jar file in the
build directory.
the patch doesn
Yes, i found the patch file LUCENE-1488.patch and there's no icu directory
in my dowloaded contrib directory.
I'm a rookie guy using patch, i'm currently in the contrib dir, could
anybody tell me how to execute this patch command to generate the relevant
dir and souce files?
On Tue, Dec 15, 2009
look at the latest patch file attached to the issue, it should work with
lucene 2.9 or greater (I think)
2009/12/15 Weiwei Wang
> where can i find the source code?
>
> On Tue, Dec 15, 2009 at 9:40 PM, Robert Muir wrote:
>
> > there is an icu transform tokenfilter in the patch here:
> > http://i
where can i find the source code?
On Tue, Dec 15, 2009 at 9:40 PM, Robert Muir wrote:
> there is an icu transform tokenfilter in the patch here:
> http://issues.apache.org/jira/browse/LUCENE-1488
>
>Transliterator pinyin = Transliterator.getInstance("Han-Latin");
>Tokenizer tokenizer = n
there is an icu transform tokenfilter in the patch here:
http://issues.apache.org/jira/browse/LUCENE-1488
Transliterator pinyin = Transliterator.getInstance("Han-Latin");
Tokenizer tokenizer = new KeywordTokenizer(new StringReader("中国"));
ICUTransformFilter filter = new ICUTransformFil
Hi, guys,
I'm implementing a search engine based on Lucene for Chinese. So I want
to support pinyin search as Google China do.
e.g.
“中国” means Chinese in English
this word's pinyin input is "zhongguo"
The feature i want to implement is when user type zhongguo the results will
include
WordDelimiterFilter is implemented in an old version where nextToken is
called
On Tue, Dec 15, 2009 at 7:17 PM, Koji Sekiguchi wrote:
> Weiwei Wang wrote:
>
>> Hi, all
>> I currently need a TokenFilter to break token season07 into two
>> tokens
>> season 07
>>
>>
>>
> I'd recommend you to r
> > And if you do it yourself, don't forget to call clearAttributes()
> whenever
> > you produce new tokens (else you may have bugs in the token increments).
> In
> > the old token api its Token.clear()... Just a warning!
>
> This comment has worried me, is this ok or am i meant to call
> clearAtt
KeywordAnalyzer can not handle a whole complete sentence.
On Tue, Dec 15, 2009 at 7:33 PM, Ganesh wrote:
> How about KeywordAnalyzer? It will treat C++ and C# as single term.
>
> Regards
> Ganesh
>
> - Original Message -
> From: "Chris Lu"
> To:
> Sent: Saturday, December 12, 2009 5:27
Thanks Simon, i used a similar approach as SpellChecker to do search
suggestion.
I'll try that in search correction.
On Tue, Dec 15, 2009 at 7:30 PM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:
> Weiwei,
> Lucene Contrib offers a Spellchecker package that might help you with
> your a
Uwe Schindler wrote:
And if you do it yourself, don't forget to call clearAttributes() whenever
you produce new tokens (else you may have bugs in the token increments). In
the old token api its Token.clear()... Just a warning!
This comment has worried me, is this ok or am i meant to call
clear
How about KeywordAnalyzer? It will treat C++ and C# as single term.
Regards
Ganesh
- Original Message -
From: "Chris Lu"
To:
Sent: Saturday, December 12, 2009 5:27 AM
Subject: Re: Lucene Analyzer that can handle C++ vs C#
> What we did in DBSight is to provide a reserved list of wor
Weiwei,
Lucene Contrib offers a Spellchecker package that might help you with
your application. The spellchecker takes a dictionary of terms (build
from your search index or from some other text resource) and builds a
suggestion index from those terms. Internally terms are indexed as
ngrams. You ca
Have seen it! It is easy to implement. Thanks!
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Nick Burch [mailto:n...@torchbox.com]
> Sent: Tuesday, December 15, 2009 12:23 PM
> To: java-user@lucene.apa
On Mon, 14 Dec 2009, Uwe Schindler wrote:
Can you open an issue? This is a problem in SnowballAnalyzer missing to
add the set ctor.
Sure, I have done - http://issues.apache.org/jira/browse/LUCENE-2165
Nick
-
To unsubscribe, e
And if you do it yourself, don't forget to call clearAttributes() whenever
you produce new tokens (else you may have bugs in the token increments). In
the old token api its Token.clear()... Just a warning!
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@
Weiwei Wang wrote:
Hi, all
I currently need a TokenFilter to break token season07 into two tokens
season 07
I'd recommend you to refer WordDelimiterFilter in Solr.
Koji
--
http://www.rondhuit.com/en/
-
To unsubscri
I recall reading Google does it based on statistical analysis of what words
users type. For example if I search for "googl" and then my next search is
for "google" that is stored. Next time someone types "googl", "google" is
suggested.
Sorry, I don't have a source to link you to on that though.
C
any ideas please?
--
View this message in context:
http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26792364.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsu
Hi, all
I currently need a TokenFilter to break token season07 into two tokens
season 07
I tried PatternReplaceCharFilter to replace "season07" with "season 07",
however, the offset is not correct for Highlighting. For this reason, I want
to implement a TokenFilter, but I do not know how to
40 matches
Mail list logo