Hi, Erick
Thanks for your suggestion, putting the declaration of StringBuffer
variable sb inside the for loop is working well. I want to ask another
question, can we modify the StopyAnalyzer to insert Stop Words of
another language, instead of English, like Urdu given below:
public stati
>
> can we modify the StopyAnalyzer to insert Stop Words of
> another language, instead of English, like Urdu given below:
> public static final String[] URDU_STOP_WORDS = { "پر", "کا", "کی", "کو" };
>
"new StandardAnalyzer(URDU_STOP_WORDS)" should work.
Regards,
Doron
I would start at the Lucene Java home page (http://lucene.apache.org/java
) and dig in from there. There are a number of good docs on Scoring
and the IR model used (Boolean plus Vector.) From there, I would dig
into the javadocs and whip up some example code that indexes a set of
tokens an
Great, I think. Except now I am really interested about the exception
and what settings you had for heap size, Lucene version, etc.
On Dec 23, 2007, at 11:03 PM, Zhou Qi wrote:
Hi , Grant
After I adjust the mergefactor of indexwriter from 1000 to 100, it
worked.
Thank you.
22 Dec 20
Any advice on this? Thanks.
> From: [EMAIL PROTECTED]
> To: java-user@lucene.apache.org
> Subject: Pagination ...
> Date: Sat, 22 Dec 2007 10:19:30 -0500
>
>
> Hi,
>
> What is the most efficient way to do pagination in Lucene? I have always done
> the following because this "flavor" of the se
Using the search function for pagination will carry out unnecessary index
search when you are going previous or next. Generally, most of the
information need (e.g 80%) can be satisfied by the first 100 documents
(20%). In lucene, the returing documents is set to 100 for the sake of
speed.
I am not
Hi Grant,
The exception is throw from java native method."Failed to merge indexes,
java.lang.OutOfMemoryError: Java heap space ". ( I have set the -Xmx1024m in
JVM.)
I guess it is similar as the problem appeared in previous thread before (
http://www.nabble.com/Index-merge-and-java-heap-space-tt50
You might want to take a look at Solr (http://lucene.apache.org/solr/). You
could either use Solr directly, or see how they implement paging.
--Mike
On Dec 26, 2007 12:12 PM, Zhou Qi <[EMAIL PROTECTED]> wrote:
> Using the search function for pagination will carry out unnecessary index
> searc
I'm working on a project where we will be searching across several languages
with a single query. There will be different categories which will include
different groups of languages to search (i.e. category "a": English, French,
Spanish; category "b": Spanish, Portugese, Itailian, etc) Originally I
Hi, Doro Cohen
Thanks for your reply, but I am facing a small problem over here. As I
am using notepad for coding, then in which format the file should be saved.
public static final String[] URDU_STOP_WORDS = { "کے" ,"کی" ,"سے" ,"کا"
,"کو" ,"ہے" };
Analyzer analyzer = new StandardAnalyzer(
"javac" has an option "-encoding", which tells the compiler the encoding
the input source file is using, this will probably solve the problem.
or you can try the unicode escape: \u, then you can save it in ANSI,
had for human to read though.
or use an IDE, eclipse is a good choice, you can se
李晓峰 wrote:
"javac" has an option "-encoding", which tells the compiler the
encoding the input source file is using, this will probably solve the
problem.
or you can try the unicode escape: \u, then you can save it in
ANSI, had for human to read though.
or use an IDE, eclipse is a good choic
It's the notepad.
It adds byte-order-mark(BOM, in this case 65279, or 0xfeff.) in front of
your file, which javac does not recognize for reasons not quite clear to me.
here is the bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
it won't be fixed, so try to eliminate BOM before co
or you can save it as "Unicode" and javac -encoding Unicode
this way you can still use notepad.
Liaqat Ali 写道:
李晓峰 wrote:
"javac" has an option "-encoding", which tells the compiler the
encoding the input source file is using, this will probably solve the
problem.
or you can try the unicode e
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
> Using javac -encoding UTF-8 still raises the following error.
>
> urduIndexer.java : illegal character: \65279
> ?
> ^
> 1 error
>
> What I am doing wrong?
>
If you have the stop-words in a file, say one word in a line,
they can be
Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
urduIndexer.java : illegal character: \65279
?
^
1 error
What I am doing wrong?
If you have the stop-words in a file, say one word in a l
Are you altering (stemming) the token before it gets to the StopFilter?
On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote:
Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
urduIndexer.java : illegal
Grant Ingersoll wrote:
Are you altering (stemming) the token before it gets to the StopFilter?
On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote:
Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
ur
On Dec 26, 2007, at 5:24 PM, Liaqat Ali wrote:
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
No, at this level I am not using any stemming technique. I am just
trying to elim
Grant Ingersoll wrote:
On Dec 26, 2007, at 5:24 PM, Liaqat Ali wrote:
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
No, at this level I am not using any stemming technique. I
20 matches
Mail list logo