Thanks, Toru and Chris,
I tried both the CJKTokenizer and CJKAnalyzer. Both return some unexpected
highlight results when I tested with Germany. The field value I searched is
Ein Mann beißt den Hund. The search criteria is beißt.
When using CJKAnalyzer, beißt is treated as 2 single terms(bei
Hi Yonik,
Sorry to jump on an old post
There is a change interface in JIRA, as long as all of the fields
originally sent are stored.
Do you remember the JIRA issue, or a token to find it ? It sounds useful
in some cases, for example, when you are working on analysers. That
could be real
Hi Hoss.
I've done a few tests using reflection to instantiate a simple object and
the results will vary a lot depending on the JVM. As the JVM optimizes code
as it is executed it will vary depending on the usage, but I think we have
something to consider:
If done 1,000 samples (5 clean X loop
Sorry I've confused things a bit... The thread safeness have to be
considered only on the Tokenizers, not on the factories. So are the
Tokenizers thread safe?
Regards,
Daniel
On 22/6/07 11:36, Daniel Alheiros [EMAIL PROTECTED] wrote:
Hi Hoss.
I've done a few tests using reflection to
Tokenizers are not thread safe (I made a mistake yesterday saying they are - I
don't know what I was thinking).
This is why:
public abstract class Tokenizer extends TokenStream {
/** The text source for this Tokenizer. */
protected Reader input; oops :(
: Sorry I've confused things a bit... The thread safeness have to be
: considered only on the Tokenizers, not on the factories. So are the
: Tokenizers thread safe?
nope ... they are constructed using Readers and mainting state about the
text they are processing ... the only api is a next()
On 21-Jun-07, at 10:22 PM, Chris Hostetter wrote:
like i said though: i'm in favore of factories like this ... i just
don't
think we should do anything to hide their use and make refering to
Tokenizer or TOkenFilter class names directly use reflection magicly.
What would be the best way
Hi Daniel,
As you know, Chinese and Japanese does not use
space or any other delimiters to break words.
To overcome this problem, CJKTokenizer uses a method
called bi-gram where the run of ideographic (=Chinese)
characters are made into tokens of two neighboring
characters. So a run of five
Hi David
1) you will have to re-add the documents, solr does not support an
update operation (only add/del)
2) same as above, solr does not support an update operation, you will
need to re-add the document with the updated numberField, if its any
help I have a popularity field in my index (3
Have you tried using the PHP functions utf8_decode/utf8_encode?
As far as I understand only UTF8 is supported (but I could be wrong on that!)
-Nick
On 6/23/07, escher2k [EMAIL PROTECTED] wrote:
Is it possible to use Windows 1252 encoding instead of UTF-8 for Solr ? The
application runs
on
: Is it possible to use Windows 1252 encoding instead of UTF-8 for Solr ? The
not at the moment...
https://issues.apache.org/jira/browse/SOLR-96
-Hoss
: There is a change interface in JIRA, as long as all of the fields
: originally sent are stored.
:
: Do you remember the JIRA issue, or a token to find it ? It sounds useful
: in some cases, for example, when you are working on analysers. That
: could be real life for me in future.
12 matches
Mail list logo