Re: Umlauts as Char

2011-02-07 Thread Stefan Bodewig
On 2011-02-08, Prescott Nasser wrote: in the void subsitute function you'll see them: else if ( buffer.charAt( c ) == 'ü' ) { buffer.setCharAt( c, 'u' ); } This does not constitue a character in .net (that I can figure out) and thus it doesn't compile. The .java

RE: Umlauts as Char

2011-02-07 Thread Prescott Nasser
Stefan somewhat nailed it on the head. My concerns where the java characters - I can't even search google or bing for them. So I can take the source codes word that 'ü' is the u with dots over it (becuase it says replace umlauts in the source notes). But, I guess, is that really true? Is that

[jira] Updated: (LUCENE-2910) Highlighter does not correctly highlight the phrase around 50th term

2011-02-07 Thread Shinya Kasatani (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinya Kasatani updated LUCENE-2910: Attachment: HighlighterFix.patch A test case that describes the problem, along with a

[jira] Created: (LUCENE-2910) Highlighter does not correctly highlight the phrase around 50th term

2011-02-07 Thread Shinya Kasatani (JIRA)
Highlighter does not correctly highlight the phrase around 50th term Key: LUCENE-2910 URL: https://issues.apache.org/jira/browse/LUCENE-2910 Project: Lucene - Java Issue

[jira] Updated: (LUCENE-2910) Highlighter does not correctly highlight the phrase around 50th term

2011-02-07 Thread Shinya Kasatani (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinya Kasatani updated LUCENE-2910: Description: When you use the Highlighter combined with N-Gram tokenizers such as

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-02-07 Thread Nick Pellow (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991315#comment-12991315 ] Nick Pellow commented on LUCENE-2666: - Hi Michael, This issue was entirely a

[jira] Commented: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-07 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991316#comment-12991316 ] Robert Muir commented on LUCENE-2909: - Is the bug really in NGramTokenFilter? This

[jira] Commented: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991319#comment-12991319 ] Uwe Schindler commented on LUCENE-2909: --- The problem has nothing to do with

[jira] Commented: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-07 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991320#comment-12991320 ] Robert Muir commented on LUCENE-2909: - You are right, some stemmers increase the

[jira] Updated: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-07 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2909: Attachment: LUCENE-2909_assert.patch here's a check we can add to BaseTokenStreamTestCase for

Re: Threading of JIRA e-mails in gmail?

2011-02-07 Thread Dawid Weiss
Just a follow-up to this one: no reply from infra yet, but I simply tried my config. on people.apache.org and it works like a charm, so for Apache committers and gmail users this is probably a life-saver. My config is described in a comment here: https://issues.apache.org/jira/browse/INFRA-3403

Re: Threading of JIRA e-mails in gmail?

2011-02-07 Thread Doron Cohen
Thanks Dawid It is not working for me yet, looking for the reason for that... Doron On Mon, Feb 7, 2011 at 12:48 PM, Dawid Weiss dawid.we...@cs.put.poznan.plwrote: Just a follow-up to this one: no reply from infra yet, but I simply tried my config. on people.apache.org and it works like a

Re: Threading of JIRA e-mails in gmail?

2011-02-07 Thread Dawid Weiss
Looks like my action prompted a response from infra and it's not encouraging -- they're supposedly switching off procmail support on that server soon. Track INFRA-3403 to see what will come out of this, I don't want to spam this list. Eh. Dawid On Mon, Feb 7, 2011 at 1:14 PM, Doron Cohen

Re: Distributed Indexing

2011-02-07 Thread Upayavira
I'm saying that deterministic policies are a requirement that *some* people will want. Others might want a random spread. Thus, I'd have deterministic based on ID and random as the two initial implementations. Upayavira NB. In case folks haven't worked it out already, I have been tasked to mentor

Re: Distributed Indexing

2011-02-07 Thread Upayavira
Surely you want to be implementing an UpdateRequestProcessor, rather than a RequestHandler. The ContentStreamHandlerBase, in the handleRequestBody method gets an UpdateRequestProcessor and uses it to process the request. What we need is that handleRequestBody method to, as you have suggested,

Maintain stopwords.txt and other files

2011-02-07 Thread Timo Schmidt
Hello together, i am currently developing a search solution, based on Apache Solr. Currently I have the problem that I want to offer the user the possibility to maintain synonyms and stopwords in a userfriendy tool. But currently I could not find any possibility to write the stopwords.txt or

Tokenization and Fuzziness: How to Allow Multiple Strategies?

2011-02-07 Thread Tavi Nathanson
Hey everyone, Tokenization seems inherently fuzzy and imprecise, yet Lucene does not appear to provide an easy mechanism to account for this fuzziness. Let's take an example, where the document I'm indexing is v1.1.0 mr. jones da...@gmail.com I may want to tokenize this as follows: [v1.1.0,

Scoring: Precedent for a Better, Less Fragile Approach?

2011-02-07 Thread Tavi Nathanson
Hey everyone, I have a question about Lucene/Solr scoring in general. It really feels like a wobbly house of cards that falls down whenever I make the slightest tweak. There are many factors at play in Lucene scoring: they're all fighting with each other, and very often one will completely

Re: [REINDEX] Note: re-indexing required !

2011-02-07 Thread Earwin Burrfoot
Lucene maintains compatibility with earlier stable release index versions, and to some extent transparently upgrades them. But there is no guaranteed compatibility between different in-development indexes. E.g. 3.2 reads 3.1 indexes and upgrades them, but 3.2-dev-snapshot-10 (while happily

RE: Tokenization and Fuzziness: How to Allow Multiple Strategies?

2011-02-07 Thread Steven A Rowe
Hi Tavi, solr-...@lucene.apache.org has been deprecated since the Lucene and Solr source trees merged last year. Please use dev@lucene.apache.org instead. However, your question is about *usage* of Lucene/Solr, rather than *development*, so you should be using solr-u...@lucene.apache.org or

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-02-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991545#comment-12991545 ] Michael McCandless commented on LUCENE-2666: Ahh, thanks for bringing closure

[jira] Commented: (LUCENE-2908) clean up serialization in the codebase

2011-02-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991559#comment-12991559 ] Michael McCandless commented on LUCENE-2908: +1 clean up serialization in

[jira] Resolved: (SOLR-2350) improve post.jar to handle non UTF-8 files

2011-02-07 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2350. Resolution: Fixed Committed revision 1068149. - trunk Committed revision 1068152. - 3x improve post.jar

Re: Keyword - search statistics

2011-02-07 Thread Erick Erickson
Solr doesn't keep meta data, so if you're asking for some kind of search logging your app has to provide that... Best Erick On Sun, Feb 6, 2011 at 10:46 PM, Selvaraj Varadharajan selvara...@gmail.com wrote: Hi Is there any way i can get 'no of times' a key word searched in SOLR ?

Re: Keyword - search statistics

2011-02-07 Thread Bill Bell
You can also use Google Analytics or something like that too to get stats. Bill Bell Sent from mobile On Feb 7, 2011, at 4:31 PM, Erick Erickson erickerick...@gmail.com wrote: Solr doesn't keep meta data, so if you're asking for some kind of search logging your app has to provide that...

Re: Keyword - search statistics

2011-02-07 Thread Selvaraj Varadharajan
Thanks Eric. What about having another core and interpret the request calls and pool it in that core.. ? Do we see any performance hit form your point of view. -Selvaraj On Mon, Feb 7, 2011 at 3:31 PM, Erick Erickson erickerick...@gmail.comwrote: Solr doesn't keep meta data, so if you're

Re: Keyword - search statistics

2011-02-07 Thread Erick Erickson
You have to explain your problem in *much* more detail for anyone to make a really relevant comment, all we can do so far is guess what you're *really* after Best Erick On Mon, Feb 7, 2011 at 8:25 PM, Selvaraj Varadharajan selvara...@gmail.comwrote: Thanks Eric. What about having another

Umlauts as Char

2011-02-07 Thread Prescott Nasser
Hey all, So while digging into the code a bit (and pushed by digy's Arabic conversion yesterday). I started looking at the various other languages we were missing from java. I started porting the GermanAnalyzer, but ran into an issue of the Umlauts...

Re: Keyword - search statistics

2011-02-07 Thread Vijay Raj
You can have a custom SearchComponent and configure a listener to the same. Checkout example/solr/config/solrconfig.xml , regarding configuring custom query components , before and after the default list of components, that can help provide some of this 'aspect' behavior. arr

[jira] Commented: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work

2011-02-07 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991771#comment-12991771 ] Hoss Man commented on SOLR-2348: My hope had been that this would be really

Re: CustomScoreQueryWithSubqueries

2011-02-07 Thread Fernando Wasylyszyn
Robert: I'm trying to follow the steps that are mentioned in: http://wiki.apache.org/lucene-java/HowToContribute in order to make a patch with my contribution. But, in the source code that I get from: http://svn.apache.org/repos/asf/lucene/dev/trunk/ the class

Should ASCIIFoldingFilter be deprecated?

2011-02-07 Thread David Smiley (@MITRE.org)
ISOLatin1AccentFilter is deprecated, presumably because you can (and should) use MappingCharFilter configured with mapping-ISOLatin1Accent.txt. By that same reasoning, shouldn't ASCIIFoldingFilter be deprecated in favor of using mapping-FoldToASCII.txt ? ~ David Smiley - Author:

RE: Should ASCIIFoldingFilter be deprecated?

2011-02-07 Thread Steven A Rowe
AFAIK, ISOLatin1AccentFilter was deprecated because ASCIIFoldingFilter provides a superset of it mappings. I haven't done any benchmarking, but I'm pretty sure that ASCIIFoldingFilter can achieve a significantly higher throughput rate than MappingCharFilter, and given that, it probably makes

Re: Should ASCIIFoldingFilter be deprecated?

2011-02-07 Thread Chris Hostetter
: : ISOLatin1AccentFilter is deprecated, presumably because you can (and should) : use MappingCharFilter configured with mapping-ISOLatin1Accent.txt. By that : same reasoning, shouldn't ASCIIFoldingFilter be deprecated in favor of using : mapping-FoldToASCII.txt ? CharFilters and TokenFilters

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 4621 - Failure

2011-02-07 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/4621/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriter.testOptimizeTempSpaceUsage Error Message: optimize used too much temporary space: starting usage was 60814 bytes; max temp usage was 244924 but

Re: CustomScoreQueryWithSubqueries

2011-02-07 Thread Doron Cohen
Hi Fernando, The wiki indeed relates mainly to trunk development. For creating a 2.9 patch checkout code from /repos/asf/lucene/java/branches/lucene_2_9 Regards, Doron As the wiki page says Most development is done on the trunk You can either use that, or, in order On Tue, Feb 8, 2011 at

[jira] Created: (SOLR-2351) Allow the MoreLikeThis component to accept filters and use the already parsed query from previous stages (if applicable) as seed.

2011-02-07 Thread Amit Nithian (JIRA)
Allow the MoreLikeThis component to accept filters and use the already parsed query from previous stages (if applicable) as seed. - Key: SOLR-2351

[jira] Updated: (SOLR-2351) Allow the MoreLikeThis component to accept filters and use the already parsed query from previous stages (if applicable) as seed.

2011-02-07 Thread Amit Nithian (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Nithian updated SOLR-2351: --- Attachment: mlt.patch Allow the MoreLikeThis component to accept filters and use the already parsed

[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

2011-02-07 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991818#comment-12991818 ] Lance Norskog commented on SOLR-2155: - The lat/long version has to be rotated away from