[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

2011-02-07 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991818#comment-12991818 ] Lance Norskog commented on SOLR-2155: - The lat/long version has to be rotated away from

[jira] Updated: (SOLR-2351) Allow the MoreLikeThis component to accept filters and use the already parsed query from previous stages (if applicable) as seed.

2011-02-07 Thread Amit Nithian (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Nithian updated SOLR-2351: --- Attachment: mlt.patch > Allow the MoreLikeThis component to accept filters and use the already parsed

[jira] Created: (SOLR-2351) Allow the MoreLikeThis component to accept filters and use the already parsed query from previous stages (if applicable) as seed.

2011-02-07 Thread Amit Nithian (JIRA)
Allow the MoreLikeThis component to accept filters and use the already parsed query from previous stages (if applicable) as seed. - Key: SOLR-2351

Re: CustomScoreQueryWithSubqueries

2011-02-07 Thread Doron Cohen
Hi Fernando, The wiki indeed relates mainly to trunk development. For creating a 2.9 patch checkout code from /repos/asf/lucene/java/branches/lucene_2_9 Regards, Doron As the wiki page says > Most development is done on the "trunk" You can either use that, or, in order On Tue, Feb 8, 2011 at

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 4621 - Failure

2011-02-07 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/4621/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriter.testOptimizeTempSpaceUsage Error Message: optimize used too much temporary space: starting usage was 60814 bytes; max temp usage was 244924 but shou

Re: Should ASCIIFoldingFilter be deprecated?

2011-02-07 Thread Chris Hostetter
: : ISOLatin1AccentFilter is deprecated, presumably because you can (and should) : use MappingCharFilter configured with mapping-ISOLatin1Accent.txt. By that : same reasoning, shouldn't ASCIIFoldingFilter be deprecated in favor of using : mapping-FoldToASCII.txt ? CharFilters and TokenFilters ha

RE: Should ASCIIFoldingFilter be deprecated?

2011-02-07 Thread Steven A Rowe
AFAIK, ISOLatin1AccentFilter was deprecated because ASCIIFoldingFilter provides a superset of it mappings. I haven't done any benchmarking, but I'm pretty sure that ASCIIFoldingFilter can achieve a significantly higher throughput rate than MappingCharFilter, and given that, it probably makes se

Should ASCIIFoldingFilter be deprecated?

2011-02-07 Thread David Smiley (@MITRE.org)
ISOLatin1AccentFilter is deprecated, presumably because you can (and should) use MappingCharFilter configured with mapping-ISOLatin1Accent.txt. By that same reasoning, shouldn't ASCIIFoldingFilter be deprecated in favor of using mapping-FoldToASCII.txt ? ~ David Smiley - Author: https://ww

Re: CustomScoreQueryWithSubqueries

2011-02-07 Thread Fernando Wasylyszyn
Robert: I'm trying to follow the steps that are mentioned in: http://wiki.apache.org/lucene-java/HowToContribute in order to make a patch with my contribution. But, in the source code that I get from: http://svn.apache.org/repos/asf/lucene/dev/trunk/ the class org.apache.lucene.search.Searcher

[jira] Commented: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work

2011-02-07 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991771#comment-12991771 ] Hoss Man commented on SOLR-2348: My hope had been that this would be really straightforward

Re: Keyword - search statistics

2011-02-07 Thread Vijay Raj
You can have a custom SearchComponent and configure a listener to the same. Checkout example/solr/config/solrconfig.xml , regarding configuring custom query components , before and after the default list of components, that can help provide some of this 'aspect' behavior. myFirstCo

Umlauts as Char

2011-02-07 Thread Prescott Nasser
Hey all, So while digging into the code a bit (and pushed by digy's Arabic conversion yesterday). I started looking at the various other languages we were missing from java. I started porting the GermanAnalyzer, but ran into an issue of the Umlauts... http://svn.apache.org/viewvc/lucene/

Re: Keyword - search statistics

2011-02-07 Thread Erick Erickson
You have to explain your problem in *much* more detail for anyone to make a really relevant comment, all we can do so far is guess what you're *really* after Best Erick On Mon, Feb 7, 2011 at 8:25 PM, Selvaraj Varadharajan wrote: > Thanks Eric. > What about having another core and interpret

Re: Keyword - search statistics

2011-02-07 Thread Selvaraj Varadharajan
Thanks Eric. What about having another core and interpret the request calls and pool it in that core.. ? Do we see any performance hit form your point of view. -Selvaraj On Mon, Feb 7, 2011 at 3:31 PM, Erick Erickson wrote: > Solr doesn't keep "meta data", so if you're asking for some kind of s

Re: Keyword - search statistics

2011-02-07 Thread Bill Bell
You can also use Google Analytics or something like that too to get stats. Bill Bell Sent from mobile On Feb 7, 2011, at 4:31 PM, Erick Erickson wrote: > Solr doesn't keep "meta data", so if you're asking for some kind of search > logging your app has to provide that... > > Best > Erick > >

Re: Keyword - search statistics

2011-02-07 Thread Erick Erickson
Solr doesn't keep "meta data", so if you're asking for some kind of search logging your app has to provide that... Best Erick On Sun, Feb 6, 2011 at 10:46 PM, Selvaraj Varadharajan wrote: > > Hi > >Is there any way i can get 'no of times' a key word searched in SOLR ? > > > *Here is my sol

[jira] Resolved: (SOLR-2350) improve post.jar to handle non UTF-8 files

2011-02-07 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2350. Resolution: Fixed Committed revision 1068149. - trunk Committed revision 1068152. - 3x > improve post.jar

[jira] Commented: (LUCENE-2908) clean up serialization in the codebase

2011-02-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991559#comment-12991559 ] Michael McCandless commented on LUCENE-2908: +1 > clean up serialization in

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-02-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991545#comment-12991545 ] Michael McCandless commented on LUCENE-2666: Ahh, thanks for bringing closure

RE: Tokenization and Fuzziness: How to Allow Multiple Strategies?

2011-02-07 Thread Steven A Rowe
Hi Tavi, solr-...@lucene.apache.org has been deprecated since the Lucene and Solr source trees merged last year. Please use dev@lucene.apache.org instead. However, your question is about *usage* of Lucene/Solr, rather than *development*, so you should be using solr-u...@lucene.apache.org or l

Re: [REINDEX] Note: re-indexing required !

2011-02-07 Thread Earwin Burrfoot
Lucene maintains compatibility with earlier stable release index versions, and to some extent transparently upgrades them. But there is no guaranteed compatibility between different in-development indexes. E.g. 3.2 reads 3.1 indexes and upgrades them, but 3.2-dev-snapshot-10 (while happily handlin

Scoring: Precedent for a Better, Less Fragile Approach?

2011-02-07 Thread Tavi Nathanson
Hey everyone, I have a question about Lucene/Solr scoring in general. It really feels like a wobbly house of cards that falls down whenever I make the slightest tweak. There are many factors at play in Lucene scoring: they're all fighting with each other, and very often one will completely domina

Tokenization and Fuzziness: How to Allow Multiple Strategies?

2011-02-07 Thread Tavi Nathanson
Hey everyone, Tokenization seems inherently fuzzy and imprecise, yet Lucene does not appear to provide an easy mechanism to account for this fuzziness. Let's take an example, where the document I'm indexing is "v1.1.0 mr. jones da...@gmail.com" I may want to tokenize this as follows: ["v1.1.0",

Maintain stopwords.txt and other files

2011-02-07 Thread Timo Schmidt
Hello together, i am currently developing a search solution, based on Apache Solr. Currently I have the problem that I want to offer the user the possibility to maintain synonyms and stopwords in a userfriendy tool. But currently I could not find any possibility to write the stopwords.txt or syn

Re: Distributed Indexing

2011-02-07 Thread Upayavira
Surely you want to be implementing an UpdateRequestProcessor, rather than a RequestHandler. The ContentStreamHandlerBase, in the handleRequestBody method gets an UpdateRequestProcessor and uses it to process the request. What we need is that handleRequestBody method to, as you have suggested, chec

Re: Distributed Indexing

2011-02-07 Thread Upayavira
I'm saying that deterministic policies are a requirement that *some* people will want. Others might want a random spread. Thus, I'd have deterministic based on ID and random as the two initial implementations. Upayavira NB. In case folks haven't worked it out already, I have been tasked to mentor

Re: Threading of JIRA e-mails in gmail?

2011-02-07 Thread Dawid Weiss
Looks like my action prompted a response from infra and it's not encouraging -- they're supposedly switching off procmail support on that server soon. Track INFRA-3403 to see what will come out of this, I don't want to spam this list. Eh. Dawid On Mon, Feb 7, 2011 at 1:14 PM, Doron Cohen wrote:

Re: Threading of JIRA e-mails in gmail?

2011-02-07 Thread Doron Cohen
Thanks Dawid It is not working for me yet, looking for the reason for that... Doron On Mon, Feb 7, 2011 at 12:48 PM, Dawid Weiss wrote: > Just a follow-up to this one: no reply from infra yet, but I simply > tried my config. on people.apache.org and it works like a charm, so > for Apache committe

Re: Threading of JIRA e-mails in gmail?

2011-02-07 Thread Dawid Weiss
Just a follow-up to this one: no reply from infra yet, but I simply tried my config. on people.apache.org and it works like a charm, so for Apache committers and gmail users this is probably a life-saver. My config is described in a comment here: https://issues.apache.org/jira/browse/INFRA-3403 D

[jira] Updated: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-07 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2909: Attachment: LUCENE-2909_assert.patch here's a check we can add to BaseTokenStreamTestCase for this

[jira] Commented: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-07 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991320#comment-12991320 ] Robert Muir commented on LUCENE-2909: - You are right, some stemmers increase the size

[jira] Commented: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991319#comment-12991319 ] Uwe Schindler commented on LUCENE-2909: --- The problem has nothing to do with CharFil

[jira] Commented: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-07 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991316#comment-12991316 ] Robert Muir commented on LUCENE-2909: - Is the bug really in NGramTokenFilter? This

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-02-07 Thread Nick Pellow (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991315#comment-12991315 ] Nick Pellow commented on LUCENE-2666: - Hi Michael, This issue was entirely a proble

[jira] Updated: (LUCENE-2910) Highlighter does not correctly highlight the phrase around 50th term

2011-02-07 Thread Shinya Kasatani (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinya Kasatani updated LUCENE-2910: Description: When you use the Highlighter combined with N-Gram tokenizers such as CJKToke

[jira] Created: (LUCENE-2910) Highlighter does not correctly highlight the phrase around 50th term

2011-02-07 Thread Shinya Kasatani (JIRA)
Highlighter does not correctly highlight the phrase around 50th term Key: LUCENE-2910 URL: https://issues.apache.org/jira/browse/LUCENE-2910 Project: Lucene - Java Issue Ty

[jira] Updated: (LUCENE-2910) Highlighter does not correctly highlight the phrase around 50th term

2011-02-07 Thread Shinya Kasatani (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinya Kasatani updated LUCENE-2910: Attachment: HighlighterFix.patch A test case that describes the problem, along with a fix.