RE: Shingle filter that reads the script attribute from ICUTokenizer and LUCENE-2906

2011-12-17 Thread Burton-West, Tom
Thanks Robert, Another idea apart from your solution would be to add a tailoring for tibetan that sets some special attribute indicating 'word-final syllable'. Then this information is not 'lost' and downstream can do the right thing. ...So essentially before doing anything like that, it would

Shingle filter that reads the script attribute from ICUTokenizer and LUCENE-2906

2011-12-16 Thread Burton-West, Tom
The ICUTokenizer now adds a script attribute for tokens (as do Standard Tokenizer and a couple of others (LUCENE-2911) For example Tibetan or Han. If the Shingle filter had some provision to only make token n-grams when the script attribute matched some specified script, it would solve both

RE: Shingle filter that reads the script attribute from ICUTokenizer and LUCENE-2906

2011-12-16 Thread Burton-West, Tom
that reads the script attribute from ICUTokenizer and LUCENE-2906 On Fri, Dec 16, 2011 at 5:44 PM, Burton-West, Tom tburt...@umich.edu wrote: The ICUTokenizer now adds a script attribute for tokens (as do Standard Tokenizer and a couple of others (LUCENE-2911)  For example “Tibetan” or “Han

re: LUCENE-167 and Solr default handling of Boolean operators is broken

2011-12-01 Thread Burton-West, Tom
The default query parser in Solr does not handle precedence of Boolean operators in the way most people expect. A AND B OR C gets interpreted as A AND (B OR C) . There are numerous other examples in the JIRA ticket for Lucene 167, this article on the wiki

RE: LUCENE-167 and Solr default handling of Boolean operators is broken

2011-12-01 Thread Burton-West, Tom
or so (if no one beats me to it). Since it's a major release, we may be able to just fix it in trunk w/o having to keep the old behavior. -Yonik http://www.lucidimagination.com On Thu, Dec 1, 2011 at 12:51 PM, Burton-West, Tom tburt...@umich.edu wrote: The default query parser in Solr does

Solr should provide an option to show only most relevant facet values

2011-09-27 Thread Burton-West, Tom
Hello all, This post is getting no replies after several days on the Solr user list, so I thought I would rewrite it as a question about a possible feature for Solr. In our use case we have a large number of documents and several facets such as Author and Subject, that have a very large number

RE: [jira] [Resolved] (SOLR-1844) CommonGramsQueryFilterFactory should read words in a comma-delimited format

2011-06-06 Thread Burton-West, Tom
Hi David, Just curious about your use of the HathiTrust list. I usually explain to people that it's customized to our index and they are probably better off making their own list based on the lists of stop words appropriate for the languages in their index (sources listed in the blog post

RE: MergePolicy Thresholds

2011-05-20 Thread Burton-West, Tom
scale search indexes shares storage with the repository that holds the 480+ terabytes of page images and metadata for the 8 million+ books). Hopefully I will be able to run the tests when I get back. Tom From: Burton-West, Tom [mailto:tburt...@umich.edu] Sent: Monday, May 09, 2011 4:10 PM To: dev

RE: MergePolicy Thresholds

2011-05-03 Thread Burton-West, Tom
Thanks Shai and Mike! I'll keep an eye on LUCENE-1076. Tom -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Tuesday, May 03, 2011 11:15 AM To: dev@lucene.apache.org Subject: Re: MergePolicy Thresholds Thanks Shai! I'm way behind on my 3.x backports

RE: MergePolicy Thresholds

2011-05-02 Thread Burton-West, Tom
Hi Shai and Mike, Testing the TieredMP on our large indexes has been on my todo list since I read Mikes blog post http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html. If you port it to the 3.x branch Shai, I'll be more than happy to test it with our very large

RE: Link to nightly build test reports on main Lucene site needs updating

2011-05-02 Thread Burton-West, Tom
Thanks for fixing++ Tom -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Sunday, May 01, 2011 6:05 AM To: dev@lucene.apache.org; simon.willna...@gmail.com; java-u...@lucene.apache.org Subject: RE: Link to nightly build test reports on main Lucene site needs

RE: Using contrib Lucene Benchmark with Solr

2011-03-31 Thread Burton-West, Tom
...@gmail.com] Sent: Wednesday, March 30, 2011 7:56 PM To: dev@lucene.apache.org Subject: Re: Using contrib Lucene Benchmark with Solr On Wed, Mar 30, 2011 at 4:49 PM, Burton-West, Tom tburt...@umich.edu wrote: I would like to be able to use the Lucene Benchmark code with Solr to run some indexing tests

Using contrib Lucene Benchmark with Solr

2011-03-30 Thread Burton-West, Tom
I would like to be able to use the Lucene Benchmark code with Solr to run some indexing tests. It would be nice if Lucene Benchmark to could read Solr configuration rather than having to translate my filter chain and other parameters into Lucene. Would it be appropriate to open a JIRA issue

RE: Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-17 Thread Burton-West, Tom
this functionality. On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom tburt...@umich.edu wrote: Lucene has this method to set the maximum size of a segment when merging: LogByteSizeMergePolicy.setMaxMergeMB (http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html

Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-06 Thread Burton-West, Tom
Lucene has this method to set the maximum size of a segment when merging: LogByteSizeMergePolicy.setMaxMergeMB (http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html#setMaxMergeMB%28double%29 ) I would like to be able to set this in my

Solr 1.4.1 Analysis console gives error regarding CharTermAttributeImpl that is not in the target

2010-11-11 Thread Burton-West, Tom
Hello all, I am using Solr 1.4.1 and a custom filter that worked with a previous version of Solr that used Lucene 2.9. When I try to use the analysis console I get this error message: java.lang.IllegalArgumentException: This AttributeSource contains AttributeImpl of type

RE: Solr 1.4.1 Analysis console gives error regarding CharTermAttributeImpl that is not in the target

2010-11-11 Thread Burton-West, Tom
Something here is using lucene 3.x or trunk code, since CharTermAttribute[Impl] only exists in unreleased versions! Doh! I forgot to switch my binaries back to Solr 1.4.1 from 3.x. Thanks for the catch Robert. The subject line should read: Solr/Lucene 3.x Analysis console gives error

RE: Solr 1.4.1 Analysis console gives error regarding CharTermAttributeImpl that is not in the target

2010-11-11 Thread Burton-West, Tom
Tom -Original Message- From: Burton-West, Tom [mailto:tburt...@umich.edu] Sent: Thursday, November 11, 2010 1:26 PM To: dev@lucene.apache.org Subject: RE: Solr 1.4.1 Analysis console gives error regarding CharTermAttributeImpl that is not in the target Something here is using lucene

RE: Antw.: Solr 1.4.1 Analysis console gives error regarding CharTermAttributeImpl that is not in the target

2010-11-11 Thread Burton-West, Tom
Schindler [mailto:u...@thetaphi.de] Sent: Thursday, November 11, 2010 1:49 PM To: Burton-West, Tom; dev@lucene.apache.org Subject: Antw.: Solr 1.4.1 Analysis console gives error regarding CharTermAttributeImpl that is not in the target I still think this is a bug in analysis.jsp. Copyto does not work

RE: Antw.: Solr 1.4.1 Analysis console gives error regarding CharTermAttributeImpl that is not in the target

2010-11-11 Thread Burton-West, Tom
Sorry about the confusion (my confusion mostly:). I was actually using revision 1030032 of Lucene/Solr (see below) with a custom token filter that does not use CharTermAttribute. I'll recompile the custom filter against this revision and verify that the analysis.jsp produces the same results

RE: Antw.: Solr 1.4.1 Analysis console gives error regarding CharTermAttributeImpl that is not in the target

2010-11-11 Thread Burton-West, Tom
regarding CharTermAttributeImpl that is not in the target On Thu, Nov 11, 2010 at 2:05 PM, Burton-West, Tom tburt...@umich.edu wrote: Sorry about the confusion (my confusion mostly:). I was actually using revision 1030032 of Lucene/Solr (see below) with a custom token filter that does

RE: Flex indexing : Hybrid index maintnenance for faster indexing

2010-10-05 Thread Burton-West, Tom
-West, Tom tburt...@umich.edu wrote: Hi all, Would it be possible to implement something like this in Flex? Büttcher, S., Clarke, C. L. A. (2008). Hybrid index maintenance for contiguous inverted lists. Information Retrieval, 11(3), 175-207. doi:10.1007/s10791-007-9042-8 The approach

Flex indexing : Hybrid index maintnenance for faster indexing

2010-10-04 Thread Burton-West, Tom
Hi all, Would it be possible to implement something like this in Flex? Büttcher, S., Clarke, C. L. A. (2008). Hybrid index maintenance for contiguous inverted lists. Information Retrieval, 11(3), 175-207. doi:10.1007/s10791-007-9042-8   The approach takes advantage of having a different

Merge policy to merge during off-peak hours

2010-07-12 Thread Burton-West, Tom
Hello all, Lucene in Action 2nd Edition mentions a time-dependent merge policy that defers large merges until off-peak hours. (Section 2.13.6 p 71). Has anyone implemented such a policy? Is it worth opening a JIRA issue for this? Tom Burton-West www.hathitrust.org/blogs

RE: Benchmarking Solr indexing using Lucene Benchmark?

2010-06-15 Thread Burton-West, Tom
in that the Lucene benchmark config would not be usable, or rather, it would need to simply point to a Solr solrconfig.xml file. Other than that, the resulting statistical reporting should be useful. Jason On Mon, Jun 14, 2010 at 8:57 AM, Burton-West, Tom tburt...@umich.edu wrote: Hi all

Benchmarking Solr indexing using Lucene Benchmark?

2010-06-14 Thread Burton-West, Tom
Hi all, Posted this to the Solr users list and after a week with no responses, thought I would try the dev list. We are about to test out various factors to try to speed up our indexing process. One set of experiments will try various maxRamBufferSizeMB settings. Since the factors we will

questions about DocsEnum.read()in flex api

2010-04-30 Thread Burton-West, Tom
I'm a bit confused about the DocsEnum.read() in the flex API. I have three questions: 1) DocsEnum.read() currently delegates to nextDoc() in the base class and there is a note that subclasses may do this more efficiently. Is there currently a more efficient implementation in a

RE: questions about DocsEnum.read()in flex api

2010-04-30 Thread Burton-West, Tom
Thanks Mike! A follow-up question: DocsEnum.read() currently delegates to nextDoc() in the base class and there is a note that subclasses may do this more efficiently.  Is there currently a more efficient implementation in a subclass?  Yes, the standard codec does so

RE: Fix to contrib/misc/HighFreqTerms.java

2010-04-16 Thread Burton-West, Tom
think. Thanks for raising this Tom, Mike On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom tburt...@umich.edu wrote: When I try to run HighFreqTerms.java in Lucene Revision: 933722  I get the the exception appended below.  I believe the line of code involved is a result of the flex

Bug in contrib/misc/HighFreqTerms.java?

2010-04-14 Thread Burton-West, Tom
When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the the exception appended below. I believe the line of code involved is a result of the flex indexing merge. Should I post this as a comment to LUCENE-2370 (Reintegrate flex branch into trunk)? Or is there simply

Solr BufferedTokenStream and new Lucene 2.9 TokenStream API

2009-07-24 Thread Burton-West, Tom
Hello all, Would it be appropriate to open a JIRA issue to get converting the Solr BufferedTokenStream class to use the new Lucene 2.9 token API on the todo list ? Alternatively is there a more general issue already open regarding Solr filters and the new API? (I couldn't find one) Or is

How to contribute question (patch against release or latest trunk?)

2009-06-19 Thread Burton-West, Tom
Hello, I read the How to Contribute page on the wiki and want to make a patch. Do I make the patch against the latest Solr trunk or against the last release? Tom

Tests fail for solrj.embedded on windows (Release 78676 and 775664 )

2009-06-19 Thread Burton-West, Tom
Hello all, About every other time I check-out a current version of trunk and run the tests, the tests for solrj.embedded.* fail. I'm running under windows XP with java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) With the latest release 786676, I get these two

How to Contribute question

2009-04-21 Thread Burton-West, Tom
Hello, I read the How to Contribute document on the wiki. (http://wiki.apache.org/solr/HowToContribute#head-385f123f540367646df16825ca043d0098b31365) I have written a custom analyzer https://issues.apache.org/jira/browse/SOLR-908 and would like to create a patch as documented in the wiki. My