Thanks Robert,
Another idea apart from your solution would be to add a tailoring for
tibetan that sets some special attribute indicating 'word-final
syllable'. Then this information is not 'lost' and downstream can do
the right thing.
...So essentially before doing anything like that, it would
The ICUTokenizer now adds a script attribute for tokens (as do Standard
Tokenizer and a couple of others (LUCENE-2911) For example Tibetan or Han.
If the Shingle filter had some provision to only make token n-grams when the
script attribute matched some specified script, it would solve both
that reads the script attribute from ICUTokenizer
and LUCENE-2906
On Fri, Dec 16, 2011 at 5:44 PM, Burton-West, Tom tburt...@umich.edu wrote:
The ICUTokenizer now adds a script attribute for tokens (as do Standard
Tokenizer and a couple of others (LUCENE-2911) For example “Tibetan” or
“Han
The default query parser in Solr does not handle precedence of Boolean
operators in the way most people expect.
A AND B OR C gets interpreted as A AND (B OR C) . There are numerous other
examples in the JIRA ticket for Lucene 167, this article on the wiki
or so (if no one
beats me to it). Since it's a major release, we may be able to just
fix it in trunk w/o having to keep the old behavior.
-Yonik
http://www.lucidimagination.com
On Thu, Dec 1, 2011 at 12:51 PM, Burton-West, Tom tburt...@umich.edu wrote:
The default query parser in Solr does
Hello all,
This post is getting no replies after several days on the Solr user list, so I
thought I would rewrite it as a question about a possible feature for Solr.
In our use case we have a large number of documents and several facets such as
Author and Subject, that have a very large number
Hi David,
Just curious about your use of the HathiTrust list. I usually explain to
people that it's customized to our index and they are probably better off
making their own list based on the lists of stop words appropriate for the
languages in their index (sources listed in the blog post
scale search indexes shares storage with the
repository that holds the 480+ terabytes of page images and metadata for the 8
million+ books). Hopefully I will be able to run the tests when I get back.
Tom
From: Burton-West, Tom [mailto:tburt...@umich.edu]
Sent: Monday, May 09, 2011 4:10 PM
To: dev
Thanks Shai and Mike!
I'll keep an eye on LUCENE-1076.
Tom
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Tuesday, May 03, 2011 11:15 AM
To: dev@lucene.apache.org
Subject: Re: MergePolicy Thresholds
Thanks Shai!
I'm way behind on my 3.x backports
Hi Shai and Mike,
Testing the TieredMP on our large indexes has been on my todo list since I read
Mikes blog post
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html.
If you port it to the 3.x branch Shai, I'll be more than happy to test it with
our very large
Thanks for fixing++
Tom
-Original Message-
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Sunday, May 01, 2011 6:05 AM
To: dev@lucene.apache.org; simon.willna...@gmail.com;
java-u...@lucene.apache.org
Subject: RE: Link to nightly build test reports on main Lucene site needs
...@gmail.com]
Sent: Wednesday, March 30, 2011 7:56 PM
To: dev@lucene.apache.org
Subject: Re: Using contrib Lucene Benchmark with Solr
On Wed, Mar 30, 2011 at 4:49 PM, Burton-West, Tom tburt...@umich.edu wrote:
I would like to be able to use the Lucene Benchmark code with Solr to run
some indexing tests
I would like to be able to use the Lucene Benchmark code with Solr to run some
indexing tests. It would be nice if Lucene Benchmark to could read Solr
configuration rather than having to translate my filter chain and other
parameters into Lucene. Would it be appropriate to open a JIRA issue
this functionality.
On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom tburt...@umich.edu wrote:
Lucene has this method to set the maximum size of a segment when merging:
LogByteSizeMergePolicy.setMaxMergeMB
(http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html
Lucene has this method to set the maximum size of a segment when merging:
LogByteSizeMergePolicy.setMaxMergeMB
(http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html#setMaxMergeMB%28double%29
)
I would like to be able to set this in my
Hello all,
I am using Solr 1.4.1 and a custom filter that worked with a previous version
of Solr that used Lucene 2.9. When I try to use the analysis console I get
this error message:
java.lang.IllegalArgumentException: This AttributeSource contains
AttributeImpl of type
Something here is using lucene 3.x or trunk code, since
CharTermAttribute[Impl] only exists in unreleased versions!
Doh! I forgot to switch my binaries back to Solr 1.4.1 from 3.x. Thanks for
the catch Robert. The subject line should read: Solr/Lucene 3.x Analysis
console gives error
Tom
-Original Message-
From: Burton-West, Tom [mailto:tburt...@umich.edu]
Sent: Thursday, November 11, 2010 1:26 PM
To: dev@lucene.apache.org
Subject: RE: Solr 1.4.1 Analysis console gives error regarding
CharTermAttributeImpl that is not in the target
Something here is using lucene
Schindler [mailto:u...@thetaphi.de]
Sent: Thursday, November 11, 2010 1:49 PM
To: Burton-West, Tom; dev@lucene.apache.org
Subject: Antw.: Solr 1.4.1 Analysis console gives error regarding
CharTermAttributeImpl that is not in the target
I still think this is a bug in analysis.jsp. Copyto does not work
Sorry about the confusion (my confusion mostly:). I was actually using
revision 1030032 of Lucene/Solr (see below)
with a custom token filter that does not use CharTermAttribute. I'll recompile
the custom filter against this revision and verify that the analysis.jsp
produces the same results
regarding
CharTermAttributeImpl that is not in the target
On Thu, Nov 11, 2010 at 2:05 PM, Burton-West, Tom tburt...@umich.edu
wrote:
Sorry about the confusion (my confusion mostly:). I was actually
using revision 1030032 of Lucene/Solr (see below) with a custom
token filter
that does
-West, Tom tburt...@umich.edu wrote:
Hi all,
Would it be possible to implement something like this in Flex?
Büttcher, S., Clarke, C. L. A. (2008). Hybrid index maintenance for
contiguous inverted lists. Information Retrieval, 11(3), 175-207.
doi:10.1007/s10791-007-9042-8
The approach
Hi all,
Would it be possible to implement something like this in Flex?
Büttcher, S., Clarke, C. L. A. (2008). Hybrid index maintenance for
contiguous inverted lists. Information Retrieval, 11(3), 175-207.
doi:10.1007/s10791-007-9042-8
The approach takes advantage of having a different
Hello all,
Lucene in Action 2nd Edition mentions a time-dependent merge policy that defers
large merges until off-peak hours. (Section 2.13.6 p 71).
Has anyone implemented such a policy? Is it worth opening a JIRA issue for
this?
Tom Burton-West
www.hathitrust.org/blogs
in that the Lucene benchmark config would not be
usable, or rather, it would need to simply point to a Solr
solrconfig.xml file. Other than that, the resulting statistical
reporting should be useful.
Jason
On Mon, Jun 14, 2010 at 8:57 AM, Burton-West, Tom tburt...@umich.edu wrote:
Hi all
Hi all,
Posted this to the Solr users list and after a week with no responses, thought
I would try the dev list.
We are about to test out various factors to try to speed up our indexing
process. One set of experiments will try various maxRamBufferSizeMB settings.
Since the factors we will
I'm a bit confused about the DocsEnum.read() in the flex API. I have three
questions:
1) DocsEnum.read() currently delegates to nextDoc() in the base class and
there is a note that subclasses may do this more efficiently. Is there
currently a more efficient implementation in a
Thanks Mike!
A follow-up question:
DocsEnum.read() currently delegates to nextDoc() in the base class and there
is a note that subclasses may do this more efficiently. Is there currently
a more efficient implementation in a subclass?
Yes, the standard codec does so
think.
Thanks for raising this Tom,
Mike
On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom tburt...@umich.edu wrote:
When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the
the exception appended below. I believe the line of code involved is a
result of the flex
When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the the
exception appended below. I believe the line of code involved is a result of
the flex indexing merge. Should I post this as a comment to LUCENE-2370
(Reintegrate flex branch into trunk)?
Or is there simply
Hello all,
Would it be appropriate to open a JIRA issue to get converting the Solr
BufferedTokenStream class to use the new Lucene 2.9 token API on the todo list
? Alternatively is there a more general issue already open regarding Solr
filters and the new API? (I couldn't find one) Or is
Hello,
I read the How to Contribute page on the wiki and want to make a patch. Do I
make the patch against the latest Solr trunk or against the last release?
Tom
Hello all,
About every other time I check-out a current version of trunk and run the
tests, the tests for solrj.embedded.* fail. I'm running under windows XP with
java version 1.6.0_13
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
With the latest release 786676, I get these two
Hello,
I read the How to Contribute document on the wiki.
(http://wiki.apache.org/solr/HowToContribute#head-385f123f540367646df16825ca043d0098b31365)
I have written a custom analyzer https://issues.apache.org/jira/browse/SOLR-908
and would like to create a patch as documented in the wiki.
My
34 matches
Mail list logo