Hi Jack,
I do not get exception before changing data files. And I do not get exception
after changing data files and creating lucene-icu...jar by ant.
But changing data files and running ant does not change the output.
So I decided to manually create .nrm file by using steps outlined in the
Hi,
Here are two more relevant links:
https://github.com/flaxsearch/luwak
http://www.lucenerevolution.org/2013/Turning-Search-Upside-Down-Using-Lucene-for-Very-Fast-Stored-Queries
Ahmet
On Saturday, February 15, 2014 3:01 AM, Ahmet Arslan wrote:
Hi Siraj,
MemoryIndex is used for such use
Do you get the exception if you run ant before changing the data files?
"Header authentication failed, please check if you have a valid ICU data
file"
Check with the ICU project as to the proper format for THEIR files. I mean,
this doesn't sound like a Lucene issue.
Maybe it could be as sim
Hi Siraj,
MemoryIndex is used for such use case. Here is a couple of pointers:
http://www.slideshare.net/jdhok/diy-percolator
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html
On Friday, February 14, 2014 8:21 PM, Siraj Haider wrote:
Hi There,
Is
Hello,
I try to use lucene-icu li in solr-4.6.1. I need to change a char mapping in
lucene-icu. I have made changes
to
lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt
and built jar file using ant , but it did not help.
I took a look to lucene/analysis/icu/build.xml and see these l
Hello,
I have recently been given a requirement to improve document highlights within
our system. Unfortunately, the current functionality gives more of a best-guess
on what terms to highlight vs the actual terms to highlight that actually did
perform the match. A couple examples of issues that
As docIDs are ints too, it's most likely he'll hit the limit of 2B documents per index though withthat approach though :)I do agree that indexing huge documents doesn't seem to have a lot of value, even when youknow a doc is a hit for a certain query, how are you going to display the results to use
You should consider making each _line_ of the log file a (Lucene)
document (assuming it is a log-per-line log file)
-Glen
On Fri, Feb 14, 2014 at 4:12 PM, John Cecere wrote:
> I'm not sure in today's world I would call 2GB 'immense' or 'enormous'. At
> any rate, I don't have control over the siz
I'm not sure in today's world I would call 2GB 'immense' or 'enormous'. At any rate, I don't have control over the size of the
documents that go into my database. Sometimes my customer's log files end up really big. I'm willing to have huge indexes for these
things.
Wouldn't just changing from
Welcome Diego,
I think you’re right about MidLetter - adding a char to it should disable
splitting on that char, as long as there is a letter on one side or the other.
(If you’d like that behavior to be extended to numeric digits, you should use
MidNumLet instead.)
I tested this by adding “/“
On Fri, Feb 14, 2014 at 12:14 AM, Ravikumar Govindarajan
wrote:
> Early-Query termination quits by throwing an Exception right?. Is it ok to
> individually search using SegmentReader and then break-off, instead of
> using a MultiReader, especially when the order is known before search
> begins?
Hmm, why are you indexing such immense documents?
In 3.x Lucene never sanity checked the offsets, so we would silently
index negative (int overflow'd) offsets into e.g. term vectors.
But in 4.x, we now detect this and throw the exception you're seeing,
because it can lead to index corruption when
Hi Lukai
That was a great help. Thank you.
I’m continuing reading about payloads:
http://searchhub.org/2009/08/05/getting-started-with-payloads/
Didn’t know that concept at all.
Regards,
Rune
Den 13/02/2014 kl. 23.12 skrev lukai :
> Hi, Rune:
> Per your requirement, you can generate a separ
If I understand correctly, you'd like to shortcut the execution when you reach the desirednumber of hits. Unfortunately, I don't think there's a graceful way to do that right now inCollector. To stop further collecting, you need to throw an IOException (or a subtype of it)and catch the exception la
Hi guys, this is my first time posting on the Lucene list, so hello everyone.
I really like the way that the StandardTokenizer works, however I'd like for it
to not split tokens on / (forward slash). I've been looking at
http://unicode.org/reports/tr29/#Default_Word_Boundaries to try to underst
I'm having a problem with Lucene 4.5.1. Whenever I attempt to index a file >
2GB in size, it dies with the following exception:
java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset,
startOffset=-2147483648,endOffset=-2147483647
Essentially
Hi There,
Is there a way to do reverse matching by indexing the queries in an index and
passing a document to see how many queries matched that? I know that I can have
the queries in memory and have the document parsed in a memory index and then
loop through trying to match each query. The issue
I am not interested in the scores at all. My requirement is simple, I only
need the first 100 hits or the numHits I specify ( irrespective of there
scores). The collector should stop after collecting the numHits specified.
Is there a way to tell in the collector to stop after collecting the
numHits
On Fri, Feb 14, 2014 at 8:21 AM, Yann-Erwan Perio wrote:
> I have written a test which demonstrates that the mistake is indeed on
> my side. It's probably due to inconsistent rules for
> indexing/searching content having special characters (namely the
> "plus" sign).
OK, thanks for bringing clos
On Fri, Feb 14, 2014 at 1:11 PM, Yann-Erwan Perio wrote:
> On Fri, Feb 14, 2014 at 12:33 PM, Michael McCandless
> wrote:
Hi again,
>> That should not be the case: it should match all terms with that
>> prefix regardless of the term's length. Try to boil it down to a
>> small test case?
>
> I g
On Fri, Feb 14, 2014 at 12:33 PM, Michael McCandless
wrote:
> This is similar to PathHierarchyTokenizer, I think.
Ah, yes, very much. I'll check it out and see if I can make something
of it. I am not sure to what extent it'll be reusable though, as my
tokenizer also sets payloads (the next comin
On Fri, Feb 14, 2014 at 6:17 AM, Yann-Erwan Perio wrote:
> Hello,
>
> I am designing a system with documents having one field containing
> values such as "Ae1 Br2 Cy8 ...", i.e. a sequence of items made of
> letters and numbers (max=7 per item), all separated by a space,
> possibly 200 items per f
This is how Collector works: it is called for every document matching
the query, and then its job is to choose which of those hits to keep.
This is because in general the hits to keep can come at any time, not
just the first N hits you see; e.g. the best scoring hit may be the
very last one.
But
Hello,
I am designing a system with documents having one field containing
values such as "Ae1 Br2 Cy8 ...", i.e. a sequence of items made of
letters and numbers (max=7 per item), all separated by a space,
possibly 200 items per field, with no limit upon the number of
documents (although I would no
This means Lucene was attempting to open _0.fnm but somehow got the
contents of _0.cfs instead; seems likely that it's a bug in the
Cassanda Directory implementation? Somehow it's opening the wrong
file name?
Mike McCandless
http://blog.mikemccandless.com
On Fri, Feb 14, 2014 at 3:13 AM, Jason
Hello,
This is my first question to lucene mailing list, sorry if the question
sounds funny.
I have been experimenting to store lucene index files on cassandra,
unfortunately the exception got overwhelmed. Below are the stacktrace.
org.apache.lucene.index.CorruptIndexException: codec mismatch: a
26 matches
Mail list logo