Don't even try.. just go with Elasticsearch or Solr. My $0.02
--
Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer Consultant
Author of RavenDB in Action http://manning.com/synhershko/
On Tue, Dec 16, 2014 at 3:36 PM, Shailesh Birari sbirar
Borek, feel free to merge this into master if all works as expected and
tests are all green
On Fri, Jul 12, 2013 at 4:21 PM, Kostka Bořivoj kos...@tovek.cz wrote:
Hi,
New branch memleak_fixes created. All memory leaks produced by tests are
fixed.
It also contains fixed bug in
Feel free to merge it into master
On Wed, Dec 12, 2012 at 4:27 PM, Kostka Bořivoj kos...@tovek.cz wrote:
BitSet::nexSetBit is implemented very inefficient way for sparse bit sets.
It searches for next bit set by per-bit iteration and bit shifting
See OPTIMIZED_BITSET branch for better
inline
On Thu, Nov 22, 2012 at 11:15 AM, Vitaly Artemov vitalyarte...@gmail.comwrote:
Hello all,
I starting to evaluate Clucene engine for using in our product.
I have 2 questions.
1. Is It planned to add support(or it already exists) for creating index in
the Database instead of memory
It looks like you are trying to use Lucene as a database, a document
database to be specific, and it actually isn't really supported out of the
box
Take a look at MongoDB, CouchDB or RavenDB.
On Thu, May 3, 2012 at 10:56 AM, Mike Aubury m...@aubit.com wrote:
I'm writing some code at the minute
*To:* clucene-developers@lists.sourceforge.net
*Subject:* Re: [CLucene-dev] Indexing a document
Thank you.
2011/11/7 Itamar Syn-Hershko ita...@code972.com
Yeah, shouldn't be too hard to pull off
On Mon, Nov 7, 2011 at 4:17 PM, Emerson Espínola
emersonespin...@gmail.com
Thats going to take a while, unfortunately. LPP is already available on
github, but we want to have some improvements made to its core before
merging it to CLucene
On Mon, Nov 7, 2011 at 4:01 PM, Emerson Espínola
emersonespin...@gmail.comwrote:
Thank you Viet.
When will this new version of
/in/emersonespinola
http://spaces.live.com/profile.aspx?mem=emersonespin...@hotmail.com
http://emersonespinola.blogspot.com http://twitter.com/emersonespinola
http://www.myebook.com/emersonespinola/
http://www.myebook.com/emersonespinola/
2011/11/7 Itamar Syn-Hershko ita...@code972.com
Thats going
There isn't such thing built into clucene nor Java Lucene. You are going to
have to keep a list of document IDs that once matched a query, and to
perform searches in the background every now and then with that document ID
in it (use your IDs, not Lucene's internal docids).
On Sat, Aug 6, 2011 at
Can you create a failing test?
On a side note, next week we will be working on the new CLucene code-base,
so hopefully we will have a newer and better version supported soon.
On Fri, Jul 29, 2011 at 12:51 PM, Andrew Victor avictor...@gmail.comwrote:
hi,
I'm consistently having this crash
Hi all,
Following (quite) a recent discussion in the mailing list, we are now ready
to begin and hopefully complete the transition to the new code base. To do
that, we are hosting a virtual hackathon starting August 1st. It will be
hosted on IRC (Freenode): #clucene-hackathon .
For reference,
Hi Alan,
As I mentioned in my previous email, our intention is to profile the
library and reduce its size and signature as much as possible. We can do
wonders just by removing ref-counting when its not really necessary, and
hopefully we'll find more bottlenecks. We are interested in having an
Hi all,
CLucene has grown quite nicely the last few years, but yet we were
unable to keep up with the high pace of Java Lucene's. Our goal has
always been to have Lucene on steroids, and have it more maintainable
and up to speed with Java Lucene.
As Ben mentioned here last week, now there's
Itamar Syn-Hershko ita...@code972.com
mailto:ita...@code972.com
Hi,
If malloc / realloc returns NULL the indexing process has to be
aborted anyway, and the only way I can think of doing this is
throwing an exception. Did you have other idea in mind?
Also, I'm not sure why
more it is safe to delete it. Please feel
free to do this, if you can
Unfortunately I was busy last few months, so I didn't port other tests. Hope
I can spend some time next month
Borek
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Thursday
Just did.
On 28/1/2011 10:26 AM, Šplíchal Jiří wrote:
Hello,
I think we should remove the memore_leaks branch,
and start searching for memleaks in the current version.
Jiri
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Thursday, January 27
Hi,
ParallelMultiSearcher wasn't ported yet. You are welcome to port it
yourself - have a look at search/ParallelMultiSearcher.java and
search/MultiSearcher.java.
Itamar.
On 8/11/2010 12:23 PM, Rajendra Prasad Murakonda wrote:
I can's seem to find ParallelMultiSearcher. I couldn't
I mean - I cherry-picked that commit, and also merged
TermPositionsQueue_fix into master and deleted it.
Itamar.
On 27/1/2011 8:15 PM, Itamar Syn-Hershko wrote:
I pulled your change and merged to master. Also deleted the fix
branch. Thanks.
Itamar.
On 21/1/2011 2:58 PM, Šplíchal Jiří
This looks great!
Itamar.
On 27/12/2010 5:50 PM, Šplíchal Jiří wrote:
Hello,
I have extended the current highlighter implementation based on the
implementation in Java Lucene 2.4.1 in order to support
correct highlighting of Phrase, MultiPhrase and Span queries. This
highlighter is now
this message for case you miseed this fix.
Borek
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Thursday, October 07, 2010 11:38 PM
To: clucene-developers@lists.sourceforge.net
Subject: Re: [CLucene-dev] Current branches state
I just had a quick look
= all tests pass
in debug and also in release
on win7 64bit. But there are still some memory leaks left.
Jiri
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Saturday, November 13, 2010 5:20 PM
To: clucene-developers@lists.sourceforge.net
Subject: Re
Hi Bill,
What kind of a sort object are you passing? if its your own brewed,
perhaps it is buggy?
Itamar.
On 10/12/2010 10:53 PM, Miller, Bill (QuickWire) wrote:
Hi all, I've been implementing MultiSearcher and have a problem that
may be more of a 'Lucene Conceptual' thing than a bug.
This is now merged to master. Thanks for reporting!
Itamar.
On 5/12/2010 7:43 PM, Matt Ronge wrote:
Shoot I wish I had noticed this earlier:
http://clucene.git.sourceforge.net/git/gitweb.cgi?p=clucene/clucene;a=commit;h=de5695332badddc264c3e187350463d9d6ee4a8a
Looks like someone else had
Merged to master. Thanks.
Itamar.
On 18/11/2010 10:37 AM, Šplíchal Jiří wrote:
Hello,
there is a bug in the constructor of the wildcard query setting the
termContainsWildcard member
variable. The existing test checked if at least one of the chars *?
was NOT contained in the string
On 20/9/2010 9:44 AM, Ben van Klinken wrote:
I've been testing out the opensuse build service. Very cool in that it
allows you to compile your code on different architectures and
different distributions. Ive picked up a few build problems already
and fixed them. Check out the results at
On 15/9/2010 11:06 PM, Kostka Bořivoj wrote:
I don't think so. Test is commented as test for LUCENE-1270 issue, which is
not related to any exceptions thrown from
directory. No idea, why MockRAMDirectory is used in test, perhaps because it
contains some additional checks.
In Java version
the last call by inserting prints and the call
writer = _CLNEW IndexWriter4Test(dir2, false,an, true);
never returns.
Do those test work for you? Could you check my branch?
Jiri
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Monday, September 13
Ben, I think Veit got it right :)
On 2/9/2010 12:45 PM, Ben van Klinken wrote:
Itanar would know more about this, but I thought the query parser IS
used in the new version. Itamar?
On Wednesday, September 1, 2010, Veit Jahnsnuncupa...@googlemail.com wrote:
Hi Mark,
in wildcard
Hi all,
The atomicthreads branch, which brings many fixes related to
multithreading in CLucene, is starting to get old. I don't know how well
it has been tested, but from what I could see it doesn't seem to present
new issues.
So I intend to merge it to master soon - thats the only move that
Borek, the intensive_testing branch is consistently crashing in one of
the IndexSearcher tests, with an AV / buffer overrun. I'm testing with
VC8. Is it something you have seen already?
Since it seems to be related to threading, I merged master and then
atomicthreads with your branch to see if
-developers@lists.sourceforge.net
Subject: Re: [CLucene-dev] RAMDirectory testing
OK, I'll try to fix this.
Borek
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Friday, July 30, 2010 1:45 PM
To: clucene-developers@lists.sourceforge.net
Subject: Re: [CLucene
Well, the cmake build system does let us deploy on many platforms very
easily, and it is also very easy to setup as a VS project. That link I
posted has instructions in it, but the following commands in a batch
file should do the trick for you:
rem Define Boost envt vars
set
On 9/8/2010 4:01 PM, John O'Brien wrote:
Hi,
Apologies if this has already been covered in previous posts but
I've not been able to find the answer in the archive so far.
We have an application which indexes mail messages. We get the
information for each message over IMAP, create the
Can i use the highlighter even if i don't store text in clucene index?
I'm not familiar with the specifics of the implementation in our
contrib, but it should also work without storing the text by looking at
the term vectors and using the token offsets generated by the analyzer
while
Eric,
For new development you better work with our git master HEAD. See
http://clucene.sourceforge.net/download.shtml
You'll find the Highlighter under /src/contribs-lib/CLucene/highlighter.
Itamar.
I just started looking at the source code, but it sure looks like you can. If
you look at
Hello,
I added a test to the TestBitSet.cpp file that test this issue.
How can I send it to you?
Send it on the list, or to me privately, prefferably as a patch / diff.
I will now try to write test for the constant score query, because it is not
working correctly.
It does not return
Jiri,
Thanks for the detailed reports.
We are following Java Lucene's implementation quite strictly. If you'll
compare \search\ConstantScoreQuery.java from 2.3.2 to CLucene's code
you'll see we did exactly what the original code did, except for
resolving Java/C++ difference in scoping
It makes sense, and I updated the code accordingly.
Can you write a small test proving this issue (and that it is resolved now)?
Thanks.
Itamar.
Hello,
I am testing my queries while having following test case:
given
CL_NS(search)::Query * pQuery
then
result of pQuery must
Hello,
it seems, that we will need following features for our project:
- SpanQueries
- extractTerms method
- TermsFilter
- RegexQuery
- MoreLikeThisQuery
so if there is no implementation then I will start to implement them.
I will probably start with the extractTerms method and the
Hi,
To the best of my knowledge no one is actively working on any of those
items at the moment, although some has shown interest in SpanQueries. If
someone does - please let us all know.
So if you implement any of them and can contribute back this will really
help.
Re #3 - you are right, I
The beauty of tests is they speak for themselves...
Is it possible to have a test showing the corruption issue you mentioned
if the BitSet patch isn't applied?
Itamar.
On 3/8/2010 4:03 PM, Veit Jahns wrote:
Hi,
I observed that the index becomes corrupted (Read past EOF) after
several
Ok, thanks. I updated master with your BitSet fix.
On 3/8/2010 4:51 PM, Veit Jahns wrote:
2010/8/3 Itamar Syn-Hershkoita...@code972.com:
What I'm looking for is a test showing the index corruption scenario you
described - if it can be reproduced in a test, and then to see the
BitSet
It should compile with MingW. What compile errors do you get?
Itamar.
On 1/8/2010 10:17 PM, Ahmed wrote:
hi,
Can compile CLucene 2.3.2 under windows using MingW ?
I tried that and i get a lot of errors.
Ahmed
--
IndexSearcher is deleted
Borek
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Monday, July 26, 2010 8:28 PM
To: clucene-developers@lists.sourceforge.net
Subject: Re: [CLucene-dev] Exception during thread finish
If you're sure this is not a race condition
On 26/7/2010 10:32 AM, Ben vanklinken wrote:
Itamar, why do you suggest one searcher per thread? The searcher is
multithreaded.
Being multithreaded means having locks on certain operations. If you
have too many users depend on one searcher, performance will drop.
This is why I suggested
Use WhitespaceAnalyzer instead of what you're using now for indexing.
Analyzer is what is being used internally to tokenize the stream and
filter tokens from it. Depending on your needs, you'll need to choose
the right analyzer for you, or write your own.
Itamar.
On 26/7/2010 11:51 AM, Rui
If you're sure this is not a race condition between your threads, try
the atomicthreads branch. We fixed several threading errors there.
Actually, if this resolves this issue, I might just go ahead and merge
it to master and wait no more...
Itamar.
On 26/7/2010 3:51 PM, Kostka Bořivoj wrote:
Hi,
These are years old. The only one who is likely to know the answer to
that is Ben - assuming there was a reason to marking it unstable.
I strongly suggest to use the latest version from the development
branch. See clucene.sourceforge.net/download.shtml
. It seems work
fine for me.
And I am curious about whether this modification is just correct.. Am
I missing something?
2010/7/11 Itamar Syn-Hershko ita...@code972.com
mailto:ita...@code972.com
On 9/7/2010 11:02 PM, Veit Jahns wrote:
That's an internal limit of Java Lucene (see e.g
On 9/7/2010 11:02 PM, Veit Jahns wrote:
That's an internal limit of Java Lucene (see e.g. [1]) as well as
CLucene. That's all I know about this, but Michael and Itamar
discussed about a similar issue regarding FSDirectory some time ago
[2]. May be this helps you with this issue.
From
Hi,
Index format and functionality are the same for both options, this is
mainly a switch to allow adding CLucene also for projects using SBCS.
For every new development - with or without CLucene, especially such
that handles texts extensively, I'd recommend using a MBCS (namely Unicode).
I tend to agree with Veit. Although JL doesn't have a finalize method, I
think it does make sense to automatically call close in the desctructor
of both Index writer and reader.
I'm not sure asserting in a destructor, or throwing an exception, will
do any good.
But is calling close() from
On 29/6/2010 12:04 PM, Kostka Bořivoj wrote:
My cycle starts at this-postingsFreeCountDW, not at 0
Sorry, I misread you. I thought you just replaced the line within the
loop. So yes, it seems to be the same, except with my solution you don't
have to search for more copy/delete occurrences in
I tested it a bit, and overall it seems to work just fine. I checked in
my changes into master and I'm signing off this issue.
If you could test this further (using your app is just fine, but of
course IndexWriter and DocumentsWriter tests are even better), check
cl_demo for leaks and try to
On 29/6/2010 1:41 PM, Kostka Bořivoj wrote:
I agree it works fine (and your way of nulling is definitelly better than
mine).
I already indexed about 1GB of data, but I'm not sure about mem leaks,
as my application memory increases constantly during indexing (and it didn't
with previous
Hi all,
Lucene In Action (2nd Edition), authored by Michael McCandless, Erik
Hatcher and Otis Gospodnetić, is hands down the best guide to Lucene,
the high-performance search engine library. It can help anyone start
using Lucene or CLucene, and understand what is going on under the hood.
It
Alrighty, seems like I have nailed it. See below + attached patch.
On 29/6/2010 12:39 AM, Kostka Bořivoj wrote:
I'm quite sure the problem is in postingsFreeListDW management:
The postings after postingsFreeCountDW are used somewhere (but are still here
in a list). If you
remove block of free
Borek, a quick update:
Apparently I was wrong. The 2 issues mentioned in JIRA 1072 were already
fixed in 2.3.2, and the core patches attached to it weren't showing up
in the release since other check-ins updated them to work differently.
So, what you were experiencing is either a CLucene
...
Borek
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Thursday, June 24, 2010 12:11 AM
To: clucene-developers@lists.sourceforge.net
Subject: Re: [CLucene-dev] vector subscript out of range
exceptionduringindexing
In IndexWriter.h (line 1163
the only way I see is to change method to be public.
I'm not very
happy doing so, but I cannot see any other way...
Borek
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Thursday, June 24, 2010 12:11 AM
To: clucene-developers
] vector subscript out of range
exception duringindexing
I'm not sure which JLucene version I should use (and where to get it)
Borek
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@code972.com]
Sent: Wednesday, June 23, 2010 12:11 AM
To: clucene-developers
]
Sent: Wednesday, June 23, 2010 5:00 PM
To: clucene-developers@lists.sourceforge.net
Subject: Re: [CLucene-dev] vector subscript out of
rangeexceptionduringindexing
I'll try to port whole TestDocumentsWriter, it is not so big
-Original Message-
From: Itamar Syn-Hershko
by postingsFreeListDW.
Until now I was not able to find the reason.
Borek
-Original Message-
From: Itamar Syn-Hershko [mailto:ita...@divrei-tora.com]
Sent: Monday, June 21, 2010 2:08 PM
To: clucene-developers@lists.sourceforge.net
Subject: Re: [CLucene-dev] vector
Looks like an encoding issue. Is the file being read correctly (check with
your debugger)?
Also, please post such questions to the CLucene user group.
Itamar.
-Original Message-
From: Itziar Cortes [mailto:itz...@eleka.net]
Sent: Sunday, June 20, 2010 12:21 PM
To:
You are right, thanks. This is how JL does this too. I fixed this and
committed to git as e75f0c...22e4 [1].
Do you have a way of reproducing this, so we can add a test case to our test
suite?
By the way, we no longer maintain 0.9.21 or the SVN repository, so you'll
need to either pull this
). The index
merging and optimizing process takes unusally (in my opinion) long
time, as the index files are combined maybe a megabyte of
disc space.
weird.
2010/6/13 Itamar Syn-Hershko ita...@divrei-tora.com:
Just to confirm: for both branches, cl_test works fine but
cl_demo crashes
Hi Norman,
From what I could tell, UserData only became available in Lucene 2.9. Since
CLucene follows JL's quite strictly, this isn't available yet as our latest
code branch conforms to 2.3.2.
To have UserData on the index level, you can add a dummy doc (see [1] for
some tradeoffs). If you want
Jiri,
Indeed, for some weird reason this wasn't implemented in CLucene. Porting
this from the original Java code [1] seems quite straight forward. I won't
be able to do so myself in the next few weeks - would you want perhaps to
provide us with a patch porting this function with tests for it (in
Hi all,
Just to let you know the atomicthreads branch in our git repositories [1] is
fixing many threading and cross-platform issues we found in the master
branch's code. We have been testing this for a while now on several
platforms, but would want some additional feedback from the community
CString is MFC's string object, and is TCHAR.
Rui, the function we are actually interested in is m_GetFileContents. The
error most likely lies there, in the way you are loading your text documents
(which we already established are ANSI). Please also let us know how you
compile your app with
Rui,
This file is ANSI encoded. Are the other files you do succeed in finding are
Unicode / UTF8 encoded perhaps? If that's the case your routine for loading
the files is buggy. You should either have them all encoded using the same
encoding, or have more intelligent code to convert incompatible
Alfredo,
First, let me welcome you to our community. JFYI, our license is dual - LGPL
/ Apache 2.0.
The latest release (from 2008) - 0.9.21b - is the one being widely used.
There is an ongoing work on our Git repository on a 2_3_2 branch, which
conforms to Java Lucene 2.3.2. See
Paul,
Attached is the patch you need. It is confirmed to work on SL. I haven't
committed it to master yet since I'm awaiting on test results from Linux
systems.
Itamar.
-Original Message-
From: Paul J. Lucas [mailto:p...@lucasmail.org]
Sent: ד 13 ינואר 2010 22:54
To:
Please have a look at cl_demo, and the various tests in cl_test, for some
sample code.
Itamar.
_
From: Marcelo Torres [mailto:primac...@gmail.com]
Sent: ג 12 ינואר 2010 22:16
To: clucene-developers@lists.sourceforge.net
Subject: [CLucene-dev] Help with error with xcode and clucene
I will have a look soon. Anyway, JFYI, CLucene's implementation of
StandardAnalyzer (mainly StandardTokenizer) differs from the current Java
Lucene's one. Porting the current Java implementation shouldn't be too hard
a task since it's jflex generated code -- perhaps if someone could
contribute
Henning,
Both of your points are valid, and being worked on. Once we complete the
port, and have solid set of rules on the various cases where this question
arises, we will write that down and have it available for all through in the
project docs.
Implementing boost::shared_ptr is work on
)
These are tiny leaks but perhaps one of them is growing into a much larger
leak with bigger input?
I will run valgrind on the test program I sent you before and send you the
output.
Michael Levin wrote:
It looks like the leaks have been fixed! Thank you so much!! :)
Itamar Syn-Hershko wrote
Celto,
Just committed a fix for that, see commit
c89f8a39fa1faa34374d8a6e92ae9c2467deeda7. Please test this and let me know.
I used search() and Luke to verify this fix.
Itamar.
-Original Message-
From: cel tix44 [mailto:celti...@gmail.com]
Sent: Thursday, November 05, 2009 4:21 PM
Hi,
Running the following test, I hit no AV. If you can make a minified version
of your use-case and send it over, that would help.
I will look at the other issue when I'll have time later. If you or Michael
could send a similar test function demonstrating that issue using the
minimalistic
Of course it does, this is Java not C++.
In C++, what you'd do is derive your own class from HitCollector and
implement the virtual function collect in it, something like:
class MyCollector : public HitCollector {
void collect(const int32_t doc, const float_t score){
80 matches
Mail list logo