Re: [CLucene-dev] How to create distributed index

2014-12-16 Thread Itamar Syn-Hershko
Don't even try.. just go with Elasticsearch or Solr. My $0.02 -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Tue, Dec 16, 2014 at 3:36 PM, Shailesh Birari sbirar

Re: [CLucene-dev] All remaining memory leaks in tests (hopefully) removed

2013-07-16 Thread Itamar Syn-Hershko
Borek, feel free to merge this into master if all works as expected and tests are all green On Fri, Jul 12, 2013 at 4:21 PM, Kostka Bořivoj kos...@tovek.cz wrote: Hi, New branch memleak_fixes created. All memory leaks produced by tests are fixed. It also contains fixed bug in

Re: [CLucene-dev] BitSet::nextSetBit very inefficient for sparse bit sets

2012-12-12 Thread Itamar Syn-Hershko
Feel free to merge it into master On Wed, Dec 12, 2012 at 4:27 PM, Kostka Bořivoj kos...@tovek.cz wrote: BitSet::nexSetBit is implemented very inefficient way for sparse bit sets. It searches for next bit set by per-bit iteration and bit shifting See OPTIMIZED_BITSET branch for better

Re: [CLucene-dev] Creating CLucene Index in a Database; Support for Asian languages

2012-11-22 Thread Itamar Syn-Hershko
inline On Thu, Nov 22, 2012 at 11:15 AM, Vitaly Artemov vitalyarte...@gmail.comwrote: Hello all, I starting to evaluate Clucene engine for using in our product. I have 2 questions. 1. Is It planned to add support(or it already exists) for creating index in the Database instead of memory

Re: [CLucene-dev] Finding all the fields used in a query..

2012-05-03 Thread Itamar Syn-Hershko
It looks like you are trying to use Lucene as a database, a document database to be specific, and it actually isn't really supported out of the box Take a look at MongoDB, CouchDB or RavenDB. On Thu, May 3, 2012 at 10:56 AM, Mike Aubury m...@aubit.com wrote: I'm writing some code at the minute

Re: [CLucene-dev] Indexing a document

2011-11-19 Thread Itamar Syn-Hershko
*To:* clucene-developers@lists.sourceforge.net *Subject:* Re: [CLucene-dev] Indexing a document Thank you. 2011/11/7 Itamar Syn-Hershko ita...@code972.com Yeah, shouldn't be too hard to pull off On Mon, Nov 7, 2011 at 4:17 PM, Emerson Espínola emersonespin...@gmail.com

Re: [CLucene-dev] Indexing a document

2011-11-07 Thread Itamar Syn-Hershko
Thats going to take a while, unfortunately. LPP is already available on github, but we want to have some improvements made to its core before merging it to CLucene On Mon, Nov 7, 2011 at 4:01 PM, Emerson Espínola emersonespin...@gmail.comwrote: Thank you Viet. When will this new version of

Re: [CLucene-dev] Indexing a document

2011-11-07 Thread Itamar Syn-Hershko
/in/emersonespinola http://spaces.live.com/profile.aspx?mem=emersonespin...@hotmail.com http://emersonespinola.blogspot.com http://twitter.com/emersonespinola http://www.myebook.com/emersonespinola/ http://www.myebook.com/emersonespinola/ 2011/11/7 Itamar Syn-Hershko ita...@code972.com Thats going

Re: [CLucene-dev] live query

2011-08-06 Thread Itamar Syn-Hershko
There isn't such thing built into clucene nor Java Lucene. You are going to have to keep a list of document IDs that once matched a query, and to perform searches in the background every now and then with that document ID in it (use your IDs, not Lucene's internal docids). On Sat, Aug 6, 2011 at

Re: [CLucene-dev] Crash with IndexReader reopen

2011-07-29 Thread Itamar Syn-Hershko
Can you create a failing test? On a side note, next week we will be working on the new CLucene code-base, so hopefully we will have a newer and better version supported soon. On Fri, Jul 29, 2011 at 12:51 PM, Andrew Victor avictor...@gmail.comwrote: hi, I'm consistently having this crash

[CLucene-dev] CLucene hackathon August 1-8. Goal: completing transition to new code-base

2011-07-17 Thread Itamar Syn-Hershko
Hi all, Following (quite) a recent discussion in the mailing list, we are now ready to begin and hopefully complete the transition to the new code base. To do that, we are hosting a virtual hackathon starting August 1st. It will be hosted on IRC (Freenode): #clucene-hackathon . For reference,

Re: [CLucene-dev] CLucene's future

2011-04-05 Thread Itamar Syn-Hershko
Hi Alan, As I mentioned in my previous email, our intention is to profile the library and reduce its size and signature as much as possible. We can do wonders just by removing ref-counting when its not really necessary, and hopefully we'll find more bottlenecks. We are interested in having an

[CLucene-dev] CLucene's future

2011-03-30 Thread Itamar Syn-Hershko
Hi all, CLucene has grown quite nicely the last few years, but yet we were unable to keep up with the high pace of Java Lucene's. Our goal has always been to have Lucene on steroids, and have it more maintainable and up to speed with Java Lucene. As Ben mentioned here last week, now there's

Re: [CLucene-dev] Potential bug in SegmentTermEnum.cpp

2011-02-08 Thread Itamar Syn-Hershko
Itamar Syn-Hershko ita...@code972.com mailto:ita...@code972.com Hi, If malloc / realloc returns NULL the indexing process has to be aborted anyway, and the only way I can think of doing this is throwing an exception. Did you have other idea in mind? Also, I'm not sure why

Re: [CLucene-dev] Current branches state

2011-01-29 Thread Itamar Syn-Hershko
more it is safe to delete it. Please feel free to do this, if you can Unfortunately I was busy last few months, so I didn't port other tests. Hope I can spend some time next month Borek -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Thursday

Re: [CLucene-dev] NearSpansUnordered bug fix

2011-01-29 Thread Itamar Syn-Hershko
Just did. On 28/1/2011 10:26 AM, Šplíchal Jiří wrote: Hello, I think we should remove the memore_leaks branch, and start searching for memleaks in the current version. Jiri -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Thursday, January 27

Re: [CLucene-dev] ParallelMultiSearcher support in clucene...

2011-01-27 Thread Itamar Syn-Hershko
Hi, ParallelMultiSearcher wasn't ported yet. You are welcome to port it yourself - have a look at search/ParallelMultiSearcher.java and search/MultiSearcher.java. Itamar. On 8/11/2010 12:23 PM, Rajendra Prasad Murakonda wrote: I can's seem to find ParallelMultiSearcher. I couldn't

Re: [CLucene-dev] MultiLevelSkipListReader bug

2011-01-27 Thread Itamar Syn-Hershko
I mean - I cherry-picked that commit, and also merged TermPositionsQueue_fix into master and deleted it. Itamar. On 27/1/2011 8:15 PM, Itamar Syn-Hershko wrote: I pulled your change and merged to master. Also deleted the fix branch. Thanks. Itamar. On 21/1/2011 2:58 PM, Šplíchal Jiří

Re: [CLucene-dev] Highlighter implementation

2011-01-27 Thread Itamar Syn-Hershko
This looks great! Itamar. On 27/12/2010 5:50 PM, Šplíchal Jiří wrote: Hello, I have extended the current highlighter implementation based on the implementation in Java Lucene 2.4.1 in order to support correct highlighting of Phrase, MultiPhrase and Span queries. This highlighter is now

Re: [CLucene-dev] Current branches state

2011-01-27 Thread Itamar Syn-Hershko
this message for case you miseed this fix. Borek -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Thursday, October 07, 2010 11:38 PM To: clucene-developers@lists.sourceforge.net Subject: Re: [CLucene-dev] Current branches state I just had a quick look

Re: [CLucene-dev] NearSpansUnordered bug fix

2011-01-27 Thread Itamar Syn-Hershko
= all tests pass in debug and also in release on win7 64bit. But there are still some memory leaks left. Jiri -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Saturday, November 13, 2010 5:20 PM To: clucene-developers@lists.sourceforge.net Subject: Re

Re: [CLucene-dev] MultiSearcher problem

2010-12-24 Thread Itamar Syn-Hershko
Hi Bill, What kind of a sort object are you passing? if its your own brewed, perhaps it is buggy? Itamar. On 10/12/2010 10:53 PM, Miller, Bill (QuickWire) wrote: Hi all, I've been implementing MultiSearcher and have a problem that may be more of a 'Lucene Conceptual' thing than a bug.

Re: [CLucene-dev] Patch: Fixes memory smasher in KeywordTokenizer

2010-12-24 Thread Itamar Syn-Hershko
This is now merged to master. Thanks for reporting! Itamar. On 5/12/2010 7:43 PM, Matt Ronge wrote: Shoot I wish I had noticed this earlier: http://clucene.git.sourceforge.net/git/gitweb.cgi?p=clucene/clucene;a=commit;h=de5695332badddc264c3e187350463d9d6ee4a8a Looks like someone else had

Re: [CLucene-dev] Wildcardquery bug in constructor

2010-12-24 Thread Itamar Syn-Hershko
Merged to master. Thanks. Itamar. On 18/11/2010 10:37 AM, Šplíchal Jiří wrote: Hello, there is a bug in the constructor of the wildcard query setting the termContainsWildcard member variable. The existing test checked if at least one of the chars *? was NOT contained in the string

Re: [CLucene-dev] Current branches state

2010-09-20 Thread Itamar Syn-Hershko
On 20/9/2010 9:44 AM, Ben van Klinken wrote: I've been testing out the opensuse build service. Very cool in that it allows you to compile your code on different architectures and different distributions. Ive picked up a few build problems already and fixed them. Check out the results at

Re: [CLucene-dev] AddIndexesNoOptimize testadded to intensive_testing branch, some problems in core detected

2010-09-16 Thread Itamar Syn-Hershko
On 15/9/2010 11:06 PM, Kostka Bořivoj wrote: I don't think so. Test is commented as test for LUCENE-1270 issue, which is not related to any exceptions thrown from directory. No idea, why MockRAMDirectory is used in test, perhaps because it contains some additional checks. In Java version

Re: [CLucene-dev] AddIndexesNoOptimize test added to intensive_testing branch, some problems in core detected

2010-09-14 Thread Itamar Syn-Hershko
the last call by inserting prints and the call writer = _CLNEW IndexWriter4Test(dir2, false,an, true); never returns. Do those test work for you? Could you check my branch? Jiri -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Monday, September 13

Re: [CLucene-dev] wild card query

2010-09-13 Thread Itamar Syn-Hershko
Ben, I think Veit got it right :) On 2/9/2010 12:45 PM, Ben van Klinken wrote: Itanar would know more about this, but I thought the query parser IS used in the new version. Itamar? On Wednesday, September 1, 2010, Veit Jahnsnuncupa...@googlemail.com wrote: Hi Mark, in wildcard

[CLucene-dev] Vote for merging of atomicthreads branch into master

2010-08-16 Thread Itamar Syn-Hershko
Hi all, The atomicthreads branch, which brings many fixes related to multithreading in CLucene, is starting to get old. I don't know how well it has been tested, but from what I could see it doesn't seem to present new issues. So I intend to merge it to master soon - thats the only move that

Re: [CLucene-dev] RAMDirectory testing (using MockRAMDirectory)

2010-08-16 Thread Itamar Syn-Hershko
Borek, the intensive_testing branch is consistently crashing in one of the IndexSearcher tests, with an AV / buffer overrun. I'm testing with VC8. Is it something you have seen already? Since it seems to be related to threading, I merged master and then atomicthreads with your branch to see if

Re: [CLucene-dev] RAMDirectory testing

2010-08-09 Thread Itamar Syn-Hershko
-developers@lists.sourceforge.net Subject: Re: [CLucene-dev] RAMDirectory testing OK, I'll try to fix this. Borek -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Friday, July 30, 2010 1:45 PM To: clucene-developers@lists.sourceforge.net Subject: Re: [CLucene

Re: [CLucene-dev] Hit Highlightining (ID:4A3787A70081DF64)

2010-08-09 Thread Itamar Syn-Hershko
Well, the cmake build system does let us deploy on many platforms very easily, and it is also very easy to setup as a VS project. That link I posted has instructions in it, but the following commands in a batch file should do the trick for you: rem Define Boost envt vars set

Re: [CLucene-dev] cloning/modifying existing documents?

2010-08-09 Thread Itamar Syn-Hershko
On 9/8/2010 4:01 PM, John O'Brien wrote: Hi, Apologies if this has already been covered in previous posts but I've not been able to find the answer in the archive so far. We have an application which indexes mail messages. We get the information for each message over IMAP, create the

Re: [CLucene-dev] Hit Highlightining (ID:4A3787A70081DF64)

2010-08-07 Thread Itamar Syn-Hershko
Can i use the highlighter even if i don't store text in clucene index? I'm not familiar with the specifics of the implementation in our contrib, but it should also work without storing the text by looking at the term vectors and using the token offsets generated by the analyzer while

Re: [CLucene-dev] Hit Highlightining (ID:4A3787A70081DF64)

2010-08-07 Thread Itamar Syn-Hershko
Eric, For new development you better work with our git master HEAD. See http://clucene.sourceforge.net/download.shtml You'll find the Highlighter under /src/contribs-lib/CLucene/highlighter. Itamar. I just started looking at the source code, but it sure looks like you can. If you look at

Re: [CLucene-dev] BitSet bug

2010-08-06 Thread Itamar Syn-Hershko
Hello, I added a test to the TestBitSet.cpp file that test this issue. How can I send it to you? Send it on the list, or to me privately, prefferably as a patch / diff. I will now try to write test for the constant score query, because it is not working correctly. It does not return

Re: [CLucene-dev] ConstantScoreQuery / ConstantWeight problem

2010-08-05 Thread Itamar Syn-Hershko
Jiri, Thanks for the detailed reports. We are following Java Lucene's implementation quite strictly. If you'll compare \search\ConstantScoreQuery.java from 2.3.2 to CLucene's code you'll see we did exactly what the original code did, except for resolving Java/C++ difference in scoping

Re: [CLucene-dev] BitSet bug

2010-08-05 Thread Itamar Syn-Hershko
It makes sense, and I updated the code accordingly. Can you write a small test proving this issue (and that it is resolved now)? Thanks. Itamar. Hello, I am testing my queries while having following test case: given CL_NS(search)::Query * pQuery then result of pQuery must

Re: [CLucene-dev] Query Interface, MultiPhraseQuery, Array

2010-08-05 Thread Itamar Syn-Hershko
Hello, it seems, that we will need following features for our project: - SpanQueries - extractTerms method - TermsFilter - RegexQuery - MoreLikeThisQuery so if there is no implementation then I will start to implement them. I will probably start with the extractTerms method and the

Re: [CLucene-dev] Query Interface, MultiPhraseQuery, Array

2010-08-04 Thread Itamar Syn-Hershko
Hi, To the best of my knowledge no one is actively working on any of those items at the moment, although some has shown interest in SpanQueries. If someone does - please let us all know. So if you implement any of them and can contribute back this will really help. Re #3 - you are right, I

Re: [CLucene-dev] Bug in BitSet::writeDgaps()

2010-08-03 Thread Itamar Syn-Hershko
The beauty of tests is they speak for themselves... Is it possible to have a test showing the corruption issue you mentioned if the BitSet patch isn't applied? Itamar. On 3/8/2010 4:03 PM, Veit Jahns wrote: Hi, I observed that the index becomes corrupted (Read past EOF) after several

Re: [CLucene-dev] Bug in BitSet::writeDgaps()

2010-08-03 Thread Itamar Syn-Hershko
Ok, thanks. I updated master with your BitSet fix. On 3/8/2010 4:51 PM, Veit Jahns wrote: 2010/8/3 Itamar Syn-Hershkoita...@code972.com: What I'm looking for is a test showing the index corruption scenario you described - if it can be reproduced in a test, and then to see the BitSet

Re: [CLucene-dev] Compile CLucene using mingw under windows

2010-08-01 Thread Itamar Syn-Hershko
It should compile with MingW. What compile errors do you get? Itamar. On 1/8/2010 10:17 PM, Ahmed wrote: hi, Can compile CLucene 2.3.2 under windows using MingW ? I tried that and i get a lot of errors. Ahmed --

Re: [CLucene-dev] Exception during thread finish

2010-07-30 Thread Itamar Syn-Hershko
IndexSearcher is deleted Borek -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Monday, July 26, 2010 8:28 PM To: clucene-developers@lists.sourceforge.net Subject: Re: [CLucene-dev] Exception during thread finish If you're sure this is not a race condition

Re: [CLucene-dev] Performance questions

2010-07-26 Thread Itamar Syn-Hershko
On 26/7/2010 10:32 AM, Ben vanklinken wrote: Itamar, why do you suggest one searcher per thread? The searcher is multithreaded. Being multithreaded means having locks on certain operations. If you have too many users depend on one searcher, performance will drop. This is why I suggested

Re: [CLucene-dev] Indexate numbers

2010-07-26 Thread Itamar Syn-Hershko
Use WhitespaceAnalyzer instead of what you're using now for indexing. Analyzer is what is being used internally to tokenize the stream and filter tokens from it. Depending on your needs, you'll need to choose the right analyzer for you, or write your own. Itamar. On 26/7/2010 11:51 AM, Rui

Re: [CLucene-dev] Exception during thread finish

2010-07-26 Thread Itamar Syn-Hershko
If you're sure this is not a race condition between your threads, try the atomicthreads branch. We fixed several threading errors there. Actually, if this resolves this issue, I might just go ahead and merge it to master and wait no more... Itamar. On 26/7/2010 3:51 PM, Kostka Bořivoj wrote:

Re: [CLucene-dev] status of clucene-contrib-0.9.16a

2010-07-14 Thread Itamar Syn-Hershko
Hi, These are years old. The only one who is likely to know the answer to that is Ben - assuming there was a reason to marking it unstable. I strongly suggest to use the latest version from the development branch. See clucene.sourceforge.net/download.shtml

Re: [CLucene-dev] How to use RAMDirectory for index that is 2G

2010-07-11 Thread Itamar Syn-Hershko
. It seems work fine for me. And I am curious about whether this modification is just correct.. Am I missing something? 2010/7/11 Itamar Syn-Hershko ita...@code972.com mailto:ita...@code972.com On 9/7/2010 11:02 PM, Veit Jahns wrote: That's an internal limit of Java Lucene (see e.g

Re: [CLucene-dev] How to use RAMDirectory for index that is 2G

2010-07-10 Thread Itamar Syn-Hershko
On 9/7/2010 11:02 PM, Veit Jahns wrote: That's an internal limit of Java Lucene (see e.g. [1]) as well as CLucene. That's all I know about this, but Michael and Itamar discussed about a similar issue regarding FSDirectory some time ago [2]. May be this helps you with this issue. From

Re: [CLucene-dev] What is the difference between ascii mode and default mode?

2010-07-05 Thread Itamar Syn-Hershko
Hi, Index format and functionality are the same for both options, this is mainly a switch to allow adding CLucene also for projects using SBCS. For every new development - with or without CLucene, especially such that handles texts extensively, I'd recommend using a MBCS (namely Unicode).

Re: [CLucene-dev] cl_demo memory leaks discovery

2010-07-03 Thread Itamar Syn-Hershko
I tend to agree with Veit. Although JL doesn't have a finalize method, I think it does make sense to automatically call close in the desctructor of both Index writer and reader. I'm not sure asserting in a destructor, or throwing an exception, will do any good. But is calling close() from

Re: [CLucene-dev] vector subscript outofrangeexceptionduringindexing

2010-06-29 Thread Itamar Syn-Hershko
On 29/6/2010 12:04 PM, Kostka Bořivoj wrote: My cycle starts at this-postingsFreeCountDW, not at 0 Sorry, I misread you. I thought you just replaced the line within the loop. So yes, it seems to be the same, except with my solution you don't have to search for more copy/delete occurrences in

Re: [CLucene-dev] vector subscript outofrangeexceptionduringindexing

2010-06-29 Thread Itamar Syn-Hershko
I tested it a bit, and overall it seems to work just fine. I checked in my changes into master and I'm signing off this issue. If you could test this further (using your app is just fine, but of course IndexWriter and DocumentsWriter tests are even better), check cl_demo for leaks and try to

Re: [CLucene-dev] vector subscript outofrangeexceptionduringindexing

2010-06-29 Thread Itamar Syn-Hershko
On 29/6/2010 1:41 PM, Kostka Bořivoj wrote: I agree it works fine (and your way of nulling is definitelly better than mine). I already indexed about 1GB of data, but I'm not sure about mem leaks, as my application memory increases constantly during indexing (and it didn't with previous

[CLucene-dev] Lucene In Action CLucene special: free chapter and a discount

2010-06-28 Thread Itamar Syn-Hershko
Hi all, Lucene In Action (2nd Edition), authored by Michael McCandless, Erik Hatcher and Otis Gospodnetić, is hands down the best guide to Lucene, the high-performance search engine library. It can help anyone start using Lucene or CLucene, and understand what is going on under the hood. It

Re: [CLucene-dev] vector subscript outofrangeexceptionduringindexing

2010-06-28 Thread Itamar Syn-Hershko
Alrighty, seems like I have nailed it. See below + attached patch. On 29/6/2010 12:39 AM, Kostka Bořivoj wrote: I'm quite sure the problem is in postingsFreeListDW management: The postings after postingsFreeCountDW are used somewhere (but are still here in a list). If you remove block of free

Re: [CLucene-dev] vector subscript outofrangeexceptionduringindexing

2010-06-26 Thread Itamar Syn-Hershko
Borek, a quick update: Apparently I was wrong. The 2 issues mentioned in JIRA 1072 were already fixed in 2.3.2, and the core patches attached to it weren't showing up in the release since other check-ins updated them to work differently. So, what you were experiencing is either a CLucene

Re: [CLucene-dev] vector subscript out of rangeexceptionduringindexing

2010-06-24 Thread Itamar Syn-Hershko
... Borek -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Thursday, June 24, 2010 12:11 AM To: clucene-developers@lists.sourceforge.net Subject: Re: [CLucene-dev] vector subscript out of range exceptionduringindexing In IndexWriter.h (line 1163

Re: [CLucene-dev] vector subscript outofrangeexceptionduringindexing

2010-06-24 Thread Itamar Syn-Hershko
the only way I see is to change method to be public. I'm not very happy doing so, but I cannot see any other way... Borek -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Thursday, June 24, 2010 12:11 AM To: clucene-developers

Re: [CLucene-dev] vector subscript out of range exception duringindexing

2010-06-23 Thread Itamar Syn-Hershko
] vector subscript out of range exception duringindexing I'm not sure which JLucene version I should use (and where to get it) Borek -Original Message- From: Itamar Syn-Hershko [mailto:ita...@code972.com] Sent: Wednesday, June 23, 2010 12:11 AM To: clucene-developers

Re: [CLucene-dev] vector subscript out of range exception duringindexing

2010-06-23 Thread Itamar Syn-Hershko
] Sent: Wednesday, June 23, 2010 5:00 PM To: clucene-developers@lists.sourceforge.net Subject: Re: [CLucene-dev] vector subscript out of rangeexceptionduringindexing I'll try to port whole TestDocumentsWriter, it is not so big -Original Message- From: Itamar Syn-Hershko

Re: [CLucene-dev] vector subscript out of rangeexceptionduringindexing

2010-06-22 Thread Itamar Syn-Hershko
by postingsFreeListDW. Until now I was not able to find the reason. Borek -Original Message- From: Itamar Syn-Hershko [mailto:ita...@divrei-tora.com] Sent: Monday, June 21, 2010 2:08 PM To: clucene-developers@lists.sourceforge.net Subject: Re: [CLucene-dev] vector

Re: [CLucene-dev] Problem indexin accented characters.

2010-06-20 Thread Itamar Syn-Hershko
Looks like an encoding issue. Is the file being read correctly (check with your debugger)? Also, please post such questions to the CLucene user group. Itamar. -Original Message- From: Itziar Cortes [mailto:itz...@eleka.net] Sent: Sunday, June 20, 2010 12:21 PM To:

Re: [CLucene-dev] IndexModifier exception during destruction

2010-06-16 Thread Itamar Syn-Hershko
You are right, thanks. This is how JL does this too. I fixed this and committed to git as e75f0c...22e4 [1]. Do you have a way of reproducing this, so we can add a test case to our test suite? By the way, we no longer maintain 0.9.21 or the SVN repository, so you'll need to either pull this

Re: [CLucene-dev] clucene - cl_demo stops with error

2010-06-16 Thread Itamar Syn-Hershko
). The index merging and optimizing process takes unusally (in my opinion) long time, as the index files are combined maybe a megabyte of disc space. weird. 2010/6/13 Itamar Syn-Hershko ita...@divrei-tora.com: Just to confirm: for both branches, cl_test works fine but cl_demo crashes

Re: [CLucene-dev] user data in an index

2010-06-05 Thread Itamar Syn-Hershko
Hi Norman, From what I could tell, UserData only became available in Lucene 2.9. Since CLucene follows JL's quite strictly, this isn't available yet as our latest code branch conforms to 2.3.2. To have UserData on the index level, you can add a dummy doc (see [1] for some tradeoffs). If you want

Re: [CLucene-dev] Saving RAMDirectory to disk

2010-05-24 Thread Itamar Syn-Hershko
Jiri, Indeed, for some weird reason this wasn't implemented in CLucene. Porting this from the original Java code [1] seems quite straight forward. I won't be able to do so myself in the next few weeks - would you want perhaps to provide us with a patch porting this function with tests for it (in

[CLucene-dev] [ANN] Better threading support in CLucene

2010-05-20 Thread Itamar Syn-Hershko
Hi all, Just to let you know the atomicthreads branch in our git repositories [1] is fixing many threading and cross-platform issues we found in the master branch's code. We have been testing this for a while now on several platforms, but would want some additional feedback from the community

Re: [CLucene-dev] Clucene search - Do not found some words

2010-04-26 Thread Itamar Syn-Hershko
CString is MFC's string object, and is TCHAR. Rui, the function we are actually interested in is m_GetFileContents. The error most likely lies there, in the way you are loading your text documents (which we already established are ANSI). Please also let us know how you compile your app with

Re: [CLucene-dev] Clucene search - Do not found some words

2010-04-23 Thread Itamar Syn-Hershko
Rui, This file is ANSI encoded. Are the other files you do succeed in finding are Unicode / UTF8 encoded perhaps? If that's the case your routine for loading the files is buggy. You should either have them all encoded using the same encoding, or have more intelligent code to convert incompatible

Re: [CLucene-dev] clucene next version?

2010-02-09 Thread Itamar Syn-Hershko
Alfredo, First, let me welcome you to our community. JFYI, our license is dual - LGPL / Apache 2.0. The latest release (from 2008) - 0.9.21b - is the one being widely used. There is an ongoing work on our Git repository on a 2_3_2 branch, which conforms to Java Lucene 2.3.2. See

Re: [CLucene-dev] 2.3.2 build failure on Mac OS X 10.6.x

2010-01-13 Thread Itamar Syn-Hershko
Paul, Attached is the patch you need. It is confirmed to work on SL. I haven't committed it to master yet since I'm awaiting on test results from Linux systems. Itamar. -Original Message- From: Paul J. Lucas [mailto:p...@lucasmail.org] Sent: ד 13 ינואר 2010 22:54 To:

Re: [CLucene-dev] Help with error with xcode and clucene

2010-01-13 Thread Itamar Syn-Hershko
Please have a look at cl_demo, and the various tests in cl_test, for some sample code. Itamar. _ From: Marcelo Torres [mailto:primac...@gmail.com] Sent: ג 12 ינואר 2010 22:16 To: clucene-developers@lists.sourceforge.net Subject: [CLucene-dev] Help with error with xcode and clucene

Re: [CLucene-dev] Problem tokenizing dash-prefixed words - GIT 2.3.2

2009-12-08 Thread Itamar Syn-Hershko
I will have a look soon. Anyway, JFYI, CLucene's implementation of StandardAnalyzer (mainly StandardTokenizer) differs from the current Java Lucene's one. Porting the current Java implementation shouldn't be too hard a task since it's jflex generated code -- perhaps if someone could contribute

Re: [CLucene-dev] CLucene Memory Management

2009-12-08 Thread Itamar Syn-Hershko
Henning, Both of your points are valid, and being worked on. Once we complete the port, and have solid set of rules on the various cases where this question arises, we will write that down and have it available for all through in the project docs. Implementing boost::shared_ptr is work on

Re: [CLucene-dev] PerFieldAnalyzerWrapper memory leak

2009-11-16 Thread Itamar Syn-Hershko
) These are tiny leaks but perhaps one of them is growing into a much larger leak with bigger input? I will run valgrind on the test program I sent you before and send you the output. Michael Levin wrote: It looks like the leaks have been fixed! Thank you so much!! :) Itamar Syn-Hershko wrote

Re: [CLucene-dev] StandardAnalyzer broken - GIT364c21b6c3f54fbb90df223621b660197366fb93

2009-11-10 Thread Itamar Syn-Hershko
Celto, Just committed a fix for that, see commit c89f8a39fa1faa34374d8a6e92ae9c2467deeda7. Please test this and let me know. I used search() and Luke to verify this fix. Itamar. -Original Message- From: cel tix44 [mailto:celti...@gmail.com] Sent: Thursday, November 05, 2009 4:21 PM

Re: [CLucene-dev] Crash when StandardAnalyzer used with empty stop list- GIT ba954e917f6ac8d96230b307e7d807ace2ac5c35

2009-11-05 Thread Itamar Syn-Hershko
Hi, Running the following test, I hit no AV. If you can make a minified version of your use-case and send it over, that would help. I will look at the other issue when I'll have time later. If you or Michael could send a similar test function demonstrating that issue using the minimalistic

Re: [CLucene-dev] Score with no normalization

2009-10-14 Thread Itamar Syn-Hershko
Of course it does, this is Java not C++. In C++, what you'd do is derive your own class from HitCollector and implement the virtual function collect in it, something like: class MyCollector : public HitCollector { void collect(const int32_t doc, const float_t score){