Lucene Index backboned by DB

2005-11-15 Thread Karel Tejnora
Hi all, in our testing application using lucene 1.4.3. Thanks you guys for that great job. We have index file around 12GiB, one file (merged). To retrieve hits it takes nice small amount of the time, but reading fields takes 10-100 times more (the stored ones). I think because all the fields

Analyzers, perfect hash, ICU

2006-01-11 Thread Karel Tejnora
Hi all, I'm working on the analyzer for the slovanic latin languages (cs,sk) w/o stemming at first. I would like to ask you: The StopWord analyzer uses often HashSet implementation, but the the Stopwords are not changed often (if ever) from shipped in the java code. Do you think that is the

Re: ThaiAnalyzer for Lucene

2006-02-22 Thread Karel Tejnora
Hi guys, I share same problem, that my czech analyzer has dependency on the icu4j. My opinion is to put interface between your code and icu4j. Because new JDK 1.6 should have more features from icu4j included. Samphan you can also look at http://getopt.org/stempel/ Stempel algorithm, even I

Re: [jira] Commented: (LUCENE-555) Index Corruption

2006-04-25 Thread Karel Tejnora
Ok than indexer indexes to separate directory (sequence of dir, e.g. 1/ 2/ 3/ 4/) with create=true. [transaction log] than merges newly created index to 'for-search' index. backup is copy of 'for-search' index than rollforward is IndexWriter addIndexes(...) newer than backup image. rollbackward

IndexWriter mergeSegments

2006-05-02 Thread Karel Tejnora
Hi, I found a small issue when I add 10GB index to 20GB index using addIndexes when useCompoundFile == true. Before compound file is created the segments info are written but points to non-existing coumpound file then new .tmp is created and renamed to .cfs Between time when new segments wa

[jira] Created: (LUCENE-592) Create compound file after addIndexes but before rewrite of segments

2006-06-07 Thread Karel Tejnora (JIRA)
: Index Versions: 2.0.0, 1.9 Reporter: Karel Tejnora Priority: Minor When compound file format is used new 'segments' file is written before cfs is created. If there is an exception (disk full, etc.) or it is opened before cfs exists, segments points to non-existing file

[jira] Updated: (LUCENE-592) Create compound file after addIndexes but before rewrite of segments

2006-06-07 Thread Karel Tejnora (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-592?page=all ] Karel Tejnora updated LUCENE-592: - Attachment: createCfthanSegments.diff patch swaps described parts of code in IndexWriter. That is my first use of jira,svn and diff, please be patient if

[jira] Created: (LUCENE-663) New feature rich higlighter for Lucene.

2006-08-22 Thread Karel Tejnora (JIRA)
Reporter: Karel Tejnora Attachments: lucene-hlt-src.jar Well, I refactored (took) some code from two previous highlighters. This highlighter: + use TermPositionVector where available + use Analyzer if no TermPositionVector found or is forced to use it. + support for all lucene queries (Term

[jira] Commented: (LUCENE-663) New feature rich higlighter for Lucene.

2006-08-22 Thread Karel Tejnora (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-663?page=comments#action_12429848 ] Karel Tejnora commented on LUCENE-663: -- Hi, yes as I wrote in the code and keeps author - I borrow small code parts from this contribution http

[jira] Commented: (LUCENE-663) New feature rich higlighter for Lucene.

2006-10-13 Thread Karel Tejnora (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-663?page=comments#action_12442203 ] Karel Tejnora commented on LUCENE-663: -- [[ Old comment, sent by email on Wed, 23 Aug 2006 02:21:04 +0200 ]] It is too late here... to on text karel