Re: FileDocument - Confused and Need Help.

2007-09-03 Thread Andreas Guther
Blueyben, What you describe is a general Java time conversion problem, not a Lucene related one. You will need to do a search on "Java time format" which should bring you amongst other links to http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html. You might also be interested

RE: Lucene Search Performance

2008-02-29 Thread Andreas Guther
Just some comment and I understand that you cannot change your index: What we did is to organize our index based on creation date of entries. We limit our search to a given number of years starting from the current year. Organizing the index in that way allows us to take off outdated information.

RE: Minalyzer

2006-12-30 Thread Andreas Guther
Hi, I am reluctant using tiny url links sent to mailing lists. I would suggest using in addition to the tiny url at the end of your email a full URL so everyone can verify the URL before clicking on it. Andreas -Original Message- From: Saurabh Dani [mailto:[EMAIL PROTECTED] Sent: Sat

RE: Minalyzer

2006-12-30 Thread Andreas Guther
@lucene.apache.org Cc: Andreas Guther Subject: RE: Minalyzer Andreas: I used tinyurl links as I did not want this e-mail to look like an effort to get back links for search engines. Here are the diret links: Steps to install and index a test log file -> http://www.minalyzer.com/documentatio

Index Write Access Strategies for Distributed Systems to Shared Index

2007-02-27 Thread Andreas Guther
Hi, I am seeking for a best practice recommendation regarding distributed write access to a Lucene index. We have the following scenario: * Our Lucene index is on a shared drive. * The Lucene lock folder is on the same shared drive * Our web application writing to the index will run on multiple

Can Highlighter handle multiple word terms?

2007-03-02 Thread Andreas Guther
Hi, I am using the Highlighter class to highlight my search results. So far my observation is that the Highlighter does not highlight terms with multiple words in it. For example if I have a text like "This is an example text for the highlighter" and my search term is "example highlighter"

RE: Can Highlighter handle multiple word terms?

2007-03-03 Thread Andreas Guther
null fragmenter? Are you using the QueryScorer that you can pass a field name to restrict highlighting to a certain field? Do you have any example code of what you are attempting? Have you looked at the test code for examples of how to use the Highlighter? - Mark Andreas Guther wrote: >

Speeding up looping over Hits

2007-03-22 Thread Andreas Guther
Hi, While looking into performance enhancement for our search feature I noticed a significant difference in Documents access time while looping over Hits. I wrote a test application search for a list of search terms and then for each returned Hits object loops twice over every single hits.doc(i).

Index File System Limits

2007-04-24 Thread Andreas Guther
I am currently dealing with lucene indexes of the size of 8 GIG. Searching is fast but retrieving documents slow down the process of returning results to the user. Also the index is updated very frequently, about 3 times a minute and more. This leads to an index that grows very fast in number o

Index Update Strategies

2007-04-25 Thread Andreas Guther
Hi We have an index of several GB in size which is updated very frequently-about every 2 seconds. Though it is desired to have changes updated to the index as soon as possible I wonder if this frequent updates can have negative affect on the search and data retrieval performance. Would it make m

Index Locking and NFS

2007-04-26 Thread Andreas Guther
Hi, I found the following recommendation in Lucene in Action from Eric and Otis about where to put Lucene lock files: "Because of known issues with lock files and NFS, choose a directory that doesn't reside on an NFS volume." Could someone please help me understand what those known issues exactl

Locking in Lucene 2.1

2007-05-09 Thread Andreas Guther
I am in the process to migrate from Lucene 2.0 to Lucene 2.1. >From reading the Changes document I understand that the write locks are now written into the index folder instead of the java.io.tmpdir. In the "Apache Lucene - Index File Formats" document in section "6.2 Lock File" I read that ther

RE: Locking in Lucene 2.1

2007-05-09 Thread Andreas Guther
I opened an issue: https://issues.apache.org/jira/browse/LUCENE-877 -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 09, 2007 1:37 PM To: java-user@lucene.apache.org Subject: Re: Locking in Lucene 2.1 On Wednesday 09 May 2007 21:18, Andreas Guther

How to re-open the IndexSearcher's IndexReader

2007-05-10 Thread Andreas Guther
Hi, How can I re-use an IndexSearcher and keep track of changes to the index? I am dealing with Index Directories of several GB. Opening and IndexSearcher is very expensive and can take several seconds. Therefore I am caching the IndexSearcher for re-use. Our indexes are frequently updated.

RE: How to re-open the IndexSearcher's IndexReader

2007-05-10 Thread Andreas Guther
Maybe I should add that I am currently using Lucene 2.0. From other threads I get the impression that this might be solved in Lucene 2.1. -Original Message- From: Andreas Guther [mailto:[EMAIL PROTECTED] Sent: Thursday, May 10, 2007 10:33 PM To: java-user@lucene.apache.org Subject: How

IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread Andreas Guther
I moved today from Lucene 2.0 to 2.1 and I noticed that the IndexReader.isCurrent() call is very expensive. What took 20 milliseconds in 2.0 now takes seconds in 2.1. I have the following scenario: - 7 index directories of different size, ranging from some MB to 5 GIG - Some index are upgraded

RE: IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread Andreas Guther
We have everything on Windows NTFS. Our index folders are on a server and accessed via shared drive. I haven't optimized the folders yet but after doing optimization on a test folder I noticed that we have very little files left. That might help. I am going to optimize all folders now and then

RE: IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread Andreas Guther
Chris, I have optimized our index directories using the compound index format. I have also moved the index directories for testing purposes local to the search process (before it was over network and shared NTFS file system). Now the time for getting the isCurrent information is negligible, i.e.

Re: IndexReader.isCurrent very slow in 2.1

2007-05-12 Thread Andreas Guther
Erick On 5/11/07, Andreas Guther <[EMAIL PROTECTED]> wrote: > > Chris, > > I have optimized our index directories using the compound index format. > I have also moved the index directories for testing purposes local to > the search process (before it was over network

Field.Store.Compress - does it improve performance of document reads?

2007-05-16 Thread Andreas Guther
I am currently exploring how to solve performance problems I encounter with Lucene document reads. We have amongst other fields one field (default) storing all searchable fields. This field can become of considerable size since we are indexing documents and store the content for display within

Re: Field.Store.Compress - does it improve performance of document reads?

2007-05-17 Thread Andreas Guther
y 17 May 2007 08:10, Andreas Guther wrote: >> I am currently exploring how to solve performance problems I >> encounter with >> Lucene document reads. >> >> We have amongst other fields one field (default) storing all >> searchable >> fields. This f

Re: Field.Store.Compress - does it improve performance of document reads?

2007-05-17 Thread Andreas Guther
wrote: - Original Message From: Paul Elschot <[EMAIL PROTECTED]> On Thursday 17 May 2007 08:10, Andreas Guther wrote: > I am currently exploring how to solve performance problems I encounter with > Lucene document reads. > > We have amongst other fields one field (default

Re: Field.Store.Compress - does it improve performance of document reads?

2007-05-20 Thread Andreas Guther
Original Message > From: Paul Elschot <[EMAIL PROTECTED]> > > On Thursday 17 May 2007 08:10, Andreas Guther wrote: > > I am currently exploring how to solve performance problems I encounter with > > Lucene document reads. > > > > We have amongst other f

How to filter fields with hits from result set

2007-05-23 Thread Andreas Guther
Hi, If a search returns a document that has multiple fields with the same name, is there a way to filter only those fields that contain hits? Background: I am indexing documents and we store all content in our index for display reasons. We want to show only those pages containing hits. My fir

RE: How to filter fields with hits from result set

2007-05-23 Thread Andreas Guther
On 5/23/07, Andreas Guther <[EMAIL PROTECTED]> wrote: > > Hi, > > If a search returns a document that has multiple fields with the same > name, is there a way to filter only those fields that contain hits? > > > Background: > > I am indexing documents and we sto

RE: How to filter fields with hits from result set

2007-05-24 Thread Andreas Guther
ents of the application. Any time you can avoid supporting arbitrary boolean logic for the user input, your job is easier But you should be able to run up a demo with simple queries that you control to prove out the methodology in any case. Best Erick On 5/23/07, Andreas Guther <[EMAIL PRO

Re: How to filter fields with hits from result set

2007-05-26 Thread Andreas Guther
of others under the unlikely subject "multiword highlighting" for Marks cut at the code and other helpful comments. I can't really give you performance numbers since we didn't collect them. It's "fast enough" that the customers aren't complaining, and the

Re: Using Lucene to search Multiple Databases

2007-06-17 Thread Andreas Guther
Rajat, I don't know about the Web Interface you are mentioning but the task can be done with a little bit coding from your side. I would suggest indexing each database in its own index which allows to keep the access easily controlled. To find matches you will need to use a Multi Searcher. All

Re: Lucene index performance

2007-06-17 Thread Andreas Guther
Searching on multiple index files is incredible fast. We have 10 different index folders with different sizes. All folders together have a size of 7 GB. Results come back usual within less than 50 ms. Getting results out of the index i.e. reading documents is expensive and you will have to spe

Luke Enhancement Suggestions

2007-06-22 Thread Andreas Guther
Hi, I am using your tool a lot and it helps me tremendously analyzing our different indexes. I highly appreciate your work and effort you put into this tool. This is an enhancement suggestion. It would be great if Luke could remember the following: 1) Which Analyzer last time was selected. I

RE: Lucene index performance

2007-06-22 Thread Andreas Guther
1000. 2) Are these index files located in a single machine or distributed into multiple machines? 3) How do you distribute the document into several index files? Thanks a lot, Li -Original Message- From: Andreas Guther [mailto:[EMAIL PROTECTED] Sent: Monday, June 18, 2007 4:00 AM To