RE: Setting the COMMIT lock timeout.

2006-03-14 Thread Jim Bedford-roberts
Thanks for your prompt response! You ask about the use case. We have a series 
of similar intranet sites, each represented by a separate tomcat application 
instance using the same code base but with different start-up parameters. The 
intranets all provide a common search function based on the same underlying 
index.

Admittedly we could have developed a single central search component, but given 
the way the code has evolved our current approach is simplest for us. With 
separate application instances sharing access to the same index we are getting 
occasional COMMIT lock time outs even while using singleton IndexSearchers in 
each application. 

-Original Message-
From: Daniel Naber [mailto:[EMAIL PROTECTED] 
Sent: 13 March 2006 23:23
To: java-user@lucene.apache.org
Subject: Re: Setting the COMMIT lock timeout.

On Montag 13 März 2006 22:24, Bill Janssen wrote:

> The default value isn't magic.  The appropriate value is
> context-specific.  I've got some people using Lucene on machines with
> slow disks, and we need to be able to increase the WRITE_LOCK_TIMEOUT
> to prevent entirely random lossage.

Here's a patch (I hope it gets through). Let me know if it's okay, I will 
commit it then.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



who can tell me how lucene search in the index files

2006-03-14 Thread hu andy
I see there are seven different files with extentions .fnm .tis and etc. I
just can't make sure how it looks up in the .tis file. Does lucene use
Binary-Search to locate the term?


Write.lock error with spellchecker

2006-03-14 Thread Madhusudan, Veda \(Norcross, DAV\)
I am trying to use the spellchecker plugin with Lucene 1.2. I get the
following exception when my SpellIndexer class tries to create the spell
index. The new directory is being created with all the correct
permissions. There is no write.lock file being created. Has someone run
into similar issue? Does this have to do with lucene1.2?

 

Exception in thread "main" java.io.IOException: couldn't delete
write.lock

at org.apache.lucene.store.FSDirectory.deleteFile(Unknown
Source)

at org.apache.lucene.index.IndexReader.unlock(Unknown Source)

at
org.apache.lucene.search.spell.SpellChecker.indexDictionnary(Unknown
Source)

at
com.unisource.ecom.search.lucene.SpellIndexer.createSpellIndex(SpellInde
xer.java:35)

at
com.unisource.ecom.search.lucene.SpellIndexer.main(SpellIndexer.java:56)

 

Thanks,

Veda



IndexFiles.java

2006-03-14 Thread Miki Sun
Hiya

I am a beginner of Lucene. I try to use IndexFiles.java to index my
text file directories, but it does not work. It always give me this
error message even when I comment it out:

Usage: java org.apache.lucene.demo.IndexFiles 

What does "if (args.length == 0) " mean?

Thanks

Miki

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching in paths

2006-03-14 Thread Java Programmer
Hello,
I have problem with indexing / quering paths eg I put
"/home/users/apache/txt/qqq__docu.txt" in field called "path", I wanted to
submit query to find all documents which are provided by my user apache, so
i tried to query Lucene as AND path:/home/users/* but not results were find
by such query if I asked any other field without / the results are provided
eg AND title natio*.
Where am I doing mistake? What I can do to ask for paths (and all what is
below of them)?

Best Regards,
Adr


Re: IndexFiles.java

2006-03-14 Thread Otis Gospodnetic
It looks like you are not specifying the directory you want to index.

Otis

- Original Message 
From: Miki Sun <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, March 14, 2006 11:27:04 AM
Subject: IndexFiles.java

Hiya

I am a beginner of Lucene. I try to use IndexFiles.java to index my
text file directories, but it does not work. It always give me this
error message even when I comment it out:

Usage: java org.apache.lucene.demo.IndexFiles 

What does "if (args.length == 0) " mean?

Thanks

Miki

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFiles.java

2006-03-14 Thread Miki Sun
I think I did. I modified these code:

//creat a directory to write the indices to
static final File INDEX_DIR = new File(File.separator + "Bible_index");

//specify the directory to be indexed
final File docDir = new File(File.separator + "Bible/1/");

Whereever else should I change?

Thanks a lot!

On 14/03/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> It looks like you are not specifying the directory you want to index.
>
> Otis
>
> - Original Message 
> From: Miki Sun <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Tuesday, March 14, 2006 11:27:04 AM
> Subject: IndexFiles.java
>
> Hiya
>
> I am a beginner of Lucene. I try to use IndexFiles.java to index my
> text file directories, but it does not work. It always give me this
> error message even when I comment it out:
>
> Usage: java org.apache.lucene.demo.IndexFiles 
>
> What does "if (args.length == 0) " mean?
>
> Thanks
>
> Miki
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Smartweb Technologies Centre
School of Computing
St Andrew Street
Aberdeen AB25 1HG
Tel: +44 (0)1224 - 262479
Web: http://athena.comp.rgu.ac.uk/staff/ms/
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFiles.java

2006-03-14 Thread Joe Scanlon
you need to specify it from the command line

ie, java org.apache.lucene.demo.IndexFile 'type in your starting directory
here'


On 3/14/06, Miki Sun <[EMAIL PROTECTED]> wrote:
>
> I think I did. I modified these code:
>
> //creat a directory to write the indices to
> static final File INDEX_DIR = new File(File.separator + "Bible_index");
>
> //specify the directory to be indexed
> final File docDir = new File(File.separator + "Bible/1/");
>
> Whereever else should I change?
>
> Thanks a lot!
>
> On 14/03/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> > It looks like you are not specifying the directory you want to index.
> >
> > Otis
> >
> > - Original Message 
> > From: Miki Sun <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Sent: Tuesday, March 14, 2006 11:27:04 AM
> > Subject: IndexFiles.java
> >
> > Hiya
> >
> > I am a beginner of Lucene. I try to use IndexFiles.java to index my
> > text file directories, but it does not work. It always give me this
> > error message even when I comment it out:
> >
> > Usage: java org.apache.lucene.demo.IndexFiles 
> >
> > What does "if (args.length == 0) " mean?
> >
> > Thanks
> >
> > Miki
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>
> --
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> Smartweb Technologies Centre
> School of Computing
> St Andrew Street
> Aberdeen AB25 1HG
> Tel: +44 (0)1224 - 262479
> Web: http://athena.comp.rgu.ac.uk/staff/ms/
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


Re: IndexFiles.java

2006-03-14 Thread Miki Sun
How do you do it using Kawa? I am not familar with command line operations.

Thanks

On 14/03/06, Joe Scanlon <[EMAIL PROTECTED]> wrote:
> you need to specify it from the command line
>
> ie, java org.apache.lucene.demo.IndexFile 'type in your starting directory
> here'
>
>
> On 3/14/06, Miki Sun <[EMAIL PROTECTED]> wrote:
> >
> > I think I did. I modified these code:
> >
> > //creat a directory to write the indices to
> > static final File INDEX_DIR = new File(File.separator + "Bible_index");
> >
> > //specify the directory to be indexed
> > final File docDir = new File(File.separator + "Bible/1/");
> >
> > Whereever else should I change?
> >
> > Thanks a lot!
> >
> > On 14/03/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> > > It looks like you are not specifying the directory you want to index.
> > >
> > > Otis
> > >
> > > - Original Message 
> > > From: Miki Sun <[EMAIL PROTECTED]>
> > > To: java-user@lucene.apache.org
> > > Sent: Tuesday, March 14, 2006 11:27:04 AM
> > > Subject: IndexFiles.java
> > >
> > > Hiya
> > >
> > > I am a beginner of Lucene. I try to use IndexFiles.java to index my
> > > text file directories, but it does not work. It always give me this
> > > error message even when I comment it out:
> > >
> > > Usage: java org.apache.lucene.demo.IndexFiles 
> > >
> > > What does "if (args.length == 0) " mean?
> > >
> > > Thanks
> > >
> > > Miki
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> >
> > --
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> > Smartweb Technologies Centre
> > School of Computing
> > St Andrew Street
> > Aberdeen AB25 1HG
> > Tel: +44 (0)1224 - 262479
> > Web: http://athena.comp.rgu.ac.uk/staff/ms/
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>


--
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Smartweb Technologies Centre
School of Computing
St Andrew Street
Aberdeen AB25 1HG
Tel: +44 (0)1224 - 262479
Web: http://athena.comp.rgu.ac.uk/staff/ms/
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Searching in paths

2006-03-14 Thread Mordo, Aviran (EXP N-NANNATEK)
You need to index the field as a keyword, or use an analyzer that will
not strip the / from the string

Aviran
http://www.aviransplace.com 

-Original Message-
From: Java Programmer [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 14, 2006 11:28 AM
To: java-user@lucene.apache.org
Subject: Searching in paths

Hello,
I have problem with indexing / quering paths eg I put
"/home/users/apache/txt/qqq__docu.txt" in field called "path", I wanted
to submit query to find all documents which are provided by my user
apache, so i tried to query Lucene as AND path:/home/users/* but not
results were find by such query if I asked any other field without / the
results are provided eg AND title natio*.
Where am I doing mistake? What I can do to ask for paths (and all what
is below of them)?

Best Regards,
Adr



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Setting the COMMIT lock timeout.

2006-03-14 Thread Daniel Naber
On Dienstag 14 März 2006 10:52, Jim Bedford-roberts wrote:

> Admittedly we could have developed a single central search component,
> but given the way the code has evolved our current approach is simplest
> for us. With separate application instances sharing access to the same
> index we are getting occasional COMMIT lock time outs even while using
> singleton IndexSearchers in each application.

Have you already tried using Lucene 1.9 without my patch? Because there was 
another bug in 1.4 that made the default timeout not work. From the 
changelog:

7. Getting a lock file with Lock.obtain(long) was supposed to wait for
a given amount of milliseconds, but this didn't work.
(John Wang via Daniel Naber, Bug #33799)

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Good MMapDirectory performance

2006-03-14 Thread Peter Keegan
- I read from Peter Keegan's recent postings:
- "The Lucene server is using MMapDirectory. I'm running
-  the jvm with -Xmx16000M. Peak memory usage of the jvm
-  on Linux is about 6GB and 7.8GB on windows."
- We don't have nearly as much memory as Peter but I
- wonder whether he is gaining anything with such
- a large heap.

My application gets better throughput with more VM, but that is probably due
to heavy use of ByteBuffers in the application, not VM for Lucene.

Peter



On 3/12/06, kent.fitch <[EMAIL PROTECTED]> wrote:
>
> I thought I'd post some good news about MMapDirectory as
> the comments in the release notes are quite downbeat about
> its performance.  In some environments MMapDirectory
> provides a big improvement.
>
> Our test application is an index of 11.4 million
> documents which are derived from MARC (bibliographic)
> catalogue records.  Our aim is to build a system
> to demonstrate relevance ranking and result clustering
> for library union catalogue searching (a "union"
> catalogue accumulates/merges records from multiple
> ibraries).
>
> Our main index component sizes:
> fdt 17GB
> fdx 91MB
> tis 82MB
> frq 45MB
> prx 11MB
> tii 1.2 MB
>
> We have a separate Lucence index (not discussed further)
> which stores the MARC records.
>
> Each document has many fields.   We'll probably reduce the
> number after we decide on the best search strategies, but
> lots of fields gives us lots of flexability whilst testing
> search and ranking strategies.
>
> Stored and unindexed fields, used for summary results:
>   display title
>   display author
>   display publication details
>   holdingsCount (number of libraries holding)
>
> Tokenized indices:
>   title
>   author
>   subject
>   genre
>   keyword (all text)
>
> Keyword (untokenized) indices:
>   title
>   author
>   subject
>   genre
>   audience
>   Dewey/LC classification
>   language
>   isbn/issn
>   publication date (date range code)
>   unique bibliographic id
>
> "Wildcard" Tokenized indices created by a custom "stub"
> analyzer which reduces a term to its first few characters:
>   title
>   author
>   subject
>   keyword
>
> Field boosts are set for some fields.  For example, "title"
> "sub title", "series title", "component title" are all
> stored as "title" but with different field boosts (as a
> match on normal title is deemed more relevant than a match
> on series title).
>
> The document boost is set to the sqrt of the holdingsCount
> (favouring "popular" resources).
>
> The user interface supports searching and refining searches
> on specific fields but the most common search is created
> from a single google style search box.  Here's a typical
> query generated from a 2 word search:
>
> +(titleWords:"franz kafka^4.0"
>   authorWords:"franz kafka^3.0"
>   subjectWords:"franz kafka^3.0"
>   keywords:"franz kafka^1.4"
>   title:franz kafka^4.0
>   (+titleWords:franz +titleWords:kafka^3.0)
>   author:franz kafka^3.0
>   +authorWords:franz +authorWords:kafka^2.0)
>   subject:franz kafka^3.0
>   (+subjectWords:franz +subjectWords:kafka^1.5)
>   (+genreWords:franz +genreWords:kafka^2.0)
>   (+keywords:franz +keywords:kafka)
>   (+titleWildcard:fra +titleWildcard:kaf^0.7)
>   (+authorWildcard:fra +authorWildcard:kaf^0.7)
>   (+subjectWildcard:fra +subjectWildcard:kaf^0.7)
>   (+keywordWildcard:fra +keywordWildcard:kaf^0.2)
> )
>
> It generated 1635 hits.  We then read the first 700
> documents in the hit list and extract the date, subject,
> author, genre, Dewey/LC classification and audience
> fields for each, accumulating the popularity of each.
>
> Using this data, for each of the subject, author, genre,
> Dewey/LC and audience categories, we find the 30 most
> popular field values and for each of these we query the
> index to find their frequency in the entire index.
>
> We then render the first 100 document results (title,
> author, publication details, holdings) and the top 30
> for each of subject, author, genre, Dewey/KC and audience,
> ordering each list by the popularity of the term in the
> hit results (sample of the first 700) and rendering the
> size of the text based on the frequency of the term in
> the entire database (a bit like the Flickr tag popularity
> lists).  We also render a graph of hit results by date
> range.
>
> The initial search is very quick - typically a small
> number of tens of millsecs.  The "clustering" takes
> much longer - reading up to 700 records, extracting
> all those fields, sorting to get the top 30 of each
> field category, looking up the frequency of each term
> in the database.
>
> The test machine was a SunFire440 with 2 x 1.593GHz
> UltraSPARC-IIIi processors and 8GB of memory running
> Solaris 9, Java 1.5 in 64 bit mode, Jetty. The Lucene data
> directory is stored on a local 10K SCSI disk.
>
> The benchmark consisted of running 13,142 representative
> and unique search phrases collected from another system.
> The search phrases are unsorted.  The client (testing)
> system is run on a

Re: who can tell me how lucene search in the index files

2006-03-14 Thread Daniel Noll

hu andy wrote:

I see there are seven different files with extentions .fnm .tis and etc. I
just can't make sure how it looks up in the .tis file. Does lucene use
Binary-Search to locate the term?


See TermInfosReader.

It loads the .tii file into memory, which contains one in every N 
entries of the .tis file and points into the real locations in the .tis 
file.


When Lucene looks for a term, it does a binary search through this 
reduced index to find which segment of the .tis file the term is in, and 
then scans through the .tis file linearly until it finds the term.


Daniel

--
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699
Web: http://www.nuix.com.au/Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Add a module to the lucene

2006-03-14 Thread jason
Hi,

Can we add a module to lucene so that we are able to use our own similarity
measure to calculate the similarity between documents and queries? As lucene
has defined its own measure, we can do few with it.

Considering the documents and queries represented as the vectors, we only
need one class to read the vectors and use our own defined measure to
calculate their similarity.

How do you think of it?

regards
jason


Add a module to the lucene!!!

2006-03-14 Thread jason
 Hi,

Can we add a module to lucene so that we are able to use our own similarity
measure to calculate the similarity between documents and queries? As lucene
has defined its own measure, we can do few with it.

Considering the documents and queries represented as the vectors, we only
need one class to read the vectors and use our own defined measure to
calculate their similarity.

How do you think of it?

regards
jason


Add more module to the lucene

2006-03-14 Thread jason
Hi,

Can we add more module to the lucene so that we can easily use our own
measures to calculate similarity between documents and queries? I have read
some codes of the original lucene, i dont think it is easy to change the
similarity measure used. But i think we can build a module which can read
the vectors of documents from the index structure. Then, we can use our own
similarity measures.


FYI.

Regards

jason.


lucene query analysis

2006-03-14 Thread Raghavendra Prabhu
Hi

The problem which i am facing is that the query is Case Sensitive

If i type in BIG letters i am not able to see answers and if  i type in
small letters i am able to see results

Is there anything by which i can do a case conversion

Now i am using a WhiteSpaceAnalyser . What Analyser should change it to ?


Rgds
Prabhu