Re: Efficient document spooling and indexing

2001-11-22 Thread Otis Gospodnetic

Yes, I think that you are correct, since I see my index directory
growing as I add documents to the index, even though I don't call
close() until I'm finished adding all documents.
Hm, I wonder what exactly gets written to the disk between add and
close.
I shall rewrite my stuff to use RAMDirectory then.  I like efficient
code and hate wasting any kind of resources, not just computing ones.

Thanks,
Otis


--- Ian Lea [EMAIL PROTECTED] wrote:
 Data may not be committed to disk, buffers flushed, files
 closed, etc. until IndexWriter.close() is called, but file
 IO does happen before then.  So I would expect the answer
 to your question to be no.
 
 
 --
 Ian.
 [EMAIL PROTECTED]
 
 
 Otis Gospodnetic wrote:
  
  Hello,
  
  This is from a thread from about 2 weeks ago.
  What is the answer to this question?
  If data is written to disk only when IndexWriter's close() is
 called,
  wouldn't the sample code below be as efficient as the sample code
 that
  uses RAMDirectory, further down?
  
  Thanks,
  Otis
  
  
  When using the FSWriter, the actual file io doesn't occur until I
 close
  the writer, right?  So wouldn't it be just as efficient to do the
  following:
  
  IndexWriter fsWriter = new IndexWriter(new File(...), analyzer,
 false);
while (... more docs to index...)
  ... add 100,000 docs to fsWriter ...
}
fsWriter.optimize();
fsWriter.close();
  
  -Original Message-
  From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
  Sent: Friday, November 02, 2001 10:47 AM
  To: 'Lucene Users List'
  Subject: RE: Indexing problem
  
  Well, I don't know if there's an archive of the list, so this what
 Doug
  wrote: 
  A more efficient and slightly more complex approach would be to
 build
  large
  indexes in RAM, and copy them to disk with IndexWriter.addIndexes:
IndexWriter fsWriter = new IndexWriter(new File(...), analyzer,
  true);
while (... more docs to index...)
  RAMDirectory ramDir = new RAMDirectory();
  IndexWriter ramWriter = new IndexWriter(ramDir, analyzer,
 true);
  ... add 100,000 docs to ramWriter ...
  ramWriter.optimize();
  ramWriter.close();
  fsWriter.addIndexes(new Directory[] { ramDir });
}
fsWriter.optimize();
fsWriter.close();
  
  
  Scott
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Order of Package Compilation

2001-11-28 Thread Otis Gospodnetic

Why not just use Ant to build Lucene?

Otis

--- srinivasa v [EMAIL PROTECTED] wrote:
 
 Hi all,
 
 I got the lucene source files, When I started to compile all
 packages again in some order, it is giving some error saying 
 some classnot foundthe order in which I compiled is given below.
 
 com\lucene\store\*.java
 com\lucene\util\*.java
 com\lucene\document\*.java
 com\lucene\analysis\standard\*.java
 com\lucene\analysis\*.java
 com\lucene\index\*.java
 com\lucene\search\*.java
 com\lucene\queryParser\*.java
 
 I hope the order may be wrong, if yes in what order i have to compile
 ?
 Plese help me.
 
 Thanks in Advance
 Srini
 
 
 
 -
 Do You Yahoo!?
 Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.


__
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Indexing other documents type than html and txt

2001-11-29 Thread Otis Gospodnetic

You'd have to write parsers for each of those document types to convert
it to text and then index it.
Sure, you can feed it something like XML, but then you may consider
something like xmldb.org instead.

Otis

--- Antonio Vazquez [EMAIL PROTECTED] wrote:
 
 Hi all,
 I have a doubt. I know that lucene can index html and text documents,
 but
 can it index other type of documents like pdf,docs, and xls
 documents? if it
 can, how can I implement it? Perhaps can implement it like html and
 txt
 indexing?
 
 regards
 
 Antonio
 
 
 _
 Do You Yahoo!?
 Get your free @yahoo.com address at http://mail.yahoo.com
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Industry Use of Lucene?

2001-12-06 Thread Otis Gospodnetic

It looks like a person at Overture (former Goto.com) is using it.
I know ScreamingMedia.com used it at one point.

Otis

--- Jeff Kunkle [EMAIL PROTECTED] wrote:
 Does anyone know of any companies or agencies using Lucene for their
 products/projects?  I am attempting to make a marketing pitch for
 Lucene to
 my manager and I know one of the first questions will be, Who else
 is using
 it?  I know Lucene is a very powerful, fast, and flexible full-text
 search
 engine but my manager will need a little more coercing.  Any help on
 this
 topic is greatly appreciated.
 
 Thanks,
 Jeff
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: FW: Installation notes

2001-12-06 Thread Otis Gospodnetic

You need to download and install JavaCC.
Try this:
http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=javaccq=b

Otis


--- Patrick Codere [EMAIL PROTECTED] wrote:
 
  Dear All,
  
I just downloaded the latest version of Lucene, and not being to
  familiar with java, I would like to get some help on installing it.
  I
  downloaded it, and using ant I got the following message: could
 not
  create task of type : javacc..  What does this mean?
  
  Please Help.
  
  Thanks. 
 


__
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: existing or not existing

2001-12-06 Thread Otis Gospodnetic

Yes, I would use this, especially the IndexReader methods that you
suggested.

Otis

--- Doug Cutting [EMAIL PROTECTED] wrote:
  From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
  
  You could try looking for a segments file in the index directory.
  If it exists, the index exists, else it does not.
  
  Is there a better way?
 
 I think that's currently the best way.  But it's not great, because
 it
 requires applications to know something about the internal structure
 of the
 index.
 
 Going forward, I'm hesitant to change the semantics of the 'create'
 flag.
 I'm also hesitant to add another flag or constructor method.
 
 Perhaps the addition of the following IndexReader methods would
 suffice:
 
   /** Returns true iff an index exists in the named directory. */
   public static boolean indexExists(String directory);
   public static boolean indexExists(File directory);
   public static boolean indexExists(Directory directory);
 
 These are analogous to the 'lastModified' methods. Internally these
 would
 just check for the existence of the segments file.
 
 Does that sound like a good plan?
 
 Another place that currently requires application knowledge of index
 structure is failure recovery.  Currently if an indexing application
 crashes
 it may leave .lock files in the directory which must be removed
 before the
 index can be altered again.  Perhaps this can be resolved similarly
 by
 adding methods like:
 
   /** Returns true iff the index in the named directory is currently
 locked.*/
   public static boolean isLocked(Directory directory);
 
   /** Forcibly unlocks the index in the named directory. 
* Caution: this should only be used by failure recovery code,
* when it is known that no other process or thread is in fact
* currently accessing this index.
*/
   public static void unlock(Directory directory);
 
 We could also have String and File versions for convenience.
 
 Would folks use something like this?  If so, more fodder for the TODO
 list!
 
 Doug
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: WildcardQuery

2001-12-11 Thread Otis Gospodnetic

If I understand you correctly, you tried to search for '*new*'.  I
believe you can't use an asterisk (*) as the first query of the query
term. So, new* is valid, while *new or *new* is not.

Otis

--- Serge A. Redchuk [EMAIL PROTECTED] wrote:
 Hello sampreet,
 
 Tuesday, December 11, 2001, 6:44:29 AM, you wrote:
 
 sic Hi All,
 
 sic This must be simple enough, but can anyone please explain me
 when a
 sic WildcardQuery is created in QueryParser i.e. what special
 characters in the
 sic query string are required to build a WildcardQuery within
 QueryParser?
 
 Moreover, when I achieved complex search like this: path:*new*
 comp*
 by combining WildcardQueries in BooleanQuery (NOT BY QueryParser),
 and
 then got that query using boolq.toString(...); - the QueryParser
 COULD
 NOT parse this string !!!
 
 Is not it strange ? :
 
QueryParser.parse( bquery.toString( ... ) )   - do not work
 :-(
 
 -- 
 Best regards,
  Sergemailto:[EMAIL PROTECTED]
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: continue ideo-logic error in QueryParser and in BooleanQuery !

2001-12-17 Thread Otis Gospodnetic

Actually,

I do not think this is a bug.
You cannot make searches with queries that have only the NOT part.
You cannot ask Lucene to match all documents that do not contain a
certain term.  For instance, issuing a 'NOT pretty' will not return
doc1, doc3, doc4.
You have to use that NOT pretty in combination with something else
(AND).  For instance 'love AND NOT pretty' should return doc1.

I was about to say that you can check what other search engines do when
you give them just the negation, so I tried av.com and google.com. 
AltaVista does return a bunch of matches, but Google doesn't let you
enter such a query.

Otis


--- Serge A. Redchuk [EMAIL PROTECTED] wrote:
 ..
   Let we have 4 docs:
   doc1: Love is life
   doc2: Java is pretty nice language
   doc3: C++ is powerful, but unsafe
   doc4: Onion and love sometimes are not compatoble
 
   So, if search for love OR NOT onion
 
 Here I was wrong:  (nevertheless it not solve described bug)
   result must be: doc1, doc2, doc3.
 must be:
   result must be: doc1, doc2, doc3, doc4.  (ALL)
 .
 
 Certainly I understand that people will not compose such complex
 queries to search for ALL,
 but lucene still do not finds all.
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: DateFilter and NullPointerException

2001-12-17 Thread Otis Gospodnetic

Hm, do you know which line in DateFilter.java this NPE comes from?
Could you try compiling Lucene with the -g switch so that we can see
the line numbers in the exception stack trace?

If you want you can also submit a bug report at
http://nagoya.apache.org/bugzilla/

Thanks,
Otis

--- Uro¹_Jurgliè [EMAIL PROTECTED] wrote:
 I'm having a problem when using Query and DateFilter for a search. If
 I
 create DateFilter with DateFilter.After with current timedate as
 parameter,
 then I get NullPointerException when executing search
 (Searcher.search(Query, DateFilter)). Had anyone experienced
 something like
 that? If I set time just a bit in past, it returns empty hits which
 is how
 it should behave all the time.
 
 code snipet:
   // I have java files as documents, consisting of content
 (Field.Text()) and modified (Field.Keyword())
   Query q = new WildcardQuery(new Term(content, packag*));
   DateFilter df = DateFilter.After(modified,
 Calendar.getInstance().getTime());
   Searcher searcher = new IndexSearcher(path);
   Hits hits = searcher.search(q, df); // line 66
 
 exception:
 Exception in thread main java.lang.NullPointerException
 at org.apache.lucene.search.DateFilter.bits(Unknown Source)
 at org.apache.lucene.search.IndexSearcher.search(Unknown
 Source)
 at org.apache.lucene.search.Hits.getMoreDocs(Unknown Source)
 at org.apache.lucene.search.Hits.init(Unknown Source)
 at org.apache.lucene.search.Searcher.search(Unknown Source)
 at Search.main(Search.java:66)
 
 Regards,
 Uros.
 


__
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Using a DateFilter without a query

2001-12-17 Thread Otis Gospodnetic

Hello,

--- Jan_Stövesand [EMAIL PROTECTED] wrote:
 Hi,
 
 is it possible to use a DateFilter without a query. I would like to
 get all
 Documents from within a certain period of time WITHOUT specifying any
 query except the range of dates.

I don't know, but I'd like to know.  Have you tried it?

 Is there something like query that will always return all documents
 from an index?

This has been asked in the past.  It can't be done, but you could work
around it by adding a field with a known, constant value to each
document.  Then searching for that will give you all documents in the
index.
Is there a better way?

Otis


__
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: IndexReader/IndexSearcher

2001-12-19 Thread Otis Gospodnetic

Uh, I don't repeat myself, but I'll repeat others' words :)

It is the analyzer (StandardAnalyzer, I believe) that lowercases text
before indexing it.
If you use the same analyzer to search it will lowercase text before
performing a search, so you'll find the document with bo23 in it even
if you use BO23 in the search.

Otis

--- Mike Baroukh [EMAIL PROTECTED] wrote:
 I reply to myself :
 
 It seem that when using IndexReader, keywords must be lower case.
 So, I indexed BO23, I can search BO23 with IndexSearcher, but I must
 use
 bo23 to search with IndexReader.
 
 Am I right ?
 
 Mike
 
 - Original Message -
 From: Mike Baroukh [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Wednesday, December 19, 2001 12:57 PM
 Subject: IndexReader/IndexSearcher
 
 
 
  Hi all.
 
  Can somebody tell me where is my error.
  There is something I don't understand.
 
  If I search something with
 
  IndexReader indexReader = IndexReader.open(/myindex);
  TermDocs docs = indexReader.termDocs(new Term(codman,
 BO23));
  while ( (docs!=null)  (docs.next()) ) {
  nbis++;
  }
  if (docs!=null) docs.close();
  indexReader.close();
 
  I see that nbis = 0 so temDocs returned nothing.
 
  But, If I use
 
  SimpleAnalyzer analyzer = new SimpleAnalyzer();
  IndexSearcher indexSearcher = new IndexSearcher(/myindex);
  Query query = QueryParser.parse(BO23, codman, analyzer);
  Hits hits = indexSearcher.search(query);
  nbis = hits.length();
 
  It's exactly the same query, the same index but this time, it
 return 1
  document.
 
  I don't understand where this difference came from ?
  I know that the firs way is not the good way of searching but what
 I wan't
  is to delete from the index the document returned wy the search #2.
 
  Thanks in advance for any help.
 
  Mike
 
 
 
  --
  To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: About indexing

2002-01-11 Thread Otis Gospodnetic

Parag,

I'm not sure if I understood your question correctly, but it seems like
you want to create a Field that holds the path information (e.g.
TEST/subdir1 or TEST/subdir2, and so on), and then include that in the
query based on which path(s) you want to search.
You could use TEST to search just TEST, TEST/subdir1 to search just
TEST/subdir1, or TEST* to search everything under TEST.

Otis

--- Parag Dharmadhikari [EMAIL PROTECTED] wrote:
 Hi all,
 
 If I will create the index of files in different thread (which may be
 invoked at any time)then is it possible to index on files from the
 root
 directory and then selectively search on the different path on
 created
 index.
 
 For example first I will index from root directiory say , TEST. Then
 depending on the selected directory path (which will be resides
 inside the
 root directory TEST) I will search on the created index.
 
 Thanx in advance
 
 regards
 parag
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: I want to search on BOTH -- (1) XML data and (2) Text data.

2002-01-12 Thread Otis Gospodnetic

Hello,

You could write an XML parser (see http://xml.apache.org/ for some XML
tools) and store XML elements as Fields in Lucene Documents.
To search for 'Hello' and 'Hello Mr. President!' you can store the
whole article body as a Text (or maybe UnStored) Field.
You can also look on www.mail-archive.com and search this list's
archive for some related discussions.  Try searching for Philip Ogren
(I think I got the name right), he sent some code that lets you go from
XML - Lucene Document quickly, I think.

Otis


--- Harun Altay [EMAIL PROTECTED] wrote:
 Hello Friends,
 
 I want to search on BOTH -- (1) XML data and (2) Text data.
 
 
 (1). Text Data -- mostly consist of HTML pages, residing on the
 server...
 example : hundreds of HTML, TXT file, etc...
 
 
 (2). XML Data -- for example, Articles that was stored in XML
 format, lets say like this :
 
 article
 article code     /article code
 article title     /article title
 author   /author
 date ... /date
 etc ... /etc
 
 body of th eTEXT
 .
 .. the article body, TEXT ..
 .
 .
 .
 .
 /body of th eTEXT
 
 /article
 
 In this type of search, we need to search this XML-based author
 file in two different ways :
 2.a. First Way of searching : Searching XML file through its
 KEYWORDS, like : date = Jan-01-2002 and author = George
 Washington
 2.b. Second Way of Searching : Free search on the article body.
 For example : All the articles, whose body has the word 'Hello', or
 the sentence 'Hello Mr. President!' 
 
 
 Note-1:
 
 XML file may reside either Operating System level, or in a
 XML-supporting DATABASE, as well.
 
 
 Note-2:
 
 If I need to have them, I can write extra java classes to support
 some more functionality, if possible...
 
 
 Thank you very much,
 Harun.
 
 
 
 
 
 


__
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Anyone run Linux JVM 1.4 Beta 3 with lucene ?

2002-01-14 Thread Otis Gospodnetic

Oui :)

Otis

--- Winton Davies [EMAIL PROTECTED] wrote:
 Hi guys,
 
   I'm getting stung by JVM  1.3.1_01 on Linux, max allocation of heap
 
 is about 1.9 gb. Anyway, I'm thinking of going to 1.4 ? Anyone run 
 Lucene under this beta ?
 
   Cheers,
   Winton
 
 
 -- 
 
 Winton Davies
 Lead Engineer, Overture (NSDQ: OVER)
 1820 Gateway Drive, Suite 360
 San Mateo, CA 94404
 work: (650) 403-2259
 cell: (650) 867-1598
 http://www.overture.com/
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: My own steammer (brazilian)

2002-02-13 Thread Otis Gospodnetic

That file is created during the build process.
Try building Lucene by typing 'ant compile'.

Otis

--- Bizu_de_Anúncio [EMAIL PROTECTED] wrote:
   My brazilian steammer has the same structure as the German steammer,
 except
 for the inner logic.
 
   I created it , tested it and now I'm trying to compile it with no
 success.
 The problem is the 'StandartTokenizer.java' class ! I can´t find it
 in the
 package org.apache.lucene.analysis.standard .
 
   The only file that exists there is a file named
 'StandartTokenizer.jj'.
 What is this file for ?
 
   I have lucene-1.2-rc2. Can someone help me,
 
 thanks,
 
   jk
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: using lucene with a very large index

2002-02-13 Thread Otis Gospodnetic


--- tal blum [EMAIL PROTECTED] wrote:
 Hi, I'm building a very large index, that contains several
 categories.
 I have several questions I hope you can answare.
 1) Is there a way to use lucene with several indexes without merging
 them?

Look at MultiSearcher class.

 2) Does the Document id changes after merging indexes adding or
 deleting documents?

Not sure.

 3) Has anyone implemented a GUI to the lucene index, such that
 enables to deletions by id or sql-like queries?

I haven't seen anything like it.

 4) assuming I have a term query that has a large number of hits say
 10 millions, is there a way to get the say the top  10 results
 without going through all the hits?

See the Javadocs for Searcher and IndexSearcher, I think you'll find
the answer there.

Otis


__
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Index Locked For Write

2002-02-24 Thread Otis Gospodnetic


--- Howk, Michael [EMAIL PROTECTED] wrote:
 Out of curiosity, why didn't we need to close the writer in rc2 or
 rc3?
 
 When you suggest a synchronized keyword, are you suggesting that
 the
 writer is not inherently thread-safe? Do we need to write our own
 thread
 management on top of Lucene?

Sorry, that might have been a wrong suggestion, IndexWriter (at least
the add method) is supposed to be thread safe.

Otis


 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, February 21, 2002 4:07 PM
 To: Lucene Users List
 Subject: RE: Index Locked For Write
 
 
 You could use synchronized keyword and use IndexReader.isLocked() or
 something like that, no?
 
 Otis
 
 --- Howk, Michael [EMAIL PROTECTED] wrote:
  Thank you for your quick responses. But in our application, we're
  working in
  a transactional environment where multiple threads are accessing a
  single
  writer using the recommended singleton pattern. Since no thread has
  exclusive access to the writer, how can we have one thread
  arbitrarily
  decide to close the writer?
  
  Michael
  
  -Original Message-
  From: Mark Tucker [mailto:[EMAIL PROTECTED]]
  Sent: Thursday, February 21, 2002 3:51 PM
  To: Lucene Users List
  Subject: RE: Index Locked For Write
  
  
  You forgot to close your writer after the call to optimize.
  
  -Original Message-
  From: Howk, Michael [mailto:[EMAIL PROTECTED]]
  Sent: Thursday, February 21, 2002 2:49 PM
  To: Lucene Mailing List (E-mail)
  Subject: Index Locked For Write
  
  
  We just got the newest daily build (to try to fix some NullPointer
  errors
  with ? and _ characters), and we're getting the same problem
 that
  Daniel
  Calvo mentioned: Index Locked for Write. Here's basically what our
  code is
  doing:
IndexWriter writer = new IndexWriter(path, analyzer, create);
  try {
  Document doc = new Document();
  doc.add(Field.Keyword(DOC_ID, 14));
doc.add(Field.UnStored(ANY, mushu));
writer.addDocument(doc);
writer.optimize();
  
// Search the document for our keyword
{   
IndexReader reader = IndexReader.open(path);
IndexSearcher searcher = new IndexSearcher(reader);
Vector returnStuff = searcher.search(mushu);
}
  
// Verify that we got one record back
assertNotNull(returnStuff);
assertEquals(1, returnStuff.size());
}
finally {
// Clean up after ourselves
IndexReader reader = IndexReader.open(path);
reader.delete(new Term(DOC_ID, 14));
reader.close();
}
  
  And the exception we're getting on the reader.delete line in the
  finally
  clause:
  
  java.io.IOException: Index locked for write:
  Lock@C:\devtools\JBossTomcat\jboss\indexes\marc\write.lock at
 

sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteC
  all.java:245) at
 

sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:220)
  at
  sun.rmi.server.UnicastRef.invoke(UnicastRef.java:122) at
 

org.jboss.ejb.plugins.jrmp.server.JRMPContainerInvoker_Stub.invoke(Unknown
  Source) at
 

org.jboss.ejb.plugins.jrmp.interfaces.GenericProxy.invokeContainer(GenericPr
  oxy.java:357) at
 

org.jboss.ejb.plugins.jrmp.interfaces.StatelessSessionProxy.invoke(Stateless
  SessionProxy.java:123) at
  $Proxy5.deleteDocument(Unknown Source)
  
  Are we using the right approach? Any suggestions? Thank you.
  
  Michael Howk
  
  --
  To unsubscribe, e-mail:
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
  
  
  --
  To unsubscribe, e-mail:
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
  
  --
  To unsubscribe, e-mail:  
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
  
 
 
 __
 Do You Yahoo!?
 Yahoo! Sports - Coverage of the 2002 Olympic Games
 http://sports.yahoo.com
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Sports - Coverage of the 2002 Olympic Games
http://sports.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Performance Tuning

2002-02-25 Thread Otis Gospodnetic

You could try playing with a merge factor...

Otis

--- Aruna Raghavan [EMAIL PROTECTED] wrote:
 Hi,
 Are there any ways to finetune the CPU performance with Lucene? I
 know of
 the usage of optimize() calls but I am wondering if there are any
 other ways
 to improve the CPU time/Disk space performace.
 Thanks!
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Sports - Coverage of the 2002 Olympic Games
http://sports.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Boolean Query Parsing with IN keyword

2002-02-26 Thread Otis Gospodnetic

Jonathan,

That's most likely caused by StandardAnalyzer, which you are probably
using.  'in' is listed as one of the stop words:

public static final String[] STOP_WORDS = {
a, and, are, as, at, be, but, by,
for, if, in, into, is, it,
no, not, of, on, or, s, such,
t, that, the, their, then, there, these,
they, this, to, was, will, with
};

Try searching for state:or
It should yield no matches.

But, StandardAnalyzer is no longer final (get the latest build) and you
can write a class that subclasses it and calls this StandardAnalyser
constructor:

/** Builds an analyzer with the given stop words. */
public StandardAnalyzer(String[] stopWords) {
stopTable = StopFilter.makeStopTable(stopWords);
}

Pass it your own list of stop words and you are done.
If you've already indexed some data you have to be careful which words
you choose as stop words.  I suggest sticking with the above list
(minus 'in', 'or', etc.) for now.
Once you have your class use it instead of StandardAnalyzer.

Otis




--- Jonathan Franzone [EMAIL PROTECTED] wrote:
 *This message was transferred with a trial version of CommuniGate(tm)
 Pro*
 
 I'm trying to search on a US State field. The lucene field name is
 state
 and so I'm building a query like: +(state:fl state:al state:in) to
 search
 for documents in Florida, Alabama, or Indiana. But whenever I pass
 in or
 IN to the QueryParser it strips it out. Passing the above query to
 the
 QueryParser yields +(state:fl state:al). Is there a way to escape the
 in
 keyword? I've tried enclosing it in double and single quotes, neither
 of
 which worked.
 
 Thanks,
 Jonathan Franzone
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Sports - Coverage of the 2002 Olympic Games
http://sports.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Software License

2002-02-26 Thread Otis Gospodnetic

Actually, I think ASL doesn't require this, although it is nice when
even commercial entities give credit in some way.
I could be wrong about ASL.

Otis

--- Rafael Luque [EMAIL PROTECTED] wrote:
 Hi all,
 
 I know Lucene is a free project, however I think its use is under
 Apache Software License (ASL) terms, so someone using Lucene should
 reference the project, use the logo 'powered by Lucene', ...
 
 I have suspects about a company releasing a commercial search engine
 based on Lucene and not mentioning Lucene at all. What kind of
 actions can we take to protect Open Source projects like Lucene of
 this kind of malicious use?
 
 Thanks, 
 


__
Do You Yahoo!?
Yahoo! Sports - Coverage of the 2002 Olympic Games
http://sports.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: SegmentTermPositions throwing nullpointer

2002-03-01 Thread Otis Gospodnetic

Have you got the latest Lucene?  Nightly build?
Try that, this looks like an old bug that has been fixed.

Otis

--- Charles Harvey [EMAIL PROTECTED] wrote:
 We are having some bizarre instances where SegmentTermPositions is
 throwing
 nullpointers. It only happens on certain queries, but it happens
 across
 different indexes using the same query terms, always quoted. Seems to
 be
 
 obscure multi word terms in quotes that make this happen.
 
 randi cohen  and wacky tobaccy threw on a sun box but did not
 throw
 on
 a pc.
 
 java.lang.null exception pointer threw on a pc
 
 Any ideas, anyone? I looked at the class and noticed that no
 nullpointers
 were thrown on purpose. I'm not familiar with the lucene code, so I'm
 not
 too sure what is happening in this process, and the lovely Unknown
 Source
 doesn't help out too much...
 
 java.lang.NullPointerException
   at
 org.apache.lucene.index.SegmentTermPositions.seek(Unknown
 Source)
   at org.apache.lucene.index.SegmentTermDocs.seek(Unknown
 Source)
   at
 org.apache.lucene.index.IndexReader.termPositions(Unknown
 Source)
   at org.apache.lucene.search.PhraseQuery.scorer(Unknown
 Source)
   at org.apache.lucene.search.Query.scorer(Unknown Source)
   at org.apache.lucene.search.IndexSearcher.search(Unknown
 Source)
   at org.apache.lucene.search.Hits.getMoreDocs(Unknown
 Source)
   at org.apache.lucene.search.Hits.init(Unknown Source)
   at org.apache.lucene.search.Searcher.search(Unknown Source)
   at org.apache.lucene.search.Searcher.search(Unknown Source)
 
 
 
 _
 
 The trouble with the rat-race is that even if you win you're still a
 rat.
 --Lily Tomlin
 _
 Charles Harvey
 Developer
 http://www.philly.com
 Wk: 215 789 6057
 Cell: 215 588 0851
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Greetings - Send FREE e-cards for every occasion!
http://greetings.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: TimeOut Exception when Indexing with EJB (Please Help)

2002-03-05 Thread Otis Gospodnetic

Hello,

I think you should just try your two suggestions and see.
The answer depends on how exactly you do it, OS configuration, etc.
Does this happen on an optimized index, too?

Otis

--- Tihon One [EMAIL PROTECTED] wrote:
 Hi all;
 
 I've tried to index a 100K text file on a empty Index folder (0 MB of
 
 indexed file) and it took 0.77 second.  However, when my index folder
 get 
 larger (~20MB of Indexed files) the same 100K text file would take up
 to 30 
 seconds.
 
 I’m using EJB to do the index processing and my SessionBean will get
 a 
 TimeOutException if it take longer than 30 second.  I prefer not to
 re-set 
 the Transactions TimeOut to longer time.  What will happen if the
 Index 
 folder get larger (~ 1GB) ?
 
 I understand that the indexing process can be slow but is there a way
 that I 
 can speed up the process no matter what the size of my Index folder
 is?
 
 *  If I increase the IndexWriter.mergeFactor = 1000 will it causes 
 FileNotFoundException (too many open files)? Is there a solution for
 this 
 error?
 
 *  If I use RAMDirectory, will it cause Out of Memory Exception? Is
 there a 
 solution for this error?
 
 Environment:
  WebLogic Server 6.1
  Java 1.3.1
  Document with ( 8 Keyword Fields and 10 Text Fields).
  The files range from 10KB – 3000KB
 
 Thanks
 
 TiHon
 
 
 _
 MSN Photos is the easiest way to share and print your photos: 
 http://photos.msn.com/support/worldwide.aspx
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: how to parse XHTML

2002-03-05 Thread Otis Gospodnetic

Terry,

These are really not Lucene questions.  Lucene will let you index text,
but you need to figure out how to parse your XHTML files.
Take a look at Jtidy on sf.net, I think Jtidy can help you with parsing
XHTML, or perhaps Xerces from xml.apache.org can.

Otis

--- Terry McGregor [EMAIL PROTECTED] wrote:
 
 Hi,
 
 I'm new to Lucene, and I was wondering how I should parse XHTML
 files. 
 Should I name them with the .HTML file extention and use 
 org.apache.lucene.demo.IndexHTML or name them with the .XML file
 extention 
 and use an XML parser?
 
 Also, I would like to keep my XHTML files with a .XHTML file
 extention, if 
 possible, but that's not so important.
 
 Thanks,
 Terry.
 
 _
 Join the world’s largest e-mail service with MSN Hotmail. 
 http://www.hotmail.com
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: phrase query and slop factor

2002-03-06 Thread Otis Gospodnetic

Wouldn't that depend on how far from each other you wanted to allow
them to be?  If you have a document with 100 words indexed and you are
searching for first second wouldn't you have to set the slop to about
100, just in case the word 'first' is the very first word in the
document, and 'second' is the very last work in your document?
I haven't used slop factor, so this is only theory :)

Otis

--- Norbert Pabi¶ [EMAIL PROTECTED] wrote:
 What must be slop factor to allow any combination of word in phrase?
 
 -- 
 Norbert Pabi¶
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Virtual Index

2002-03-06 Thread Otis Gospodnetic

If you prefer the old way (multiple indices) you can do that with
Lucene, too.  Look at MultiSearcher class.
Lucene also supports range queries which may be helpful.  I haven't
used them, but it sounds like the thing to look at.

Otis

--- Paul Dlug [EMAIL PROTECTED] wrote:
 We have a relatively large (300,000+ documents) set of XML files to
 index. The files themselves are articles broken up by journal and
 decade
 so that users can restrict their search to specific journals and year
 ranges. Under our old search engine this was done by creating a
 seperate
 index for each journal/decade and then creating a virtual index
 which
 would search the smaller indexes and put the results together (with
 scoring preserved).
 
 In Lucene it looks like I would have to build one large index and do
 something like this:
 
 title:test  (journal:myjournal  (year:1990 || year:1991 ||
 year:1992
 || year:1993 || year:1994 || year:1995 || year:1996 || year:1997 ||
 year:1998 || year:1999))
 
 Is there a better way to do this?
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene throws an ArrayIndexOutOfBoundsException() if the first te rm in my query string is a stopWord

2002-03-07 Thread Otis Gospodnetic

Hm, I've got the latest Lucene (from CVS) and don't have this issue.
The query I tried on our index is:
  +title:of +title:someotherwordthatDOESgetmeresults

Otis

--- Biswas, Goutam_Kumar [EMAIL PROTECTED] wrote:
 Dear Lucene Users
 
  Lucene throws an ArrayIndexOutOfBoundsException() if the first
 term in
 my query string is a stopWord. Why is it so ?
 
  I'm making AND as the default mode of search. So I'm adding an
 AND
 operator between each term of my query. That is if my query 
  is 'cats dogs' I'm rephrasing it as 'cats AND dogs'. But if the
 first
 term is a stopWord (example: 'of cats ...') I get the 
  ArrayIndexOutOfBoundsException. 
 
  I'm tried something like the following to get away with this:
 
   // 
  String queryStr = of AND by AND for AND cats AND dogs; //
 'of', 'by',
 'for'  are stopwords  
  Query query = null;
  Analyzer myAnalyzer = new MyAnalyzer(stopWords); 
  try {
  query = QueryParser.parse(queryStr, content, myAnalyzer);
 //
 content is the default field to search.   
  } catch (ArrayIndexOutOfBoundsException e) {
  queryStr = queryStr.substring(queryStr.indexOf(AND) + 3); 
  }
  //   
  // so my final queryStr becomes 'cats AND dogs' which works
 fine!
  //
  // 
 
  Is there a better way to handle this situation ? or can someone
 throw a
 pointer on why this error is occuring in the first place ?   
 
 Thanks in advance
 -Goutam   
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




2 exceptions

2002-03-08 Thread Otis Gospodnetic

Hello,

Do these 2 exceptions look familiar to anyone:


java.lang.ArrayIndexOutOfBoundsException: 111
at java.util.Vector.elementAt(Vector.java(Compiled Code))
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:136)
at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:132)
at
org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:134)
at
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:114)
at
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:166)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:156)
at
org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:205)
at
org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:91)
at org.apache.lucene.search.Similarity.idf(Similarity.java:104)
at
org.apache.lucene.search.TermQuery.sumOfSquaredWeights(TermQuery.java:76)
at
org.apache.lucene.search.BooleanQuery.sumOfSquaredWeights(BooleanQuery.java:105)
at org.apache.lucene.search.Query.scorer(Query.java:91)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:105)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:91)
at org.apache.lucene.search.Hits.init(Hits.java:81)
at org.apache.lucene.search.Searcher.search(Searcher.java:75)
at org.apache.lucene.search.Searcher.search(Searcher.java:69)


The second exception that I am getting is this:

java.io.IOException: Interrupted system call
at java.io.RandomAccessFile.seek(Native Method)
at
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:271)
at org.apache.lucene.store.InputStream.refill(InputStream.java:166)
at
org.apache.lucene.store.InputStream.readVInt(InputStream.java(Compiled
Code))
at
org.apache.lucene.store.InputStream.readVInt(InputStream.java(Compiled
Code))
at
org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:127)
at
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:114)
at
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:166)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:161)
at
org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:205)
at
org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:91)
at org.apache.lucene.search.Similarity.idf(Similarity.java:104)
at
org.apache.lucene.search.TermQuery.sumOfSquaredWeights(TermQuery.java:76)
at
org.apache.lucene.search.BooleanQuery.sumOfSquaredWeights(BooleanQuery.java:105)
at org.apache.lucene.search.Query.scorer(Query.java:91)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:105)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:91)
at org.apache.lucene.search.Hits.init(Hits.java:81)
at org.apache.lucene.search.Searcher.search(Searcher.java:75)
at org.apache.lucene.search.Searcher.search(Searcher.java:69)


Any search I make in a multi-threaded environment seems to fail withone
of these exceptions.
The search code in use looks like this:

try
{
// if the index has been modified since opened, re-open it.
if (IndexReader.lastModified(_paIndexDir) = _paIndexLastMod)
{
_paIndexLastMod  = new Date().getTime();
if (_paIndexSearcher != null)
_paIndexSearcher.close();
_paIndexLastMod  = new Date().getTime();
}
if (_paIndexSearcher == null)
_paIndexSearcher = new IndexSearcher(_paIndexDir);
}
catch (IOException e)
{
_log.error(Could not open/close IndexSearcher:  +
e.getMessage());
return;
}

Query query = null;
Hits  hits  = null;
try {
query = MultiFieldQueryParser.parse(queryString, new String[]
{title, description}, _analyzer);
hits = _paIndexSearcher.search(query);
} catch (ParseException e) {
_log.warn(QueryParser threw ParseException while parsing:  +
queryString, e);
} catch (TokenMgrError e) {
_log.warn(QueryParser threw TokenMgrException while parsing:  +
queryString, e);
} catch (IOException e) {
_log.error(IndexSearcher threw IOException while searching for: 
+
queryString, e);
}

I'm about to look at the source, but if any of these exceptions look
familiar to anyone, or if you see a flaw in the code above please let
me know.

Thanks,
Otis


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: optimize(), delete() calls on IndexWriter

2002-03-08 Thread Otis Gospodnetic

No they don't. Note that delete() is in IndexReader.

Otis

--- Aruna Raghavan [EMAIL PROTECTED] wrote:
 Hi,
 Do calls like optimize() and delete() on the Indexwriter cause a
 separate
 thread to be kicked off?
 Thanks!
 Aruna.
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: 1.02 download on jakarta.apache.org?

2002-03-08 Thread Otis Gospodnetic

I don't think you are blind.  You could get the latest source from the
CVS, or wait a few weeks when I hope we will get the new release out...

Otis


--- Shannon Booher [EMAIL PROTECTED] wrote:
 
 Maybe I'm just blind, but Lucene v1.02 does not appear to be
 available 
 through jakarta.apache.org.  There is no listing for Lucene under
 Release 
 Builds, only Milestone and Nightly...
 
 thanks,
 
 sjb
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: 2 exceptions

2002-03-08 Thread Otis Gospodnetic

Just for the list/knowledge archive:

I found the source of one of the exceptions in my code:

 java.io.IOException: Interrupted system call
   at java.io.RandomAccessFile.seek(Native Method)
   at

org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:271)
   at 
 // if the index has been modified since opened, re-open it.
 if (IndexReader.lastModified(_paIndexDir) = _paIndexLastMod)
 {
   _paIndexLastMod  = new Date().getTime();
   if (_paIndexSearcher != null)
   _paIndexSearcher.close();
   _paIndexLastMod  = new Date().getTime();
 }
 if (_paIndexSearcher == null)
   _paIndexSearcher = new IndexSearcher(_paIndexDir);
BUG:  ^
  And what if it's != null?  It's already close()d above.

The other one might have been a side-effect of the above bug.

Otis


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: QueryParser and Double Quotes

2002-03-10 Thread Otis Gospodnetic

I think there is no way to do that since a double quote is a special
character for query parser.
There was some discussion about introducing an escape character to
allow things like this, but the discussion has not materialized yet.

Otis


--- Tony Biag [EMAIL PROTECTED] wrote:
 
 Is there a way where I can search for phrase containing double quote?
  For
 example, the search string is: 6 nail.  Thanks for any answers.  
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Maximum indexable data

2002-03-10 Thread Otis Gospodnetic

I haven't heard of any such limit.  There is a 'limit' of 10,000
characters on a field length, but that is a limit only because that
number is hard coded in the source.
However, shouldn't this be very simple for you to test?
Index something over and over and see if you ever hit the wall :)

Otis

--- Herman Chen [EMAIL PROTECTED] wrote:
 Hi,
 
 Is there a limit for the amount of data indexable by a segment?
 If so is there a limit for searching?  i.e. can I give MultiSearcher
 several indices that are all close to the maximum size.  Thanks.
 
 --
 Herman
 
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Indexing across multiple servers

2002-03-11 Thread Otis Gospodnetic

This is becoming a FAQ...
Not by itself, so you have to write an application to collect the data
to be indexed yourself, and then feed it to Lucene.

Otis

--- Ryan Ogaard [EMAIL PROTECTED] wrote:
 Does Lucene support the indexing/searching of multiple servers across
 the network (file servers, web servers, databases, ...)?
 
 Thank you,
 Ryan
 
 
  --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: special character handling

2002-03-12 Thread Otis Gospodnetic

It depends on the Analyzer used.

Otis

--- Aruna Raghavan [EMAIL PROTECTED] wrote:
 Hi,
 Does lucene replace all special characters with spaces when it adds
 the
 document to the index?
 Thanks!
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: special character handling

2002-03-12 Thread Otis Gospodnetic

This is answered in FAQA:
http://jguru.com/faq/view.jsp?EID=538308

--- Aruna Raghavan [EMAIL PROTECTED] wrote:
 Otis,
 I am using StandardAnalyzer.
 
 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, March 12, 2002 3:37 PM
 To: Lucene Users List
 Subject: Re: special character handling
 
 
 It depends on the Analyzer used.
 
 Otis
 
 --- Aruna Raghavan [EMAIL PROTECTED] wrote:
  Hi,
  Does lucene replace all special characters with spaces when it adds
  the
  document to the index?
  Thanks!
  
  --
  To unsubscribe, e-mail:  
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
  
 
 
 __
 Do You Yahoo!?
 Try FREE Yahoo! Mail - the world's greatest free email!
 http://mail.yahoo.com/
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: size and nos of documents in the index

2002-03-15 Thread Otis Gospodnetic

Parag,

Indexing time and index size should be proportional to the size of
documents being indexed.  Also, I believe a document containing more
different, unique terms will result in a larger index size increase
than a document containing more duplicates.  For instance I am going
to bed in a few moments because I am tired will result in more unique
terms than Good night.
As for the maximum number of documents that can be indexed I think
there is virtually no limit, other than you hardware and things like
that.

Otis

--- Parag Dharmadhikari [EMAIL PROTECTED] wrote:
 Hi all,
 
 How the indexing is afftected by the size of documents and what is
 the maximum number of documents which can be indexed.
 
 regards
 parag
 
 


__
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Wildcard Searching

2002-03-16 Thread Otis Gospodnetic

Hello,

This was a thread on lucene-user initially, but I'm copying lucene-dev
as well.  Sorry about duplicates.

--- Stefan Bergstrand [EMAIL PROTECTED] wrote:
 Doug Cutting [EMAIL PROTECTED] writes:
 
 Just noticed this problem in my program.
 
 It seems as if the analyzer passed to QueryParser.parse(), never is
 passed to PrefixQuery (which is what my test case is parsed to).
 
 A quick look in QueryParser.jj confirms this: 
 
  q = new PrefixQuery(new Term(field, term.image.substring
   (0, term.image.length()-1)));

I thought that queries such as 'rou?d' are considered wildcard queries
by QueryParser.jj, and not Prefix queries, no?
In the default definition of token in QueryParser.jj I see this:

| PREFIXTERM:  _TERM_START_CHAR (_TERM_CHAR)* * 
| WILDTERM:  _TERM_START_CHAR 
  (_TERM_CHAR | ( [ *, ? ] ))* 

Then further down in QueryParser.jj we have this:

   if (wildcard)
 q = new WildcardQuery(new Term(field, term.image));

So a WildWuery is being constructed, not PrefixQuery, I think.

What I don't understand is why the definition of _TERM_START_CHAR looks
like this:

| #_TERM_START_CHAR: ~[  , \t, +, -, !, (, ), :, ^, 
 [, ], \, {, }, ~, * ] 

Maybe the name is misleading, but it seems like _TERM_START_CHAR are
the characters that a TERM can start with, because later in
QueryParser.jj we have TERM defined as:

| TERM:  _TERM_START_CHAR (_TERM_CHAR)*  

and _TERM_CHAR has this definition:

| #_TERM_CHAR: _TERM_START_CHAR 

So how can we have a * in _TERM_START_CHAR when terms are not allowed
to start with a *, and if we do have *, how come we do not have ?
as well?

Can somebodyt correct me in every place where I made false statements,
assumptions, and conclusions?

Thanks,
Otis

   From: Howk, Michael [mailto:[EMAIL PROTECTED]]
   
   Also, Lucene returns the parsed version of each of our 
   searches. When we
   search by rou*d, Lucene parses it as rou*d (which is what we 
   would expect).
   But when we search by rou?d, Lucene parses it as rou d. It 
   seems to wrap
   the term in quotes and replace the question mark with a 
   space. Any ideas? Or
   can someone give us an idea of how to understand WildcardQuery or
   WildcardTermEnum?
  
  It sounds like the problem is in the query parser.  Brian?
  
  Doug
  
  --
  To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
  
  
 
 -- 
 ---
 Stefan Bergstrand
 Polopoly - Cultivating the information garden
 Ph:   +46 8 506 782 67
 Cell: +46 704 47 82 67
 Fax:  +46 8 506 782 51
 [EMAIL PROTECTED], http://www.polopoly.com
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 



__
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: corrupted index

2002-03-16 Thread Otis Gospodnetic

Oh, I just thought of something (wine does body good).
Perhaps one could use Runtime (the class) to catch the JVM shutdown and
do whatever is needed to prevent index corruption.  I believe there are
some shutdown hook methods in there that may let you do that.  I'm too
lazy to look up the API docs now, but I rememeber reading about that
once, and perhaps it was even mentioned on one of the 2 Lucene mailing
lists.

On the other hand, it would be great to have a tool that can verify an
existing index.  I don't know enough about the actual file structure
yet to write something like that, but maybe somebody else has done that
already or would like to contribute.

Otis


--- Steven J. Owens [EMAIL PROTECTED] wrote:
 Otis,
 
  You can remove the .lock file and try re-indexing or continuing
  indexing where you left off.
  I am not sure about the corrupt index.  I have never seen it
 happen,
  and I believe I recall reading some messages from Doug Cutting
 saying
  that index should never be left in an inconsistent state.  
 
  Obviously never should be, but if something's pulling the rug
 out from under his JRE, changes could be only partially written,
 right?  
 
  Or is the writing format in some sense transactionally safe?
 I've never worked directly on something like this, but I worked at a
 database software company where they used transaction semantics and a
 journaling scheme to fake a bulletproof file system.  Is this how
 the index-writing code is implemented?
 
  In general, I can guess Doug's response - just torch the old
 index directory and rebuild it; Lucene's indexing is fast enough that
 you don't need to get clever.  This seems to be Doug's stance in
 general (i.e. don't get fancy, I already put all the fanciness
 you'll
 need into extremely fast indexing and searching).  So far, it seems
 to work :-).
 
  I could be making this up, though, so I suggest you search through
  lucene-user and lucene-dev archives on www.mail-archive.com.
  A search for corrupt should do it.
  Once you figure things out maybe you can post a summary here.
 
  I got a little curious, so I went and did the searches.  There
 is
 exactly one message in each list archive (dev and users) with the
 keyword corrupt in it.  The lucene-users instance is irrelevant:
 

http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00557.html
 
  The lucene-dev instance is more useful:
 

http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00157.html
 
  It's a post from Doug, dated sept 27, 2001, about adding not
 just
 thread-safety but process-safety:
 
   It should be impossible to corrupt an index through the Lucene API.
   However if a Lucene process exits unexpectedly it can leave the
 index
   locked.  The remedy is simply to, at a time when it is certain that
 no
   processes are accessing the index, remove all lock files.
   
  So it sounds like it's worth trying just removing the lock
 files.
 Hm, is there a way to come up with a sanity check you can run on an
 index to make sure it's not corrupted?  This might be an excellent
 thing to reassure yourself with: something went wrong?  Run a sanity
 check, if it fails just reindex.
 
 Steven J. Owens
 [EMAIL PROTECTED]


__
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene Bugs

2002-03-16 Thread Otis Gospodnetic

Hola,

I don't have year of search engine writing experience either, but I did
look at your reports on Sourceforge earlier and I will try to look at
the source to see if they are the right fixes.  I haven't used
DateFilter, which, I think, you said contains the bug, so no promises,
but I'll look.
That part of code might have changed since your reports, and I may have
trouble locating the lines you mentiones, so I may ask you to point me
to the right lines in the new source.
Tomorrow or Monday.  Right now I have to go kill some crapes and go
to bed.

Otis

--- David Smiley [EMAIL PROTECTED] wrote:
Oh I *have* downloaded the CVS source and I actually did *fix* 
 (maybe) two of these three bugs and I did *submit* what I did exactly
 
 to fix them to the sourceforge / mailing-list for public review (but
 
 not in diff/patch format since they were one-liners).  The problem is
 
 that much of Lucene is very complicated (understandably so) and I 
 never got someone more familiar with Lucene's more complicated parts 
 (like Doug, or perhaps some others here) to respond to see if my fix 
 was correct and completely addresses the issue.  Not one person 
 responded except for some other guy to say he experienced the same 
 bug and that nobody responded to his bug report either :-(.  The 3rd 
 bug, the one that I didn't fix, I took the time to write a test 
 program that showed the bug.  What's needed now for these bugs to be 
 squashed, is someone that really knows Lucene's complicated parts to 
 verify if my 2 fixes are sufficient and to at least investigate the 
 3rd bug.  I'm not the one with years of search-engine writing 
 experience ;-).
 
 I really appreciate your response by the way, it's a welcome 
 change... and an initial step.
 
 ~ Dave Smiley
 
 On Saturday, March 16, 2002, at 08:59  PM, Andrew C. Oliver wrote:
 
  You need not be asked, help is always wanted.  How about instead of
  submitting bugs, submit patches.  Simply get the sources via CVS
 (click
  on CVS Repository on the Jakarta front page), fix the bugs and then
 do
  cvs diff -u to create patches.  Post those into bugzilla and put 
  [PATCH]
  on the summary line and I think you'll find them applied rather 
  quickly.
 
  -Andy
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Multiple field searching

2002-03-19 Thread Otis Gospodnetic

I'm using MultiTermQueryParser and it works for me.

Otis

--- Tate Jones [EMAIL PROTECTED] wrote:
 hi,
 
 I am trying to search across multiple fields using the following
 query
 
 +keyword:computers +subject:News content:xml
 or
 +(keyword:{computers}) +(subject:{News}) content:xml
 
 i have added the fields to the document correctly. 
 
 Have also tried using the MutipleFieldQueryParser without success.
 
 The only query that works is, which is not correct as they are OR's
 keyword:computers subject:IT content:xml
 
 Is anyone having the same problems
 
 Thanks in advance
 Tate
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Question Deleting/Reindexing Files

2002-03-20 Thread Otis Gospodnetic

The standard answer is try deleting/adding in batches instead of
individually.  Seems more efficient, too, if you can write your
application that way.
That is what you are essentially doing by writing to a separate index
and then doing a bunch of deletions, followed by re-additions.
I know I'm stating the obvious, but I wanted to get this out of the way
:)

Otis

--- Spencer, Dave [EMAIL PROTECTED] wrote:
 [1] There's no update so delete and then add is what you want.
 [2] I have had the same problems w/ using an IndexWriter and
 IndexReader
 at the same time and getting a locking problem when deleting. I think
 I
 sent
 mail to the list w/ a test case a week ago  [disclaimer: this is not
 a complaint!] and I think the issue is still open. Maybe I should
 turn
 this
 into a bug report? I know fixing bugs is encourage but I don't have
 enough
 context about the right solution, or how the locking apparently
 changed to foul this up, though I did look thru things. 
 My workaround was to write new entries to a new index and then run
 a separate merge utility that 1st does a delete pass, and then
 reopens
 and does adds, based on a primary key (the URL of each doc in my
 case).
 
 
 -Original Message-
 From: Joe Hajek [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, March 20, 2002 12:28 AM
 To: [EMAIL PROTECTED]
 Subject: Question Deleting/Reindexing Files
 
 
 Hi,
 
 I am using Lucene for indexing a relatively large article based
 system
 where articles change from time to time so i have to reindex them.
 reindexing had the effekt that a query would return the hit for a
 file
 multiple times (according to the number of updates.
 
 The only solution to that problem I found was to delete the file to
 be
 updated before indexing it again. Is there another possibility ?
 
 As the system is large i am collecting the articles that have to be
 updated together, open a writer and add the documents to the index.
 this
 solution worked fine for me using rc1 in rc4 it seems that it is not
 possible anymore to delete a file from an index while the index is
 opened for writing.
 
 do you know any solutions to that problem ?
 
 thanx a lot in advance
 
 regards joe
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Multiple field searching

2002-03-21 Thread Otis Gospodnetic


--- Kelvin Tan [EMAIL PROTECTED] wrote:
 hmmm...really?
 
 My impression was that the ANDs are treated equivalently with +s
 by the
 parser, so they're redundant. 

Correct.

 The { and }s aren't part of the syntax, are they?

I was wondering where those came from.
I don't think I've seen them in QueryParser.jj.

Otis

 - Original Message -
 From: Mehran Mehr [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]; Kelvin
 Tan
 [EMAIL PROTECTED]
 Sent: Thursday, March 21, 2002 8:11 PM
 Subject: Re: Multiple field searching
 
 
  this is the right syntax:
 
  +(keyword:{computers}) AND +(subject:{News}) AND
  content:xml
 
 
  __
  Do You Yahoo!?
  Yahoo! Movies - coverage of the 74th Academy Awards®
  http://movies.yahoo.com/
 
  --
  To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Older versions of Lucene?

2002-03-21 Thread Otis Gospodnetic

Maybe you can find something in Lucene's old repository on
Sourceforge.net.

Otis

--- Robert A. Decker [EMAIL PROTECTED] wrote:
 I'm on Java 1.1.8, and can't upgrade beyond that for quite some time
 due
 to testing requirements.
 
 I've managed to compile in and use the 1.2 StringBuffer class that is
 required by Lucene. However, I'm getting tons of 'Integer constant
 out of
 range' errors when building. For example:
0xfffeL, 0xL, 0xL,
 0xL
 
 Are all out of range...
 
 Did the size of a long change from 1.1.8 to 1.2? If so, is there a
 way to
 use 1.1.8 and lucene?
 
 If not, is it possible to use an older version?
 
 thanks,
 rob
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: TokenManager's longs too long

2002-03-21 Thread Otis Gospodnetic

Sorry, no experience with JDK 1.1.8 and Lucene nor JavaCC.
Sounds like a question for WebGain folks.

Otis

--- Robert A. Decker [EMAIL PROTECTED] wrote:
 I'm stuck on jdk 1.1.8 and can't upgrade for some time.
 
 I'm using javacc to create some java code from a .jj file provided by
 the
 Lucene project at lucene.jakarta.org.
 
 I'm runnig into a problem where the long data types found in the
 XXXTokenManager.java files are too long for my version of java.
 
 For example, these are all too long:
 static final long[] jjbitVec0 = {
   0xfffeL, 0xL, 0xL, 
   0xL
 };
 
 Is this a familiar problem? I just joined the mailing list. I've been
 looking around the documentation at webgain, but can't find a mention
 of
 this.
 
 Is there a solution to this?
 
 thanks,
 rob
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene Bugs

2002-03-19 Thread Otis Gospodnetic

Hello,

--- David Smiley [EMAIL PROTECTED] wrote:
 I have reported bugs about Lucene in the fall of 2001 but no Lucene 
 developer has responded.  I am sending this summary as a reminder.
 
 My original message to the mailing list is here:
 
 [Lucene-dev] More bugs
 http://www.geocrawler.com/archives/3/2626/2001/8/0/6409669/
 
 The bugs at SourceForge are here:
 
 DateFilter: call enum.next() first

DateFilter.java has changed since the report, but I think I found the
piece of code that you were referring to.
After looking at DateFilter, TermEnum, and FilteredTermEnum it seems to
me that next() does not need to be called first.  This is not
java.util.Enumeration enum, it is TermEnum's enum.
Also, if you look at methods next() and term() in FilteredTermEnum
you'll see that term() does need to be called first, otherwise the
first term would get skipped.
I'm not very familiar with this code, but this is what it seems like
from looking at it for 7:32 minutes.

Otis


__
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene Bugs

2002-03-19 Thread Otis Gospodnetic

Hello,

 SegmentTermEnum.clone(), term == null

http://sourceforge.net/tracker/index.php?func=detailaid=451315group_id=3922;
 atid=103922

Aha, this was a bug, indeed, but it looks like this bug has been fixed
about 6 months ago:

revision 1.2
date: 2001/10/11 15:14:14;  author: scottganyo;  state: Exp;  lines: +1
-1
Fix NullPointerException in clone() method when the Term is null.

Otis


__
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene Bugs

2002-03-19 Thread Otis Gospodnetic

Hello,
Has anyone else observed this behaviour?

 Wrong ordering from Document.fields()

http://sourceforge.net/tracker/index.php?func=detailaid=451317group_id=3922;
 atid=103922

It looks like java.util.Enumeration is used to store the fields, so if
Enumeration guarantees order than this should, too.
Could you please provide a self-contained test case that I can just put
somewhere, compile, and run?
I can't compile the snippet in the above bug report.

 No software is bug free; I just want to help make Lucene better.  If 
 I can be of any help, please ask.

Thanks!
Otis


__
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Field search matching exact and partial occurence

2002-03-22 Thread Otis Gospodnetic

Aero*
Look at Wildcard and Prefix queries.

Otis

--- RAYMOND Romain [EMAIL PROTECTED] wrote:
 
 Hello,
 Is there a way to do a query where I will find on a filed XX and
 retrieved
 the exact or partial matching fields ... for example a query on
 aero
 will return aeronef , aerosol, aero-finder ...
 
 
 Thanks.
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: TokenManager's longs too long

2002-03-22 Thread Otis Gospodnetic

www.webgain.com

--- Robert A. Decker [EMAIL PROTECTED] wrote:
 Aren't the webgain people on this mailing list? If not, how do I
 contact
 them? I've been looking around the javacc pages, but can only find
 the
 email address for this mailing list...
 
 thanks,
 rob
 
 On Thu, 21 Mar 2002, Otis Gospodnetic wrote:
 
  Sorry, no experience with JDK 1.1.8 and Lucene nor JavaCC.
  Sounds like a question for WebGain folks.
  
  Otis
  
  --- Robert A. Decker [EMAIL PROTECTED] wrote:
   I'm stuck on jdk 1.1.8 and can't upgrade for some time.
   
   I'm using javacc to create some java code from a .jj file
 provided by
   the
   Lucene project at lucene.jakarta.org.
   
   I'm runnig into a problem where the long data types found in the
   XXXTokenManager.java files are too long for my version of java.
   
   For example, these are all too long:
   static final long[] jjbitVec0 = {
 0xfffeL, 0xL, 0xL, 
 0xL
   };
   
   Is this a familiar problem? I just joined the mailing list. I've
 been
   looking around the documentation at webgain, but can't find a
 mention
   of
   this.
   
   Is there a solution to this?
   
   thanks,
   rob
   
   
   
   --
   To unsubscribe, e-mail:  
   mailto:[EMAIL PROTECTED]
   For additional commands, e-mail:
   mailto:[EMAIL PROTECTED]
   
  
  
  __
  Do You Yahoo!?
  Yahoo! Movies - coverage of the 74th Academy Awards®
  http://movies.yahoo.com/
  
  --
  To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
  
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: StopFilter-troubles

2002-03-27 Thread Otis Gospodnetic


--- [EMAIL PROTECTED] wrote:
 Dear Lucene-users,
 has someone an answer to the following question:
 If I add a StopFilter to my Analyzer, the stopwords I gave him will
 be left
 out the query. So far, so good. But when my query is like this one:
 (field1
 : x) AND (field2 : stopword) AND (field 1 : y)
 the StopFilter will do its work, but the resulting query is a big
 mess :
 (field1 : x) AND ( ) AND (field 1 : y), and because of that
 the
 searching results ara no good. I hoped it would search for (field1 :
 x)
 AND (field 1 : y). 
 I think the StopFilter does a poor job here. Is anyone familiar with
 this
 problem and has an answer for me? 
 Puk Witte.

I tried something like this on one Lucene index:
description:travel AND description:a

The results were the same as this query:
description:travel

This seems right to me.

Otis



__
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: StopFilter-troubles

2002-03-27 Thread Otis Gospodnetic

I don't know enough about the query parser to be able to answer that
question, but why do you really need those parentheses?
It would also be great if you could submit this as a bug at
http://jakarta.apache.org/lucene/

Thanks,
Otis


--- [EMAIL PROTECTED] wrote:
 Dear all, especially Otis Gospodnetic (thanks for your answer),
 without ( )'s the StopFilter is doing a good job indeed, but if I put
 them
 around parts of the query, then the searchResult is wrong. 
 For example:
 (field1 : x) AND (field2 : stopword) AND (field 1 : y)
 So I'm afraid my problem is not solved yet. But maybe someone can try
 it
 with the ()'s with his own tool and tell me if they've got the same
 problem.
 Then I know whether I made a mistake. 
 
 Puk Witte
 
 
 
 
 
 
 
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: What do reader-valued Fields do?

2002-03-31 Thread Otis Gospodnetic

This means that you can make searches against that field, but cannot
retrieve its original value.

Otis

--- Robert A. Decker [EMAIL PROTECTED] wrote:
 What should I use to store and add to my Document a long
 String? (thousands of characters)
 
 I'm still having difficulty understanding what it means to create a
 field
 with a reader value:
 
 String aString = fieldName;
 String aStringReader = new StringReader(someLongText);
 Field field = Field.Text(aString, aStringReader);
 
 The documentation says that this will be tokenized and indexed, but
 is
 not stored in the index verbatim. 
 
 I'm using this to store a long text field - an entire document.
 
 However, in my case, nothing appears to be stored in the index! What
 do
 they mean by not being stored verbatim? I assumed this to mean that
 it
 would run the text through my analyzer, at the least, and perhaps,
 further, store it as a serialized form.
 
 
 thanks
 rob
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Greetings - send holiday greetings for Easter, Passover
http://greetings.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: corrupted index

2002-04-02 Thread Otis Gospodnetic

Hello,

Nobody has contributed a tool that verified index integrity, yet.
Is this the latest version of Lucene?
Are you hitting the 2GB/file limit?
Just some ideas.

Otis


--- H S [EMAIL PROTECTED] wrote:
 Dear All,
 
 We are experiencing a problem with
 index updates. We have a fairly
 large index (10 gigabytes). There
 are no problems searching it. But
 when we add a single file and then
 try to optimize, optimization fails
 with a null pointer exception in
 RandomAccessFile.seek.
 
 Has anybody come across this problem?
 Is there a way to tell whether an index
 is corrupted?
 
 Thanks very much -
 
 Hinrich Schuetze


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: compiling lucene

2002-04-03 Thread Otis Gospodnetic

JavaCC 2.1 works, too.
This is how I have it set up:

[otis@linux2 otis]$ ls -al /usr/local/.version/javacc2.1/
total 44
drwxrwxr-x6 otis otis 4096 Jan 28 06:50 .
drwxr-xr-x   20 otis otis 4096 Apr  2 23:32 ..
drwxrwxr-x3 otis otis 4096 Jan 28 06:50 bin
-rw-rw-r--1 otis otis 8518 Jan 28 06:50 COPYRIGHT
drwxrwxr-x2 otis otis 4096 Jan 28 06:50 doc
drwxrwxr-x   21 otis otis 4096 Jan 28 06:50 examples
-rw-rw-r--1 otis otis 5599 Jan 28 06:50 README
drwxrwxr-x5 otis otis 4096 Jan 28 06:50 src
[otis@linux2 otis]$ ls -al
~/cvs-repositories/jakarta/jakarta-lucene/lib/
total 132
drwxrwxr-x3 otis otis 4096 Jan 28 15:28 .
drwxrwxr-x9 otis otis 4096 Mar 27 23:28 ..
drwxrwxr-x2 otis otis 4096 Jan 28 15:29 CVS
lrwxrwxrwx1 otis otis   36 Jan 28 06:55 JavaCC.zip -
/usr/local/javacc/bin/lib/JavaCC.zip
-rw-rw-r--1 otis otis   117522 Jan 28 15:23 junit_37.jar

Otis


--- Victor Hadianto [EMAIL PROTECTED] wrote:
 Hi list,
 
 I'm having problem compiling lucene from scratch. I checkout lucene
 1.2 rc4 
 from cvs and I am missing one vital component JavaCC 2.0
 
 The latest javaCC that I can get from webgain is 2.1 and just
 dropping the 
 thing to lucene/lib directory does not work quite well, I had a look
 and the 
 class name expected by lucene build file is quite different from
 JavaCC 2.1
 
 Is there someplace where I can get JavaCC 2.0 that works with lucene?
 
 
 Thanks,
 
 -- 
 Victor Hadianto
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: storing index in third party database.

2002-04-03 Thread Otis Gospodnetic

If you want to store indices in a database search the mailing list
archives for SqlDirectory.

Once I considered using it for one application at work, so I asked its
author about performance.  The answer was that it doesn't perform all
that well when the index grows, if I recall correctly.  Consequently,
we chose to use file-based indices instead.

Otis

--- [EMAIL PROTECTED] wrote:
 Hi all
 
 I want to index the datas which I already stored in a thirdparty
 database table and develop a search facility using lucene. I am
 thinking of storing this indexes back to the database in another
 table. I know for this we have to create a 'directory' which do all
 the indexing operations,
 
 for example
 
 Indexwriter indwriter = new Indexwriter(dirStore,null,create);
 
 where dirStore is the directory, create is boolean.
 
 but I don't know the format to be followed for the
 directory(dirStore).Please help  me if anybody has done similar
 thing.
 TIA
 Amith
 
 
 __
 Your favorite stores, helpful shopping tools and great gift ideas.
 Experience the convenience of buying online with Shop@Netscape!
 http://shopnow.netscape.com/
 
 Get your own FREE, personal Netscape Mail account today at
 http://webmail.netscape.com/
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Custom queries

2002-04-05 Thread Otis Gospodnetic

name != pradeep == -name:pradeep

I think there is also support for the date query below, but I haven't
used it yet, so I don't want to give you any wrong information.

Otis


--- Pradeep Kumar K [EMAIL PROTECTED] wrote:
 Hi lucene friends!
 
 Is there any way to create custom queries.
 Just for example I want to create a query like name != 'pradeep' 
 creationDatedateVar.
 
 TIA
 Pradeep
 
 
 --
 Robosoft Technologies, Mangalore, India
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: JavaCC error when installing with Ant

2002-04-10 Thread Otis Gospodnetic

Ant you have Ant's optional.jar in Ant's lib directory?

--- David Black [EMAIL PROTECTED] wrote:
 
 Ant returns following error.any ideas?
 ...
 lucene-1.2-rc4-src/build.xml:92: Could not create task of type:
 javacc. 
 Common solutions are to use taskdef to declare your task, or, if this
 is 
 an optional task, to put the optional.jar in the lib directory of
 your 
 ant installation (ANT_HOME).
 ...
 
 
 
 I altered the build.properties file to reflect my version of javacc
 
 # Home directory of JavaCC
 javacc.home = /usr/local/java/javacc2.1
 javacc.zip.dir = ${javacc.home}/lib
 javacc.zip = ${javacc.zip.dir}/JavaCC.zip
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




HTML parser

2002-04-18 Thread Otis Gospodnetic

Hello,

I need to select an HTML parser for the application that I'm writing
and I'm not sure what to choose.
The HTML parser included with Lucene looks flimsy, JTidy looks like a
hack and an overkill, using classes written for Swing
(javax.swing.text.html.parser) seems wrong, and I haven't tried David
McNicol's parser (included with Spindle).

Somebody on this list must have done some research on this subject.
Can anyone share some experiences?
Have you found a better HTML parser than any of those I listed above?
If your application deals with HTML, what do you use for parsing it?

Thanks,
Otis


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: HTML parser

2002-04-18 Thread Otis Gospodnetic

Hello Terrence,

Ah, you got me.
I guess I need a bit of both.
I need to just strip HTML and get raw body text so that I can stick it
in Lucene's index.
I would also like something that can extract at least the
title.../title stuff, so that I can stick that in a separate field
in Lucene index.
While doing that I, like you, need to be able to handle poorly
formatted web pages.

In a future I may need something that has the ability to extract HREFs,
but I'll stick to one of the XP principles and just look for something
that meets current needs :)

I looked for ANTLR-based HTML parser a few days ago, but must have
missed the one you pointed out.  I'll take a look at it now.
Can you share or describe your stripHTML method?  Simple java that
looks for s and s or something smarter?

Thanks,
Otis
P.S.
This type of thing makes me wish I can use Perl or Python :)


--- Terence Parr [EMAIL PROTECTED] wrote:
 
 On Thursday, April 18, 2002, at 10:28  PM, Otis Gospodnetic wrote:
 
  Hello,
 
  I need to select an HTML parser for the application that I'm
 writing
  and I'm not sure what to choose.
  The HTML parser included with Lucene looks flimsy, JTidy looks like
 a
  hack and an overkill, using classes written for Swing
  (javax.swing.text.html.parser) seems wrong, and I haven't tried
 David
  McNicol's parser (included with Spindle).
 
  Somebody on this list must have done some research on this subject.
  Can anyone share some experiences?
  Have you found a better HTML parser than any of those I listed
 above?
  If your application deals with HTML, what do you use for parsing
 it?
 
 Hi Otis,
 
 I have an HTML parser built for ANTLR, but it's pretty strict in what
 it 
 accepts.  Not sure how useful it will be for you, but here it is:
 
 http://www.antlr.org/grammars/HTML
 
 I am not sure what your goal is, but I personally have to scarf all 
 sorts of HTML from various websites to such them into the jGuru
 search 
 engine.  I use a simple stripHTML() method I wrote to handle it. 
 Works 
 great.  Kills everything but the text.  is that the kind of thing you
 
 are looking for or do you really want to parse not filter?
 
 Terence
 --
 Co-founder, http://www.jguru.com
 Creator, ANTLR Parser Generator: http://www.antlr.org
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Wildcard query problem with ?

2002-04-19 Thread Otis Gospodnetic

Hm, I just went through all the diffs after RC2 (QueryParser.jj
revision 1.3) and I can't see where '?' was dropped.
However, one user reported this on February 27th:

We just tried adding the ? character to QueryParser.jj under
#_TERM_START_CHAR. We noticed that the * was in that list, so we
figured
we'd just give it a try. It seems to have worked. Now when we search on
rou?d, we get hits on the word round. We're going to try searching
for
some other variations to make sure that we've done the right thing.
We'd still be interested to know exactly why this worked (assuming it
continues to solve our problem). What is a TERM_START_CHAR and how is
it
used? Obviously it does something important. :-)


So I'll try your code and if wildcards really don't work I'll try this
person's suggestion and if it works I'll commit it.


Otis


--- Ralf Hettesheimer [EMAIL PROTECTED] wrote:
 Hello,
 
 I have been using RC2 until yesterday when I tried the latest nightly
 build.
 Now it seems that I can no longer search for wildcard-queries with a
 question mark.
 For example in my index there are two documents, one containing the
 word
 meier and another one containing the word maier. With RC2 I could
 search
 for m?ier and got two hits. With anything later (I tried RC3, RC4
 and the
 nightly builds from 1704 and 1804) I get 0 hits. When searching for
 mei?r
 the same, 1 hit with RC2 and 0 hits with RC4.
 The QueryParser from RC2 generated a BooleanQuery and the QueryParser
 from
 RC4 generates a PhraseQuery. I have attached the source code of a
 little
 test program and output from the debugger.
 Could somebody explain this behaviour?
 
 Thanks
 Ralf Hettesheimer
 
 
 

 ATTACHMENT part 2 application/octet-stream name=TestQueryParser.java


 ATTACHMENT part 3 image/gif name=debugrc2.gif


 ATTACHMENT part 4 image/gif name=debugrc4.gif
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: HTML parser

2002-04-19 Thread Otis Gospodnetic

Such classes are not included with Lucene.
This was _just_ mentioned on this list earlier today.
Look at the archives and search for crawler, URL, lucene sandbox, etc.

Otis

--- Ian Forsyth [EMAIL PROTECTED] wrote:
 
 Are there core classes part of lucene that allow one to feed lucene
 links,
 and 'it' will capture the contents of those urls into the index..
 
 or does one write a file capture class to seek out the url store the
 file in
 a directory, then index the local directory..
 
 Ian
 
 
 -Original Message-
 From: Terence Parr [mailto:[EMAIL PROTECTED]]
 Sent: Friday, April 19, 2002 1:38 AM
 To: Lucene Users List
 Subject: Re: HTML parser
 
 
 
 On Thursday, April 18, 2002, at 10:28  PM, Otis Gospodnetic wrote:
 
 :snip
 
 Hi Otis,
 
 I have an HTML parser built for ANTLR, but it's pretty strict in what
 it
 accepts.  Not sure how useful it will be for you, but here it is:
 
 http://www.antlr.org/grammars/HTML
 
 I am not sure what your goal is, but I personally have to scarf all
 sorts of HTML from various websites to such them into the jGuru
 search
 engine.  I use a simple stripHTML() method I wrote to handle it. 
 Works
 great.  Kills everything but the text.  is that the kind of thing you
 are looking for or do you really want to parse not filter?
 
 Terence
 --
 Co-founder, http://www.jguru.com
 Creator, ANTLR Parser Generator: http://www.antlr.org
 
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Wildcard Searching

2002-04-19 Thread Otis Gospodnetic

Did the change that you mentioned below really work for you?
I wrote this class:
http://nagoya.apache.org/bugzilla/showattachment.cgi?attach_id=1638

and it looks like the bug is not in QueryParser, but in some Java class
(could it be WildcardTermEnum?), since the class does not make use of
QueryParser and still demonstrates that WildcardQuery doesn't work
properly.

Thanks,
Otis


--- Howk, Michael [EMAIL PROTECTED] wrote:
 We just tried adding the ? character to QueryParser.jj under
 #_TERM_START_CHAR. We noticed that the * was in that list, so we
 figured
 we'd just give it a try. It seems to have worked. Now when we search
 on
 rou?d, we get hits on the word round. We're going to try searching
 for
 some other variations to make sure that we've done the right thing.
 
 We'd still be interested to know exactly why this worked (assuming it
 continues to solve our problem). What is a TERM_START_CHAR and how is
 it
 used? Obviously it does something important. :-)
 
 -Original Message-
 From: Howk, Michael [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, February 27, 2002 11:14 AM
 To: 'Lucene Users List'
 Subject: RE: Wildcard Searching
 
 
 The StandardAnalyzer uses a lowercase filter, but we tried indexing
 the
 round hat, just to make sure. The * still worked, but the ? still
 failed.
 
 We noticed that the ? character is listed in the QueryParser as a
 WILDTERM.
 But after that, the code heads into the WildcardQuery class, and we
 get lost
 amidst setEnum() and wildcardEquals() stuff. :-)
 
 Seriously though, we're using the StandardAnalyzer directly from
 Lucene. I
 suppose it's possible that the ? is a special character that's
 getting
 stripped out. But we need help to find out exactly where the special
 characters are defined or filtered.
 
 Michael
 
 -Original Message-
 From: Aruna Raghavan [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, February 27, 2002 11:00 AM
 To: 'Lucene Users List'
 Subject: RE: Wildcard Searching
 
 
 From my experience with wildcards,
 1. They are case sensitive while the regular queries aren't.
 2. Only one wild card is allowed in a word. If you are using this
 with a
 bool query, you can use something like the following
 (asas*) AND (fhg*fd). This is acceptable
 3. There is a requirement of using atleast one character before
 wildcard in
 a query.(*fhhd is not valid)
 4. Special characters are not supported (? may be a special
 character)
 Hope this helps!
 
 -Original Message-
 From: Howk, Michael [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, February 27, 2002 10:56 AM
 To: Lucene Mailing List (E-mail)
 Subject: Wildcard Searching
 
 
 We're really struggling with trying to understand why the
 WildcardQuery
 seems to strip out the question mark by replacing it with a space.
 We're
 using the daily build, and a StandardAnalyzer. We've got the text
 The Round
 Window in our index. If we search on roun* the Lucene QueryParser
 returns
 a hit. When we search on roun?, we don't get any hits. We don't
 even know
 how to make heads or tails of the WildcardQuery or WildcardTermEnum
 classes.
 
 Also, Lucene returns the parsed version of each of our searches. When
 we
 search by rou*d, Lucene parses it as rou*d (which is what we would
 expect).
 But when we search by rou?d, Lucene parses it as rou d. It seems to
 wrap
 the term in quotes and replace the question mark with a space. Any
 ideas? Or
 can someone give us an idea of how to understand WildcardQuery or
 WildcardTermEnum?
 
 Michael
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


WildcardQuestionmarkTest.java
Description: WildcardQuestionmarkTest.java

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]


Re:_HTML_parser

2002-04-21 Thread Otis Gospodnetic

Laura,

http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=Spindleq=b

Oops, it's JoBo, not MoJo :)
http://www.matuschek.net/software/jobo/

Otis

--- [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi Otis,
 
 thanks for your reply. I have been looking for Spindle and Mojo for 2
 
 hours but I don't found anything.
 
 Can you help me? Wher can I find something?
 
 Thanks for your help and time
 
 
 Laura
 
 
   
 
  Laura,
  
  Search the lucene-user and lucene-dev archives for things like:
  crawler
  spider
  spindle
  lucene sandbox
  
  Spindle is something you may want to look at, as is MoJo (not
 mentione
 d
  on lucene lists, use Google).
  
  Otis
  
   Did someone solve the problem to spider recursively a web pages?
  
While trying to research the same thing, I found the
   following...here
   's a 
good example of link extraction.

Try http://www.quiotix.com/opensource/html-parser

Its easy to write a Visitor which extracts the links; should
 take
   abou
   t ten 
lines of code.
  
  
  __
  Do You Yahoo!?
  Yahoo! Games - play chess, backgammon, pool and more
  http://games.yahoo.com/
  
  --
  To unsubscribe, e-mail:   mailto:lucene-user-
 [EMAIL PROTECTED]
  For additional commands, e-mail: mailto:lucene-user-
 [EMAIL PROTECTED]
  
  


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Error with StandardTokenizer.java and Token.java

2002-04-23 Thread Otis Gospodnetic

Hello,

Get the latest version, try again, paste the error if you get it, and
use lucene-user list instead, more eyeballs and brains will see your
proble on that list.

Thanks,
Otis



--- Jacob Gutierrez [EMAIL PROTECTED] wrote:
 Hi there...
 
 Using the latest version of StandardTokenizer.jj and using JavaCC
 (ver 
 2.1) I get 7 java files, among them StandardTokenizer.java and
 Token.java
 
 The Token Class has this atributes
 
 public final class Token {
 String termText; // the text of the term
 int startOffset; // start in source text
 int endOffset; // end in source text
 String type = word; // lexical type
 
 
 }
 
 And the StandardTokenizer in it's next() function has this code:
 
 new org.apache.lucene.analysis.Token(token.image,
 
 token.beginColumn,token.endColumn,
  tokenImage[token.kind]);
 
 Giving an error of Variable not found.
 Why is this error happening?? Do I have to manually modify the file
 created 
 by JavaCC???
 
 Any help will be appreciated.
 
 
 
 
 Jacob Gutiérrez R.
 Cochabamba - Bolivia
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Cannot compile Lucene

2002-04-24 Thread Otis Gospodnetic

Just curious, what exactly people need to do to 'fix up the
exceptions'?  Editing of which files to change what to what?

I'd just like to document that somewhere, that's why I'm asking...

Otis

--- Robert A. Decker [EMAIL PROTECTED] wrote:
 I got it working under Project Builder. You just have to fix up the
 exceptions yourself. Also, you'll get some warnings (121 warnings to
 be
 exact) during the linking stage stating that an Integer Constant is
 too
 large - just ignore these - they're wrong.
 
 thanks,
 rob
 
 http://www.robdecker.com/
 http://www.planetside.com/
 
 On Wed, 24 Apr 2002, Avi Drissman wrote:
 
  I'm using Lucene rc4 and JavaCC 2.1. I'm trying to compile Lucene 
  without Ant, by tossing the files into Project Builder (Mac OS X).
 I 
  ran JavaCC on StandardTokenizer.jj with the standard options,
 tossed 
  the resulting files into the project, and now I'm running into a
 few 
  errors:
  
  1. StandardTokenizer.jj:173 is
  
  org.apache.lucene.analysis.Token next() throws IOException
  
  which is JavaCC'd into StandardTokenizer.java:26 as
  
  final public org.apache.lucene.analysis.Token next() throws 
  ParseException, IOException
  
  which isn't a valid override. javac says
  
  next() in org.apache.lucene.analysis.standard.StandardTokenizer 
  cannot override next() in org.apache.lucene.analysis.TokenStream; 
  overridden method does not throw 
  org.apache.lucene.analysis.standard.ParseException
  
  2. StandardTokenizer.java:26 says
  
  token.beginColumn,token.endColumn
  
  and there are no such member variables.
  
  Am I totally missing something here?
  
  Avi
  
  -- 
  Avi Drissman
  [EMAIL PROTECTED]
  Argh! This darn mailserver is trunca
  
  --
  To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
  
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Otis Gospodnetic

Morning,

 I'm starting to wander how bullet proof are Lucene indexes? Do they
 
 get corrupted easely? If so is there a way to rebuild them?

There is no tool to detect index corruption, fixing of indexing, nor
index rebuilding.
The last one anyone can/has to do on their own.

 I'm started to get the following exception left and right...
 
 04/25 18:34:39 (Warning) Indexer.indexObjectWithValues: 
 java.io.IOException: _91.fnm already exists

I've seen people asking about this on the list, but I never encountered
this particular exception.

 I build a little app (http://homepage.mac.com/zoe_info/) that uses 
 Lucene quiet extensively, and I would like to keep it that way.
 However, 
 I'm starting to have second thought about Lucene's reliability... :-(
 
 I'm sure I'm doing something wrong somewhere, but I really cannot see
 
 what...

Maybe it's not a Lucene issue then, although I've seen this mentioned
so often, which means that documentation could be improved to prevent
people from making the same mistakes that others have already made.

Otis


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Otis Gospodnetic

Hello,

  There is no tool to detect index corruption, fixing of indexing,
 nor
  index rebuilding.
  The last one anyone can/has to do on their own.
 
 :-( Well, that *very* sad to say the least... How do I know if my 
 indexes are not corrupted even if everything seems to be working
 fine? 
 Don't tell me I'm the first one to run into this kind of issues?!?
 How 
 can I trust an index if there is *no* way of checking its
 integrity? 
 And even if you happen to notice that something is fishy, there is no
 
 way to rebuild the index -short or re-indexing everything from
 scratch? 
 That does not sound like a very healthy situation to me. Fragile 
 will be kind for describing it...

Yes, that's all unfortunate.  If you come up with anything, please
share it.  Or, you can use Lucene Sandbox and develop stuff there.

  I've seen people asking about this on the list, but I never
 encountered
  this particular exception.
 
 Lucky you...

:)

  Maybe it's not a Lucene issue then, although I've seen this
 mentioned
  so often, which means that documentation could be improved to
 prevent
  people from making the same mistakes that others have already made.
 
 Maybe, maybe not. And most likely I'm doing something odd. In any
 case, 
 could you point me to the mistakes that others have already made?
 Or 
 did I miss something obvious here?

Nah, the only thing I can suggest is check the lists' archives, that is
where mistakes of others would be recorded.

Otis


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread Otis Gospodnetic


--- petite_abeille [EMAIL PROTECTED] wrote:
  I don't know what environment you're using Lucene in.
 
 The problem seems to be specially bad on osx (10.1.4 + JRE 1.3.1 + 
 latest updates).

Does this mean you tried it on other OSs and it worked?
Which ones?
What JDK did those have and what was their ulimit and what is the
ulimit on your OSX machine?
Just curious.

Otis


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: FileNotFoundException: code example

2002-04-29 Thread Otis Gospodnetic

Hello,

I'll put my comments inline...

--- petite_abeille [EMAIL PROTECTED] wrote:
 Hello again,
 
 attached is the source code of the only class interacting directly
 with 
 Lucene in my app. Sorry for not providing a complete test case as
 it's 
 hard for me to come up with something self contained. Maybe there is 
 something that's obviously wrong in what I'm doing.
 
 Thanks for any help.
 
 PA
 
  //
 //

===
 //
 //Title:  SZIndex.java
 //Description:[Description]
 //Author: Raphael Szwarc [EMAIL PROTECTED]
 //Creation Date:  Wed Sep 12 2001
 //Legal:  Copyright (C) 2001 Raphael Szwarc. All Rights Reserved.
 //
 //

---
 //
 
 package alt.dev.szobject;
 
 import com.lucene.store.Directory;
 import com.lucene.store.FSDirectory;
 import com.lucene.store.RAMDirectory;
 import com.lucene.document.Field;
 import com.lucene.document.DateField;
 import com.lucene.document.Document;
 import com.lucene.analysis.Analyzer;
 import com.lucene.analysis.standard.StandardAnalyzer;
 import com.lucene.index.IndexWriter;
 import com.lucene.index.IndexReader;
 import com.lucene.index.Term;
 import com.lucene.search.IndexSearcher;
 import com.lucene.search.MultiSearcher;
 import com.lucene.search.Searcher;
 import com.lucene.search.Query;
 import com.lucene.search.Hits;
 
 import java.io.FilenameFilter;
 import java.io.File;
 import java.io.IOException;
 
 import java.util.Map;
 import java.util.Collection;
 import java.util.Date;
 import java.util.Iterator;
 
 import alt.dev.szfoundation.SZHexCoder;
 import alt.dev.szfoundation.SZDate;
 import alt.dev.szfoundation.SZSystem;
 import alt.dev.szfoundation.SZLog;
 
 final class SZIndex extends Object
 {
 
 //

===
 //Constant(s)
 //

---
 
   private static final String Extension = .index;
 
 //

===
 //Class variable(s)
 //

---
 
   private static final Filter _filter = new Filter();
 
 //

===
 //Instance variable(s)
 //

---
 
   private String  _path = null;
   private transient File  _directory = null;
   private transient Directory _indexDirectory = null;
   private transient IndexWriter   _writer = null;
   
   private transient IndexReader   _reader = null;
   private transient Searcher  _searcher = null;
 
   private transient Directory _ramDirectory = null;
   private transient IndexWriter   _ramWriter = null;
   private transient int   _counter = 0;
 
 //

===
 //Constructor method(s)
 //

---
 
   private SZIndex()
   {
   super();
   }
 
 //

===
 //Class method(s)
 //

---
 
   static FilenameFilter filter()
   {
   return _filter;
   }
   
   static String stringByDeletingPathExtension(String aPath)
   {
   if ( aPath != null )
   {
   int anIndex = aPath.lastIndexOf( SZIndex.Extension );
   
   if ( anIndex  0 )
   {
   aPath = aPath.substring( 0, anIndex );
   }
   
   return aPath;
   }
   
   throw new IllegalArgumentException(
 SZIndex.stringByDeletingPathExtension: null path. );
   }
 
   static SZIndex indexWithNameInDirectory(String aName, File
 aDirectory)
   {
   if ( aName != null )
   {
   if ( aDirectory != null )
   {
   String  anEncodedName = SZHexCoder.encode( 
aName.getBytes() );
   //StringaPath = aDirectory.getPath() + 
File.separator +
 anEncodedName + SZIndex.Extension + File.separator;
   String  aPath = aDirectory.getPath() + File.separator 
+ aName +
 SZIndex.Extension + File.separator;
   SZIndex anIndex = new SZIndex();
   
   anIndex.setPath( aPath );
   
   

Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread Otis Gospodnetic

Hello,

   and what was their ulimit and what is the
  ulimit on your OSX machine?
  Just curious.
 
 I don't know. Does it matter?

Of course it does - a low (u)limit is a part of your problem, perhaps.

Otis
P.S.
I don't know how Winblows deals with file descriptors.  Try your
application on some other flavour of Unix, if possible.


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Options for sorting on an integer or date

2002-05-01 Thread Otis Gospodnetic

Hello,

--- Joel Bernstein [EMAIL PROTECTED] wrote:
 At my company we trying to decide on a new search engine.
 I am very impressed with what I see with Lucene and am thinking very
 seriously of not going with AltaVista, FAST etc...

:)

 One of things that is very important to us is sorting by an
 integer or by a date, which Lucene currently cannot do.
 
 So I am thinking about some options I might have here.  I would
 welcome comments from the lucene developers on the options below:
 
 1)  We could wait for the sorting to be added to Lucene.  Is there an
 idea of when this will be added?

There was not much/any discussion about this functionality, so one can
draw a conclusion from that easily :)

 2)  Have my company commission a project from the Lucene team to add
 this
 functionality soon.  Does the Lucene team do commissioned work?

Commission in what sense?  The $en$e?
I think payment is out of question, but I would encourage you to take
the current Lucene snapshot, or maybe the next release, which is
imminent, and add this functionality to Lucene.
It sounds like if Lucene doesn't have this functionality you'll have to
spend a good amount of dollars anyway.
Damn, I'm not a very good salesman :)

 3)  Add the sorting code with guidance from the Lucene team and from
 a search engine expert that works with our company.

I can't help with that, but maybe somebody else can.

 4) Re-sort the results in the application that is using Lucene.  This
 is the least attractive because our result-sets can be very
 large and I think we will have performance problems.

That's the simpliest and the 'hackiest' solution. :(

Otis


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: indexing PDF files

2002-05-01 Thread Otis Gospodnetic

  Hm, this should be a FAQ.
 
 Maybe it should... ;-)

It is now.

  Check Lucene contributions page, there are some starting points
 there,
 
 Well, this seems to be a very popular request... In fact I need 
 something like that also. Unfortunately, there seems to be no 
 authoritative answer as far as converting pdf files to text in a pure
 
 Java environment... Maybe I'm missing something here as usual?
 
 Also, on a related note, what would be a good approach to convert any
 
 random document into pdf? I was thinking to have a two steps process
 for 
 document indexing in Lucene:
 
 - First, convert everything to pdf (with Acrobat or something)
 - Second, convert pdf to text and index it.
 
 Any practical suggestions about how to do that in a pure Java 
 environment very welcome.

Wouldn't you want to convert to XML instead and use XSLT to transform
the XML representation to any desired format by just applying a style
sheet?
Sounds like less work with bigger document type coverage.

Otis


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: term search speeds

2002-05-01 Thread Otis Gospodnetic

Caching?
The OSes usually cache recently opened files...

Otis

--- a person [EMAIL PROTECTED] wrote:
 Does anyone know exactlty why when searching for a term the engine is
 much slower on the first search of a term, than on subsequent searchs
 of the same term?
 
 Thanks
 
 
 Join 18 million Eudora users by signing up for a free Eudora Web-Mail
 account at http://www.eudoramail.com
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: 3 Times Isn't a Charm for me and Lucene

2002-05-02 Thread Otis Gospodnetic

Uh, this is a very broad question.
A number of things could be wrong.
Look at your Tomcat log files.
Write a class that you can run from the command line, not as a servlet,
that may be easier to debug.  You can use one of the demo ones to get
started.
Log things, don't catch exceptions and ignore them, etc.
Check that your index directory exists, that it is readable by the user
doing the searchs, etc. etc.

Otis

--- James Rozee [EMAIL PROTECTED] wrote:
 I've just recently recoded my entire website and search engine to use
 Tomcat 4.0.3, Velocity, MySQL and Lucene 1.2-rc4.  I have been using
 MySQL
 and servlets for a few years now.  However, I only recently started
 using
 Lucene.  I've built a Lucene index from my document collection and
 now I
 need to be able to search it from a servlet.
 
 My first attempt to do this causes Tomcat to return a page that is
 empty.
 Can anyone give me some advice on how to track down my problem?  My
 hardware is an SS1000E with 5 SM81s and 1.2GB RAM.  Thanks.
 
 James
 
 *
 The Game Development Search Engine
 and DQuest E-zine
 http://www.gdse.com/
 
 A Member of the Future Games Network
 http://www.fgn.com/
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Stemming

2002-05-02 Thread Otis Gospodnetic

You could have a single index with both stemmed and non-stemmed terms,
using different field names for each and searching a different set of
fields depending on the type of search.
You'd also have to use 2 types of analyzers/filters, I think.
Roughly :)

Otis


--- Joel Bernstein [EMAIL PROTECTED] wrote:
 In our search application the user can turn stemming off and on.
 
 With Lucene will I have to maintain two sets of indexes to create
 this functionality, one
 stemming and one non-stemming index?
 
 Or
 
 Is there a way to query a stemming index so that it does not return
 stems?
 
 
 Thanks,
 Joel
 


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene Book

2002-05-03 Thread Otis Gospodnetic

I don't think there are any on the market.
A perfect opportunity for somebody :)

Otis

--- William W [EMAIL PROTECTED] wrote:
 
 Hi All,
 Do you know some book about Lucene ?
 Thanks,
 William.
 
 _
 MSN Photos is the easiest way to share and print your photos: 
 http://photos.msn.com/support/worldwide.aspx
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Any one used websearch - Need Help Please

2002-05-06 Thread Otis Gospodnetic

Hello,

The host that you are trying to crawl cannot be looked up:

bash-2.04$ nslookup www.violet-arcana.com
Server:  localhost.apache.org
Address:  127.0.0.1

*** localhost.apache.org can't find www.violet-arcana.com: Non-existent
host/domain


This is not a Lucene issue, but more of a networking issue, so I
suggest you talk to some network/system administrators about this.
They'll have an answer for you.

Good luck,
Otis


--- Moturu,Praveen [EMAIL PROTECTED] wrote:
 Hi All, Has any one used websearch.. If so can you please help
 me.
 
 I am trying to use the demo files.. When I do the index the demo site
 I am
 getting the following message and when I try the examples search form
 and
 enter rock or red as described I am not getting any search results...
 
 START CRAWLING index exists, delete all files deleting 0 records
 SCANNING :
 http://localhost/websearch/bot.jsp *status: bad SCANNING :
 http://www.violet-arcana.com/ *status: java.net.UnknownHostException:
 www.violet-arcana.com DONE CRAWLING links crawled
 http://localhost/websearch/bot.jsp http://www.violet-arcana.com/ 
 
 Any help is highly appreciated
 
 Thanks
  Praveen Moturu
  
  
  
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: WildcardQuery

2002-05-07 Thread Otis Gospodnetic

Yes, me too.  I just tried it on some Lucene index (the search at
blink.com) and it doesn't seem to work (try searching for travel and
then *vel).
I'm assuming the original poster confused something...

Otis

--- Joel Bernstein [EMAIL PROTECTED] wrote:
 I thought Lucene didn't support left wildcards like the following:
 
 *ucene
 
 - Original Message -
 From: Christian Schrader [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Monday, May 06, 2002 7:14 PM
 Subject: WildcardQuery
 
 
  I am pretty happy with the results of WildcardQueries like *ucen*
 that
  matches lucene, but *lucene* doesn't match lucene. Is there a
 reason for
  this? And what would be the patch.
  It should be in WildcardTermEnum. I am wondering if somebody
 already
 patched
  it?
 
  Thanks, Chris
 
 
  --
  To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Searching greater than/less than

2002-05-21 Thread Otis Gospodnetic

Hello,

I believe that is not possible with Lucene.
Although there is something called a RangeQuery, which may be helpful.

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/RangeQuery.html

Otis

--- Victor Hadianto [EMAIL PROTECTED] wrote:
 Can I use lucene to search greater than / less than a value in the
 field? I 
 have a field in the document that function as a score. I would need
 to be 
 able to search the index + the option having to say a field  50
 
 Regards,
 
 -- 
 Victor Hadianto
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Merging (adding) indices

2002-05-27 Thread Otis Gospodnetic

The source code looks like this:

  public final synchronized void addIndexes(Directory[] dirs)
  throws IOException {
optimize();   // start with zero or
1 seg
for (int i = 0; i  dirs.length; i++) {
  SegmentInfos sis = new SegmentInfos();  // read infos from
dir
  sis.read(dirs[i]);
  for (int j = 0; j  sis.size(); j++) {
segmentInfos.addElement(sis.info(j)); // add each info
  }
}
optimize();   // final cleanup
  }

So I think the original directories/indices should not be modified in
any way.  Are you sure your application is not deleting them?

Otis



--- Lex Lawrence [EMAIL PROTECTED] wrote:
 Hello-
 I am using org.apache.lucene.index.IndexWriter.addIndexes(Directory[]
 dirs) 
 to merge several indices into one.  The resulting index appears to
 work 
 fine, but afterward the original indices seem to have been completely
 
 emptied.
 
 I can deal with that, but I just wanted to check: Is this method
 supposed to 
 alter the indices in the 'dirs' parameter?  It's not mentioned in the
 
 javadoc.
 
 Thanks- Lex
 
 _
 Chat with friends online, try MSN Messenger: http://messenger.msn.com
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Partial word search with unicode contents

2002-06-04 Thread Otis Gospodnetic

Hello,

A query for india should not be returning southindia (one word).
It sounds like something else is happening in your application.

Otis

--- Harpreet S Walia [EMAIL PROTECTED] wrote:
 Hi,
 
 We are using lucene to index and search unicode(utf-8) contents in
 devnagari(hindi) language .
 
 What we have observed is that our query fetches results which have
 partial
 word match . i.e if it were english then a query india would relurn
 words
 like
 indian , southindia and so on.
 
 Is there a way by which we can instruct lucene to only search
 complete words
 and not word parts.
 
 TIA
 
 Regards
 harpreet
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Opening and index as ready only

2002-06-04 Thread Otis Gospodnetic

I believe what you are referring to is on Lucene's TODO list, possibly
for the next release.
One or two people have already contributed some code for Lucene on
read-only media such as CD-ROM, so you may want to check the mailing
list archives for the code if this is urgent for you.

Otis


--- Paul Dlug [EMAIL PROTECTED] wrote:
 Is there anyway to open an index as read-only? I get an IOException
 with
 Permission Denied when I change the index to a set of read-only file
 permissions. I have a cluster of search servers with the index on an
 NFS
 mount. I'd like to be able to have them all open and search the index
 at
 the same time. A single IndexWriter would be used to add new
 documents.
 Is there any way to do this?
 
 Thanks,
 Paul
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: searching with wild cards ignoers analyzer?

2002-06-04 Thread Otis Gospodnetic

Dobro jutro,

Dario, maybe this answers your question:
http://www.jguru.com/faq/view.jsp?EID=538312

Otis

--- Dario Novakovic [EMAIL PROTECTED] wrote:
 i index/search with anlyzer which converts all characters to
 lowercase. it 
 works corectly until i use *, then i must use query strings with
 exact 
 capitalization. why is that, am i doing something wrong?
 
 thanks for any answer
 
 dario
 
 _
 Get your FREE download of MSN Explorer at
 http://explorer.msn.com/intl.asp.
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: lucene and java naming conventions

2002-06-04 Thread Otis Gospodnetic

Dario,

Yes, we may improve coding style over time, but there are no plans for
doing that in the immediate future.  I know, it's not ideal, so we all
have to get used to those few exceptions.

Otis

--- Dario Novakovic [EMAIL PROTECTED] wrote:
 i noticed that some method names in lucene start with upercase, and
 it is 
 realy confusing for me because i allways think it is some inner
 classes. 
 java naming convention suggest that method names starts with
 lowercase and 
 lucene is my first source code expirience that oposes naming
 conventions.
 i don't want to teach developers how to code, i  just want to ask is
 there 
 any reasons for that and to suggest them to consider changes to
 source code 
 to comply with conventions.
 
 thanks
 
 dario
 
 _
 Send and receive Hotmail on your mobile device: http://mobile.msn.com
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: lucen compared to other open source solutions

2002-06-04 Thread Otis Gospodnetic

I haven't used Swish-e, but I remember looking at it years ago, and
from what I remember it wasn't nowhere nearly as scalable as Lucene,
and it did not support various types of queries that Lucene supports. 
Maybe things have changed since then.
You can look at http://www.searchtools.com/ for some additional
information.

Otis


--- degetel [EMAIL PROTECTED] wrote:
 Hi,
 
 I have a small question.
 I am quiet new in this field of indexing  searching content.
 I already used lucene in aproject  it was succesfull !
 
 now I have to consider other solutions.
 Do you know where I can find some arguments to choose lucene compared
 to the
 swish-e solution ?
 functionnal differences ?
 scalability ?
 performances ?
 
 is there any benchamrks somewhere ?
 
 thanks
 roland
 
 -Message d'origine-
 De : Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
 Envoye : mercredi 5 juin 2002 00:23
 A : Lucene Users List
 Objet : Re: lucene and java naming conventions
 
 
 Dario,
 
 Yes, we may improve coding style over time, but there are no plans
 for
 doing that in the immediate future.  I know, it's not ideal, so we
 all
 have to get used to those few exceptions.
 
 Otis
 
 --- Dario Novakovic [EMAIL PROTECTED] wrote:
  i noticed that some method names in lucene start with upercase, and
  it is
  realy confusing for me because i allways think it is some inner
  classes.
  java naming convention suggest that method names starts with
  lowercase and
  lucene is my first source code expirience that oposes naming
  conventions.
  i don't want to teach developers how to code, i  just want to ask
 is
  there
  any reasons for that and to suggest them to consider changes to
  source code
  to comply with conventions.
 
  thanks
 
  dario
 
  _
  Send and receive Hotmail on your mobile device:
 http://mobile.msn.com
 
 
  --
  To unsubscribe, e-mail:
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
 
 
 
 __
 Do You Yahoo!?
 Yahoo! - Official partner of 2002 FIFA World Cup
 http://fifaworldcup.yahoo.com
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Document Object

2002-06-05 Thread Otis Gospodnetic

As far as I know there is no generic way to do that.
You can parse the String in your application, form Fields, add them to
a Document, and there you go, but there is nothing generic.  Besides
field names and values, your String would also have to contain meta
data about each field, whether it is to be indexed, unindexed,
tokenized or not tokenized, etc.
e.g.
field1:value1Keyword, field2:value2UnStored

Maybe there are better approaches.  This is just the first thing that
came to mind.

Good luck, and if you implement something generic please contribute it
to the project.

Thanks,
Otis


--- Pradeep Kumar K [EMAIL PROTECTED] wrote:
 Hi all
 
 Is there any way to type cast a String Object to Document object.
 
 ie, Document object can be converted to its String from by using
 method 
 'toString()'. How we can convert it back to Document object.
 
 Any help will be greatly appreciated.
 
 Regards
 Pradeep
 
 
 --
 Robosoft Technologies, Mangalore, India
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: status of ? wildcard queries in rc5

2002-06-09 Thread Otis Gospodnetic

David,

As far as I can tell the '?' character works as it should with
WildcardQuery.  See
src/test/org/apache/lucene/search/TestWildcard.java.
The tests there use SimpleAnalyzer and WildcardQuery directly (i.e. not
QueryParser).  All tests pass.  Try comparing your code with the code
in the above test class.

Otis


--- [EMAIL PROTECTED] wrote:
 I've searched the mail archive and I'm still a bit confused as to the
 current status of ? wildcard queries.  My experience, using
 lucene-1.2-RC5, is that ? wildcard queries are unsupported using
 the
 StandardAnalyzer or SimpleAnalyzer.
 
 For example, the following search on two fields (go_id and go_desc)
 (using StandardAnalyzer for indexing and searching):
 
 %java Search ./index +go_id:5737 +go_desc:biosynthesis
 Result:
 go_id:4853, 6783, 5737
 go_desc:uroporphyrinogen decarboxylase, heme biosynthesis, cytoplasm
 Score: 1.0
 
 using * wildcard:
 %java Search ./index +go_id:5737 +go_desc:biosynth*sis
 Result:
 go_id:4853, 6783, 5737
 go_desc:uroporphyrinogen decarboxylase, heme biosynthesis, cytoplasm
 Score: 1.0
 
 using ? wildcard:
 %java Search ./index +go_id:5737 +go_desc:biosynth?sis
 Noresults
 
 Is this the expected behavior for RC5, a reported bug, or an
 unreported bug?
 
 thanks,
 --David M. Goodstein



__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Problem in unicode field value retrival

2002-06-10 Thread Otis Gospodnetic

Hello,

 That was the problem , Thanks :-) . still i am strugling to get
 lucene to
 search non english unicode content . it works partially will simple
 analyser
 but doesn't return any results with standard analyser . is there a
 way by
 which i can output the exact contents that are going into the index  

Perhaps something like this will help.  This is a very recent post from
the searchable mailing list archives at http://nagoya.apache.org/:

http://nagoya.apache.org/eyebrowse/ReadMsg?[EMAIL PROTECTED]msgId=352570

Otis


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Within Search

2002-06-10 Thread Otis Gospodnetic

Hello,

I'm sending this to lucene-user list, as that seems more appropriate.
I haven't used Lucene's slop feature, but it looks like both
QueryParser and PhraseQuery have support for slop.  I am not sure what
the syntax for it is, but if nothign else you should be able to call
setSlop(int) method on an instance of PhraseQuery.

Oh, it looks like you missed it in the Query Parser Syntax document:
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

Otis


--- none none [EMAIL PROTECTED] wrote:
 hi,
 i asked some help about this feature some time ago, but no answer.
 What do i need to do is the WithinPhraseSearch. An example can be:
 
 search for:  car w/10 rent.
 
 This mean, look for documents that contains 'car' and within 10 words
 'rent'. So, what i think i need is:
 
 1.Change the QueryParser.jj to reconize the operator w/xx as the
 within operator.
 
 2.The QueryParser should return a PhraseQuery with a slop factor
 equals to '10' for the example above. Should also ignore w/xx if xx
 is not numeric.
 
 An other question: what should i do if i want the query operator
 (AND,OR,NOT,etc) to be case insensitive? what should i change inside
 the QueryParser.jj ? 
 
 PLEASE HELP, because i really don't know how to use the JavaCC
 utility.
 
 Thanks,
 bye.
 
  
 
 
 ___
 WIN a first class trip to Hawaii.  Live like the King of Rock and
 Roll
 on the big Island. Enter Now!

http://r.lycos.com/r/sagel_mail/http://www.elvis.lycos.com/sweepstakes
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: How does simple analyser work

2002-06-11 Thread Otis Gospodnetic


--- Harpreet S Walia [EMAIL PROTECTED] wrote:
 Hi,
 
 Are there any resources available which explain how the simple
 analyser processes the data given to it . 
 what i want to know is that suppose i have a set of words , what
 exact rules are applied to tokenize and index these words and how can
 i customize them. 
 
 My requirement is that the words be broken only by spaces and not at
 any other character . I understand that this can be done by writing 
 a parser in JAVACC . but is there any simpler way of achieving this .

Actually, this can be done by writing your own custom Analyzer.
Check this:
./org/apache/lucene/analysis/standard/StandardAnalyzer.java
./org/apache/lucene/analysis/Analyzer.java
./org/apache/lucene/analysis/de/GermanAnalyzer.java
./org/apache/lucene/analysis/SimpleAnalyzer.java
./org/apache/lucene/analysis/StopAnalyzer.java
./org/apache/lucene/analysis/WhitespaceAnalyzer.java

Maybe this last one is what you are looking for.

Otis


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Question about RangeQuery and strings...

2002-06-11 Thread Otis Gospodnetic

James,

I haven't used RangeQueries, but what you describe does sound confusing
to me.  I'll enter it as a bug, just so this information doesn't get
lost, because I am not certain that this is really a bug, even though
it sounds like one to me.

Thanks,
Otis


--- James Ricci [EMAIL PROTECTED] wrote:
 I'm replying to my own message because I think I now understand the
 problem,
 and part of it is, in my opinion, a bad implementation of
 RangedQuery.
 
 When you create a ranged query and omit the lower term, my
 expectation would
 be that I would find everything less than the upper term. Now if I
 pass
 false for the inclusive term, then I would expect that I would find
 all
 terms less than the upper term excluding the upper term itself.
 
 What is happening in the case of lower_term=null, upper_term=x,
 inclusive=false is that empty strings are being excluded because
 inclusive
 is set false, and the implementation of RangedQuery creates a default
 lower
 term of Term(fieldName, ). Since it's not inclusive, it excludes
 . This
 isn't what I intended, and I don't think it's what most people would
 imagine
 RangedQuery would do in the case I've mentioned.
 
 I equate lower=null, upper=x, inclusive=false to Field  x.
 lower=null,
 upper=x, inclusive=true would be Field = x. In both cases, the only
 difference should be whether or not Field = x is true for the query.
 
 I'm still quite new to Lucene, so maybe I'm wrong about all this
 because I
 just don't understand it well enough. If so, could someone tell me
 where
 I've gone astray?
 
 Thanks much,
 
 James
 
 PS: The rest of the problems I had below I was able to fix by
 changing how
 the fields were tokenized and indexed.
 
   -Original Message-
  From:   James Ricci  
  Sent:   Thursday, June 06, 2002 11:16 AM
  To: '[EMAIL PROTECTED]'
  Subject:Question about RangeQuery and strings...
  
  Hi all,
  
  I've been having some problems using RangeQuery. I have a simple
 Query
  which is essentially document.field  AB. Field values are:
  
   // Empty string
  A SPACE
  A123456
  ABC
  
  Now I expected to find the first three of the four values (and I do
 with
  another commercial search engine product I've worked with). With
 Lucene I
  get nothing. Part of the problem I think is that there are some
 issues
  with case here. Changing my query to document.field  ab returns:
  
  A123456
  
  Now I would have expected A SPACE to get returned, and I was
 really
  surprised that  wasn't returned. I'm guessing that  wasn't
 returned
  because no term in the field passed the query criteria, and empty
 string
  is not considered a term.
  
  How should I go about getting what I expect? What is going on here?
  
  Thanks much,
  
  James
  
  
  
  
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Are IndexReader objects always up to date?

2002-06-11 Thread Otis Gospodnetic

Hm, this sounds an awful lot like a FAQ, yet I don't see it in Lucene's
FAQ at jGuru.com.
You need to close and reopen the index(reader) if you want to see the
latest changes.
There is a method that you can use to figure out if the index has been
modified since you opened it.

Otis

--- James Ricci [EMAIL PROTECTED] wrote:
 Hi,
 
 If I have an IndexReader object open, and someone else is using an
 IndexWriter to update the contents of an index, will my IndexReader
 automatically reflect the current contents of the index? If not, what
 must I
 do to refresh it?
 
 James
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Are IndexReader objects always up to date?

2002-06-11 Thread Otis Gospodnetic

I don't think there is anything else.  That is how I wrote applications
that used Lucene at my previous job.  It worked, but those indices
changed only hourly.

Otis

--- James Ricci [EMAIL PROTECTED] wrote:
 Otis,
 
 Thanks. This seems to agree with what I've seen myself. The system
 I'm
 working on is extremely dynamic, so this will be an issue for me. The
 method
 I think you're talking about is IndexReader.lastModified. I'm not
 sure this
 actually tells me if the IndexReader I have is up to date, but it
 would tell
 me if there has been a change since I opened it (assuming I have
 saved off
 the open time). Is there something a little more direct?
 
 James
 
 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, June 11, 2002 2:23 PM
 To: Lucene Users List
 Subject: Re: Are IndexReader objects always up to date?
 
 
 Hm, this sounds an awful lot like a FAQ, yet I don't see it in
 Lucene's
 FAQ at jGuru.com.
 You need to close and reopen the index(reader) if you want to see the
 latest changes.
 There is a method that you can use to figure out if the index has
 been
 modified since you opened it.
 
 Otis
 
 --- James Ricci [EMAIL PROTECTED] wrote:
  Hi,
  
  If I have an IndexReader object open, and someone else is using an
  IndexWriter to update the contents of an index, will my IndexReader
  automatically reflect the current contents of the index? If not,
 what
  must I
  do to refresh it?
  
  James
  
  --
  To unsubscribe, e-mail:  
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
  
 
 
 __
 Do You Yahoo!?
 Yahoo! - Official partner of 2002 FIFA World Cup
 http://fifaworldcup.yahoo.com
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Thread safety

2002-06-11 Thread Otis Gospodnetic

Thanks for this table.  It's part of the Lucene FAQ at jGuru now:
http://www.jguru.com/forums/view.jsp?EID=910778

Otis

--- Mark Harwood [EMAIL PROTECTED] wrote:
 I've been trying to understand the multithreaded behaviour of Lucene
 too.
 
 I have a test rig and the observed results are available here:
 
 http://home.clara.net/markharwood/lucene/threads.htm
 
 I would be interested in having these observations verified.
 
 
 Cheers
 Mark
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Memory-based indexing

2002-06-12 Thread Otis Gospodnetic

Yes, there are a few things one can do.  See
http://nagoya.apache.org/eyebrowse/ReadMsg?[EMAIL PROTECTED]msgId=117057

Otis


--- James Ricci [EMAIL PROTECTED] wrote:
 I've been doing a few tests, and I'm finding creating an index in
 Lucene to
 be somewhat slower than other engines I've worked with. Is there a
 way to
 cache, batch, or otherwise speed up indexing of a large number of
 documents?
 This is mainly a problem when creating the index for the first time.
 
 James
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Thread safety

2002-06-12 Thread Otis Gospodnetic

Yeah, I think you are right, that matrix isn't 100% correct.
I'll have to change it...thanks for checking.

Otis

--- David Smiley [EMAIL PROTECTED] wrote:
 Maybe I'm just not with it right now... but that matrix doesn't seem 
 to make sense to me.  From my understanding, two write requests 
 cannot happen concurrently, yet there's a Y in that box on the 
 matrix.  Also, /shouldn't/ the matrix be symmetric?  It isn't.  If it
 
 is intended to me, I think only half of the matrix should be there as
 
 to not be confusing.
 
 ~ Dave Smiley
 
 On Tuesday, June 11, 2002, at 10:12  PM, Otis Gospodnetic wrote:
 
  Thanks for this table.  It's part of the Lucene FAQ at jGuru now:
  http://www.jguru.com/forums/view.jsp?EID=910778
 
  Otis
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Boolean Query + Memory Monster

2002-06-15 Thread Otis Gospodnetic

I don't know about Resin, but Tomcat allows one to set CATALINA_OPTS
(or some other _OPTS) environment variable, whose value is them used to
invoke Java.  I would imagine Resin to have something similar.
This then becomes a Resin question.

Otis

--- Nader S. Henein [EMAIL PROTECTED] wrote:
 I'm all ears .. I'm running the search from a servlet on 
 a resin web server, any suggestions as to increasing the heap
 size in this case ?
 
 
 
 -Original Message-
 From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, June 13, 2002 9:47 PM
 To: 'Lucene Users List'
 Subject: RE: Boolean Query + Memory Monster
 
 
 Use the java -Xmx option to increase your heap size.
 
 Scott
 
  -Original Message-
  From: Nader S. Henein [mailto:[EMAIL PROTECTED]]
  Sent: Thursday, June 13, 2002 12:20 PM
  To: [EMAIL PROTECTED]
  Subject: Boolean Query + Memory Monster
  
  
  
  I have 1 Geg of memory on the machine with the application 
  when I use a normal query it goes well, but when I use a range 
  query it sucks the memory out of the machine and throws a servlet 
  out of memory error, 
  I have 80 000 records in the index and it's 43 MB large
  
  anything people ?
  
  
  Nader S. Henein
  Bayt.com , Dubai Internet City
  Tel. +9714 3911900
  Fax. +9714 3911915
  GSM. +9715 05659557
  www.bayt.com
  
  --
  To unsubscribe, e-mail:   
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail: 
  mailto:[EMAIL PROTECTED]
  
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Deleting document from index

2002-06-22 Thread Otis Gospodnetic

Hello,

First of all, the machine from which you sent this email has the date
set incorrectly - it thinks it's 22. 6. 2000.

--- [EMAIL PROTECTED] wrote:

 I had searched the archive of this list for getting more info on How
 to delete a document from the lucene index.
 But most of the postings talk about IndexReader.delete(docNum). When
 we tried to delete a single document entry from the index , what we
 found is : the whole index got deleted. 

You must be doing something wrong.  Send the relevant piece of code.

 1) Can anyone help us on how we can handle this ?

http://www.jguru.com/faq/view.jsp?EID=492423


public int delete(final String fieldName, final String fieldValue)
throws IOException
{
final IndexReader reader = IndexReader.open(mIndexDir);
final int deleteCount= reader.delete(new Term(fieldName,
fieldValue));
reader.close();
return deleteCount;
}

 2) When the search results will reflect that, the particular document
 which I had deleted ,is not there ?
 Do I need to optimize the index for this ?

You don't need to optimize the index, but I believe you need to close
the IndexReader and re-open IndexSearcher when you detect that the
index has changed.

 3) After adding few more documents to an existing index, what effect
 will it have on search , if I don't optimize
 the index immediately ? Will these new documents will be searchable
 before optimization ?

Yes.

Otis


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Retrieve documents from index by document number

2002-06-25 Thread Otis Gospodnetic

Check the Hits class API:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Hits.html

Otis

--- Chris Sibert [EMAIL PROTECTED] wrote:
 Anybody know how to retrieve a stored document from an index by it's
 document number ? I have a list of search hits, and when the user
 clicks on one, I want to pull the stored document up out of the
 index. 
 
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: IndexReader Pool

2002-06-27 Thread Otis Gospodnetic

I don't think Lucene contains anything to help you create this pool.
However, if you look at Jakarta Commons project you will find a
subproject there that allows you to create pools of any kind of Java
object.  You can probably use that to save yourself development and
debug time.

Otis


--- Nader S. Henein [EMAIL PROTECTED] wrote:
 
 I was going through the lucene-user posts on the web and I came
 accross
 a posting by Scott Oshima 

http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00693.html
 
 witch is talking about creating a IndexReader pool to spead up the
 search
 I've looked into that but I can't fiure out what to use for a
 DataSource 
 like in creating a pool for DB connections, is there an equivalant in
 the 
 lucene architecture or should one just take the initiative.
 
 Nader S. Henein
 Bayt.com , Dubai Internet City
 Tel. +9714 3911900
 Fax. +9714 3911915
 GSM. +9715 05659557
 www.bayt.com
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




  1   2   3   4   5   6   7   8   >