Re: Efficient document spooling and indexing

2001-11-22 Thread Otis Gospodnetic
then. So I would expect the answer to your question to be no. -- Ian. [EMAIL PROTECTED] Otis Gospodnetic wrote: Hello, This is from a thread from about 2 weeks ago. What is the answer to this question? If data is written to disk only when IndexWriter's close() is called

Re: Order of Package Compilation

2001-11-28 Thread Otis Gospodnetic
Why not just use Ant to build Lucene? Otis --- srinivasa v [EMAIL PROTECTED] wrote: Hi all, I got the lucene source files, When I started to compile all packages again in some order, it is giving some error saying some classnot foundthe order in which I compiled is given below.

Re: Indexing other documents type than html and txt

2001-11-29 Thread Otis Gospodnetic
You'd have to write parsers for each of those document types to convert it to text and then index it. Sure, you can feed it something like XML, but then you may consider something like xmldb.org instead. Otis --- Antonio Vazquez [EMAIL PROTECTED] wrote: Hi all, I have a doubt. I know that

Re: Industry Use of Lucene?

2001-12-06 Thread Otis Gospodnetic
It looks like a person at Overture (former Goto.com) is using it. I know ScreamingMedia.com used it at one point. Otis --- Jeff Kunkle [EMAIL PROTECTED] wrote: Does anyone know of any companies or agencies using Lucene for their products/projects? I am attempting to make a marketing pitch

Re: FW: Installation notes

2001-12-06 Thread Otis Gospodnetic
You need to download and install JavaCC. Try this: http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=javaccq=b Otis --- Patrick Codere [EMAIL PROTECTED] wrote: Dear All, I just downloaded the latest version of Lucene, and not being to familiar with java, I would like to get some

RE: existing or not existing

2001-12-06 Thread Otis Gospodnetic
Yes, I would use this, especially the IndexReader methods that you suggested. Otis --- Doug Cutting [EMAIL PROTECTED] wrote: From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] You could try looking for a segments file in the index directory. If it exists, the index exists, else it does

Re: WildcardQuery

2001-12-11 Thread Otis Gospodnetic
If I understand you correctly, you tried to search for '*new*'. I believe you can't use an asterisk (*) as the first query of the query term. So, new* is valid, while *new or *new* is not. Otis --- Serge A. Redchuk [EMAIL PROTECTED] wrote: Hello sampreet, Tuesday, December 11, 2001,

Re: continue ideo-logic error in QueryParser and in BooleanQuery !

2001-12-17 Thread Otis Gospodnetic
Actually, I do not think this is a bug. You cannot make searches with queries that have only the NOT part. You cannot ask Lucene to match all documents that do not contain a certain term. For instance, issuing a 'NOT pretty' will not return doc1, doc3, doc4. You have to use that NOT pretty in

Re: DateFilter and NullPointerException

2001-12-17 Thread Otis Gospodnetic
Hm, do you know which line in DateFilter.java this NPE comes from? Could you try compiling Lucene with the -g switch so that we can see the line numbers in the exception stack trace? If you want you can also submit a bug report at http://nagoya.apache.org/bugzilla/ Thanks, Otis ---

Re: Using a DateFilter without a query

2001-12-17 Thread Otis Gospodnetic
Hello, --- Jan_Stövesand [EMAIL PROTECTED] wrote: Hi, is it possible to use a DateFilter without a query. I would like to get all Documents from within a certain period of time WITHOUT specifying any query except the range of dates. I don't know, but I'd like to know. Have you tried it?

Re: IndexReader/IndexSearcher

2001-12-19 Thread Otis Gospodnetic
Uh, I don't repeat myself, but I'll repeat others' words :) It is the analyzer (StandardAnalyzer, I believe) that lowercases text before indexing it. If you use the same analyzer to search it will lowercase text before performing a search, so you'll find the document with bo23 in it even if you

Re: About indexing

2002-01-11 Thread Otis Gospodnetic
Parag, I'm not sure if I understood your question correctly, but it seems like you want to create a Field that holds the path information (e.g. TEST/subdir1 or TEST/subdir2, and so on), and then include that in the query based on which path(s) you want to search. You could use TEST to search

Re: I want to search on BOTH -- (1) XML data and (2) Text data.

2002-01-12 Thread Otis Gospodnetic
Hello, You could write an XML parser (see http://xml.apache.org/ for some XML tools) and store XML elements as Fields in Lucene Documents. To search for 'Hello' and 'Hello Mr. President!' you can store the whole article body as a Text (or maybe UnStored) Field. You can also look on

Re: Anyone run Linux JVM 1.4 Beta 3 with lucene ?

2002-01-14 Thread Otis Gospodnetic
Oui :) Otis --- Winton Davies [EMAIL PROTECTED] wrote: Hi guys, I'm getting stung by JVM 1.3.1_01 on Linux, max allocation of heap is about 1.9 gb. Anyway, I'm thinking of going to 1.4 ? Anyone run Lucene under this beta ? Cheers, Winton -- Winton Davies Lead

Re: My own steammer (brazilian)

2002-02-13 Thread Otis Gospodnetic
That file is created during the build process. Try building Lucene by typing 'ant compile'. Otis --- Bizu_de_AnĂșncio [EMAIL PROTECTED] wrote: My brazilian steammer has the same structure as the German steammer, except for the inner logic. I created it , tested it and now I'm

Re: using lucene with a very large index

2002-02-13 Thread Otis Gospodnetic
--- tal blum [EMAIL PROTECTED] wrote: Hi, I'm building a very large index, that contains several categories. I have several questions I hope you can answare. 1) Is there a way to use lucene with several indexes without merging them? Look at MultiSearcher class. 2) Does the Document id

RE: Index Locked For Write

2002-02-24 Thread Otis Gospodnetic
? Sorry, that might have been a wrong suggestion, IndexWriter (at least the add method) is supposed to be thread safe. Otis -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 21, 2002 4:07 PM To: Lucene Users List Subject: RE: Index Locked

Re: Performance Tuning

2002-02-25 Thread Otis Gospodnetic
You could try playing with a merge factor... Otis --- Aruna Raghavan [EMAIL PROTECTED] wrote: Hi, Are there any ways to finetune the CPU performance with Lucene? I know of the usage of optimize() calls but I am wondering if there are any other ways to improve the CPU time/Disk space

Re: Boolean Query Parsing with IN keyword

2002-02-26 Thread Otis Gospodnetic
Jonathan, That's most likely caused by StandardAnalyzer, which you are probably using. 'in' is listed as one of the stop words: public static final String[] STOP_WORDS = { a, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, s, such,

Re: Software License

2002-02-26 Thread Otis Gospodnetic
Actually, I think ASL doesn't require this, although it is nice when even commercial entities give credit in some way. I could be wrong about ASL. Otis --- Rafael Luque [EMAIL PROTECTED] wrote: Hi all, I know Lucene is a free project, however I think its use is under Apache Software

Re: SegmentTermPositions throwing nullpointer

2002-03-01 Thread Otis Gospodnetic
Have you got the latest Lucene? Nightly build? Try that, this looks like an old bug that has been fixed. Otis --- Charles Harvey [EMAIL PROTECTED] wrote: We are having some bizarre instances where SegmentTermPositions is throwing nullpointers. It only happens on certain queries, but it

Re: TimeOut Exception when Indexing with EJB (Please Help)

2002-03-05 Thread Otis Gospodnetic
Hello, I think you should just try your two suggestions and see. The answer depends on how exactly you do it, OS configuration, etc. Does this happen on an optimized index, too? Otis --- Tihon One [EMAIL PROTECTED] wrote: Hi all; I've tried to index a 100K text file on a empty Index folder

Re: how to parse XHTML

2002-03-05 Thread Otis Gospodnetic
Terry, These are really not Lucene questions. Lucene will let you index text, but you need to figure out how to parse your XHTML files. Take a look at Jtidy on sf.net, I think Jtidy can help you with parsing XHTML, or perhaps Xerces from xml.apache.org can. Otis --- Terry McGregor [EMAIL

Re: phrase query and slop factor

2002-03-06 Thread Otis Gospodnetic
Wouldn't that depend on how far from each other you wanted to allow them to be? If you have a document with 100 words indexed and you are searching for first second wouldn't you have to set the slop to about 100, just in case the word 'first' is the very first word in the document, and 'second'

Re: Virtual Index

2002-03-06 Thread Otis Gospodnetic
If you prefer the old way (multiple indices) you can do that with Lucene, too. Look at MultiSearcher class. Lucene also supports range queries which may be helpful. I haven't used them, but it sounds like the thing to look at. Otis --- Paul Dlug [EMAIL PROTECTED] wrote: We have a relatively

Re: Lucene throws an ArrayIndexOutOfBoundsException() if the first te rm in my query string is a stopWord

2002-03-07 Thread Otis Gospodnetic
Hm, I've got the latest Lucene (from CVS) and don't have this issue. The query I tried on our index is: +title:of +title:someotherwordthatDOESgetmeresults Otis --- Biswas, Goutam_Kumar [EMAIL PROTECTED] wrote: Dear Lucene Users Lucene throws an ArrayIndexOutOfBoundsException() if the

2 exceptions

2002-03-08 Thread Otis Gospodnetic
Hello, Do these 2 exceptions look familiar to anyone: java.lang.ArrayIndexOutOfBoundsException: 111 at java.util.Vector.elementAt(Vector.java(Compiled Code)) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:136) at

Re: optimize(), delete() calls on IndexWriter

2002-03-08 Thread Otis Gospodnetic
No they don't. Note that delete() is in IndexReader. Otis --- Aruna Raghavan [EMAIL PROTECTED] wrote: Hi, Do calls like optimize() and delete() on the Indexwriter cause a separate thread to be kicked off? Thanks! Aruna. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For

Re: 1.02 download on jakarta.apache.org?

2002-03-08 Thread Otis Gospodnetic
I don't think you are blind. You could get the latest source from the CVS, or wait a few weeks when I hope we will get the new release out... Otis --- Shannon Booher [EMAIL PROTECTED] wrote: Maybe I'm just blind, but Lucene v1.02 does not appear to be available through

Re: 2 exceptions

2002-03-08 Thread Otis Gospodnetic
Just for the list/knowledge archive: I found the source of one of the exceptions in my code: java.io.IOException: Interrupted system call at java.io.RandomAccessFile.seek(Native Method) at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:271) at

Re: QueryParser and Double Quotes

2002-03-10 Thread Otis Gospodnetic
I think there is no way to do that since a double quote is a special character for query parser. There was some discussion about introducing an escape character to allow things like this, but the discussion has not materialized yet. Otis --- Tony Biag [EMAIL PROTECTED] wrote: Is there a way

Re: Maximum indexable data

2002-03-10 Thread Otis Gospodnetic
I haven't heard of any such limit. There is a 'limit' of 10,000 characters on a field length, but that is a limit only because that number is hard coded in the source. However, shouldn't this be very simple for you to test? Index something over and over and see if you ever hit the wall :) Otis

Re: Indexing across multiple servers

2002-03-11 Thread Otis Gospodnetic
This is becoming a FAQ... Not by itself, so you have to write an application to collect the data to be indexed yourself, and then feed it to Lucene. Otis --- Ryan Ogaard [EMAIL PROTECTED] wrote: Does Lucene support the indexing/searching of multiple servers across the network (file servers,

Re: special character handling

2002-03-12 Thread Otis Gospodnetic
It depends on the Analyzer used. Otis --- Aruna Raghavan [EMAIL PROTECTED] wrote: Hi, Does lucene replace all special characters with spaces when it adds the document to the index? Thanks! -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail:

RE: special character handling

2002-03-12 Thread Otis Gospodnetic
This is answered in FAQA: http://jguru.com/faq/view.jsp?EID=538308 --- Aruna Raghavan [EMAIL PROTECTED] wrote: Otis, I am using StandardAnalyzer. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 12, 2002 3:37 PM To: Lucene Users List

Re: size and nos of documents in the index

2002-03-15 Thread Otis Gospodnetic
Parag, Indexing time and index size should be proportional to the size of documents being indexed. Also, I believe a document containing more different, unique terms will result in a larger index size increase than a document containing more duplicates. For instance I am going to bed in a few

Re: Wildcard Searching

2002-03-16 Thread Otis Gospodnetic
Hello, This was a thread on lucene-user initially, but I'm copying lucene-dev as well. Sorry about duplicates. --- Stefan Bergstrand [EMAIL PROTECTED] wrote: Doug Cutting [EMAIL PROTECTED] writes: Just noticed this problem in my program. It seems as if the analyzer passed to

Re: corrupted index

2002-03-16 Thread Otis Gospodnetic
Oh, I just thought of something (wine does body good). Perhaps one could use Runtime (the class) to catch the JVM shutdown and do whatever is needed to prevent index corruption. I believe there are some shutdown hook methods in there that may let you do that. I'm too lazy to look up the API

Re: Lucene Bugs

2002-03-16 Thread Otis Gospodnetic
Hola, I don't have year of search engine writing experience either, but I did look at your reports on Sourceforge earlier and I will try to look at the source to see if they are the right fixes. I haven't used DateFilter, which, I think, you said contains the bug, so no promises, but I'll look.

Re: Multiple field searching

2002-03-19 Thread Otis Gospodnetic
I'm using MultiTermQueryParser and it works for me. Otis --- Tate Jones [EMAIL PROTECTED] wrote: hi, I am trying to search across multiple fields using the following query +keyword:computers +subject:News content:xml or +(keyword:{computers}) +(subject:{News}) content:xml i have

RE: Question Deleting/Reindexing Files

2002-03-20 Thread Otis Gospodnetic
The standard answer is try deleting/adding in batches instead of individually. Seems more efficient, too, if you can write your application that way. That is what you are essentially doing by writing to a separate index and then doing a bunch of deletions, followed by re-additions. I know I'm

Re: Multiple field searching

2002-03-21 Thread Otis Gospodnetic
--- Kelvin Tan [EMAIL PROTECTED] wrote: hmmm...really? My impression was that the ANDs are treated equivalently with +s by the parser, so they're redundant. Correct. The { and }s aren't part of the syntax, are they? I was wondering where those came from. I don't think I've seen them

Re: Older versions of Lucene?

2002-03-21 Thread Otis Gospodnetic
Maybe you can find something in Lucene's old repository on Sourceforge.net. Otis --- Robert A. Decker [EMAIL PROTECTED] wrote: I'm on Java 1.1.8, and can't upgrade beyond that for quite some time due to testing requirements. I've managed to compile in and use the 1.2 StringBuffer class

Re: TokenManager's longs too long

2002-03-21 Thread Otis Gospodnetic
Sorry, no experience with JDK 1.1.8 and Lucene nor JavaCC. Sounds like a question for WebGain folks. Otis --- Robert A. Decker [EMAIL PROTECTED] wrote: I'm stuck on jdk 1.1.8 and can't upgrade for some time. I'm using javacc to create some java code from a .jj file provided by the Lucene

Re: Lucene Bugs

2002-03-19 Thread Otis Gospodnetic
Hello, --- David Smiley [EMAIL PROTECTED] wrote: I have reported bugs about Lucene in the fall of 2001 but no Lucene developer has responded. I am sending this summary as a reminder. My original message to the mailing list is here: [Lucene-dev] More bugs

Re: Lucene Bugs

2002-03-19 Thread Otis Gospodnetic
Hello, SegmentTermEnum.clone(), term == null http://sourceforge.net/tracker/index.php?func=detailaid=451315group_id=3922; atid=103922 Aha, this was a bug, indeed, but it looks like this bug has been fixed about 6 months ago: revision 1.2 date: 2001/10/11 15:14:14; author: scottganyo;

Re: Lucene Bugs

2002-03-19 Thread Otis Gospodnetic
Hello, Has anyone else observed this behaviour? Wrong ordering from Document.fields() http://sourceforge.net/tracker/index.php?func=detailaid=451317group_id=3922; atid=103922 It looks like java.util.Enumeration is used to store the fields, so if Enumeration guarantees order than this should,

Re: Field search matching exact and partial occurence

2002-03-22 Thread Otis Gospodnetic
Aero* Look at Wildcard and Prefix queries. Otis --- RAYMOND Romain [EMAIL PROTECTED] wrote: Hello, Is there a way to do a query where I will find on a filed XX and retrieved the exact or partial matching fields ... for example a query on aero will return aeronef , aerosol, aero-finder

Re: TokenManager's longs too long

2002-03-22 Thread Otis Gospodnetic
www.webgain.com --- Robert A. Decker [EMAIL PROTECTED] wrote: Aren't the webgain people on this mailing list? If not, how do I contact them? I've been looking around the javacc pages, but can only find the email address for this mailing list... thanks, rob On Thu, 21 Mar 2002, Otis

Re: StopFilter-troubles

2002-03-27 Thread Otis Gospodnetic
--- [EMAIL PROTECTED] wrote: Dear Lucene-users, has someone an answer to the following question: If I add a StopFilter to my Analyzer, the stopwords I gave him will be left out the query. So far, so good. But when my query is like this one: (field1 : x) AND (field2 : stopword) AND

RE: StopFilter-troubles

2002-03-27 Thread Otis Gospodnetic
I don't know enough about the query parser to be able to answer that question, but why do you really need those parentheses? It would also be great if you could submit this as a bug at http://jakarta.apache.org/lucene/ Thanks, Otis --- [EMAIL PROTECTED] wrote: Dear all, especially Otis

Re: What do reader-valued Fields do?

2002-03-31 Thread Otis Gospodnetic
This means that you can make searches against that field, but cannot retrieve its original value. Otis --- Robert A. Decker [EMAIL PROTECTED] wrote: What should I use to store and add to my Document a long String? (thousands of characters) I'm still having difficulty understanding what it

Re: corrupted index

2002-04-02 Thread Otis Gospodnetic
Hello, Nobody has contributed a tool that verified index integrity, yet. Is this the latest version of Lucene? Are you hitting the 2GB/file limit? Just some ideas. Otis --- H S [EMAIL PROTECTED] wrote: Dear All, We are experiencing a problem with index updates. We have a fairly large

Re: compiling lucene

2002-04-03 Thread Otis Gospodnetic
JavaCC 2.1 works, too. This is how I have it set up: [otis@linux2 otis]$ ls -al /usr/local/.version/javacc2.1/ total 44 drwxrwxr-x6 otis otis 4096 Jan 28 06:50 . drwxr-xr-x 20 otis otis 4096 Apr 2 23:32 .. drwxrwxr-x3 otis otis 4096 Jan 28 06:50 bin

Re: storing index in third party database.

2002-04-03 Thread Otis Gospodnetic
If you want to store indices in a database search the mailing list archives for SqlDirectory. Once I considered using it for one application at work, so I asked its author about performance. The answer was that it doesn't perform all that well when the index grows, if I recall correctly.

Re: Custom queries

2002-04-05 Thread Otis Gospodnetic
name != pradeep == -name:pradeep I think there is also support for the date query below, but I haven't used it yet, so I don't want to give you any wrong information. Otis --- Pradeep Kumar K [EMAIL PROTECTED] wrote: Hi lucene friends! Is there any way to create custom queries. Just for

Re: JavaCC error when installing with Ant

2002-04-10 Thread Otis Gospodnetic
Ant you have Ant's optional.jar in Ant's lib directory? --- David Black [EMAIL PROTECTED] wrote: Ant returns following error.any ideas? ... lucene-1.2-rc4-src/build.xml:92: Could not create task of type: javacc. Common solutions are to use taskdef to declare your task, or, if this

HTML parser

2002-04-18 Thread Otis Gospodnetic
Hello, I need to select an HTML parser for the application that I'm writing and I'm not sure what to choose. The HTML parser included with Lucene looks flimsy, JTidy looks like a hack and an overkill, using classes written for Swing (javax.swing.text.html.parser) seems wrong, and I haven't tried

Re: HTML parser

2002-04-18 Thread Otis Gospodnetic
PROTECTED] wrote: On Thursday, April 18, 2002, at 10:28 PM, Otis Gospodnetic wrote: Hello, I need to select an HTML parser for the application that I'm writing and I'm not sure what to choose. The HTML parser included with Lucene looks flimsy, JTidy looks like a hack and an overkill

Re: Wildcard query problem with ?

2002-04-19 Thread Otis Gospodnetic
Hm, I just went through all the diffs after RC2 (QueryParser.jj revision 1.3) and I can't see where '?' was dropped. However, one user reported this on February 27th: We just tried adding the ? character to QueryParser.jj under #_TERM_START_CHAR. We noticed that the * was in that list, so we

RE: HTML parser

2002-04-19 Thread Otis Gospodnetic
19, 2002 1:38 AM To: Lucene Users List Subject: Re: HTML parser On Thursday, April 18, 2002, at 10:28 PM, Otis Gospodnetic wrote: :snip Hi Otis, I have an HTML parser built for ANTLR, but it's pretty strict in what it accepts. Not sure how useful it will be for you, but here

RE: Wildcard Searching

2002-04-19 Thread Otis Gospodnetic
Did the change that you mentioned below really work for you? I wrote this class: http://nagoya.apache.org/bugzilla/showattachment.cgi?attach_id=1638 and it looks like the bug is not in QueryParser, but in some Java class (could it be WildcardTermEnum?), since the class does not make use of

Re:_HTML_parser

2002-04-21 Thread Otis Gospodnetic
Laura, http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=Spindleq=b Oops, it's JoBo, not MoJo :) http://www.matuschek.net/software/jobo/ Otis --- [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi Otis, thanks for your reply. I have been looking for Spindle and Mojo for 2 hours but I don't

Re: Error with StandardTokenizer.java and Token.java

2002-04-23 Thread Otis Gospodnetic
Hello, Get the latest version, try again, paste the error if you get it, and use lucene-user list instead, more eyeballs and brains will see your proble on that list. Thanks, Otis --- Jacob Gutierrez [EMAIL PROTECTED] wrote: Hi there... Using the latest version of StandardTokenizer.jj

Re: Cannot compile Lucene

2002-04-24 Thread Otis Gospodnetic
Just curious, what exactly people need to do to 'fix up the exceptions'? Editing of which files to change what to what? I'd just like to document that somewhere, that's why I'm asking... Otis --- Robert A. Decker [EMAIL PROTECTED] wrote: I got it working under Project Builder. You just have

Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Otis Gospodnetic
Morning, I'm starting to wander how bullet proof are Lucene indexes? Do they get corrupted easely? If so is there a way to rebuild them? There is no tool to detect index corruption, fixing of indexing, nor index rebuilding. The last one anyone can/has to do on their own. I'm started to

Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Otis Gospodnetic
Hello, There is no tool to detect index corruption, fixing of indexing, nor index rebuilding. The last one anyone can/has to do on their own. :-( Well, that *very* sad to say the least... How do I know if my indexes are not corrupted even if everything seems to be working fine?

Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread Otis Gospodnetic
--- petite_abeille [EMAIL PROTECTED] wrote: I don't know what environment you're using Lucene in. The problem seems to be specially bad on osx (10.1.4 + JRE 1.3.1 + latest updates). Does this mean you tried it on other OSs and it worked? Which ones? What JDK did those have and what was

Re: FileNotFoundException: code example

2002-04-29 Thread Otis Gospodnetic
Hello, I'll put my comments inline... --- petite_abeille [EMAIL PROTECTED] wrote: Hello again, attached is the source code of the only class interacting directly with Lucene in my app. Sorry for not providing a complete test case as it's hard for me to come up with something self

Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread Otis Gospodnetic
Hello, and what was their ulimit and what is the ulimit on your OSX machine? Just curious. I don't know. Does it matter? Of course it does - a low (u)limit is a part of your problem, perhaps. Otis P.S. I don't know how Winblows deals with file descriptors. Try your application on

Re: Options for sorting on an integer or date

2002-05-01 Thread Otis Gospodnetic
Hello, --- Joel Bernstein [EMAIL PROTECTED] wrote: At my company we trying to decide on a new search engine. I am very impressed with what I see with Lucene and am thinking very seriously of not going with AltaVista, FAST etc... :) One of things that is very important to us is sorting by

Re: indexing PDF files

2002-05-01 Thread Otis Gospodnetic
Hm, this should be a FAQ. Maybe it should... ;-) It is now. Check Lucene contributions page, there are some starting points there, Well, this seems to be a very popular request... In fact I need something like that also. Unfortunately, there seems to be no authoritative answer as

Re: term search speeds

2002-05-01 Thread Otis Gospodnetic
Caching? The OSes usually cache recently opened files... Otis --- a person [EMAIL PROTECTED] wrote: Does anyone know exactlty why when searching for a term the engine is much slower on the first search of a term, than on subsequent searchs of the same term? Thanks Join 18 million

Re: 3 Times Isn't a Charm for me and Lucene

2002-05-02 Thread Otis Gospodnetic
Uh, this is a very broad question. A number of things could be wrong. Look at your Tomcat log files. Write a class that you can run from the command line, not as a servlet, that may be easier to debug. You can use one of the demo ones to get started. Log things, don't catch exceptions and ignore

Re: Stemming

2002-05-02 Thread Otis Gospodnetic
You could have a single index with both stemmed and non-stemmed terms, using different field names for each and searching a different set of fields depending on the type of search. You'd also have to use 2 types of analyzers/filters, I think. Roughly :) Otis --- Joel Bernstein [EMAIL

Re: Lucene Book

2002-05-03 Thread Otis Gospodnetic
I don't think there are any on the market. A perfect opportunity for somebody :) Otis --- William W [EMAIL PROTECTED] wrote: Hi All, Do you know some book about Lucene ? Thanks, William. _ MSN Photos is the easiest way to

Re: Any one used websearch - Need Help Please

2002-05-06 Thread Otis Gospodnetic
Hello, The host that you are trying to crawl cannot be looked up: bash-2.04$ nslookup www.violet-arcana.com Server: localhost.apache.org Address: 127.0.0.1 *** localhost.apache.org can't find www.violet-arcana.com: Non-existent host/domain This is not a Lucene issue, but more of a

Re: WildcardQuery

2002-05-07 Thread Otis Gospodnetic
Yes, me too. I just tried it on some Lucene index (the search at blink.com) and it doesn't seem to work (try searching for travel and then *vel). I'm assuming the original poster confused something... Otis --- Joel Bernstein [EMAIL PROTECTED] wrote: I thought Lucene didn't support left

Re: Searching greater than/less than

2002-05-21 Thread Otis Gospodnetic
Hello, I believe that is not possible with Lucene. Although there is something called a RangeQuery, which may be helpful. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/RangeQuery.html Otis --- Victor Hadianto [EMAIL PROTECTED] wrote: Can I use lucene to search greater

Re: Merging (adding) indices

2002-05-27 Thread Otis Gospodnetic
The source code looks like this: public final synchronized void addIndexes(Directory[] dirs) throws IOException { optimize(); // start with zero or 1 seg for (int i = 0; i dirs.length; i++) { SegmentInfos sis = new SegmentInfos(); //

Re: Partial word search with unicode contents

2002-06-04 Thread Otis Gospodnetic
Hello, A query for india should not be returning southindia (one word). It sounds like something else is happening in your application. Otis --- Harpreet S Walia [EMAIL PROTECTED] wrote: Hi, We are using lucene to index and search unicode(utf-8) contents in devnagari(hindi) language .

Re: Opening and index as ready only

2002-06-04 Thread Otis Gospodnetic
I believe what you are referring to is on Lucene's TODO list, possibly for the next release. One or two people have already contributed some code for Lucene on read-only media such as CD-ROM, so you may want to check the mailing list archives for the code if this is urgent for you. Otis ---

Re: searching with wild cards ignoers analyzer?

2002-06-04 Thread Otis Gospodnetic
Dobro jutro, Dario, maybe this answers your question: http://www.jguru.com/faq/view.jsp?EID=538312 Otis --- Dario Novakovic [EMAIL PROTECTED] wrote: i index/search with anlyzer which converts all characters to lowercase. it works corectly until i use *, then i must use query strings with

Re: lucene and java naming conventions

2002-06-04 Thread Otis Gospodnetic
Dario, Yes, we may improve coding style over time, but there are no plans for doing that in the immediate future. I know, it's not ideal, so we all have to get used to those few exceptions. Otis --- Dario Novakovic [EMAIL PROTECTED] wrote: i noticed that some method names in lucene start

Re: lucen compared to other open source solutions

2002-06-04 Thread Otis Gospodnetic
arguments to choose lucene compared to the swish-e solution ? functionnal differences ? scalability ? performances ? is there any benchamrks somewhere ? thanks roland -Message d'origine- De : Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Envoye : mercredi 5 juin 2002 00:23

Re: Document Object

2002-06-05 Thread Otis Gospodnetic
As far as I know there is no generic way to do that. You can parse the String in your application, form Fields, add them to a Document, and there you go, but there is nothing generic. Besides field names and values, your String would also have to contain meta data about each field, whether it is

Re: status of ? wildcard queries in rc5

2002-06-09 Thread Otis Gospodnetic
David, As far as I can tell the '?' character works as it should with WildcardQuery. See src/test/org/apache/lucene/search/TestWildcard.java. The tests there use SimpleAnalyzer and WildcardQuery directly (i.e. not QueryParser). All tests pass. Try comparing your code with the code in the

Re: Problem in unicode field value retrival

2002-06-10 Thread Otis Gospodnetic
Hello, That was the problem , Thanks :-) . still i am strugling to get lucene to search non english unicode content . it works partially will simple analyser but doesn't return any results with standard analyser . is there a way by which i can output the exact contents that are going into

Re: Within Search

2002-06-10 Thread Otis Gospodnetic
Hello, I'm sending this to lucene-user list, as that seems more appropriate. I haven't used Lucene's slop feature, but it looks like both QueryParser and PhraseQuery have support for slop. I am not sure what the syntax for it is, but if nothign else you should be able to call setSlop(int)

Re: How does simple analyser work

2002-06-11 Thread Otis Gospodnetic
--- Harpreet S Walia [EMAIL PROTECTED] wrote: Hi, Are there any resources available which explain how the simple analyser processes the data given to it . what i want to know is that suppose i have a set of words , what exact rules are applied to tokenize and index these words and how

RE: Question about RangeQuery and strings...

2002-06-11 Thread Otis Gospodnetic
James, I haven't used RangeQueries, but what you describe does sound confusing to me. I'll enter it as a bug, just so this information doesn't get lost, because I am not certain that this is really a bug, even though it sounds like one to me. Thanks, Otis --- James Ricci [EMAIL PROTECTED]

Re: Are IndexReader objects always up to date?

2002-06-11 Thread Otis Gospodnetic
Hm, this sounds an awful lot like a FAQ, yet I don't see it in Lucene's FAQ at jGuru.com. You need to close and reopen the index(reader) if you want to see the latest changes. There is a method that you can use to figure out if the index has been modified since you opened it. Otis --- James

RE: Are IndexReader objects always up to date?

2002-06-11 Thread Otis Gospodnetic
saved off the open time). Is there something a little more direct? James -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 11, 2002 2:23 PM To: Lucene Users List Subject: Re: Are IndexReader objects always up to date? Hm, this sounds

Re: Thread safety

2002-06-11 Thread Otis Gospodnetic
Thanks for this table. It's part of the Lucene FAQ at jGuru now: http://www.jguru.com/forums/view.jsp?EID=910778 Otis --- Mark Harwood [EMAIL PROTECTED] wrote: I've been trying to understand the multithreaded behaviour of Lucene too. I have a test rig and the observed results are

Re: Memory-based indexing

2002-06-12 Thread Otis Gospodnetic
Yes, there are a few things one can do. See http://nagoya.apache.org/eyebrowse/ReadMsg?[EMAIL PROTECTED]msgId=117057 Otis --- James Ricci [EMAIL PROTECTED] wrote: I've been doing a few tests, and I'm finding creating an index in Lucene to be somewhat slower than other engines I've worked

Re: Thread safety

2002-06-12 Thread Otis Gospodnetic
cannot happen concurrently, yet there's a Y in that box on the matrix. Also, /shouldn't/ the matrix be symmetric? It isn't. If it is intended to me, I think only half of the matrix should be there as to not be confusing. ~ Dave Smiley On Tuesday, June 11, 2002, at 10:12 PM, Otis

RE: Boolean Query + Memory Monster

2002-06-15 Thread Otis Gospodnetic
I don't know about Resin, but Tomcat allows one to set CATALINA_OPTS (or some other _OPTS) environment variable, whose value is them used to invoke Java. I would imagine Resin to have something similar. This then becomes a Resin question. Otis --- Nader S. Henein [EMAIL PROTECTED] wrote: I'm

Re: Deleting document from index

2002-06-22 Thread Otis Gospodnetic
Hello, First of all, the machine from which you sent this email has the date set incorrectly - it thinks it's 22. 6. 2000. --- [EMAIL PROTECTED] wrote: I had searched the archive of this list for getting more info on How to delete a document from the lucene index. But most of the postings

Re: Retrieve documents from index by document number

2002-06-25 Thread Otis Gospodnetic
Check the Hits class API: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Hits.html Otis --- Chris Sibert [EMAIL PROTECTED] wrote: Anybody know how to retrieve a stored document from an index by it's document number ? I have a list of search hits, and when the user clicks

Re: IndexReader Pool

2002-06-27 Thread Otis Gospodnetic
I don't think Lucene contains anything to help you create this pool. However, if you look at Jakarta Commons project you will find a subproject there that allows you to create pools of any kind of Java object. You can probably use that to save yourself development and debug time. Otis ---

  1   2   3   4   5   6   7   8   >