Re: Writing a stemmer

2004-06-03 Thread Leo Galambos
Erik Hatcher [EMAIL PROTECTED] wrote:
__

 How proficient must I be in a language for which I wish to write the 
 stemmer?
I would venture to say you would need to be an expert in a language to 
write a decent stemmer.

I'm sorry for a self-promo ;), but
the stemmer of egothor project can be
adapted to any language, and you needn't be
a language expert. Moreover, the stemmer
achieves better F-measure than Porter's stemmers.

Cheers,
Leo



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Tool for analyzing analyzers

2004-06-02 Thread Leo Galambos


Zilverline [EMAIL PROTECTED] wrote:
__

get more out of  lucene, such as incremental indexing, to name one. On 

Hello,

as far as I know, the incremental indexing
could be a real bottleneck if you implemented
your system without some knowledge
about Lucene internals.

The respective test is here:
http://www.egothor.org/twiki/bin/view/Know/LuceneIssue

Cheers,
Leo



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: thanks for your mail

2004-02-16 Thread Leo Galambos
Could an admin filter out hema's e-mails, please?

THX
Leo
[EMAIL PROTECTED] wrote:

Received your mail we will get back to you shortly

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Index advice...

2004-02-10 Thread Leo Galambos
Otis Gospodnetic napsal(a):

Thus I do not know how it could be O(1).
   

~ O(1) is what I have observed through experiments with indexing of
several million documents.
 

What did you exactly measured? Just the time of the insert operation 
(incl. merge(), of course)? Was it a test on real documents?

THX
Leo
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Index advice...

2004-02-10 Thread Leo Galambos
Otis Gospodnetic napsal(a):

--- Leo Galambos [EMAIL PROTECTED] wrote:
 

Otis Gospodnetic napsal(a):

   

Thus I do not know how it could be O(1).
  

   

~ O(1) is what I have observed through experiments with indexing of
several million documents.
 

What did you exactly measured? Just the time of the insert operation 
(incl. merge(), of course)? Was it a test on real documents?
   

I didn't really measure anything, I only observed this, as my focus was
something else, not performance measurements.
It is true that every time an insert/add triggers a merge operation,
things will slow down, but from what I recall (and this was about 1
year ago), the overall performance was steady as the index grew.
 

Try the same test with mergeFactor=2, you will see the difference.

Leo

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene with Postgres db

2004-02-01 Thread Leo Galambos
Have you tried a special add-on for pgsql - 
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

Lucene is faster than tsearch (I hope so), but tsearch neednot be 
synchronized with the main DB...up to you.

Cheers,
Leo
Ankur Goel wrote:

Hi,

I have to search the documents which are stored in postgres db. 

Can someone give a clue how to go about it?

Thanks

Ankur Goel
Brickred Technologies
B-2 IInd Floor, Sector-31
Noida,India
P:+91-1202456361
C:+91-9810161323
E:[EMAIL PROTECTED]
http://www.brickred.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: IndexHTML example on Jakarta Site

2004-01-02 Thread Leo Galambos
Colin McGuigan wrote:

It creates an index, but when I search using
http://localhost:8000/luceneweb/
The page works but I do not get any replies.
 

Can it read your index? See indexLocation in configuration.jsp

1. How do you specify which directory is to be searched
snip
 

I agree with Erik, that you would rather use an application which is 
ready for use in a minute. IMHO Lucene is library/API and unless you are 
a JAVA developer, it does not fit your needs. Some applications are 
listed here:
http://dmoz.org/Computers/Programming/Languages/Java/Server-Side/Search_Engines/
Omit the Lucene link, else you will be in an endless loop... ;-)

If you must use Lucene, try to find something for you here:
http://jakarta.apache.org/lucene/docs/powered.html
You may be interested in i2a, but their demo (@24.9.177.111) is dead 
right now.

Cheers,
Leo
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: What about Spindle

2003-12-03 Thread Leo Galambos
You can try Capek (needs JDK1.4, because it uses NIO). It can crawl 
whatever you like.

API:
http://www.egothor.org/api/robot/
Console - demo (*.dundee.ac.uk):
http://www.egothor.org/egothor/index.jsp?q=http%3A%2F%2Fwww.compbio.dundee.ac.uk%2F
Leo

Zhou, Oliver wrote:

I think it is common task to index a jsp based web site.  A lot of poeple
ask how to do so on this mailing list.  However, Lucene does not have a
ready to use web crawler.  My question is that has anybody used Spindle to
index a jsp based web site or is there any other tools out there.
Thanks,
Oliver


-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 03, 2003 11:25 AM
To: Lucene Users List
Subject: Re: What about Spindle
You should ask Spindle author(s).  The error doesn't look like
something that is related to Lucene, really.
Otis

--- Zhou, Oliver [EMAIL PROTECTED] wrote:
 

What about Spindle? Has anybody used it to crawle a jsp based web
site? Do I
need to intall listlib.jar to do so? 

I got error message Jsp Translate:Unable to find setter method for
attribue:class when I tried to run listlib-example.jsp in wsad.
Thanks,
Oliver




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   



__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Vector Space Model in Lucene?

2003-11-14 Thread Leo Galambos
Really? And what model is used/implemented by Lucene?

THX
Leo
Otis Gospodnetic wrote:

Lucene does not implement vector space model.

Otis

--- [EMAIL PROTECTED] wrote:
 

Hi,

does Lucene implement a Vector Space Model? If yes, does anybody have
an
example of how using it?
Cheers,
Ralf
--
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService
Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   



__
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Vector Space Model in Lucene?

2003-11-14 Thread Leo Galambos
The model implies the quality, thus it does matter.

ad several important models) Are any of them implemented in Lucene?

Chong, Herb wrote:

does it matter? vector space is only one of several important ones.

Herb

-Original Message-
From: Leo Galambos [mailto:[EMAIL PROTECTED]
Sent: Friday, November 14, 2003 4:00 AM
To: Lucene Users List
Subject: Re: Vector Space Model in Lucene?
Really? And what model is used/implemented by Lucene?

THX
Leo
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Document Clustering

2003-11-11 Thread Leo Galambos
Marcel Stör wrote:

Hi

As everybody seems to be so exited about it, would someone please be so kind to explain 
what document based clustering is?
 

Hi

they are trying to implement what you can see in the right panel here:
http://www.egothor.dundee.ac.uk/egothor/q2c.jsp?q=protein
They may also analyze identical pages (hit #9 and #10) - this could be 
also taken as clustering AFAIK.

For instance, Doug wrote some papers about clustering (if I remember it 
correctly) - see his bibliography.

Leo

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene features

2003-09-11 Thread Leo Galambos
Doug Cutting wrote:

Erik Hatcher wrote:

Yes, you're right.  Getting the scores of a second query based on the 
scores of the first query is probably not trivial, but probably 
possible with Lucene.  And that combined with a QueryFilter would do 
the trick I suspect.  Somehow the scores of the first query could be 
remembered and used as a boost (or other type of factor) the scores 
of the second query.


Why not just AND together the first and second query?  That way 
they're both incorporated in the ranking.  Filters are good when you 
don't want it to affect the ranking, and also when the first query is 
a criterion that you'll reuse for many queries (e.g., 
language=french), since the bit vectors can be cached (as by 
QueryFilter).


You probably missed the start of our discussion - we are talking about 
this: q1 - q2 which means NOT q1 OR q2, versus q2 - q1 which 
means q1 OR NOT q2. It causes the issue, and it also shows why you 
cannot use the simple AND, because q1 AND q2 != NOT q1 OR q2 != 
q1 OR NOT q2.

Leo

BTW: I didn't see the logic formulas for many years, so it is without 
any guarantee ;-)



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene features

2003-09-11 Thread Leo Galambos
Doug Cutting wrote:

I have some extensions to Lucene that I've not yet commited which make 
it possible to easily define synthetic IndexReaders (not currently 
supported).  So you could do things that way, once I check these in. 
But is this really better than just ANDing the clauses together?  It 
would take some big experiments to know, but my guess is that it 
doesn't make much difference to compute a local IDF for such things.


In this case, I think that the operator would be evaluated as an 
implication and not AND (=1-(((1-q1)^p+(1-q2)^p )/2 )^(1/p)). 
Obviously, you have to use an filter to filter out false hits (in case 
of q1-q2, the formula is true when q1 is false, so it is not what you 
really need), but it is not an issue with the auxiliary index. On the 
other hand, it is a feeling and it needs a test, you are right.

Leo



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene features

2003-09-07 Thread Leo Galambos
Erik Hatcher wrote:

On Friday, September 5, 2003, at 07:45  PM, Leo Galambos wrote:

And for the second time today QueryFilter.  It allows narrowing 
the documents queried to only the documents from a previous Query.


I guess, it would not be an ideal solution - the first query does two 
things a) it selects a subset from the corpus; b) it assigns a 
relevance to each document of this subset. Your solution omits the 
second point. It implies, the solution will not return good hit 
lists, because you will not consider the information value of the 
first query which was given to you by a user.


Yes, you're right.  Getting the scores of a second query based on the 
scores of the first query is probably not trivial, but probably 
possible with Lucene.  And that combined with a QueryFilter would do 
the trick I suspect.  Somehow the scores of the first query could be 
remembered and used as a boost (or other type of factor) the scores of 
the second query.


Well, I do not want to be a pessimist, but the boost vector is not a 
good solution due to CWI statistics. On the other hand, it is much 
better than the simple QueryFilter which, in fact, works as 0/1 boost.

Example: I use this notation: inverted_list_term:{list of W values, - 
denotes W=0, for 12 documents in a collection}
A:{23[16]--27}
B:{--[38]}
C:{18[2-]45239812}
If your first query is B, the subset of documents (denoted by brackets - 
namely, the 3rd and 4th doc) is selected, and if your second query is A 
C, then you cannot use global IDFs, because in the subset, the IDF 
factors are different. Globally, A is better distriminator, but in the 
subset, C is better. This fact is then reflected by the hit list you 
generate, and I guess, the quality will be also affected by this.

The example shows, that you would rather export the subset to an 
auxiliary index (RAMDirectory?) and then use this structure instead of 
the original index. Obviously, it will solve the issue of speed you 
mentioned.

Unfortunately, I am not sure, if you can export the inverted lists when 
you read them. In egothor, I would use a listener in Rider class, in 
Lucene, I would have to rewrite some classes and it could be a real 
problem. Maybe, there is a solution I do not see...

Your turn ;-)
Cheers,
Leo
Am I off base here?

Thus I think, Chris would implement something more complex than 
QueryFilter. If not, the results will be poorer than with the 
commercial packages he may get. He could use a different model where 
AND is not an associative operator (i.e. some modification of the 
extended Boolean model). It implies, he would implement it in 
Similarity.java (if I remember that class name correctly).


Right... but you'd still need the filtering capability as well, I 
would think - at least for performance reasons.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene features

2003-09-05 Thread Leo Galambos

But Drill Down searching is very desirable. It's where you're able to 
search
within the results of a previous search. I'm assuming that I'll have to
implement that myself, by keeping a copy of the previous Hits list, 
and only
returning results that are in both lists.


And for the second time today QueryFilter.  It allows narrowing 
the documents queried to only the documents from a previous Query.


I guess, it would not be an ideal solution - the first query does two 
things a) it selects a subset from the corpus; b) it assigns a relevance 
to each document of this subset. Your solution omits the second point. 
It implies, the solution will not return good hit lists, because you 
will not consider the information value of the first query which was 
given to you by a user.

For instance, neologism  George Bush (1st2nd query) would return 
different order of hits than George Bush  neologism. Other 
examples, Prague Berlin  flight (I must go there, and I prefer an 
airplane) versus flight  Prague Berlin (I must fly, and I prefer 
Berlin).

Thus I think, Chris would implement something more complex than 
QueryFilter. If not, the results will be poorer than with the commercial 
packages he may get. He could use a different model where AND is not 
an associative operator (i.e. some modification of the extended Boolean 
model). It implies, he would implement it in Similarity.java (if I 
remember that class name correctly).

Leo



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Fastest batch indexing with 1.3-rc1

2003-08-20 Thread Leo Galambos
Isn't it better for Dan to skip the optimization phase before merging? I 
am not sure, but he could save some time on this (if he has enough file 
handles for that, of course). What strategy do you use in nutch?

THX

-g-

Doug Cutting wrote:

As the index grows, disk i/o becomes the bottleneck.  The default 
indexing parameters do a pretty good job of optimizing this.  But if 
you have lots of CPUs and lots of disks, you might try building 
several indexes in parallel, each containing a subset of the 
documents, optimize each index and finally merge them all into a 
single index at the end. But you need lots of i/o capacity for this to 
pay off.

Doug

Dan Quaroni wrote:

Looks like I spoke too soon... As the index gets larger, time to merge
becomes prohibitably high.  It appears to increase linearly.
Oh well.  I guess I'll just have to go with about 3ms/doc.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: How can I index JSP files?

2003-07-27 Thread Leo Galambos
If I understand the Enigma code well, they say, that you must write a 
crawler ;-)

-g-

To index the content of JSPs that a user would see using a Web browser,
you would need to write an application that acts as a Web client, in
order to mimic the Web browser behaviour. Once you have such an
application, you should be able to point it to the desired JSP, retrieve
the contents that the JSP generates, parse it, and feed it to Lucene.






I am a newbie to lucene and I would like to enable searching capability
to my website which is written entirely with JSP and servlets.  Does
anyone have any experience parsing JSP files in order to create in index
for/by Lucene?   I would greatly appreciate any help with the matter.
THanx


Russ

 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: High Capacity (Distributed) Crawler

2003-06-10 Thread Leo Galambos
Otis Gospodnetic wrote:

What interface do you need for Lucene? Will you use PUSH (=the robot 
will modify Lucene's index) or PULL (=the engine will get deltas from

the robot) mode? Tell me what you need and I will try to do all my
best.
   

I'd imagine one would want to use it in the PUSH mode (e.g. the crawler
fetches a web page and adds it to the searchable index).
How does PULL mode work?  I've never heard of web crawlers being used
in the PULL mode.  What exactly does that mean, could you please
describe it?
 

It is a long story, so I will assume, that everything runs on a single 
box - it is the most simple case.
[x] will denote points, where Lucene may have problems with a fast 
implementation, I guess.

Crawler: The crawler stores meta and body of all documents. If you want 
to retrieve the document meta or body (knowing its URI), it costs O(1) 
(2 seeks and 2 read requests in auxiliary data structures). After this 
retrieval you also get a direct handle to meta and body - then the price 
of retrieval becomes O(1), but no extra seeks in any structures. The 
handle is persistent and is related to URI. The meta and body is updated 
as soon as the crawler fetches a new fresh copy.

Engine: engine stores the handle for each document. Moreover it knows 
the last (highest) handle, which is stored in the main index. So the 
trick is this:
1) build up an auxiliary index from new documents. The new documents are 
documents which have their handle greater than the last handle which is 
known to the engine, thus you can iterate them easily - this process can 
run in a separate thread
2) consult the changes. You read meta, which are stored in index, and 
test if they are obsolete (note: you have already got the handle, so it 
smokes). If so, you denote the respective document as deleted and its 
new version (if any) is appended to another index - the index of 
changes. The insertion to the index runs in a separate thread, so the 
main thread is not blocked. BTW: [x] The documents, which are not 
modified, may modify their ranks (depthrank, pagerank, frequencyrank 
etc) in this round.

[x] The two auxiliary indices are then merged with the main index.

Obviously, the weak point is the test if anything is changed. This can 
be easily solved with the index dynamization I use. Despite Lucene, I 
order barrels (segments in your terminology) by their size. I do not 
want to describe all the details - I hate long e-mails ;-), but the 
dynamization guarantees that:
a) the query time is never worse than 8x, comparing with 
fully-optimalized index (if you buy 8x faster HW, you overcome this easily)
b) the documents, which are often modified, are stored in small barrels 
of the main index. It means, that their actualization is fast.

So, I process only the small barrels once a day, and the larger ones 
less often. If we say, that 5M of docs are updated daily, PULL mode can 
handle this load in few minutes. Unfortunately, the slowest point is the 
HTML parser which may run few hours :-(.

If you want to actualize other 10^10 crap pages once a month, it can be 
done too, but it is out of my first assumption above ;-).

-g-

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: High Capacity (Distributed) Crawler

2003-06-09 Thread Leo Galambos
Hi Otis.

The first beta is done (without NIO). It needs, however, further 
testing. Unfortunatelly, I could not find enough servers which I may hit.

I wanted to commit the robot as a part of egothor (it will use it in 
PULL mode), but we have a nice weather here, so I lost any motivation to 
play with PC ;-).

What interface do you need for Lucene? Will you use PUSH (=the robot 
will modify Lucene's index) or PULL (=the engine will get deltas from 
the robot) mode? Tell me what you need and I will try to do all my best.

-g-

Otis Gospodnetic wrote:

Leo,

Have you started this project?  Where is it hosted?
It would be nice to see a few alternative implementations of a robust
and scalable java web crawler with the ability to index whatever it
fetches.
Thanks,
Otis
--- Leo Galambos [EMAIL PROTECTED] wrote:
 

Hi.

I would like to write $SUBJ (HCDC), because LARM does not offer many 
options which are required by web/http crawling IMHO. Here is my
list:

1. I would like to manage the decision what will be gathered first - 
this would be based on pageRank, number of errors, connection speed
etc. 
etc.
2. pure JAVA solution without any DBMS/JDBC
3. better configuration in case of an error
4. NIO style as it is suggested by LARM specification
5. egothor's filters for automatic processing of various data formats
6. management of Expires HTTP-meta headers, heuristic rules which
will 
describe how fast a page can expire (.php often expires faster than
.html)
7. reindexing without any data exports from a full-text index
8. open protocol between the crawler and a full-text engine

If anyone wants to join (or just extend the wish list), let me know,
please.
-g-

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   



__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Leo Galambos
I see. Are you looking for this: 
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html

On the other hand, if n is not fixed, you still have a problem. As far 
as I read this list it seems, that Lucene reads a dictionary (of terms) 
into memory, and it also allocates one file handle for each of the 
acting terms. It implies you would not break the terms up into n-grams 
and, as a result, you would use a slow look-up over the dictionary. I do 
not know if I express it correctly, but my personal feeling is, that you 
would rather write your application from scratch.

BTW: If you have nice terms, you could find all their n-grams 
occurencies in the dictionary, and compute a boost factor for each of 
the inverted lists. I.e., bbc is a term in a query, and for i-list of 
abba, the factor is 1 (bigram bb is there), for i-list of bbb, the 
factor is 2 (bb 2x). Then you use the Similarity class, and it is 
solved. Nevertheless, if the n-grams are not nice and the query is long, 
you will lost a lot of time in the dictionary look-up phase.

-g-

PS: I'm sorry for my English, just learning...

Jim Hargrave wrote:

Probably shouldn't have added that last bit. Our app isn't a DNA searcher. But DASG+Lev does look interesting.

Our app is a linguistic application. We want to search for sentences which have many ngrams in common and rank them based on the score below. Similar to the TELLTALE system (do a google search TELLTALE + ngrams) - but we are not interested in IR per se - we want to compute a score based on pure string similarity. Sentences are docs, ngrams are terms.

Jim

 

[EMAIL PROTECTED] 06/05/03 03:55PM 
   

AFAIK Lucene is not able to look DNA strings up effectively. You would 
use DASG+Lev (see my previous post - 05/30/2003 1916CEST).

-g-

Jim Hargrave wrote:

 

Our application is a string similarity searcher where the query is an input string and we want to find all fuzzy variants of the input string in the DB.  The Score is basically dice's coefficient: 2C/Q+D, where C is the number of terms (n-grams) in common, Q is the number of unique query terms and D is the number of unique document terms. Our documents will be sentences.

I know Lucene has a fuzzy search capability - but I assume this would be very slow since it must search through the entire term list to find candidates.

In order to do the calculation I will need to have 'C' - the number of terms in common between query and document. Is there an API that I can call to get this info? Any hints on what it will take to modify Lucene to handle these kinds of queries? 

   



-
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 





--
This message may contain confidential information, and is intended only for the use of 
the individual(s) to whom it is addressed.
==

 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Leo Galambos
Exact matches are not ideal for DNA applications, I guess. I am not a 
DNA expert, but those guys often need a feature that is termed 
``fuzzy''[*] in Lucene. They need Levenstein's and Hamming's metrics, 
and I think that Lucene has many drawbacks which disallow effective 
implementations. On the other hand, I am very interested in a method you 
mentioned. Could you give me a reference, please? Thank you.

-g-

[*] why do you use the label ``fuzzy''? It has nothing to do with fuzzy 
logic or fuzzy IR, I guess.

Frank Burough wrote:

I have seen some interesting work done on storing DNA sequence as a set of common patterns with unique sequence between them. If one uses an analyzer to break sequence into its set of patterns and unique sequence then Lucene could be used to search for exact pattern matches. I know of only one sequence search tool that was based on this approach. I don't know if it ever left the lab and made it into the mainstream. If I have time I will explore this a bit.

Frank Burough



 

-Original Message-
From: Leo Galambos [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 05, 2003 5:55 PM
To: Lucene Users List
Subject: Re: String similarity search vs. typcial IR application...

AFAIK Lucene is not able to look DNA strings up effectively. 
You would 
use DASG+Lev (see my previous post - 05/30/2003 1916CEST).

-g-

Jim Hargrave wrote:

   

Our application is a string similarity searcher where the 
 

query is an 
   

input string and we want to find all fuzzy variants of the 
 

input string in the DB.  The Score is basically dice's 
coefficient: 2C/Q+D, where C is the number of terms (n-grams) 
in common, Q is the number of unique query terms and D is the 
number of unique document terms. Our documents will be sentences.
   

I know Lucene has a fuzzy search capability - but I assume 
 

this would 
   

be very slow since it must search through the entire term 
 

list to find candidates.
   

In order to do the calculation I will need to have 'C' - the 
 

number of 
   

terms in common between query and document. Is there an API 
 

that I can call to get this info? Any hints on what it will 
take to modify Lucene to handle these kinds of queries?
   



 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Where to get stopword lists?

2003-06-06 Thread Leo Galambos
Ulrich Mayring wrote:

Hello,

does anyone know of good stopword lists for use with Lucene? I'm 
interested in English and German lists.
What does mean ``good''? It depends on your corpus IMHO. The best way, 
how one can get a ``good'' stop-list, is an analysis that's based on 
idf. Thus, index your documents, list all the terms with low idf out, 
save them in a file and use them in next indexing round.

Just a thought...

-g-



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lowercasing wildcards - why?

2003-05-31 Thread Leo Galambos
I'm sorry, I did not read the complete thread. Do you mean - analyzer == 
stemmer? Does it really work? If I was a stemmer, I would let searche 
intact. ;-)

-g-

[EMAIL PROTECTED] wrote:

Hi Les,

We ended up modifying the QueryParser to pass prefix and suffix queries
through the Analyzer.  For us, it was about stemming.  If you decide to use
an analyzer that incorporated stemming, there are cases where wildcard
queries will not return the expected results.
Example:  searcher will probably get stemmed to search.  A search on
searche* should hit the term searcher, but, it won't, all instances of
searcher having been stemmed to search at index time.  Our solution was
to remove the trailing wildcard and send searche to the analyzer, then
tack the wildcard character back on there and create the PrefixQuery object
with the new search string search*.
DaveB




 Leslie Hughes  
 [EMAIL PROTECTED]To:   '[EMAIL PROTECTED]'   
 ion.com.au  [EMAIL PROTECTED]  
 cc:
 05/30/03 01:09 AM   Subject:  Lowercasing wildcards - why? 
 Please respond to Lucene  
 Users List





Hi,

I was just wondering what the rationale is behind lowercasing wildcard
queries produced by QueryParser? It's just that my data is all upper case
and my analyser doesn't lowercase so it seems a bit odd that I have to call
setLowercaseWildcardTerms(false). Couldn't queryparser leave the terms
unnormalised or better still pass them through the analyser?
I'm sure there's a good reason for it though.

Les



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lowercasing wildcards - why?

2003-05-31 Thread Leo Galambos
Ah, I got it. THX. In the good old days, the wildcards were used as a 
fix for missing stemming module. I am not sure if you can combine these 
two opposite approaches successfully. I see the following drawbacks of 
your solution.

Example:
built* (-built) could be changed to build* (no built, but -builder, 
building, etc.), and precision will go down drastically.

You probably use a stemmer with one important bug (a.k.a. feature) - 
overstemming, so here is another example:
political* (-political, politically) is transformed to polic* 
(-policer, policy, policies, policement etc.) by Porter alg., and the 
precision is again affected drastically

-g-

[EMAIL PROTECTED] wrote:

Your analyzers can optionally incorporate stemming, along with the other
things that analyzers do (lowercasing, etc...).  The stemming algorithms
are all different.  This searcher example was made up, but, there are
instances where stemming at index time and not stemming wildcard searches
will result in lost hits.  Specifically, we encountered this situation
using the optional Snoball analyzers (which work great, by the way).
DaveB



 
 Leo Galambos
 [EMAIL PROTECTED]To:   Lucene Users List
   [EMAIL PROTECTED]  
 05/30/03 10:26 AMcc:
 Please respond toSubject:  Re: Lowercasing wildcards - why? 
 Lucene Users   
 List   
 
 



I'm sorry, I did not read the complete thread. Do you mean - analyzer ==
stemmer? Does it really work? If I was a stemmer, I would let searche
intact. ;-)
-g-

[EMAIL PROTECTED] wrote:

 

Hi Les,

We ended up modifying the QueryParser to pass prefix and suffix queries
through the Analyzer.  For us, it was about stemming.  If you decide to
   

use
 

an analyzer that incorporated stemming, there are cases where wildcard
queries will not return the expected results.
Example:  searcher will probably get stemmed to search.  A search on
searche* should hit the term searcher, but, it won't, all instances of
searcher having been stemmed to search at index time.  Our solution
   

was
 

to remove the trailing wildcard and send searche to the analyzer, then
tack the wildcard character back on there and create the PrefixQuery
   

object
 

with the new search string search*.

DaveB





   

 

Leslie Hughes
   

 

[EMAIL PROTECTED]To:
   

'[EMAIL PROTECTED]'
 

ion.com.au
   

[EMAIL PROTECTED]
 

cc:
   

 

05/30/03 01:09 AM   Subject:
   

Lowercasing wildcards - why?
 

Please respond to Lucene
   

 

Users List
   

 

 

 



Hi,

I was just wondering what the rationale is behind lowercasing wildcard
queries produced by QueryParser? It's just that my data is all upper case
and my analyser doesn't lowercase so it seems a bit odd that I have to
   

call
 

setLowercaseWildcardTerms(false). Couldn't queryparser leave the terms
unnormalised or better still pass them through the analyser?
I'm sure there's a good reason for it though.

Les



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


   





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Search for similar terms

2003-05-31 Thread Leo Galambos
http://cs.felk.cvut.cz/psc/members.html
http://cs.felk.cvut.cz/psc/event/1998/p13.html
or contact prof. Melichar for more details:
http://webis.felk.cvut.cz/people/melichar.html
-g-

Dario Dentale wrote:

Hi,
can you suffer me a link with an overview document of this method?
I couldn't find.
Thanks,
Dario
- Original Message - 
From: Leo Galambos [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, May 30, 2003 4:25 PM
Subject: Re: Search for similar terms

 

You need DASG+Lev over the dictionary. The boundary could be the highest
idf of the terms. It was solved by prof. Melichar, you can find the
construction of the automaton in his papers.
-g-

Dario Dentale wrote:

   

Hi,
anybody knows which is the best way to implements in Lucene a
 

fuctionality
 

(that Google has) like this:

Search text- notebok

Answer- Did you mean: notebook ?

Thanks,
Dario
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: I: incremental index

2003-03-28 Thread Leo Galambos
 Adding a new document does not immediately modify an index, so the time
 it takes to add a new document to an existing index is not proportional
 to the index size.  It is constant.  The execution time of optimize()
 is proportional to the index size, so you want to do that only if you
 really need it.  The Lucene article on http://www.onjava.com/ from
 March 5th describes this in more detail.

Otis,

I am not sure, if anything about constants is constant in non-constant IR 
systems :-)

I think, that the correct answer is O(t/k*(1+log_m(k)), where t is a time
you need to createwrite one monolithic segment of k documents, m is
merge factor you use, and k is the number of documents which are already
in index. As you can see, the function grows with k.

Can you explain me, why addition of one document takes constant time?

Thank you

-g-



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-06 Thread Leo Galambos
  1. 2 threads per request may improve speed up to 50%
 Hmm? Could you clarify? During indexing, multithreading may speed things
 up (splitting docs to index in 2 or more sets, indexing separately, combining
 indexing). But... isn't that a good thing? Or are you saying that it'd be good 
 to have multi-threaded search functionality for single search? (in my 
 experience searching is seldom the slow part)

you may improve indexing and searching. Indexing, because the merge
operation will lock just one thread and smaller part of an index while
other threads are still working;  searching, because you can distribute
the query to more barrels. In both cases you save up to 50% of time (I
assume mergefactor=2).

  2. Merger is hard coded
 
 In a way that is bad because... ?
 (ie. what is the specific problem... I assume you mean index merging
 functionality?)

Because you cannot process local and/or remote barrels -- all must be
local in Lucene object model. That is the serious bug IMHO.

  4. you cannot implement dissemination + wrappers for internet servers
  which would serve as static barrels.
 Could you explain this bit more thoroughly (or pointers on longer 
 explanation)?

Read more about dissemination, metasearch engines (i.e. Savvysearch),
dDIRs (i.e. Harvest). BTW, let's go to a pub and we can talk til morning
:) (it is a serious offer, because I would like to know more about IR).

This example is about metasearch (the simplest case of dDIRs): Can Lucene
allow that a barrel (index segment?) is static and a query is solved via
wrapper, that sends the query ${QUERY} to
http://www.google.com/search?hl=enie=UTF-8oe=UTF-8q=${QUERY} and then
reads the HTML output as a result?

  5. Document metadata cannot be stored as a programmer wants, he must
  translate the object to a set of fields
 Yes? I'd think that possibility of doing separate fields is a good thing; 
 after all, all a plain text search engine needs to provide (to be considered 
 one) is indexing of plain text data, right?

I talked about metadata. When metadata object knows how to achieve its 
persistence, why would one translate anything to fields and then back?
Why would you touch the users metadata at all? You need flat fields for
indexing, and what's around -- it is not your problem :). Lucene is
something between CMS and CIS, you say that it's closer to CIS, but when
you need metadata in fields, you are closer to CMS IMHO.

  6. Lucene cannot implement your own dynamization
 
 (sorry, I must sound real thick here).
 Could you elaborate on this... what do you mean by dynamization?

Read more about Dynamization of Decomposable Searching Problems.

-g-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Potential Lucene drawbacks

2003-03-06 Thread Leo Galambos
 If I understand you correctly, then maybe you are not aware of
 RemoteSearchable in Lucene.

That class cannot be used in Merger. RemoteSearchable is a class that
allows you to pass a query to another node, nothing less and nothing more
AFAIK.

 This is the point that's more clear to me now.  There is confusion
 about what Lucene is and what it is not.  Lucene does not even try to
 be what those services you mentioned are.  Their goals are different,
 they are a different set of tools.  Lucene's focus is on indexing text
 and searching it.  It is not a tool to query other existing search

I do not think so. It is all about the object model you use. If you are
not able to solve the simplest case, how can you distribute the engine
across the network? I do not mean the simple RMI gateways which marshall
parameters and send them through a network pipe, I mean the true system
that could beat google (and it is another topic...).

Moreover, I think that Lucene can do much more than you think Otis :). 
Egothor can do that, so why not Lucene?

-g-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-05 Thread Leo Galambos
On Tue, 4 Mar 2003, Otis Gospodnetic wrote:

 Even if you could replace C:\. with http:// it wouldn't be a
 good solution, as directory structures and file paths do not always map
 directly to URLs.

Yes, but it is not the case of Samuel's configuration and 99.99% of 
others.

The fact is, that Lucene is only a library, and sandbox utilities which
are of different quality. :-)

-g-



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-05 Thread Leo Galambos
 org.apache.lucene.demo.IndexHTML wich was provided with the
 documentation. Is there any problem using this demo class for a web
 production site? I'm an application developer and it would be hard to
 understand the hole lucene code to use it. It would be almost imposible

You can use it, but: if you need something special (snippets, coloring,
different URL mapping, handling of your local charset, etc. etc.) you must
include code from sandbox or write it from scratch AFAIK.

 for my develop phase timings to try to do this. * Regarding you comment:
 Lucene does not index web pages. I thougth lucene main goal was to index
 web pages ¿? and as an after thougth it should be able to index text
 files or some other information (for example mail databases). Regards

Lucene *can* index HTML pages, if you use programs which build Lucene 
index from HTML documents. The programs exist.

On the other hand, if you extend Lucene with your hacks, you will find out
that the model of Lucene is unknown and many parts are hard-coded. It
boosts speed, but it disallows future enhancements (I could name the
parts, I hope we do not start flamewar here).

 and thanks for your comments!!! I'm considering egothor search
 engine. I succesfully set a web application for searching my web site
 but I didn't see a mailing list or a forum with the level of

I had PhD exam, and many questions went throught ICQ, you know, it is 
faster for me than e-mails...

-g-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



A thought: netique

2003-02-28 Thread Leo Galambos
Hi,

I was away and when I read what I missed, well...ehm... have you read 
http://sustainability.open.ac.uk/gary/papers/netique.htm?

i.e., see Caution when quoting other messages while replying to them.

BTW: I would also vote for a strict standard, when Re: prefix must be
used in replies.

Just a thought.

-g-



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Wildchar based search?? |

2003-02-02 Thread Leo Galambos
On Sat, 1 Feb 2003, Rishabh Bajpai wrote:

 also, i rememebr readin somewhere that one had to build the index in
 some special way, but since you say no; i will take that. i anyways dont
 rememebr where I read it, so no point asking about something if I am
 myself not sure

I remember only one problem that is related to indexing phase - it is 
``optimize'' function. If you update your index, one cannot tell you if 
you must also call optimize() or not.

If you do not call it, it may slow down queries (I do not know how much,
but Otis told it). If you call it, it slows down the indexing phase (I
have tested it and it is significant).

AFAIK Lucene cannot tell you when the index becomes dirty so that you must 
call optimize. On the other hand it does not affect small indexes, where 
optimize() costs nothing.

Otis, I think that this still holds. Right?

-g-



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Stop-word in phrase (BUG?)

2003-01-27 Thread Leo Galambos
Hi.

 In this phrase word 'and' occurs which is a stop-word.

they may take AND as a keyword in a query. IMHO your query is taken as 
boolean query.

I hope this helps.

-g-


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene Benchmarks and Information

2002-12-21 Thread Leo Galambos
On Fri, 20 Dec 2002, Doug Cutting wrote:

 The max a reader will keep open is:
 
mergeFactor * log_base_mergeFactor(N) * files_per_segment
 
 A writer will open:
 
(1 + mergeFactor) * files_per_segment

I am not sure if you must open all files (i.e. writer would need just
2*f_p_s if you keep A-Z order in DocUIDs??). IMHO it is a bug and the
point why Lucene does not scale well on huge collections of documents. I
am talking about my previous tests when I used live index and concurrent
query+insert+delete (I wanted to simulate real application).

BTW, your mail is also an answer to previous topic how often could one
call optimize(). The method would be called before the index goes to
production state. And it also means that tests are irrelevant until they
are made with lower mergeFactor.

...but it is possible that I missed something (I do not know Lucene as 
good as you).

-g-


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




HTML saga continues...

2002-12-12 Thread Leo Galambos
So, I have tried this with Lucene:
1) original JavaCC LL(k) HTML parser
2) SWING's HTML parser

In case of (1) I could process about 300K of HTML documents. In case of 
(2) more than 400K.

But I cannot process complete collection (5M) and finish my hard stress
tests of Lucene.

Is there anyone who has HTML parser that really works with Lucene? :) If
you think that you have one, please let me know. I wanted to try Neko, but 
it looks complicated and I do not want to affect the results by ``robust'' 
parser.

THX

-g-


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: SV: Indexing HTML

2002-12-07 Thread Leo Galambos
 I'm not sure this is a solution to your problem. However, it seems that the
 HTMLParser used by the IndexHTML class has problems parsing the document
 (there is a test class included in the jar):
 
 
 java -cp C:\projects\lucene\jakarta-lucene\bin\lucene-demos.jar
 org.apache.lucene.demo.html.Test f01529.txt
 Title: Webcz.cz - Power of search
 Parse Aborted: Encountered \' at line 106, column 27.
 Was expecting one of:
 ArgName ...
 TagEnd ...
 /Ronnie

Hi Ronnie!

I know about it and the exception is handled well (see log file below). I
have found a better example than 1529, try this:
http://com-os2.ms.mff.cuni.cz/bugs/f00034.txt This file cannot go throught
Lucene HTML parser (I have tried 1.2 and IBM JDK 1.3.1r3). The file is
specific, i.e. it has two titles, two base tags etc.

I have not debugger here, so I cannot find the line where is the bug. If
you try your magic, please, let me know about the patch. :) THX

-g-



adding save/d00320/f01516.html
Parse Aborted: Lexical error at line 68, column 11.  Encountered: \u0178 
(376), after : 
:
adding save/d00320/f01527.html
Parse Aborted: Encountered = at line 83, column 48.
Was expecting one of:
ArgName ...
TagEnd ...

adding save/d00320/f01528.html



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene Speed under diff JVMs

2002-12-05 Thread Leo Galambos
On Thu, 5 Dec 2002, Armbrust, Daniel C. wrote:

 I'm using the class that Otis wrote (see message from about 3 weeks ago)
 for testing the scalability of lucene (more results on that later) and I

May I ask you where one can get the source code? I cannot find it in 
archive. Thank you

-g-



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Performance (figures)

2002-11-30 Thread Leo Galambos
The first round of tests is presented here (more will come later):

1) http://com-os2.ms.mff.cuni.cz/proof.png

Price per insert (time, space).
Doc base: 5M HTML *.CZ
Collection size: 300K docs were processed; then Lucene crashed (it may be
my fault, but I haven't time to debug it now)
Optimize() after 2000 of docs (IMHO this simulates dynamic IR 
environment, i.e. indexing emails, news groups etc.).

For instance (see Fig. 1):
collection size/time per insert()
2000/25ms
16/33ms
30/48ms

It means that for collection of 16 docs you need 16*33ms=5280s.

2) http://com-os2.ms.mff.cuni.cz/draw.png

Absolute values



If someone is able to say how often I would call optimize(), I can 
recalculate the results. Now the 2nd round of tests is running (without 
optimize()).

-g-

BTW: All figures, (C) 2002 Leo Galambos. Do not copy until I am sure that 
the testsvalues are correct.


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




optimize()

2002-11-26 Thread Leo Galambos
How does it affect overall performance, when I do not call optimize()?

THX

-g-



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: optimize()

2002-11-26 Thread Leo Galambos
Did you try any tests in this area? (figures, charts...)

AFAIK reader reads identical number of (giga)bytes. BTW, it could read
segments in many threads. I do not see why it would be slower (until you
do many delete()-s). If reader opens 1 or 50 files, it is still nothing.

-g-

On Tue, 26 Nov 2002, Otis Gospodnetic wrote:

 This was just mentioned a few days ago. Check the archives.
 Not needed for indexing, good to do after you are done indexing, as the
 index reader needs to open and search through less files.
 
 Otis
 
 --- Leo Galambos [EMAIL PROTECTED] wrote:
  How does it affect overall performance, when I do not call
  optimize()?
  
  THX
  
  -g-
  
  
  
  --
  To unsubscribe, e-mail:  
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
  
 
 
 __
 Do you Yahoo!?
 Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
 http://mailplus.yahoo.com
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]