Google search algorithm

2004-01-29 Thread Ardor Wei
We all know Lucene algorithm (thanks to open source:).
Anybody has a general idea of how Google search
algorithm works? How is the page ranking (I don't mean
the paid ones) determined by Google? I have strong
interest to know this. Any idea or feedback will be
appreciated. Thanks!

Ardor

__
Do you Yahoo!?
Yahoo! SiteBuilder - Free web site building tool. Try it!
http://webhosting.yahoo.com/ps/sb/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Google search algorithm

2004-01-29 Thread Dror Matalon
This is not quite related to Lucene but I found a web page that
has quite a few links about this subject:

http://www.google.com/search?q=google+page+ranksourceid=mozilla-searchstart=0start=0ie=utf-8oe=utf-8

:-).


On Wed, Jan 28, 2004 at 11:10:28PM -0800, Ardor Wei wrote:
 We all know Lucene algorithm (thanks to open source:).
 Anybody has a general idea of how Google search
 algorithm works? How is the page ranking (I don't mean
 the paid ones) determined by Google? I have strong
 interest to know this. Any idea or feedback will be
 appreciated. Thanks!
 
 Ardor
 
 __
 Do you Yahoo!?
 Yahoo! SiteBuilder - Free web site building tool. Try it!
 http://webhosting.yahoo.com/ps/sb/
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Google search algorithm

2004-01-29 Thread Magnus Johansson
I read somewhere that it used a hidden markov model.

It checks each page and gives each link a click probability.
It also gives a probability that the user will enter a new
address instead of clicking a link.

We then, by using a hidden markov model, calculate the
probability that the user will be at a particular page
after an infinite time using random browsing according
to the probabilies found.

This probability is then used as a basis for ranking
results.

Magnus Johansson


 We all know Lucene algorithm (thanks to open source:).
 Anybody has a general idea of how Google search
 algorithm works? How is the page ranking (I don't mean
 the paid ones) determined by Google? I have strong
 interest to know this. Any idea or feedback will be
 appreciated. Thanks!

 Ardor

 __
 Do you Yahoo!?
 Yahoo! SiteBuilder - Free web site building tool. Try it!
 http://webhosting.yahoo.com/ps/sb/

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Date Range support

2004-01-29 Thread tom wa
Hi,

I'm trying to create an index which can also be searched with date ranges. My first 
attempt using the Lucene date format ran in to trouble after my index grew and I 
couldn't search over more than a few days.
I saw some other posts explaining why this happens and the suggestion seemed to be to 
use strings of the format MMdd. Using that format worked great until I remembered 
that my search needs to be able to support different timezones. Adding the hour to my 
field causes the same problem above and my queries stop working when using a range of 
about 2 months.
I briefly looked at using the DateFilter but a good thread in the archive suggests 
this won't work too well under my conditions (http://java2.5341.com/msg/5138.html). 
I'm looking to index about 1000 documents for each day and my search ranges could be 
as narrow as one day or as broad as a year.
At the moment I'm thinking of having two date fields, one formatted with MMdd and 
the other MMddHHmm and so get Lucene to do me a rough match down to an accuracy of 
+1 day either side of the range and then process the more detailed date outside of 
Lucene (to cope with timezones).
I'm going to try it out, but if there is any simpler method I've missed I'd be happy 
to know.

Thanks
Tom.
-- 
___
Get your free email from http://www.mail.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: use Lucene LOCAL (looking for a frontend)

2004-01-29 Thread Peter Becker
Hi Sebastian,

there are not too many Lucene features used, and some rather orthogonal 
mixin of Formal Concept Analysis, but let me still advertise our little 
Docco tool:

 http://tockit.sourceforge.net/docco/index.html

It is based on Lucene, comes with a couple of indexing tools (including 
HTML) and is Open Source (BSD licence). Source can be found here:

 http://sourceforge.net/cvs/?group_id=37081 (module name is docco)

You can run Luke (http://www.getopt.org/luke/) on any index created by 
Docco to check out some more advanced features.

HTH,
  Peter


Sebastian Fey wrote:

hi,

my task is to implement a search engine to a documentation in HTML. the files are not 
online but local.
But the getting started guide at lucene-home just explains howto set up lucene with 
tomcat. (ive never set up a webserver)
I was able to create an index of my files, but now the web-frontend is missing. I think its in the luceneweb.war, right?
So, my qustion, how can i use lucene local? Can someone provide a html-frontend? 

thx in advance,

Sebastian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Performance difference between 1.2 and 1.3?

2004-01-29 Thread Weir, Michael
I am fairly new to Lucene and I have noticed a difference between Lucene
1.2RC1 (which came with our build of Cocoon) and the new Lucene 1.3Final.  

I am indexing about 400 very small documents, each in 10 languages.  The
document contents are basically a product name and description.  With Lucene
1.2 my little test takes about 13.2 seconds and when I change to using the
Lucene 1.3 jar file the test takes 38 seconds.  I am not using the Snowball
stemmers, and my code is as vanilla as it gets (I think).

Is this a known problem?  Or is there a known fix?

Thanks for any help.

Michael Weir · Transform Research Inc. · 613.238.1363 x.114


This message may contain privileged and/or confidential information.  If you
have received this e-mail in error or are not the intended recipient, you
may not use, copy, disseminate or distribute it; do not open any
attachments, delete it immediately from your system and notify the sender
promptly by e-mail that you have done so.  Thank you.


Re: Performance difference between 1.2 and 1.3?

2004-01-29 Thread Erik Hatcher
On Jan 29, 2004, at 9:00 AM, Weir, Michael wrote:
I am fairly new to Lucene and I have noticed a difference between 
Lucene
1.2RC1 (which came with our build of Cocoon) and the new Lucene 
1.3Final.

I am indexing about 400 very small documents, each in 10 languages.  
The
document contents are basically a product name and description.  With 
Lucene
1.2 my little test takes about 13.2 seconds and when I change to using 
the
Lucene 1.3 jar file the test takes 38 seconds.  I am not using the 
Snowball
stemmers, and my code is as vanilla as it gets (I think).

Is this a known problem?  Or is there a known fix?
There is no known issue.  Could you provide an easy-to-run example that 
demonstrates this difference in speed?

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Paid support for Lucene

2004-01-29 Thread Boris Goldowsky
Strangely, the web site does not seem to list any vendors who provide
incident support for Lucene.  That can't be right, can it?

Can anyone point me to organizations that would be willing to provide
support for Lucene issues?

Thanks,
Boris
-- 
Boris Goldowsky
[EMAIL PROTECTED]
www.goldowsky.com/consulting


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Performance difference between 1.2 and 1.3?

2004-01-29 Thread Otis Gospodnetic
Hello,

This is not a known problem.
The mention of Cocoon makes me think XML.
What format are your documents in?
If they are in XML, the first place to look for performance-related
problems is the XML parser.  It looks like you got a new version of
Cocoon, so maybe this new version includes a different (version of a)
XML parser.

Otis

--- Weir, Michael [EMAIL PROTECTED] wrote:
 I am fairly new to Lucene and I have noticed a difference between
 Lucene
 1.2RC1 (which came with our build of Cocoon) and the new Lucene
 1.3Final.  
 
 I am indexing about 400 very small documents, each in 10 languages. 
 The
 document contents are basically a product name and description.  With
 Lucene
 1.2 my little test takes about 13.2 seconds and when I change to
 using the
 Lucene 1.3 jar file the test takes 38 seconds.  I am not using the
 Snowball
 stemmers, and my code is as vanilla as it gets (I think).
 
 Is this a known problem?  Or is there a known fix?
 
 Thanks for any help.
 
 Michael Weir · Transform Research Inc. · 613.238.1363 x.114
 
 
 This message may contain privileged and/or confidential information. 
 If you
 have received this e-mail in error or are not the intended recipient,
 you
 may not use, copy, disseminate or distribute it; do not open any
 attachments, delete it immediately from your system and notify the
 sender
 promptly by e-mail that you have done so.  Thank you.
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Japanese Analyzer

2004-01-29 Thread Weir, Michael
Is the CJKAnalyzer the best to use for Japanese?  If not, which is?  If so,
from where can I download it?
Thanks.

Michael Weir . Transform Research Inc. . 613.238.1363 x.114

This message may contain privileged and/or confidential information.  If you
have received this e-mail in error or are not the intended recipient, you
may not use, copy, disseminate or distribute it; do not open any
attachments, delete it immediately from your system and notify the sender
promptly by e-mail that you have done so.  Thank you.


Re: Japanese Analyzer

2004-01-29 Thread Otis Gospodnetic
I think that's the only one we've got.
You can browse the Lucene Sandbox contributions directory, it's there.

Otis

--- Weir, Michael [EMAIL PROTECTED] wrote:
 Is the CJKAnalyzer the best to use for Japanese?  If not, which is? 
 If so,
 from where can I download it?
 Thanks.
 
 Michael Weir . Transform Research Inc. . 613.238.1363 x.114
 
 This message may contain privileged and/or confidential information. 
 If you
 have received this e-mail in error or are not the intended recipient,
 you
 may not use, copy, disseminate or distribute it; do not open any
 attachments, delete it immediately from your system and notify the
 sender
 promptly by e-mail that you have done so.  Thank you.
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Paid support for Lucene

2004-01-29 Thread Erik Hatcher
and eHatcher Solutions would be happy to as well :))



On Jan 29, 2004, at 12:16 PM, Ryan Ackley wrote:
I know of two:

http://superlinksoftware.com
http://jboss.org
- Original Message -
From: Boris Goldowsky [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, January 29, 2004 12:04 PM
Subject: Paid support for Lucene

Strangely, the web site does not seem to list any vendors who provide
incident support for Lucene.  That can't be right, can it?
Can anyone point me to organizations that would be willing to provide
support for Lucene issues?
Thanks,
Boris
--
Boris Goldowsky
[EMAIL PROTECTED]
www.goldowsky.com/consulting
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Paid support for Lucene

2004-01-29 Thread Dror Matalon
On Thu, Jan 29, 2004 at 01:46:12PM -0500, Erik Hatcher wrote:
 and eHatcher Solutions would be happy to as well :))

Recommended. Eric knows Lucene well and is very responsive.

 
 
 
 
 On Jan 29, 2004, at 12:16 PM, Ryan Ackley wrote:
 I know of two:
 
 http://superlinksoftware.com
 http://jboss.org
 
 - Original Message -
 From: Boris Goldowsky [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, January 29, 2004 12:04 PM
 Subject: Paid support for Lucene
 
 
 Strangely, the web site does not seem to list any vendors who provide
 incident support for Lucene.  That can't be right, can it?
 
 Can anyone point me to organizations that would be willing to provide
 support for Lucene issues?
 
 Thanks,
 Boris
 -- 
 Boris Goldowsky
 [EMAIL PROTECTED]
 www.goldowsky.com/consulting
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Paid support for Lucene

2004-01-29 Thread Otis Gospodnetic
Otis Gospodnetic


--- Boris Goldowsky [EMAIL PROTECTED] wrote:
 Strangely, the web site does not seem to list any vendors who provide
 incident support for Lucene.  That can't be right, can it?
 
 Can anyone point me to organizations that would be willing to provide
 support for Lucene issues?
 
 Thanks,
 Boris
 -- 
 Boris Goldowsky
 [EMAIL PROTECTED]
 www.goldowsky.com/consulting
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Paid support for Lucene

2004-01-29 Thread Dror Matalon
On Thu, Jan 29, 2004 at 10:59:36AM -0800, Otis Gospodnetic wrote:
 Otis Gospodnetic

Same as with Eric. Otis knows Lucene well and is very responsive.

Should have gone with my gut and recommended you gues in the first place, but 
didn't know if you were available for support. Would have saved 3 emails
to the list :-).

 
 
 --- Boris Goldowsky [EMAIL PROTECTED] wrote:
  Strangely, the web site does not seem to list any vendors who provide
  incident support for Lucene.  That can't be right, can it?
  
  Can anyone point me to organizations that would be willing to provide
  support for Lucene issues?
  
  Thanks,
  Boris
  -- 
  Boris Goldowsky
  [EMAIL PROTECTED]
  www.goldowsky.com/consulting
  
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Paid support for Lucene

2004-01-29 Thread Erik Hatcher
On Jan 29, 2004, at 1:56 PM, Dror Matalon wrote:
On Thu, Jan 29, 2004 at 01:46:12PM -0500, Erik Hatcher wrote:
and eHatcher Solutions would be happy to as well :))
Recommended. Eric knows Lucene well and is very responsive.
That should read very expensive :))  But we all know you get what you 
pay for.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Paid support for Lucene

2004-01-29 Thread Stefan Groschupf
I will not, but I would work to get a degree from mit.edu. B-)
Just kidding, I wouldn't do that.
http://www.ai.mit.edu/research/sponsors/sponsors.shtml
Peace!
Stefan



I am willing as well.

Scott

On Jan 29, 2004, at 12:04 PM, Boris Goldowsky wrote:

Strangely, the web site does not seem to list any vendors who provide
incident support for Lucene.  That can't be right, can it?
Can anyone point me to organizations that would be willing to provide
support for Lucene issues?
Thanks,
Boris
--
Boris Goldowsky
[EMAIL PROTECTED]
www.goldowsky.com/consulting
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

open technology:   www.media-style.com
open source:   www.weta-group.net
open discussion:www.text-mining.org
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


can i remove commit.lock while sreaching in my application

2004-01-29 Thread umamahesh bayireddya
hi all

I am using lucene in my web application.
I am using 2 index directories.  For the first time I will create index in 
index1 directory and next time I will create in index2. I will use flip-flop 
mechanism in two directories.
After creating index all the users who are searching will point to new index 
directory(this I will handle using flags).

So I won’t have searching and adding/deleting documents simultaneously to 
one index directory.

Now can I remove commit.lock file when I am search documents.
As I have a problem like in peak traffic searching is timing out in 
obtaining  commit.lock this is rare but I have to eliminate.

As I understood commit.lock is there not to write anything into index 
directory when someone is search, am I right?
Please let me know if it is wrong assumption

So I am thinking of remove commit.lock when searching for document as multi 
read is allowed on any operating system
If not please suggest me alternative

I am creating IndexSearch object whenever there is a request for searching. 
I will be creating index everyday after adding and removing file to source 
directory where my file reside.

Please give ur suggestion, alternative

Thanking you
mahesh
_
Contact brides  grooms FREE! http://www.shaadi.com/ptnr.php?ptnr=hmltag 
Only on www.shaadi.com. Register now!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


can I remove commit.lock file when I am search documents

2004-01-29 Thread umamahesh bayireddya
I am using lucene in my web application.
I am using 2 index directories.  For the first time I will create index in 
index1 directory and next time I will create in index2. I will use flip-flop 
mechanism in two directories.
After creating index all the users who are searching will point to new index 
directory(this I will handle using flags).

So I won’t have searching and adding/deleting documents simultaneously to 
one index directory.

Now can I remove commit.lock file when I am search documents.
As I have a problem like in peak traffic searching is timing out in 
obtaining  commit.lock this is rare but I have to eliminate.

As I understood commit.lock is there not to write anything into index 
directory when someone is search, am I right?
Please let me know if it is wrong assumption

So I am thinking of remove commit.lock when searching for document as multi 
read is allowed on any operating system
If not please suggest me alternative

I am creating IndexSearch object whenever there is a request for searching. 
I will be creating index everyday after adding and removing file to source 
directory where my file reside.

Please give ur suggestion, alternative

Thanking you
mahesh
_
Marriage? Join BharatMatrimony.com for free. 
http://www.bharatmatrimony.com/cgi-bin/bmclicks1.cgi?74

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]