Google search algorithm
We all know Lucene algorithm (thanks to open source:). Anybody has a general idea of how Google search algorithm works? How is the page ranking (I don't mean the paid ones) determined by Google? I have strong interest to know this. Any idea or feedback will be appreciated. Thanks! Ardor __ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Google search algorithm
This is not quite related to Lucene but I found a web page that has quite a few links about this subject: http://www.google.com/search?q=google+page+ranksourceid=mozilla-searchstart=0start=0ie=utf-8oe=utf-8 :-). On Wed, Jan 28, 2004 at 11:10:28PM -0800, Ardor Wei wrote: We all know Lucene algorithm (thanks to open source:). Anybody has a general idea of how Google search algorithm works? How is the page ranking (I don't mean the paid ones) determined by Google? I have strong interest to know this. Any idea or feedback will be appreciated. Thanks! Ardor __ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Google search algorithm
I read somewhere that it used a hidden markov model. It checks each page and gives each link a click probability. It also gives a probability that the user will enter a new address instead of clicking a link. We then, by using a hidden markov model, calculate the probability that the user will be at a particular page after an infinite time using random browsing according to the probabilies found. This probability is then used as a basis for ranking results. Magnus Johansson We all know Lucene algorithm (thanks to open source:). Anybody has a general idea of how Google search algorithm works? How is the page ranking (I don't mean the paid ones) determined by Google? I have strong interest to know this. Any idea or feedback will be appreciated. Thanks! Ardor __ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Date Range support
Hi, I'm trying to create an index which can also be searched with date ranges. My first attempt using the Lucene date format ran in to trouble after my index grew and I couldn't search over more than a few days. I saw some other posts explaining why this happens and the suggestion seemed to be to use strings of the format MMdd. Using that format worked great until I remembered that my search needs to be able to support different timezones. Adding the hour to my field causes the same problem above and my queries stop working when using a range of about 2 months. I briefly looked at using the DateFilter but a good thread in the archive suggests this won't work too well under my conditions (http://java2.5341.com/msg/5138.html). I'm looking to index about 1000 documents for each day and my search ranges could be as narrow as one day or as broad as a year. At the moment I'm thinking of having two date fields, one formatted with MMdd and the other MMddHHmm and so get Lucene to do me a rough match down to an accuracy of +1 day either side of the range and then process the more detailed date outside of Lucene (to cope with timezones). I'm going to try it out, but if there is any simpler method I've missed I'd be happy to know. Thanks Tom. -- ___ Get your free email from http://www.mail.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: use Lucene LOCAL (looking for a frontend)
Hi Sebastian, there are not too many Lucene features used, and some rather orthogonal mixin of Formal Concept Analysis, but let me still advertise our little Docco tool: http://tockit.sourceforge.net/docco/index.html It is based on Lucene, comes with a couple of indexing tools (including HTML) and is Open Source (BSD licence). Source can be found here: http://sourceforge.net/cvs/?group_id=37081 (module name is docco) You can run Luke (http://www.getopt.org/luke/) on any index created by Docco to check out some more advanced features. HTH, Peter Sebastian Fey wrote: hi, my task is to implement a search engine to a documentation in HTML. the files are not online but local. But the getting started guide at lucene-home just explains howto set up lucene with tomcat. (ive never set up a webserver) I was able to create an index of my files, but now the web-frontend is missing. I think its in the luceneweb.war, right? So, my qustion, how can i use lucene local? Can someone provide a html-frontend? thx in advance, Sebastian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Performance difference between 1.2 and 1.3?
I am fairly new to Lucene and I have noticed a difference between Lucene 1.2RC1 (which came with our build of Cocoon) and the new Lucene 1.3Final. I am indexing about 400 very small documents, each in 10 languages. The document contents are basically a product name and description. With Lucene 1.2 my little test takes about 13.2 seconds and when I change to using the Lucene 1.3 jar file the test takes 38 seconds. I am not using the Snowball stemmers, and my code is as vanilla as it gets (I think). Is this a known problem? Or is there a known fix? Thanks for any help. Michael Weir · Transform Research Inc. · 613.238.1363 x.114 This message may contain privileged and/or confidential information. If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it; do not open any attachments, delete it immediately from your system and notify the sender promptly by e-mail that you have done so. Thank you.
Re: Performance difference between 1.2 and 1.3?
On Jan 29, 2004, at 9:00 AM, Weir, Michael wrote: I am fairly new to Lucene and I have noticed a difference between Lucene 1.2RC1 (which came with our build of Cocoon) and the new Lucene 1.3Final. I am indexing about 400 very small documents, each in 10 languages. The document contents are basically a product name and description. With Lucene 1.2 my little test takes about 13.2 seconds and when I change to using the Lucene 1.3 jar file the test takes 38 seconds. I am not using the Snowball stemmers, and my code is as vanilla as it gets (I think). Is this a known problem? Or is there a known fix? There is no known issue. Could you provide an easy-to-run example that demonstrates this difference in speed? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Paid support for Lucene
Strangely, the web site does not seem to list any vendors who provide incident support for Lucene. That can't be right, can it? Can anyone point me to organizations that would be willing to provide support for Lucene issues? Thanks, Boris -- Boris Goldowsky [EMAIL PROTECTED] www.goldowsky.com/consulting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Performance difference between 1.2 and 1.3?
Hello, This is not a known problem. The mention of Cocoon makes me think XML. What format are your documents in? If they are in XML, the first place to look for performance-related problems is the XML parser. It looks like you got a new version of Cocoon, so maybe this new version includes a different (version of a) XML parser. Otis --- Weir, Michael [EMAIL PROTECTED] wrote: I am fairly new to Lucene and I have noticed a difference between Lucene 1.2RC1 (which came with our build of Cocoon) and the new Lucene 1.3Final. I am indexing about 400 very small documents, each in 10 languages. The document contents are basically a product name and description. With Lucene 1.2 my little test takes about 13.2 seconds and when I change to using the Lucene 1.3 jar file the test takes 38 seconds. I am not using the Snowball stemmers, and my code is as vanilla as it gets (I think). Is this a known problem? Or is there a known fix? Thanks for any help. Michael Weir · Transform Research Inc. · 613.238.1363 x.114 This message may contain privileged and/or confidential information. If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it; do not open any attachments, delete it immediately from your system and notify the sender promptly by e-mail that you have done so. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Japanese Analyzer
Is the CJKAnalyzer the best to use for Japanese? If not, which is? If so, from where can I download it? Thanks. Michael Weir . Transform Research Inc. . 613.238.1363 x.114 This message may contain privileged and/or confidential information. If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it; do not open any attachments, delete it immediately from your system and notify the sender promptly by e-mail that you have done so. Thank you.
Re: Japanese Analyzer
I think that's the only one we've got. You can browse the Lucene Sandbox contributions directory, it's there. Otis --- Weir, Michael [EMAIL PROTECTED] wrote: Is the CJKAnalyzer the best to use for Japanese? If not, which is? If so, from where can I download it? Thanks. Michael Weir . Transform Research Inc. . 613.238.1363 x.114 This message may contain privileged and/or confidential information. If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it; do not open any attachments, delete it immediately from your system and notify the sender promptly by e-mail that you have done so. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Paid support for Lucene
and eHatcher Solutions would be happy to as well :)) On Jan 29, 2004, at 12:16 PM, Ryan Ackley wrote: I know of two: http://superlinksoftware.com http://jboss.org - Original Message - From: Boris Goldowsky [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, January 29, 2004 12:04 PM Subject: Paid support for Lucene Strangely, the web site does not seem to list any vendors who provide incident support for Lucene. That can't be right, can it? Can anyone point me to organizations that would be willing to provide support for Lucene issues? Thanks, Boris -- Boris Goldowsky [EMAIL PROTECTED] www.goldowsky.com/consulting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Paid support for Lucene
On Thu, Jan 29, 2004 at 01:46:12PM -0500, Erik Hatcher wrote: and eHatcher Solutions would be happy to as well :)) Recommended. Eric knows Lucene well and is very responsive. On Jan 29, 2004, at 12:16 PM, Ryan Ackley wrote: I know of two: http://superlinksoftware.com http://jboss.org - Original Message - From: Boris Goldowsky [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, January 29, 2004 12:04 PM Subject: Paid support for Lucene Strangely, the web site does not seem to list any vendors who provide incident support for Lucene. That can't be right, can it? Can anyone point me to organizations that would be willing to provide support for Lucene issues? Thanks, Boris -- Boris Goldowsky [EMAIL PROTECTED] www.goldowsky.com/consulting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Paid support for Lucene
Otis Gospodnetic --- Boris Goldowsky [EMAIL PROTECTED] wrote: Strangely, the web site does not seem to list any vendors who provide incident support for Lucene. That can't be right, can it? Can anyone point me to organizations that would be willing to provide support for Lucene issues? Thanks, Boris -- Boris Goldowsky [EMAIL PROTECTED] www.goldowsky.com/consulting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Paid support for Lucene
On Thu, Jan 29, 2004 at 10:59:36AM -0800, Otis Gospodnetic wrote: Otis Gospodnetic Same as with Eric. Otis knows Lucene well and is very responsive. Should have gone with my gut and recommended you gues in the first place, but didn't know if you were available for support. Would have saved 3 emails to the list :-). --- Boris Goldowsky [EMAIL PROTECTED] wrote: Strangely, the web site does not seem to list any vendors who provide incident support for Lucene. That can't be right, can it? Can anyone point me to organizations that would be willing to provide support for Lucene issues? Thanks, Boris -- Boris Goldowsky [EMAIL PROTECTED] www.goldowsky.com/consulting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Paid support for Lucene
On Jan 29, 2004, at 1:56 PM, Dror Matalon wrote: On Thu, Jan 29, 2004 at 01:46:12PM -0500, Erik Hatcher wrote: and eHatcher Solutions would be happy to as well :)) Recommended. Eric knows Lucene well and is very responsive. That should read very expensive :)) But we all know you get what you pay for. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Paid support for Lucene
I will not, but I would work to get a degree from mit.edu. B-) Just kidding, I wouldn't do that. http://www.ai.mit.edu/research/sponsors/sponsors.shtml Peace! Stefan I am willing as well. Scott On Jan 29, 2004, at 12:04 PM, Boris Goldowsky wrote: Strangely, the web site does not seem to list any vendors who provide incident support for Lucene. That can't be right, can it? Can anyone point me to organizations that would be willing to provide support for Lucene issues? Thanks, Boris -- Boris Goldowsky [EMAIL PROTECTED] www.goldowsky.com/consulting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] open technology: www.media-style.com open source: www.weta-group.net open discussion:www.text-mining.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
can i remove commit.lock while sreaching in my application
hi all I am using lucene in my web application. I am using 2 index directories. For the first time I will create index in index1 directory and next time I will create in index2. I will use flip-flop mechanism in two directories. After creating index all the users who are searching will point to new index directory(this I will handle using flags). So I wont have searching and adding/deleting documents simultaneously to one index directory. Now can I remove commit.lock file when I am search documents. As I have a problem like in peak traffic searching is timing out in obtaining commit.lock this is rare but I have to eliminate. As I understood commit.lock is there not to write anything into index directory when someone is search, am I right? Please let me know if it is wrong assumption So I am thinking of remove commit.lock when searching for document as multi read is allowed on any operating system If not please suggest me alternative I am creating IndexSearch object whenever there is a request for searching. I will be creating index everyday after adding and removing file to source directory where my file reside. Please give ur suggestion, alternative Thanking you mahesh _ Contact brides grooms FREE! http://www.shaadi.com/ptnr.php?ptnr=hmltag Only on www.shaadi.com. Register now! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
can I remove commit.lock file when I am search documents
I am using lucene in my web application. I am using 2 index directories. For the first time I will create index in index1 directory and next time I will create in index2. I will use flip-flop mechanism in two directories. After creating index all the users who are searching will point to new index directory(this I will handle using flags). So I wont have searching and adding/deleting documents simultaneously to one index directory. Now can I remove commit.lock file when I am search documents. As I have a problem like in peak traffic searching is timing out in obtaining commit.lock this is rare but I have to eliminate. As I understood commit.lock is there not to write anything into index directory when someone is search, am I right? Please let me know if it is wrong assumption So I am thinking of remove commit.lock when searching for document as multi read is allowed on any operating system If not please suggest me alternative I am creating IndexSearch object whenever there is a request for searching. I will be creating index everyday after adding and removing file to source directory where my file reside. Please give ur suggestion, alternative Thanking you mahesh _ Marriage? Join BharatMatrimony.com for free. http://www.bharatmatrimony.com/cgi-bin/bmclicks1.cgi?74 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]