Re: New Lucene-powered Website
Could you add a Lucene logo somewhere on your search results, as noted here: http://jakarta.apache.org/lucene/docs/powered.html ? Thanks! Otis --- Ulrich Mayring [EMAIL PROTECTED] wrote: Hello, we (DENIC) are the world's second largest domain registry (.de-zone has almost 6.9 million domains) and are using Lucene to index and search our website in a high-traffic scenario. Most of our web pages are available in English in addition to our native language German. If you want to try our Lucene-based search engine, please start here: http://www.denic.de/en/special/index.jsp Use the input field on the page to search our website. Don't use the input field at the top right, that is only for searching domains in our domain database, it has nothing to do with Lucene. The indexes for German and English are seperate, so you should find only English pages from that page. A somewhat interesting feature is the summarizer, on the results page you'll get a short summary of the page. These are not hand-written blurbs, rather they are generated automatically from the HTML pages at indexing time. I'd be especially interested in improvement suggestions in this area. Naturally, the automatically generated texts don't have the same quality as hand-written ones. But they're better than nothing and in my eyes more useful than Google-style excerpts. How many times has it happened to you that the Google excerpt doesn't really tell you anything, because it's totally out of context? Summaries tell you what the whole page is about, irregardless of the context within which your search terms may appear. After reading the summary you should (hopefully) be able to decide whether the page contains the info you're looking for. Comments welcome! We're using the snowball stemmers/analyzers for German and English, custom stopword lists and the HTML parser from the Sourceforge htmlparser project. Apart from that it's vanilla Lucene. cheers, Ulrich - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Lucene-powered Website
Otis Gospodnetic wrote: Could you add a Lucene logo somewhere on your search results, as noted here: http://jakarta.apache.org/lucene/docs/powered.html ? Will suggest that to the powers that be :) Ulrich - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Lucene-powered Website
Ok, let us know if you can add it. Otis --- Ulrich Mayring [EMAIL PROTECTED] wrote: Otis Gospodnetic wrote: Could you add a Lucene logo somewhere on your search results, as noted here: http://jakarta.apache.org/lucene/docs/powered.html ? Will suggest that to the powers that be :) Ulrich - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: New Lucene-powered Website
Hi, I am very keen on using the New luceneweb. Has anyone managed to run luceneweb successfully on Windows? The instructions in luceneweb seems to support unix more than windows. Anyone has the install instructions for windows to run luceneweb? I cannot even see the first page when I start tomcat though I have the weblucene in the webapps directory. Can anyone help? Please. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 8:35 PM To: Lucene Users List Subject: Re: New Lucene-powered Website Could you add a Lucene logo somewhere on your search results, as noted here: http://jakarta.apache.org/lucene/docs/powered.html ? Thanks! Otis --- Ulrich Mayring [EMAIL PROTECTED] wrote: Hello, we (DENIC) are the world's second largest domain registry (.de-zone has almost 6.9 million domains) and are using Lucene to index and search our website in a high-traffic scenario. Most of our web pages are available in English in addition to our native language German. If you want to try our Lucene-based search engine, please start here: http://www.denic.de/en/special/index.jsp Use the input field on the page to search our website. Don't use the input field at the top right, that is only for searching domains in our domain database, it has nothing to do with Lucene. The indexes for German and English are seperate, so you should find only English pages from that page. A somewhat interesting feature is the summarizer, on the results page you'll get a short summary of the page. These are not hand-written blurbs, rather they are generated automatically from the HTML pages at indexing time. I'd be especially interested in improvement suggestions in this area. Naturally, the automatically generated texts don't have the same quality as hand-written ones. But they're better than nothing and in my eyes more useful than Google-style excerpts. How many times has it happened to you that the Google excerpt doesn't really tell you anything, because it's totally out of context? Summaries tell you what the whole page is about, irregardless of the context within which your search terms may appear. After reading the summary you should (hopefully) be able to decide whether the page contains the info you're looking for. Comments welcome! We're using the snowball stemmers/analyzers for German and English, custom stopword lists and the HTML parser from the Sourceforge htmlparser project. Apart from that it's vanilla Lucene. cheers, Ulrich - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Lucene-powered Website
On Tuesday, December 2, 2003, at 07:34 AM, Otis Gospodnetic wrote: Could you add a Lucene logo somewhere on your search results, as noted here: http://jakarta.apache.org/lucene/docs/powered.html ? I thought we were going to loosen up the requirement to have the logo on a search results page? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: New Lucene-powered Website
Hello, This is the first time that I noticed this. Is the 'powered by Lucene' a legal requirement? Or just a suggestion? Does it apply to any system embedding Lucene (web pages, applications, etc)? That is not covered in the Apache Software License, I believe. Just curious... Tate -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 9:26 AM To: Lucene Users List Subject: Re: New Lucene-powered Website There was discussion about it, yes. I don't think we ever reached any conclusions, and the powered.html still says 'include the logo'. Otis --- Erik Hatcher [EMAIL PROTECTED] wrote: On Tuesday, December 2, 2003, at 07:34 AM, Otis Gospodnetic wrote: Could you add a Lucene logo somewhere on your search results, as noted here: http://jakarta.apache.org/lucene/docs/powered.html ? I thought we were going to loosen up the requirement to have the logo on a search results page? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Lucene-powered Website
On Tuesday, December 2, 2003, at 09:32 AM, Tate Avery wrote: Hello, This is the first time that I noticed this. Is the 'powered by Lucene' a legal requirement? Or just a suggestion? Does it apply to any system embedding Lucene (web pages, applications, etc)? That is not covered in the Apache Software License, I believe. It's only a (loose) requirement in order to be listed on the powered by page, that is all. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Lucene-powered Website (TO Tun Lin)
Anyone has the install instructions for windows to run luceneweb? I cannot even see the first page when I start tomcat though I have the weblucene in the webapps directory. Can anyone help? Please. it's a bug with the tar ball of weblucene, we'll fix the bug asap, and some little index will be added into the tar ball. please be patient! Good Luck!
RE: New Lucene-powered Website (TO Tun Lin)
Hi, It's ok. Take your time. :-) -Original Message- From: lhelper [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 03, 2003 9:29 AM To: Lucene Users List; [EMAIL PROTECTED] Subject: Re: New Lucene-powered Website (TO Tun Lin) Anyone has the install instructions for windows to run luceneweb? I cannot even see the first page when I start tomcat though I have the weblucene in the webapps directory. Can anyone help? Please. it's a bug with the tar ball of weblucene, we'll fix the bug asap, and some little index will be added into the tar ball. please be patient! Good Luck! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Lucene-powered Website
Chong, Herb wrote: can you share a description of the heuristics you used to clean up the text? i am facing the same problem right now handling email. i'm not interested in the rules you use as much as the tools you use to implement the rules. The tools... well, Java ;-) The search engine is a custom Java application, which uses Lucene. The heuristics are not very general at this point, they are tailored to our domain. So what you are hinting at (a generic rules description language to customize to the local domain) seems appropriate. Our rules are things like anything within h1.../h1 is an important sentence and we add a full-stop at the end. Ulrich - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Lucene-powered Website
nice and fast ;-) would be interesting though to know how you implemented the summarizer. regards Akmal - Original Message - From: Ulrich Mayring [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, November 27, 2003 12:29 PM Subject: New Lucene-powered Website Hello, we (DENIC) are the world's second largest domain registry (.de-zone has almost 6.9 million domains) and are using Lucene to index and search our website in a high-traffic scenario. Most of our web pages are available in English in addition to our native language German. If you want to try our Lucene-based search engine, please start here: http://www.denic.de/en/special/index.jsp Use the input field on the page to search our website. Don't use the input field at the top right, that is only for searching domains in our domain database, it has nothing to do with Lucene. The indexes for German and English are seperate, so you should find only English pages from that page. A somewhat interesting feature is the summarizer, on the results page you'll get a short summary of the page. These are not hand-written blurbs, rather they are generated automatically from the HTML pages at indexing time. I'd be especially interested in improvement suggestions in this area. Naturally, the automatically generated texts don't have the same quality as hand-written ones. But they're better than nothing and in my eyes more useful than Google-style excerpts. How many times has it happened to you that the Google excerpt doesn't really tell you anything, because it's totally out of context? Summaries tell you what the whole page is about, irregardless of the context within which your search terms may appear. After reading the summary you should (hopefully) be able to decide whether the page contains the info you're looking for. Comments welcome! We're using the snowball stemmers/analyzers for German and English, custom stopword lists and the HTML parser from the Sourceforge htmlparser project. Apart from that it's vanilla Lucene. cheers, Ulrich - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]