Re: New Lucene-powered Website

2003-12-02 Thread Otis Gospodnetic
Could you add a Lucene logo somewhere on your search results, as noted
here:
http://jakarta.apache.org/lucene/docs/powered.html ?

Thanks!
Otis


--- Ulrich Mayring [EMAIL PROTECTED] wrote:
 Hello,
 
 we (DENIC) are the world's second largest domain registry (.de-zone
 has 
 almost 6.9 million domains) and are using Lucene to index and search
 our 
 website in a high-traffic scenario. Most of our web pages are
 available 
 in English in addition to our native language German. If you want to
 try 
 our Lucene-based search engine, please start here:
 
 http://www.denic.de/en/special/index.jsp
 
 Use the input field on the page to search our website. Don't use the 
 input field at the top right, that is only for searching domains in
 our 
 domain database, it has nothing to do with Lucene.
 
 The indexes for German and English are seperate, so you should find
 only 
 English pages from that page.
 
 A somewhat interesting feature is the summarizer, on the results page
 
 you'll get a short summary of the page. These are not hand-written 
 blurbs, rather they are generated automatically from the HTML pages
 at 
 indexing time. I'd be especially interested in improvement
 suggestions 
 in this area.
 
 Naturally, the automatically generated texts don't have the same
 quality 
 as hand-written ones. But they're better than nothing and in my eyes 
 more useful than Google-style excerpts. How many times has it
 happened 
 to you that the Google excerpt doesn't really tell you anything,
 because 
 it's totally out of context? Summaries tell you what the whole page
 is 
 about, irregardless of the context within which your search terms may
 
 appear. After reading the summary you should (hopefully) be able to 
 decide whether the page contains the info you're looking for.
 Comments 
 welcome!
 
 We're using the snowball stemmers/analyzers for German and English, 
 custom stopword lists and the HTML parser from the Sourceforge 
 htmlparser project. Apart from that it's vanilla Lucene.
 
 cheers,
 
 Ulrich
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New Lucene-powered Website

2003-12-02 Thread Ulrich Mayring
Otis Gospodnetic wrote:
Could you add a Lucene logo somewhere on your search results, as noted
here:
http://jakarta.apache.org/lucene/docs/powered.html ?
Will suggest that to the powers that be :)

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: New Lucene-powered Website

2003-12-02 Thread Otis Gospodnetic
Ok, let us know if you can add it.

Otis

--- Ulrich Mayring [EMAIL PROTECTED] wrote:
 Otis Gospodnetic wrote:
  Could you add a Lucene logo somewhere on your search results, as
 noted
  here:
  http://jakarta.apache.org/lucene/docs/powered.html ?
 
 Will suggest that to the powers that be :)
 
 Ulrich
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: New Lucene-powered Website

2003-12-02 Thread Tun Lin
Hi,

I am very keen on using the New luceneweb. Has anyone managed to run luceneweb
successfully on Windows? 

The instructions in luceneweb seems to support unix more than windows. 

Anyone has the install instructions for windows to run luceneweb? I cannot even
see the first page when I start tomcat though I have the weblucene in the
webapps directory.

Can anyone help? Please.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2003 8:35 PM
To: Lucene Users List
Subject: Re: New Lucene-powered Website

Could you add a Lucene logo somewhere on your search results, as noted
here:
http://jakarta.apache.org/lucene/docs/powered.html ?

Thanks!
Otis


--- Ulrich Mayring [EMAIL PROTECTED] wrote:
 Hello,
 
 we (DENIC) are the world's second largest domain registry (.de-zone 
 has almost 6.9 million domains) and are using Lucene to index and 
 search our website in a high-traffic scenario. Most of our web pages 
 are available in English in addition to our native language German. If 
 you want to try our Lucene-based search engine, please start here:
 
 http://www.denic.de/en/special/index.jsp
 
 Use the input field on the page to search our website. Don't use the 
 input field at the top right, that is only for searching domains in 
 our domain database, it has nothing to do with Lucene.
 
 The indexes for German and English are seperate, so you should find 
 only English pages from that page.
 
 A somewhat interesting feature is the summarizer, on the results page
 
 you'll get a short summary of the page. These are not hand-written 
 blurbs, rather they are generated automatically from the HTML pages at 
 indexing time. I'd be especially interested in improvement suggestions 
 in this area.
 
 Naturally, the automatically generated texts don't have the same 
 quality as hand-written ones. But they're better than nothing and in 
 my eyes more useful than Google-style excerpts. How many times has it 
 happened to you that the Google excerpt doesn't really tell you 
 anything, because it's totally out of context? Summaries tell you what 
 the whole page is about, irregardless of the context within which your 
 search terms may
 
 appear. After reading the summary you should (hopefully) be able to 
 decide whether the page contains the info you're looking for.
 Comments
 welcome!
 
 We're using the snowball stemmers/analyzers for German and English, 
 custom stopword lists and the HTML parser from the Sourceforge 
 htmlparser project. Apart from that it's vanilla Lucene.
 
 cheers,
 
 Ulrich
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New Lucene-powered Website

2003-12-02 Thread Erik Hatcher
On Tuesday, December 2, 2003, at 07:34  AM, Otis Gospodnetic wrote:
Could you add a Lucene logo somewhere on your search results, as noted
here:
http://jakarta.apache.org/lucene/docs/powered.html ?
I thought we were going to loosen up the requirement to have the logo 
on a search results page?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: New Lucene-powered Website

2003-12-02 Thread Tate Avery
Hello,

This is the first time that I noticed this.

Is the 'powered by Lucene' a legal requirement?  Or just a suggestion?
Does it apply to any system embedding Lucene (web pages, applications, etc)?
That is not covered in the Apache Software License, I believe.

Just curious...

Tate



-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 02, 2003 9:26 AM
To: Lucene Users List
Subject: Re: New Lucene-powered Website


There was discussion about it, yes.  I don't think we ever reached any
conclusions, and the powered.html still says 'include the logo'.

Otis

--- Erik Hatcher [EMAIL PROTECTED] wrote:
 On Tuesday, December 2, 2003, at 07:34  AM, Otis Gospodnetic wrote:
  Could you add a Lucene logo somewhere on your search results, as
 noted
  here:
  http://jakarta.apache.org/lucene/docs/powered.html ?
 
 I thought we were going to loosen up the requirement to have the logo
 
 on a search results page?
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New Lucene-powered Website

2003-12-02 Thread Erik Hatcher
On Tuesday, December 2, 2003, at 09:32  AM, Tate Avery wrote:
Hello,

This is the first time that I noticed this.

Is the 'powered by Lucene' a legal requirement?  Or just a suggestion?
Does it apply to any system embedding Lucene (web pages, applications, 
etc)?
That is not covered in the Apache Software License, I believe.
It's only a (loose) requirement in order to be listed on the powered 
by page, that is all.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: New Lucene-powered Website (TO Tun Lin)

2003-12-02 Thread lhelper
 Anyone has the install instructions for windows to run luceneweb? I cannot even
 see the first page when I start tomcat though I have the weblucene in the
 webapps directory.
 
 Can anyone help? Please.
 
it's a bug with the tar ball of weblucene, we'll fix the bug asap, and some little 
index will be added into the tar ball.
please be patient!

Good Luck!

RE: New Lucene-powered Website (TO Tun Lin)

2003-12-02 Thread Tun Lin
Hi,

It's ok. Take your time. :-) 

-Original Message-
From: lhelper [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 03, 2003 9:29 AM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: New Lucene-powered Website (TO Tun Lin)

 Anyone has the install instructions for windows to run luceneweb? I 
 cannot even see the first page when I start tomcat though I have the 
 weblucene in the webapps directory.
 
 Can anyone help? Please.
 
it's a bug with the tar ball of weblucene, we'll fix the bug asap, and some
little index will be added into the tar ball.
please be patient!

Good Luck!



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New Lucene-powered Website

2003-12-01 Thread Ulrich Mayring
Chong, Herb wrote:
can you share a description of the heuristics you used to clean up the text? i am facing the same problem right now handling email. i'm not interested in the rules you use as much as the tools you use to implement the rules.
The tools... well, Java ;-)

The search engine is a custom Java application, which uses Lucene. The 
heuristics are not very general at this point, they are tailored to our 
domain. So what you are hinting at (a generic rules description language 
to customize to the local domain) seems appropriate. Our rules are 
things like anything within h1.../h1 is an important sentence and 
we add a full-stop at the end.

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: New Lucene-powered Website

2003-11-28 Thread Akmal Sarhan
nice and fast ;-)

would be interesting though to know how you implemented the summarizer.

regards
Akmal
- Original Message - 
From: Ulrich Mayring [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, November 27, 2003 12:29 PM
Subject: New Lucene-powered Website


 Hello,
 
 we (DENIC) are the world's second largest domain registry (.de-zone has 
 almost 6.9 million domains) and are using Lucene to index and search our 
 website in a high-traffic scenario. Most of our web pages are available 
 in English in addition to our native language German. If you want to try 
 our Lucene-based search engine, please start here:
 
 http://www.denic.de/en/special/index.jsp
 
 Use the input field on the page to search our website. Don't use the 
 input field at the top right, that is only for searching domains in our 
 domain database, it has nothing to do with Lucene.
 
 The indexes for German and English are seperate, so you should find only 
 English pages from that page.
 
 A somewhat interesting feature is the summarizer, on the results page 
 you'll get a short summary of the page. These are not hand-written 
 blurbs, rather they are generated automatically from the HTML pages at 
 indexing time. I'd be especially interested in improvement suggestions 
 in this area.
 
 Naturally, the automatically generated texts don't have the same quality 
 as hand-written ones. But they're better than nothing and in my eyes 
 more useful than Google-style excerpts. How many times has it happened 
 to you that the Google excerpt doesn't really tell you anything, because 
 it's totally out of context? Summaries tell you what the whole page is 
 about, irregardless of the context within which your search terms may 
 appear. After reading the summary you should (hopefully) be able to 
 decide whether the page contains the info you're looking for. Comments 
 welcome!
 
 We're using the snowball stemmers/analyzers for German and English, 
 custom stopword lists and the HTML parser from the Sourceforge 
 htmlparser project. Apart from that it's vanilla Lucene.
 
 cheers,
 
 Ulrich
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]