Re: Re:_HTML_parser

2002-04-24 Thread Kelvin Tan

Otis, what's the final conclusion you've arrived at regarding the HTML
filter/parsing?

I have pretty much the same requirements as you do right now (extract text,
and obtain the title).

Kelvin

- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, April 22, 2002 12:27 AM
Subject: Re:_HTML_parser


 Laura,

 http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=Spindleq=b

 Oops, it's JoBo, not MoJo :)
 http://www.matuschek.net/software/jobo/

 Otis

 --- [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  Hi Otis,
 
  thanks for your reply. I have been looking for Spindle and Mojo for 2
 
  hours but I don't found anything.
 
  Can you help me? Wher can I find something?
 
  Thanks for your help and time
 
 
  Laura
 
 
 
 
   Laura,
  
   Search the lucene-user and lucene-dev archives for things like:
   crawler
   spider
   spindle
   lucene sandbox
  
   Spindle is something you may want to look at, as is MoJo (not
  mentione
  d
   on lucene lists, use Google).
  
   Otis
  
Did someone solve the problem to spider recursively a web pages?
  
 While trying to research the same thing, I found the
following...here
's a
 good example of link extraction.

 Try http://www.quiotix.com/opensource/html-parser

 Its easy to write a Visitor which extracts the links; should
  take
abou
t ten
 lines of code.
  
  
   __
   Do You Yahoo!?
   Yahoo! Games - play chess, backgammon, pool and more
   http://games.yahoo.com/
  
   --
   To unsubscribe, e-mail:   mailto:lucene-user-
  [EMAIL PROTECTED]
   For additional commands, e-mail: mailto:lucene-user-
  [EMAIL PROTECTED]
  
  


 __
 Do You Yahoo!?
 Yahoo! Games - play chess, backgammon, pool and more
 http://games.yahoo.com/

 --
 To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re:_HTML_parser

2002-04-22 Thread [EMAIL PROTECTED]

Hi all,

did someone try jobo?

It seems a good software which can be extended.

Has someone some experiences about it?

Laura


 Laura,
 
 http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=Spindleq=b
 
 Oops, it's JoBo, not MoJo :)
 http://www.matuschek.net/software/jobo/
 
 Otis
 
 --- [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  Hi Otis,
  
  thanks for your reply. I have been looking for Spindle and Mojo for 
2
  
  hours but I don't found anything.
  
  Can you help me? Wher can I find something?
  
  Thanks for your help and time
  
  
  Laura
  
  

  
   Laura,
   
   Search the lucene-user and lucene-dev archives for things like:
   crawler
   spider
   spindle
   lucene sandbox
   
   Spindle is something you may want to look at, as is MoJo (not
  mentione
  d
   on lucene lists, use Google).
   
   Otis
   
Did someone solve the problem to spider recursively a web pages?
   
 While trying to research the same thing, I found the
following...here
's a 
 good example of link extraction.
 
 Try http://www.quiotix.com/opensource/html-parser
 
 Its easy to write a Visitor which extracts the links; should
  take
abou
t ten 
 lines of code.
   
   
   __
   Do You Yahoo!?
   Yahoo! Games - play chess, backgammon, pool and more
   http://games.yahoo.com/
   
   --
   To unsubscribe, e-mail:   mailto:lucene-user-
  [EMAIL PROTECTED]
   For additional commands, e-mail: mailto:lucene-user-
  [EMAIL PROTECTED]
   
   
 
 
 __
 Do You Yahoo!?
 Yahoo! Games - play chess, backgammon, pool and more
 http://games.yahoo.com/
 
 --
 To unsubscribe, e-mail:   mailto:lucene-user-
[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:lucene-user-
[EMAIL PROTECTED]
 
 


Re:_HTML_parser

2002-04-21 Thread Otis Gospodnetic

Laura,

http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=Spindleq=b

Oops, it's JoBo, not MoJo :)
http://www.matuschek.net/software/jobo/

Otis

--- [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi Otis,
 
 thanks for your reply. I have been looking for Spindle and Mojo for 2
 
 hours but I don't found anything.
 
 Can you help me? Wher can I find something?
 
 Thanks for your help and time
 
 
 Laura
 
 
   
 
  Laura,
  
  Search the lucene-user and lucene-dev archives for things like:
  crawler
  spider
  spindle
  lucene sandbox
  
  Spindle is something you may want to look at, as is MoJo (not
 mentione
 d
  on lucene lists, use Google).
  
  Otis
  
   Did someone solve the problem to spider recursively a web pages?
  
While trying to research the same thing, I found the
   following...here
   's a 
good example of link extraction.

Try http://www.quiotix.com/opensource/html-parser

Its easy to write a Visitor which extracts the links; should
 take
   abou
   t ten 
lines of code.
  
  
  __
  Do You Yahoo!?
  Yahoo! Games - play chess, backgammon, pool and more
  http://games.yahoo.com/
  
  --
  To unsubscribe, e-mail:   mailto:lucene-user-
 [EMAIL PROTECTED]
  For additional commands, e-mail: mailto:lucene-user-
 [EMAIL PROTECTED]
  
  


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Re:_HTML_parser

2002-04-21 Thread James Cooper

On Sun, 21 Apr 2002, [iso-8859-1] [EMAIL PROTECTED] wrote:

 thanks for your reply. I have been looking for Spindle and Mojo for 2 
 hours but I don't found anything.

spindle is at:

http://www.bitmechanic.com/projects/spindle/

cheers

-- James


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]