Re: Re:_HTML_parser
Otis, what's the final conclusion you've arrived at regarding the HTML filter/parsing? I have pretty much the same requirements as you do right now (extract text, and obtain the title). Kelvin - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, April 22, 2002 12:27 AM Subject: Re:_HTML_parser Laura, http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=Spindleq=b Oops, it's JoBo, not MoJo :) http://www.matuschek.net/software/jobo/ Otis --- [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi Otis, thanks for your reply. I have been looking for Spindle and Mojo for 2 hours but I don't found anything. Can you help me? Wher can I find something? Thanks for your help and time Laura Laura, Search the lucene-user and lucene-dev archives for things like: crawler spider spindle lucene sandbox Spindle is something you may want to look at, as is MoJo (not mentione d on lucene lists, use Google). Otis Did someone solve the problem to spider recursively a web pages? While trying to research the same thing, I found the following...here 's a good example of link extraction. Try http://www.quiotix.com/opensource/html-parser Its easy to write a Visitor which extracts the links; should take abou t ten lines of code. __ Do You Yahoo!? Yahoo! Games - play chess, backgammon, pool and more http://games.yahoo.com/ -- To unsubscribe, e-mail: mailto:lucene-user- [EMAIL PROTECTED] For additional commands, e-mail: mailto:lucene-user- [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Games - play chess, backgammon, pool and more http://games.yahoo.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re:_HTML_parser
Hi all, did someone try jobo? It seems a good software which can be extended. Has someone some experiences about it? Laura Laura, http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=Spindleq=b Oops, it's JoBo, not MoJo :) http://www.matuschek.net/software/jobo/ Otis --- [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi Otis, thanks for your reply. I have been looking for Spindle and Mojo for 2 hours but I don't found anything. Can you help me? Wher can I find something? Thanks for your help and time Laura Laura, Search the lucene-user and lucene-dev archives for things like: crawler spider spindle lucene sandbox Spindle is something you may want to look at, as is MoJo (not mentione d on lucene lists, use Google). Otis Did someone solve the problem to spider recursively a web pages? While trying to research the same thing, I found the following...here 's a good example of link extraction. Try http://www.quiotix.com/opensource/html-parser Its easy to write a Visitor which extracts the links; should take abou t ten lines of code. __ Do You Yahoo!? Yahoo! Games - play chess, backgammon, pool and more http://games.yahoo.com/ -- To unsubscribe, e-mail: mailto:lucene-user- [EMAIL PROTECTED] For additional commands, e-mail: mailto:lucene-user- [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Games - play chess, backgammon, pool and more http://games.yahoo.com/ -- To unsubscribe, e-mail: mailto:lucene-user- [EMAIL PROTECTED] For additional commands, e-mail: mailto:lucene-user- [EMAIL PROTECTED]
Re:_HTML_parser
Laura, http://marc.theaimsgroup.com/?l=lucene-userw=2r=1s=Spindleq=b Oops, it's JoBo, not MoJo :) http://www.matuschek.net/software/jobo/ Otis --- [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi Otis, thanks for your reply. I have been looking for Spindle and Mojo for 2 hours but I don't found anything. Can you help me? Wher can I find something? Thanks for your help and time Laura Laura, Search the lucene-user and lucene-dev archives for things like: crawler spider spindle lucene sandbox Spindle is something you may want to look at, as is MoJo (not mentione d on lucene lists, use Google). Otis Did someone solve the problem to spider recursively a web pages? While trying to research the same thing, I found the following...here 's a good example of link extraction. Try http://www.quiotix.com/opensource/html-parser Its easy to write a Visitor which extracts the links; should take abou t ten lines of code. __ Do You Yahoo!? Yahoo! Games - play chess, backgammon, pool and more http://games.yahoo.com/ -- To unsubscribe, e-mail: mailto:lucene-user- [EMAIL PROTECTED] For additional commands, e-mail: mailto:lucene-user- [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Games - play chess, backgammon, pool and more http://games.yahoo.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Re:_HTML_parser
On Sun, 21 Apr 2002, [iso-8859-1] [EMAIL PROTECTED] wrote: thanks for your reply. I have been looking for Spindle and Mojo for 2 hours but I don't found anything. spindle is at: http://www.bitmechanic.com/projects/spindle/ cheers -- James -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]