Re: About HBase Integration

2010-02-08 Thread Ryan Smith
FWIW, there is a plugin for heritrix to write to hbase as a back end store. Maybe it will help for making a nutch plugin? http://code.google.com/p/hbase-writer -Ryan On Mon, Feb 8, 2010 at 4:32 AM, Hua Su huas...@gmail.com wrote: Hi all, Any recent progress on HBase integration? There is a

Re: [ANNOUNCE] Apache Nutch 1.0

2009-03-29 Thread Ryan Smith
Dennis, Thanks a lot. -Ryan 2009/3/28 Tony Wang ivyt...@gmail.com Hi Sami, Thank you so much for the good news. Is there going to be documentation for Solr integration? Sorry to Otis, I know you are going to ask me to try to find it out by myself ;) Thanks! - Tony On Sat, Mar 28, 2009

Re: [ANNOUNCE] Apache Nutch 1.0

2009-03-28 Thread Ryan Smith
Is it possible to use heritrix as nutch's crawler? On Sat, Mar 28, 2009 at 3:53 PM, Sami Siren ssi...@gmail.com wrote: I am pleased to announce the availability of Apache Nutch 1.0. Apache Nutch, a subproject of Apache Lucene, is open source web-search software. It builds on Lucene Java,

Re: [ANNOUNCE] Apache Nutch 1.0

2009-03-28 Thread Ryan Smith
to convert the arc files to segments. From there you can run other tools on the segments as normal. What you won't get is Heritrix access to the crawldb. Dennis Ryan Smith wrote: Is it possible to use heritrix as nutch's crawler? On Sat, Mar 28, 2009 at 3:53 PM, Sami Siren ssi

Re: httpclient and cookies

2008-12-11 Thread Ryan Smith
One way is you can try to enable debug logging in log4j so you can see the headers that httpclient is passing back and forth to the webserver. On Thu, Dec 11, 2008 at 10:29 AM, George Herlin [EMAIL PROTECTED] wrote: I have read that if one sets the plugin.includes property to use

Re: Indexing static html files

2008-07-06 Thread Ryan Smith
Ok, so you merge your other crawls into the same search dir, thats understood thanks. My other question is concerning when you do a search in nutch. Right now, it returns links to file:///x/y/z/.../foo.html and i was wondering if there was a simple way to change that link to be

Re: Indexing static html files

2008-07-05 Thread Ryan Smith
tell. Don't understand the logic, but there you are. Note, if you use a webserver, be aware you will have to disable IGNORE.INTERNAL setting in Nutch-Site.xml (you'll be messing around a lot in here). Cheers, Winton At 2:40 PM -0400 7/3/08, Ryan Smith wrote: Is there a simple way to have

Re: Indexing static html files

2008-07-05 Thread Ryan Smith
at 7:17 PM, Winton Davies [EMAIL PROTECTED] wrote: Hi Ryan, I just used the regular intranet crawl, didnt try to do the inject W At 6:16 PM -0400 7/5/08, Ryan Smith wrote: Winton, I added the override property to nutch-site.xml ( i saw the one in nutch-default.xml after your email

Indexing static html files

2008-07-03 Thread Ryan Smith
Is there a simple way to have nutch index a folder full of other folders and html files? I was hoping to avoid having to run apache to serve the html files, and then have nutch crawl the site on apache. Thank you, -Ryan