Re: [jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2012-01-19 Thread Lewis John Mcgibbney
Thanks for dropping this on Remi. For future reference you might want to check out this online book on subversion [1]. Here at Nutch we use subversion for SCM and therefore this is the program we use to create patches, applying them and hopefully improving Nutch in the process ;0) It's straight

Get target URL of redirects

2012-01-19 Thread Markus Jelsma
Hi, Why is it so hard to get the target URL of a redirect? I have to get the protocolstatus out of the crawl datum's metadata and then get the first arg of ProtocolStatus' args? Can it have more than 1 arg? Is there a decent method to get the URL? At first i assumed _repr_ key would return

make nutch plugin to get termfreqvectors

2012-01-19 Thread Ale
Hi, I'm quite new working with nutch plugins. I'm trying to save the termfreqvectors of the documents. I'm using nutch 1.4 I've seen that I had to use, in the plugin class, the method addFieldOption, like: -- public void addIndexBackendOptions(Configuration conf) {    //add lucene