[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-03-02 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: NUTCH-1024-1.5-1.patch New patch for trunk! This also includes a change to the

[jira] [Created] (NUTCH-1295) nutchgora restlet dependencies failing when remote repos is down

2012-03-02 Thread Ferdy Galema (Created) (JIRA)
nutchgora restlet dependencies failing when remote repos is down Key: NUTCH-1295 URL: https://issues.apache.org/jira/browse/NUTCH-1295 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1295) nutchgora restlet dependencies failing when remote repos is down

2012-03-02 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1295: Attachment: NUTCH-1295.patch nutchgora restlet dependencies failing when remote repos is down

[jira] [Closed] (NUTCH-1295) nutchgora restlet dependencies failing when remote repos is down

2012-03-02 Thread Ferdy Galema (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1295. --- Resolution: Fixed committed nutchgora restlet dependencies failing when remote

Drawing an analogy between AdaptiveFetchSchedule and AdaptiveCrawlDelay

2012-03-02 Thread Lewis John Mcgibbney
Hi Guys, As there were some comments on the user list, I recently got digging with http redirects then stumbled across NUTCH-1042. Although these are individual issues e.g. redirects and crawl delays, I think they are certainly linked, however what is interesting is that users 'usually' don't

Re: Drawing an analogy between AdaptiveFetchSchedule and AdaptiveCrawlDelay

2012-03-02 Thread Andrzej Bialecki
On 02/03/2012 12:45, Lewis John Mcgibbney wrote: Hi Guys, As there were some comments on the user list, I recently got digging with http redirects then stumbled across NUTCH-1042. Although these are individual issues e.g. redirects and crawl delays, I think they are certainly linked, however

[jira] [Updated] (NUTCH-1273) Fix [deprecation] javac warnings

2012-03-02 Thread Lewis John McGibbney (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1273: Attachment: NUTCH-1273-v2-trunk.patch This patch goes some length to address the

Re: Drawing an analogy between AdaptiveFetchSchedule and AdaptiveCrawlDelay

2012-03-02 Thread Lewis John Mcgibbney
Hi Andrzej, On Fri, Mar 2, 2012 at 12:37 PM, Andrzej Bialecki a...@getopt.org wrote: Fetcher2 is the current Fetcher. The original Fetcher was temporarily renamed OldFetcher and then removed. So looks like this 'might' be more straight forward to implement than I originally thought. When I

Nutch with Letor

2012-03-02 Thread varunpandeyengg
Hey Guys, I am new to Nutch. I am part of a IR research team need to create a setup where in I need to crawl Microsoft's LETOR Dataset with Nutch. After googling for a while, I didn't get any tutorial or help. Could anyone guide me for the same? I am using Nutch 1.4 on Ubuntu 11.10 Eclipse

[jira] [Created] (NUTCH-1296) nutchgora fetcher does not show correct 'threads' and 'resuming' properties

2012-03-02 Thread Ferdy Galema (Created) (JIRA)
nutchgora fetcher does not show correct 'threads' and 'resuming' properties --- Key: NUTCH-1296 URL: https://issues.apache.org/jira/browse/NUTCH-1296 Project: Nutch

[jira] [Closed] (NUTCH-1296) nutchgora fetcher does not show correct 'threads' and 'resuming' properties

2012-03-02 Thread Ferdy Galema (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1296. --- Resolution: Fixed committed nutchgora fetcher does not show correct 'threads' and

Re: Nutch with Letor

2012-03-02 Thread Lewis John Mcgibbney
Hi, Would be great if you could provide some links to the dataset, exactly what it is etc. Thank you On Fri, Mar 2, 2012 at 1:19 PM, varunpandeyengg varunpandeye...@gmail.comwrote: Hey Guys, I am new to Nutch. I am part of a IR research team need to create a setup where in I need to crawl

Re: Nutch with Letor

2012-03-02 Thread Lewis John Mcgibbney
Also please4 hip this discussion to user@ as it seems to be more relevant there. Thanks On Fri, Mar 2, 2012 at 2:13 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi, Would be great if you could provide some links to the dataset, exactly what it is etc. Thank you On Fri,

[jira] [Closed] (NUTCH-1263) FetcherJob must put 'fetchTime' on input

2012-03-02 Thread Ferdy Galema (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1263. --- Resolution: Fixed Fix Version/s: nutchgora This one slipped under the radar. Committed.

[jira] [Closed] (NUTCH-1292) Better exception logging and debugging during fetch.

2012-03-02 Thread Ferdy Galema (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1292. --- Resolution: Fixed committed Better exception logging and debugging during fetch.

[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-03-02 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220984#comment-13220984 ] Ferdy Galema commented on NUTCH-1253: - I'll give this one a go..

[jira] [Updated] (NUTCH-475) Adaptive crawl delay

2012-03-02 Thread Lewis John McGibbney (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-475: --- Attachment: NUTCH-475.patch Updated patch which brings this issue up to speed as of

[jira] [Commented] (NUTCH-1292) Better exception logging and debugging during fetch.

2012-03-02 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221506#comment-13221506 ] Hudson commented on NUTCH-1292: --- Integrated in Nutch-nutchgora #181 (See

[jira] [Commented] (NUTCH-1263) FetcherJob must put 'fetchTime' on input

2012-03-02 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221507#comment-13221507 ] Hudson commented on NUTCH-1263: --- Integrated in Nutch-nutchgora #181 (See

[jira] [Commented] (NUTCH-1296) nutchgora fetcher does not show correct 'threads' and 'resuming' properties

2012-03-02 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221508#comment-13221508 ] Hudson commented on NUTCH-1296: --- Integrated in Nutch-nutchgora #181 (See

[jira] [Commented] (NUTCH-1295) nutchgora restlet dependencies failing when remote repos is down

2012-03-02 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221509#comment-13221509 ] Hudson commented on NUTCH-1295: --- Integrated in Nutch-nutchgora #181 (See