[jira] [Commented] (NUTCH-1941) Optional rolling http.agent.name's

2015-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315647#comment-14315647 ] Lewis John McGibbney commented on NUTCH-1941: - Perfect example of where this s

[jira] [Created] (NUTCH-1941) Optional rolling http.agent.name's

2015-02-10 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1941: --- Summary: Optional rolling http.agent.name's Key: NUTCH-1941 URL: https://issues.apache.org/jira/browse/NUTCH-1941 Project: Nutch Issue Type: Bu

Re: Nutch-Selenium in Nutch 1.10

2015-02-10 Thread Mattmann, Chris A (3980)
Perfect, that’s what I suggested, thanks guys! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-52

Re: Nutch-Selenium in Nutch 1.10

2015-02-10 Thread Sapnashri Suresh
Hi Shuo Li, We were facing a similar issue. Prof. Mattman suggested we look into this patch for Selenium on Nutch 1.10 : https://issues.apache.org/jira/browse/NUTCH-1933. Hope this helps! Thanks, Sapna On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li wrote: > Yop, > > I'm trying to install selenium i

Nutch-Selenium in Nutch 1.10

2015-02-10 Thread Shuo Li
Yop, I'm trying to install selenium in Nutch 1.10. However, this error pops out: *error: package org.apache.nutch.storage does not exist* I can only find this package in Nutch 2.x. Is there a way to use Selenium in 1.10? Any advice would be appreciated. Regards, Shuo Li

[jira] [Created] (NUTCH-1940) Port HTTP POST Authentication to 2.X

2015-02-10 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1940: --- Summary: Port HTTP POST Authentication to 2.X Key: NUTCH-1940 URL: https://issues.apache.org/jira/browse/NUTCH-1940 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-1940) Port HTTP POST Authentication to 2.X

2015-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1940: Issue Type: New Feature (was: Bug) > Port HTTP POST Authentication to 2.X > ---

[jira] [Updated] (NUTCH-827) HTTP POST Authentication

2015-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-827: --- Fix Version/s: (was: 2.4) > HTTP POST Authentication > > >

org.mortbay.proxy package not found in nutch 1.x, Ref Class - ProxyTestbed

2015-02-10 Thread Preetam Pradeepkumar Shingavi
Hi, I am trying to configure Nutch 1.X on eclipse, and configured the build path to include all jars from the build->lib folder. There is a class ProxyTestbed.java which has a error in importing the following package : import *org.mortbay.proxy.*AsyncProxyServlet; (proxy package not found) I tri

[jira] [Comment Edited] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315374#comment-14315374 ] lufeng edited comment on NUTCH-1939 at 2/11/15 2:16 AM: I think th

[jira] [Commented] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315374#comment-14315374 ] lufeng commented on NUTCH-1939: --- Hi Sebastian One question. How do you use the FetchItem re

[Nutch Wiki] Update of "SujenShah" by SujenShah

2015-02-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "SujenShah" page has been changed by SujenShah: https://wiki.apache.org/nutch/SujenShah New page: ##language:en == Sujen Shah == Email: <> ... CategoryHomepage

[Nutch Wiki] Trivial Update of "ContributorsGroup" by LewisJohnMcgibbney

2015-02-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ContributorsGroup" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/ContributorsGroup?action=diff&rev1=19&rev2=20 * WayneBurke * MichaelJoyce * Ch

[Nutch Wiki] Update of "FrontPage" by LewisJohnMcgibbney

2015-02-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=294&rev2=295 Please contribute your knowledge about Nutch here!

RE: Why the protocol-httpclient Does Handle URL with Special Characters

2015-02-10 Thread Markus Jelsma
Indeed! You need to make some improvements to the basic URL normalizer or any custom normalizer so it will properly encode URL's. It will not do it for you. This is still an open issue. -Original message- From: Renxia Wang Sent: Wednesday 11th February 2015 0:00 To: dev@nutch.apache.org

Why the protocol-httpclient Does Handle URL with Special Characters

2015-02-10 Thread Renxia Wang
Hi, I used the protocol-httpclient to deal with https and I noticed that it does not handle the special characters, like spaces, [, ], | automatically, while the protocol-http does. Is there a reason why this plugin doesn't support this feature? Any improvement can be made to it? Thanks, Zhique

[jira] [Updated] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1939: --- Attachment: NUTCH-1939.patch Patch was tested: redirects are followed by Fetcher if http.redir

[jira] [Commented] (NUTCH-1735) code dedup fetcher queue redirects

2015-02-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315071#comment-14315071 ] Sebastian Nagel commented on NUTCH-1735: Hi [~leoyey], you are absolutely right. T

[jira] [Created] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1939: -- Summary: Fetcher fails to follow redirects Key: NUTCH-1939 URL: https://issues.apache.org/jira/browse/NUTCH-1939 Project: Nutch Issue Type: Bug

[jira] [Assigned] (NUTCH-1735) code dedup fetcher queue redirects

2015-02-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-1735: -- Assignee: Sebastian Nagel > code dedup fetcher queue redirects > --

[jira] [Commented] (NUTCH-1323) AjaxNormalizer

2015-02-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314532#comment-14314532 ] Chris A. Mattmann commented on NUTCH-1323: -- +1 > AjaxNormalizer > --

[jira] [Commented] (NUTCH-1323) AjaxNormalizer

2015-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314461#comment-14314461 ] Lewis John McGibbney commented on NUTCH-1323: - [~markus17] I am +1 on this, if

Re: Reverse Geocoding for the Masses - Apache Nutch Guest Post - Revised - STF - Invitation to comment

2015-02-10 Thread Lewis John Mcgibbney
Fantastic Susan. I'll look forward to your feedback. Thank you for notice. Have a great day Lewis On Tue, Feb 10, 2015 at 7:25 AM, Susan Fendrock wrote: > Hi Lewis, > > I went through and accepted all your suggested changes. > > As it stands, your blog post is about twice our targeted length of

Re: 572:Crawl statistics for each repository ?

2015-02-10 Thread feng lu
Hi Jaydeep you can following command to get statistics for each host when using one database to crawl multiple repository. bin/nutch readdb crawldb/crawldb/ -stats -sort On Mon, Feb 9, 2015 at 12:01 PM, Jaydeep Bagrecha wrote: > Thanks. > > *P.S* > The question was:- > *Given M (repo)repositor