[
https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315647#comment-14315647
]
Lewis John McGibbney commented on NUTCH-1941:
-
Perfect example of where this s
Lewis John McGibbney created NUTCH-1941:
---
Summary: Optional rolling http.agent.name's
Key: NUTCH-1941
URL: https://issues.apache.org/jira/browse/NUTCH-1941
Project: Nutch
Issue Type: Bu
Perfect, that’s what I suggested, thanks guys!
++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-52
Hi Shuo Li,
We were facing a similar issue. Prof. Mattman suggested we look into this
patch for Selenium on Nutch 1.10 :
https://issues.apache.org/jira/browse/NUTCH-1933.
Hope this helps!
Thanks,
Sapna
On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li wrote:
> Yop,
>
> I'm trying to install selenium i
Yop,
I'm trying to install selenium in Nutch 1.10. However, this error pops out:
*error: package org.apache.nutch.storage does not exist*
I can only find this package in Nutch 2.x. Is there a way to use Selenium
in 1.10?
Any advice would be appreciated.
Regards,
Shuo Li
Lewis John McGibbney created NUTCH-1940:
---
Summary: Port HTTP POST Authentication to 2.X
Key: NUTCH-1940
URL: https://issues.apache.org/jira/browse/NUTCH-1940
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1940:
Issue Type: New Feature (was: Bug)
> Port HTTP POST Authentication to 2.X
> ---
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-827:
---
Fix Version/s: (was: 2.4)
> HTTP POST Authentication
>
>
>
Hi,
I am trying to configure Nutch 1.X on eclipse, and configured the build
path to include all jars from the build->lib folder.
There is a class ProxyTestbed.java which has a error in importing the
following package :
import *org.mortbay.proxy.*AsyncProxyServlet; (proxy package not found)
I tri
[
https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315374#comment-14315374
]
lufeng edited comment on NUTCH-1939 at 2/11/15 2:16 AM:
I think th
[
https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315374#comment-14315374
]
lufeng commented on NUTCH-1939:
---
Hi Sebastian
One question. How do you use the FetchItem re
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The "SujenShah" page has been changed by SujenShah:
https://wiki.apache.org/nutch/SujenShah
New page:
##language:en
== Sujen Shah ==
Email: <>
...
CategoryHomepage
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The "ContributorsGroup" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/ContributorsGroup?action=diff&rev1=19&rev2=20
* WayneBurke
* MichaelJoyce
* Ch
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=294&rev2=295
Please contribute your knowledge about Nutch here!
Indeed! You need to make some improvements to the basic URL normalizer or any
custom normalizer so it will properly encode URL's. It will not do it for you.
This is still an open issue.
-Original message-
From: Renxia Wang
Sent: Wednesday 11th February 2015 0:00
To: dev@nutch.apache.org
Hi,
I used the protocol-httpclient to deal with https and I noticed that it
does not handle the special characters, like spaces, [, ], | automatically,
while the protocol-http does. Is there a reason why this plugin doesn't
support this feature? Any improvement can be made to it?
Thanks,
Zhique
[
https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1939:
---
Attachment: NUTCH-1939.patch
Patch was tested: redirects are followed by Fetcher if http.redir
[
https://issues.apache.org/jira/browse/NUTCH-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315071#comment-14315071
]
Sebastian Nagel commented on NUTCH-1735:
Hi [~leoyey], you are absolutely right. T
Sebastian Nagel created NUTCH-1939:
--
Summary: Fetcher fails to follow redirects
Key: NUTCH-1939
URL: https://issues.apache.org/jira/browse/NUTCH-1939
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-1735:
--
Assignee: Sebastian Nagel
> code dedup fetcher queue redirects
> --
[
https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314532#comment-14314532
]
Chris A. Mattmann commented on NUTCH-1323:
--
+1
> AjaxNormalizer
> --
[
https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314461#comment-14314461
]
Lewis John McGibbney commented on NUTCH-1323:
-
[~markus17] I am +1 on this, if
Fantastic Susan.
I'll look forward to your feedback.
Thank you for notice.
Have a great day
Lewis
On Tue, Feb 10, 2015 at 7:25 AM, Susan Fendrock
wrote:
> Hi Lewis,
>
> I went through and accepted all your suggested changes.
>
> As it stands, your blog post is about twice our targeted length of
Hi Jaydeep
you can following command to get statistics for each host when using one
database to crawl multiple repository.
bin/nutch readdb crawldb/crawldb/ -stats -sort
On Mon, Feb 9, 2015 at 12:01 PM, Jaydeep Bagrecha wrote:
> Thanks.
>
> *P.S*
> The question was:-
> *Given M (repo)repositor
24 matches
Mail list logo