My main concerns with the Nutch2Tutorial was that it didn't stand by
itself. As a newcomer to nutch I treated the NutchTutorial (for 1.x) with
suspicion because I didn't know what is relevant for Nutch 2 and what isn't.
And the Nutch2Tutorial tutorial alone is not enough to get you going.
I think
Hi Lewis,
On Tue, Jan 21, 2014 at 9:03 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi d_k,
On Tue, Jan 21, 2014 at 11:20 AM, dev-digest-h...@nutch.apache.orgwrote:
I'm working on porting NUTCH-1622 to Nutch 2
Excellent
and the path I took was to add a MapWritable field
[
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879847#comment-13879847
]
Sebastian Nagel commented on NUTCH-1253:
+1 tested with a collection of
[
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879853#comment-13879853
]
Lewis John McGibbney commented on NUTCH-1253:
-
I'll post the patches today
[
https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sertac TURKEL updated NUTCH-1164:
-
Attachment: NUTCH-1164.patch
Hi [~tejas.patil], I updated the patchfile, I think, it's ok. Could
[
https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sertac TURKEL updated NUTCH-1164:
-
Attachment: (was: NUTCH-1158.patch)
Write JUnit tests for protocol-http
[
https://issues.apache.org/jira/browse/NUTCH-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879881#comment-13879881
]
Alparslan Avcı commented on NUTCH-1709:
---
+1 on this issue. The Avro generated
[
https://issues.apache.org/jira/browse/NUTCH-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879883#comment-13879883
]
Lewis John McGibbney commented on NUTCH-1709:
-
I will probably submit a patch
Markus Jelsma created NUTCH-1711:
Summary: Normalizer does not encode exclamation mark
Key: NUTCH-1711
URL: https://issues.apache.org/jira/browse/NUTCH-1711
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879914#comment-13879914
]
Markus Jelsma commented on NUTCH-1711:
--
Well, perhaps it is best to stick with the
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879955#comment-13879955
]
Lewis John McGibbney edited comment on NUTCH-1465 at 1/23/14 2:38 PM:
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879955#comment-13879955
]
Lewis John McGibbney commented on NUTCH-1465:
-
Hey [~tejasp]. Again, great
Tejas Patil created NUTCH-1712:
--
Summary: Use MultipleInputs in Injector to make it a single
mapreduce job
Key: NUTCH-1712
URL: https://issues.apache.org/jira/browse/NUTCH-1712
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1712:
---
Description:
Currently Injector creates two mapreduce jobs:
1. sort job: get the urls from seeds
[
https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1712:
---
Attachment: NUTCH-1712-trunk.v1.patch
Use MultipleInputs in Injector to make it a single mapreduce
[
https://issues.apache.org/jira/browse/NUTCH-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1713:
Attachment: NUTCH-1713-trunk.patch
Patch contributed by [~wal]. I forgot to open a
[
https://issues.apache.org/jira/browse/NUTCH-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1713:
Fix Version/s: 1.8
2.3
IpAddressResolver and DNSCache
[
https://issues.apache.org/jira/browse/NUTCH-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879981#comment-13879981
]
Lewis John McGibbney commented on NUTCH-1660:
-
[~icebergx5] and [~talat] we
Lewis John McGibbney created NUTCH-1713:
---
Summary: IpAddressResolver and DNSCache
Key: NUTCH-1713
URL: https://issues.apache.org/jira/browse/NUTCH-1713
Project: Nutch
Issue Type: New
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alparslan Avcı updated NUTCH-1714:
--
Attachment: NUTCH-1714.patch
I've uploaded a patch that makes Nutch 2.x suitable to use
[
https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880026#comment-13880026
]
Markus Jelsma commented on NUTCH-1113:
--
I have tried running long sequences with
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880008#comment-13880008
]
Alparslan Avcı edited comment on NUTCH-1714 at 1/23/14 4:18 PM:
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880008#comment-13880008
]
Alparslan Avcı edited comment on NUTCH-1714 at 1/23/14 4:17 PM:
[
https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880007#comment-13880007
]
Sebastian Nagel commented on NUTCH-1113:
Great! I'll try to verify it within the
On Thu, Jan 23, 2014 at 1:36 PM, d_k mail...@gmail.com wrote:
My main concerns with the Nutch2Tutorial was that it didn't stand by
itself. As a newcomer to nutch I treated the NutchTutorial (for 1.x) with
suspicion because I didn't know what is relevant for Nutch 2 and what isn't.
And the
[
https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil resolved NUTCH-1164.
Resolution: Fixed
The patch is better now and all tests pass. It needed little modification: you
[
https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880288#comment-13880288
]
Tejas Patil commented on NUTCH-1712:
The performance gains due to this patch won't be
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1465:
---
Fix Version/s: (was: 1.9)
1.8
Support sitemaps in Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880295#comment-13880295
]
Tejas Patil commented on NUTCH-1465:
Hi [~lewismc],
+1 for the first two suggestions.
Correction: the subject of this message should have read:
Right way to run crawl script in deploy mode
~tejas
On Wed, Jan 22, 2014 at 7:56 PM, Tejas Patil tejas.patil...@gmail.comwrote:
Hi nutch-dev,
I was assuming that the commands to run the bin/crawl script in both local
and deploy mode
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880305#comment-13880305
]
Lewis John McGibbney commented on NUTCH-1465:
-
hey [~tejasp] no probs. RE: #3,
What I was missing when first started with Nutch, and one can claim that a
little research would of solved it, was how to configure nutch-site.xml,
when looking at the NutchTutorial you can't be sure what applies to Nutch
2.x and what doesn't without prior knowledge that the nutch-site.xml is the
[
https://issues.apache.org/jira/browse/NUTCH-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880356#comment-13880356
]
Lewis John McGibbney commented on NUTCH-1645:
-
hey [~msertacturkel] thank you
[
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1253:
Attachment: TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt
Actually, I
[
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880362#comment-13880362
]
Lewis John McGibbney edited comment on NUTCH-1253 at 1/23/14 9:10 PM:
[
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1253:
---
Attachment: nutch1253test.html
nutch1253parsed.html
It's likely a regression
[
https://issues.apache.org/jira/browse/NUTCH-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1677:
Patch Info: Patch Available
ORIGINAL_CHAR_ENCODING and
Hi d_k,
On Thu, Jan 23, 2014 at 11:06 AM, dev-digest-h...@nutch.apache.org wrote:
I attached the patch. If you think its ready I can add it to JIRA.
Yes please open an issue and we can take the conversation there. dev@ is
quite busy these days and some mail gets lost in the digest emails I
[
https://issues.apache.org/jira/browse/NUTCH-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880529#comment-13880529
]
Lewis John McGibbney commented on NUTCH-1677:
-
hi [~ilhamikalkan], thank you
[
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Kugel updated NUTCH-1622:
Attachment: NUTCH-1622-2.x.patch
A patch for Nutch 2.x was added.
Create Outlinks with metadata
40 matches
Mail list logo