[
https://issues.apache.org/jira/browse/NUTCH-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-2009.
-
Resolution: Duplicate
These MongoDB issues have been resolved in Gora 0.6.1 and
[
https://issues.apache.org/jira/browse/NUTCH-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-2080.
-
Resolution: Invalid
This has to do with ivy/ivy.xml configuration and should be
Lewis John McGibbney created NUTCH-2101:
---
Summary: Upgrade Nutch 2.X to Hadoop 2.4.0
Key: NUTCH-2101
URL: https://issues.apache.org/jira/browse/NUTCH-2101
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746866#comment-14746866
]
Hudson commented on NUTCH-1679:
---
SUCCESS: Integrated in Nutch-nutchgora #1535 (See
[
https://issues.apache.org/jira/browse/NUTCH-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-2029.
-
Resolution: Fixed
This issue has been resolved as it was fixed over in GORA-423.
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746807#comment-14746807
]
Lewis John McGibbney commented on NUTCH-1679:
-
I've tested this with Nutch 2.X HEAD, Gora 0.5
[
https://issues.apache.org/jira/browse/NUTCH-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-1922.
-
Resolution: Duplicate
This issue is a clone of NUTCH-1679 for which I just
[
https://issues.apache.org/jira/browse/NUTCH-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1572:
Fix Version/s: (was: 2.4)
2.3.1
> Nutch 2.x should use
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1679:
Attachment: NUTCH-1679_4.patch
Patch which sorts out some trivial formatting and
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-1679.
-
Resolution: Fixed
Committed @revision 1703331 in 2.X HEAD
> UpdateDb using
[
https://issues.apache.org/jira/browse/NUTCH-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney reassigned NUTCH-1572:
---
Assignee: Lewis John McGibbney
> Nutch 2.x should use
[
https://issues.apache.org/jira/browse/NUTCH-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kim Whitehall closed NUTCH-2100.
Resolution: Invalid
The command was used incorrectly. There is no bug.
> Nutch dump command
Hi Everyone,
I would like to thank the members of the Apache Nutch PMC for bringing me
on board and giving me the opportunity to become a member and committer.
I am a Graduate student at the University of Southern California, majoring
in Computer Science. I have been working with Chris Mattmann
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744364#comment-14744364
]
Lewis John McGibbney edited comment on NUTCH-2097 at 9/15/15 6:51 AM:
--
[
https://issues.apache.org/jira/browse/NUTCH-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2093.
--
Resolution: Fixed
Assignee: Markus Jelsma
Committed to trunk in revision 1703111.
>
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744983#comment-14744983
]
Lewis John McGibbney commented on NUTCH-2097:
-
Hi [~markus17] thanks for initial comments. I
[
https://issues.apache.org/jira/browse/NUTCH-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-2094.
-
Resolution: Not A Problem
This issue is already resolved in 2.X branch
[
https://issues.apache.org/jira/browse/NUTCH-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744959#comment-14744959
]
Markus Jelsma commented on NUTCH-2064:
--
I think having it in CC makes sense indeed. I shall commit
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744953#comment-14744953
]
Markus Jelsma commented on NUTCH-2097:
--
Interesting! What does 'Complete Ant + Ivy build system
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744953#comment-14744953
]
Markus Jelsma edited comment on NUTCH-2097 at 9/15/15 6:50 AM:
---
Interesting!
Github user prernasatija closed the pull request at:
https://github.com/apache/nutch/pull/57
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The "AdvancedAjaxInteraction" page has been changed by MichaelJoyce:
https://wiki.apache.org/nutch/AdvancedAjaxInteraction?action=diff=4=5
Comment:
Updates regarding available selenium
Github user jnioche commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39492460
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Attachment: NUTCH-1932.patch
Eeh, patch with the scoring filter itself. Apparently it is possible
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745061#comment-14745061
]
Sebastian Nagel commented on NUTCH-2097:
Yes, looks promising.
- maven could simplify the build,
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Attachment: NUTCH-1932.patch
New and much simpler patch. This relies on a scoring filter to mark
Github user jnioche commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39492479
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import
[
https://issues.apache.org/jira/browse/NUTCH-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745663#comment-14745663
]
ASF GitHub Bot commented on NUTCH-2099:
---
GitHub user sujen1412 opened a pull request:
Sujen Shah created NUTCH-2099:
-
Summary: Refactoring the REST endpoints for integration with webui
Key: NUTCH-2099
URL: https://issues.apache.org/jira/browse/NUTCH-2099
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745313#comment-14745313
]
Nadeem Douba commented on NUTCH-2097:
-
I'm not entirely married to the package structure to be honest.
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745330#comment-14745330
]
Nadeem Douba commented on NUTCH-2097:
-
Re: maven migration
Would building each tool into a separate
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Attachment: NUTCH-1932.patch
> Automatically remove orphaned pages
>
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745330#comment-14745330
]
Nadeem Douba edited comment on NUTCH-2097 at 9/15/15 12:23 PM:
---
Re: maven
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745330#comment-14745330
]
Nadeem Douba edited comment on NUTCH-2097 at 9/15/15 12:22 PM:
---
Re: maven
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Attachment: NUTCH-1932.patch
First proper working patch. Tests pass
> Automatically remove
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Description: Orphan scoring filter that determines whether a page has
become orphaned, e.g. it
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745322#comment-14745322
]
Markus Jelsma commented on NUTCH-2097:
--
Yes, having them as separate mapper and reducer class files,
[
https://issues.apache.org/jira/browse/NUTCH-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745755#comment-14745755
]
Hudson commented on NUTCH-2093:
---
SUCCESS: Integrated in Nutch-trunk #3271 (See
GitHub user sujen1412 opened a pull request:
https://github.com/apache/nutch/pull/59
Fix for NUTCH-2099 Contributed by Sujen Shah
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sujen1412/nutch NUTCH-2099
Alternatively you can
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1932:
---
Attachment: NUTCH-1932-add.patch
> Automatically remove orphaned pages
>
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746004#comment-14746004
]
Sebastian Nagel commented on NUTCH-1932:
Hi Markus, understood.
- didn't we have the problem
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746034#comment-14746034
]
Markus Jelsma commented on NUTCH-1932:
--
Hello Sebastian. I am not sure about that being on the list.
Kim Whitehall created NUTCH-2100:
Summary: Nutch dump command doesnt dump anything
Key: NUTCH-2100
URL: https://issues.apache.org/jira/browse/NUTCH-2100
Project: Nutch
Issue Type: Bug
Dear all,
on behalf of the Nutch PMC it is my pleasure to announce
that Sujen Shah has been voted in as committer and member
of the Nutch PMC. Sujen, would you mind to introduce
yourself to the Nutch community and tell in just a few
words about your interests and your plans regarding Nutch?
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746062#comment-14746062
]
Sebastian Nagel commented on NUTCH-1932:
Correct, it was about 404 pages not about duplicates, see
[
https://issues.apache.org/jira/browse/NUTCH-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann reassigned NUTCH-2100:
Assignee: Chris A. Mattmann
> Nutch dump command doesnt dump anything
>
[
https://issues.apache.org/jira/browse/NUTCH-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746141#comment-14746141
]
Chris A. Mattmann commented on NUTCH-2100:
--
Kim I think that the directory expects a path to the
[
https://issues.apache.org/jira/browse/NUTCH-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746210#comment-14746210
]
Kim Whitehall commented on NUTCH-2100:
--
LOL! how dumb of me! yeap, it works. Of all the things ...
Do
[
https://issues.apache.org/jira/browse/NUTCH-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aron Ahmadia updated NUTCH-2098:
Attachment: 0001-Default-SeedURL-constructor.patch
> Add null SeedUrl constructor
>
Aron Ahmadia created NUTCH-2098:
---
Summary: Add null SeedUrl constructor
Key: NUTCH-2098
URL: https://issues.apache.org/jira/browse/NUTCH-2098
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745205#comment-14745205
]
Sebastian Nagel commented on NUTCH-1932:
Hi Markus, that looks quite simple
- do we still need a
Github user jorgelbg commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39509063
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import
Github user jorgelbg commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39509273
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import
Github user jnioche commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39509421
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import
54 matches
Mail list logo