[
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089405#comment-13089405
]
Andrzej Bialecki commented on NUTCH-1087:
--
IIRC we had this discussion in the
[
https://issues.apache.org/jira/browse/NUTCH-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067972#comment-13067972
]
Andrzej Bialecki commented on NUTCH-1014:
--
java.util.regex has the advantage of
[
https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034724#comment-13034724
]
Andrzej Bialecki commented on NUTCH-985:
-
We should use the Solr's DateUtil in all
[
https://issues.apache.org/jira/browse/NUTCH-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-951.
-
Resolution: Fixed
Backport changes from 2.0 into 1.3
--
[
https://issues.apache.org/jira/browse/NUTCH-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004488#comment-13004488
]
Andrzej Bialecki commented on NUTCH-951:
-
* Ported NUTCH-872 in rev. 1079746.
*
[
https://issues.apache.org/jira/browse/NUTCH-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-962.
-
Resolution: Fixed
Fix Version/s: 2.0
1.3
Assignee:
[
https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-955.
-
Resolution: Fixed
Fix Version/s: 2.0
Assignee: Andrzej Bialecki
Ivy
[
https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-939.
-
Resolution: Fixed
Assignee: Andrzej Bialecki
I modified the patch slightly to
Remove Lucene dependencies
--
Key: NUTCH-948
URL: https://issues.apache.org/jira/browse/NUTCH-948
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.3
Reporter: Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-948.
-
Resolution: Fixed
Committed in rev. 1051509.
Remove Lucene dependencies
[
https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973915#action_12973915
]
Andrzej Bialecki commented on NUTCH-939:
-
1.2 release is out, and branch-1.2 is
[
https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12936047#action_12936047
]
Andrzej Bialecki commented on NUTCH-939:
-
Please note that trunk uses a very
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: NUTCH-932-4.patch
Final version of the patch.
Bulk REST API to retrieve crawl
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-932.
-
Resolution: Fixed
Fix Version/s: 2.0
Committed in rev. 1039014.
Bulk REST API to
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: NUTCH-932-3.patch
NutchTool is an abstract class in this patch. This actually
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928909#action_12928909
]
Andrzej Bialecki commented on NUTCH-880:
-
Thanks - this issue is already fixed in
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: NUTCH-932.patch
This patch adds bulk retrieval of crawl results. This is still
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: db.formatted.gz
Example DB content (this was passed through a JSON
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928355#action_12928355
]
Andrzej Bialecki commented on NUTCH-932:
-
Examples (with the db equivalent to the
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: NUTCH-932.patch
Updated patch - this recognizes now URL parameters such as
[
https://issues.apache.org/jira/browse/NUTCH-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-931.
-
Resolution: Fixed
Committed in rev. 1028736 with some changes.
Simple admin API to
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-880:
Summary: REST API for Nutch (was: REST API (and webapp) for Nutch)
The webapp part is
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-880.
-
Resolution: Fixed
Fix Version/s: 2.0
Committed in rev. 1028235. The webapp part of
Remove remaining dependencies on Lucene API
---
Key: NUTCH-930
URL: https://issues.apache.org/jira/browse/NUTCH-930
Project: Nutch
Issue Type: Improvement
Affects Versions: 2.0
[
https://issues.apache.org/jira/browse/NUTCH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-930:
Attachment: NUTCH-930.patch
Patch to fix the issue. I'll commit this shortly.
Remove
[
https://issues.apache.org/jira/browse/NUTCH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-930.
-
Resolution: Fixed
Fix Version/s: 2.0
Committed in rev. 1028474.
Remove remaining
Simple admin API to fetch status and stop the service
-
Key: NUTCH-931
URL: https://issues.apache.org/jira/browse/NUTCH-931
Project: Nutch
Issue Type: Improvement
Components:
[
https://issues.apache.org/jira/browse/NUTCH-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925543#action_12925543
]
Andrzej Bialecki commented on NUTCH-926:
-
bq. Nutch continues to crawl the WRONG
[
https://issues.apache.org/jira/browse/NUTCH-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924659#action_12924659
]
Andrzej Bialecki commented on NUTCH-913:
-
+1, let's commit it - I want to start
[
https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924154#action_12924154
]
Andrzej Bialecki commented on NUTCH-923:
-
This doesn't solve the problem of
[
https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923845#action_12923845
]
Andrzej Bialecki commented on NUTCH-924:
-
The functionality is useful, +1. But the
[
https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923896#action_12923896
]
Andrzej Bialecki commented on NUTCH-923:
-
This sounds useful, though the
[
https://issues.apache.org/jira/browse/NUTCH-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-921:
Attachment: NUTCH-921.patch
Patch that implements reading config parameters from
[
https://issues.apache.org/jira/browse/NUTCH-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920610#action_12920610
]
Andrzej Bialecki commented on NUTCH-913:
-
There are formatting issues in
[
https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916870#action_12916870
]
Andrzej Bialecki commented on NUTCH-907:
-
Hi Sertan,
Thanks for the patch, this
[
https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916874#action_12916874
]
Andrzej Bialecki commented on NUTCH-882:
-
Doğacan, I missed your previous
[
https://issues.apache.org/jira/browse/NUTCH-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916912#action_12916912
]
Andrzej Bialecki commented on NUTCH-864:
-
I think the difficulty comes from the
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913118#action_12913118
]
Andrzej Bialecki commented on NUTCH-880:
-
bq. I think we can combine the approach
[
https://issues.apache.org/jira/browse/NUTCH-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912474#action_12912474
]
Andrzej Bialecki commented on NUTCH-909:
-
bq. It might be better to see the message
[
https://issues.apache.org/jira/browse/NUTCH-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki reassigned NUTCH-862:
---
Assignee: Andrzej Bialecki
HttpClient null pointer exception
[
https://issues.apache.org/jira/browse/NUTCH-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-906.
-
Fix Version/s: 1.2
Resolution: Fixed
Fixed in rev. 998261. Thanks!
Nutch
[
https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910109#action_12910109
]
Andrzej Bialecki commented on NUTCH-907:
-
That's very good news - in that case I'm
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-880:
Attachment: API.patch
Initial patch for discussion. This is a work in progress, so only
DataStore API doesn't support multiple storage areas for multiple disjoint
crawls
-
Key: NUTCH-907
URL: https://issues.apache.org/jira/browse/NUTCH-907
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909757#action_12909757
]
Andrzej Bialecki commented on NUTCH-882:
-
+1 to NutchContext. See also NUTCH-907
[
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908791#action_12908791
]
Andrzej Bialecki commented on NUTCH-893:
-
+1 and +1.
DataStore.put() silently
[
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907297#action_12907297
]
Andrzej Bialecki commented on NUTCH-893:
-
Very good catch - yes, the test now
[
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904226#action_12904226
]
Andrzej Bialecki commented on NUTCH-893:
-
Dogacan, flush() doesn't help - there are
DataStore.put() silently loses records when executed from multiple processes
Key: NUTCH-893
URL: https://issues.apache.org/jira/browse/NUTCH-893
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-893:
Attachment: NUTCH-893.patch
Unit test to illustrate the issue.
DataStore.put() silently
Nutch build should not depend on unversioned local deps
---
Key: NUTCH-891
URL: https://issues.apache.org/jira/browse/NUTCH-891
Project: Nutch
Issue Type: Bug
Reporter: Andrzej
[
https://issues.apache.org/jira/browse/NUTCH-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900455#action_12900455
]
Andrzej Bialecki commented on NUTCH-891:
-
Yes, this would help.
Nutch build
[
https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899810#action_12899810
]
Andrzej Bialecki commented on NUTCH-882:
-
This functionality is very useful for
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-880:
Description:
This issue is for discussing a REST-style API for accessing Nutch.
Here's an
FetcherJob should run more reduce tasks than default
Key: NUTCH-884
URL: https://issues.apache.org/jira/browse/NUTCH-884
Project: Nutch
Issue Type: Improvement
Components:
[
https://issues.apache.org/jira/browse/NUTCH-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-872.
-
Fix Version/s: 2.0
Resolution: Fixed
I changed the name of the option to -parse to
[
https://issues.apache.org/jira/browse/NUTCH-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-884:
Attachment: NUTCH-884.patch
Patch with the change. I also rearranged the arguments to
URL-s getting lost
--
Key: NUTCH-879
URL: https://issues.apache.org/jira/browse/NUTCH-879
Project: Nutch
Issue Type: Bug
Affects Versions: 2.0
Environment: * Ubuntu 10.4 x64, Sun JDK 1.6
* using 1-node Hadoop +
[
https://issues.apache.org/jira/browse/NUTCH-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-876:
Attachment: NUTCH-876.patch
Patch to fix the issue. If there are no objections I'll commit
[
https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895377#action_12895377
]
Andrzej Bialecki commented on NUTCH-858:
-
It was r960064, but I have to admit I
[
https://issues.apache.org/jira/browse/NUTCH-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-867:
Attachment: benchmark.patch
Ported benchmark that uses HSQLDB as the store impl. If there
Port Nutch benchmark to Nutchbase
-
Key: NUTCH-867
URL: https://issues.apache.org/jira/browse/NUTCH-867
Project: Nutch
Issue Type: New Feature
Affects Versions: nutchbase
Reporter: Andrzej
[
https://issues.apache.org/jira/browse/NUTCH-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-863.
-
Fix Version/s: 2.0
Resolution: Fixed
Committed in rev. 980932.
Benchmark and a
[
https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-858:
Assignee: Andrzej Bialecki
Fix Version/s: 1.2
No longer able to set per-field
[
https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890873#action_12890873
]
Andrzej Bialecki commented on NUTCH-858:
-
Unfortunately no. The patch was included
[
https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-844:
Attachment: NUTCH-844.patch
Updated patch. This also addresses an issue in PluginRepository
[
https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-844.
-
Resolution: Fixed
Committed in r964063. Thanks for review!
Improve NutchConfiguration
[
https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-844:
Attachment: conf.patch
Improve NutchConfiguration
--
[
https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886318#action_12886318
]
Andrzej Bialecki commented on NUTCH-843:
-
runtime/local doesn't need Hadoop
[
https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886330#action_12886330
]
Andrzej Bialecki commented on NUTCH-843:
-
Pseudo-distributed (i.e. a real
[
https://issues.apache.org/jira/browse/NUTCH-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-845.
-
Fix Version/s: 2.0
Resolution: Fixed
Committed in rev. 961778. Thanks for review!
Separate the build and runtime environments
---
Key: NUTCH-843
URL: https://issues.apache.org/jira/browse/NUTCH-843
Project: Nutch
Issue Type: Improvement
Components: build
Affects
[
https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-843:
Attachment: NUTCH-843.patch
This patch moves bin/nutch to src/bin/nutch, and creates
[
https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886015#action_12886015
]
Andrzej Bialecki commented on NUTCH-843:
-
We need to create the job file anyway.
[
https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-843:
Attachment: NUTCH-843.patch
Updated patch that moves nutch.jar to lib/ for the local
[
https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885583#action_12885583
]
Andrzej Bialecki commented on NUTCH-821:
-
+1 for this patch for now - all good
[
https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885188#action_12885188
]
Andrzej Bialecki commented on NUTCH-821:
-
I think this patch refers to some parts
[
https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-696:
Attachment: timeout.patch
A simple patch that implements the strategy outlined here
[
https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885257#action_12885257
]
Andrzej Bialecki commented on NUTCH-696:
-
Yes - this patch is a quick solution that
[
https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki reopened NUTCH-696:
-
This may be useful after all - let's gather more comments.
Timeout for Parser
[
https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885295#action_12885295
]
Andrzej Bialecki commented on NUTCH-696:
-
I agree, ultimately that's the way to go.
[
https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-837:
Attachment: NUTCH-837.patch
Updated patch against r959954 (after NUTCH-836).
Remove
[
https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-837:
Attachment: (was: NUTCH-837.patch)
Remove search servers and Lucene dependencies
[
https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884729#action_12884729
]
Andrzej Bialecki commented on NUTCH-837:
-
bq. So, I think we should still have a
[
https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-837.
-
Resolution: Fixed
Committed in r960064. Thanks for review!
Remove search servers and
[
https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki reassigned NUTCH-837:
---
Assignee: Andrzej Bialecki
Remove search servers and Lucene dependencies
[
https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883559#action_12883559
]
Andrzej Bialecki commented on NUTCH-650:
-
So far as one can digest such a giant
87 matches
Mail list logo