[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

2011-08-23 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089405#comment-13089405 ] Andrzej Bialecki commented on NUTCH-1087: -- IIRC we had this discussion in the

[jira] [Commented] (NUTCH-1014) Migrate from Apache ORO to java.util.regex

2011-07-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067972#comment-13067972 ] Andrzej Bialecki commented on NUTCH-1014: -- java.util.regex has the advantage of

[jira] [Commented] (NUTCH-985) MoreIndexingFilter doesn't use properly formatted date fields for Solr

2011-05-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034724#comment-13034724 ] Andrzej Bialecki commented on NUTCH-985: - We should use the Solr's DateUtil in all

[jira] Resolved: (NUTCH-951) Backport changes from 2.0 into 1.3

2011-03-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-951. - Resolution: Fixed Backport changes from 2.0 into 1.3 --

[jira] Commented: (NUTCH-951) Backport changes from 2.0 into 1.3

2011-03-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004488#comment-13004488 ] Andrzej Bialecki commented on NUTCH-951: - * Ported NUTCH-872 in rev. 1079746. *

[jira] Resolved: (NUTCH-962) max. redirects not handled correctly: fetcher stops at max-1 redirects

2011-03-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-962. - Resolution: Fixed Fix Version/s: 2.0 1.3 Assignee:

[jira] Resolved: (NUTCH-955) Ivy configuration

2011-03-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-955. - Resolution: Fixed Fix Version/s: 2.0 Assignee: Andrzej Bialecki Ivy

[jira] Resolved: (NUTCH-939) Added -dir command line option to Indexer and SolrIndexer, allowing to specify directory containing segments

2010-12-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-939. - Resolution: Fixed Assignee: Andrzej Bialecki I modified the patch slightly to

[jira] Created: (NUTCH-948) Remove Lucene dependencies

2010-12-21 Thread Andrzej Bialecki (JIRA)
Remove Lucene dependencies -- Key: NUTCH-948 URL: https://issues.apache.org/jira/browse/NUTCH-948 Project: Nutch Issue Type: Improvement Affects Versions: 1.3 Reporter: Andrzej Bialecki

[jira] Resolved: (NUTCH-948) Remove Lucene dependencies

2010-12-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-948. - Resolution: Fixed Committed in rev. 1051509. Remove Lucene dependencies

[jira] Commented: (NUTCH-939) Added -dir command line option to Indexer and SolrIndexer, allowing to specify directory containing segments

2010-12-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973915#action_12973915 ] Andrzej Bialecki commented on NUTCH-939: - 1.2 release is out, and branch-1.2 is

[jira] Commented: (NUTCH-939) Added -dir command line option to Indexer and SolrIndexer, allowing to specify directory containing segments

2010-11-26 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12936047#action_12936047 ] Andrzej Bialecki commented on NUTCH-939: - Please note that trunk uses a very

[jira] Updated: (NUTCH-932) Bulk REST API to retrieve crawl results as JSON

2010-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-932: Attachment: NUTCH-932-4.patch Final version of the patch. Bulk REST API to retrieve crawl

[jira] Resolved: (NUTCH-932) Bulk REST API to retrieve crawl results as JSON

2010-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-932. - Resolution: Fixed Fix Version/s: 2.0 Committed in rev. 1039014. Bulk REST API to

[jira] Updated: (NUTCH-932) Bulk REST API to retrieve crawl results as JSON

2010-11-12 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-932: Attachment: NUTCH-932-3.patch NutchTool is an abstract class in this patch. This actually

[jira] Commented: (NUTCH-880) REST API for Nutch

2010-11-05 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928909#action_12928909 ] Andrzej Bialecki commented on NUTCH-880: - Thanks - this issue is already fixed in

[jira] Updated: (NUTCH-932) Bulk REST API to retrieve crawl results as JSON

2010-11-04 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-932: Attachment: NUTCH-932.patch This patch adds bulk retrieval of crawl results. This is still

[jira] Updated: (NUTCH-932) Bulk REST API to retrieve crawl results as JSON

2010-11-04 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-932: Attachment: db.formatted.gz Example DB content (this was passed through a JSON

[jira] Commented: (NUTCH-932) Bulk REST API to retrieve crawl results as JSON

2010-11-04 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928355#action_12928355 ] Andrzej Bialecki commented on NUTCH-932: - Examples (with the db equivalent to the

[jira] Updated: (NUTCH-932) Bulk REST API to retrieve crawl results as JSON

2010-11-04 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-932: Attachment: NUTCH-932.patch Updated patch - this recognizes now URL parameters such as

[jira] Resolved: (NUTCH-931) Simple admin API to fetch status and stop the service

2010-10-29 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-931. - Resolution: Fixed Committed in rev. 1028736 with some changes. Simple admin API to

[jira] Updated: (NUTCH-880) REST API for Nutch

2010-10-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-880: Summary: REST API for Nutch (was: REST API (and webapp) for Nutch) The webapp part is

[jira] Resolved: (NUTCH-880) REST API for Nutch

2010-10-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-880. - Resolution: Fixed Fix Version/s: 2.0 Committed in rev. 1028235. The webapp part of

[jira] Created: (NUTCH-930) Remove remaining dependencies on Lucene API

2010-10-28 Thread Andrzej Bialecki (JIRA)
Remove remaining dependencies on Lucene API --- Key: NUTCH-930 URL: https://issues.apache.org/jira/browse/NUTCH-930 Project: Nutch Issue Type: Improvement Affects Versions: 2.0

[jira] Updated: (NUTCH-930) Remove remaining dependencies on Lucene API

2010-10-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-930: Attachment: NUTCH-930.patch Patch to fix the issue. I'll commit this shortly. Remove

[jira] Resolved: (NUTCH-930) Remove remaining dependencies on Lucene API

2010-10-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-930. - Resolution: Fixed Fix Version/s: 2.0 Committed in rev. 1028474. Remove remaining

[jira] Created: (NUTCH-931) Simple admin API to fetch status and stop the service

2010-10-28 Thread Andrzej Bialecki (JIRA)
Simple admin API to fetch status and stop the service - Key: NUTCH-931 URL: https://issues.apache.org/jira/browse/NUTCH-931 Project: Nutch Issue Type: Improvement Components:

[jira] Commented: (NUTCH-926) Nutch follows wrong url in META http-equiv=refresh tag

2010-10-27 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925543#action_12925543 ] Andrzej Bialecki commented on NUTCH-926: - bq. Nutch continues to crawl the WRONG

[jira] Commented: (NUTCH-913) Nutch should use new namespace for Gora

2010-10-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924659#action_12924659 ] Andrzej Bialecki commented on NUTCH-913: - +1, let's commit it - I want to start

[jira] Commented: (NUTCH-923) Multilingual support for Solr-index-mapping

2010-10-23 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924154#action_12924154 ] Andrzej Bialecki commented on NUTCH-923: - This doesn't solve the problem of

[jira] Commented: (NUTCH-924) Static field in solr mapping

2010-10-22 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923845#action_12923845 ] Andrzej Bialecki commented on NUTCH-924: - The functionality is useful, +1. But the

[jira] Commented: (NUTCH-923) Multilingual support for Solr-index-mapping

2010-10-22 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923896#action_12923896 ] Andrzej Bialecki commented on NUTCH-923: - This sounds useful, though the

[jira] Updated: (NUTCH-921) Reduce dependency of Nutch on config files

2010-10-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-921: Attachment: NUTCH-921.patch Patch that implements reading config parameters from

[jira] Commented: (NUTCH-913) Nutch should use new namespace for Gora

2010-10-13 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920610#action_12920610 ] Andrzej Bialecki commented on NUTCH-913: - There are formatting issues in

[jira] Commented: (NUTCH-907) DataStore API doesn't support multiple storage areas for multiple disjoint crawls

2010-10-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916870#action_12916870 ] Andrzej Bialecki commented on NUTCH-907: - Hi Sertan, Thanks for the patch, this

[jira] Commented: (NUTCH-882) Design a Host table in GORA

2010-10-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916874#action_12916874 ] Andrzej Bialecki commented on NUTCH-882: - Doğacan, I missed your previous

[jira] Commented: (NUTCH-864) Fetcher generates entries with status 0

2010-10-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916912#action_12916912 ] Andrzej Bialecki commented on NUTCH-864: - I think the difficulty comes from the

[jira] Commented: (NUTCH-880) REST API (and webapp) for Nutch

2010-09-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913118#action_12913118 ] Andrzej Bialecki commented on NUTCH-880: - bq. I think we can combine the approach

[jira] Commented: (NUTCH-909) Add alternative search-provider to Nutch site

2010-09-20 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912474#action_12912474 ] Andrzej Bialecki commented on NUTCH-909: - bq. It might be better to see the message

[jira] Assigned: (NUTCH-862) HttpClient null pointer exception

2010-09-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reassigned NUTCH-862: --- Assignee: Andrzej Bialecki HttpClient null pointer exception

[jira] Resolved: (NUTCH-906) Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names not being valid XML tag names

2010-09-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-906. - Fix Version/s: 1.2 Resolution: Fixed Fixed in rev. 998261. Thanks! Nutch

[jira] Commented: (NUTCH-907) DataStore API doesn't support multiple storage areas for multiple disjoint crawls

2010-09-16 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910109#action_12910109 ] Andrzej Bialecki commented on NUTCH-907: - That's very good news - in that case I'm

[jira] Updated: (NUTCH-880) REST API (and webapp) for Nutch

2010-09-16 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-880: Attachment: API.patch Initial patch for discussion. This is a work in progress, so only

[jira] Created: (NUTCH-907) DataStore API doesn't support multiple storage areas for multiple disjoint crawls

2010-09-15 Thread Andrzej Bialecki (JIRA)
DataStore API doesn't support multiple storage areas for multiple disjoint crawls - Key: NUTCH-907 URL: https://issues.apache.org/jira/browse/NUTCH-907 Project: Nutch

[jira] Commented: (NUTCH-882) Design a Host table in GORA

2010-09-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909757#action_12909757 ] Andrzej Bialecki commented on NUTCH-882: - +1 to NutchContext. See also NUTCH-907

[jira] Commented: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-09-13 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908791#action_12908791 ] Andrzej Bialecki commented on NUTCH-893: - +1 and +1. DataStore.put() silently

[jira] Commented: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-09-08 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907297#action_12907297 ] Andrzej Bialecki commented on NUTCH-893: - Very good catch - yes, the test now

[jira] Commented: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-08-30 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904226#action_12904226 ] Andrzej Bialecki commented on NUTCH-893: - Dogacan, flush() doesn't help - there are

[jira] Created: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-08-25 Thread Andrzej Bialecki (JIRA)
DataStore.put() silently loses records when executed from multiple processes Key: NUTCH-893 URL: https://issues.apache.org/jira/browse/NUTCH-893 Project: Nutch

[jira] Updated: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-08-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-893: Attachment: NUTCH-893.patch Unit test to illustrate the issue. DataStore.put() silently

[jira] Created: (NUTCH-891) Nutch build should not depend on unversioned local deps

2010-08-19 Thread Andrzej Bialecki (JIRA)
Nutch build should not depend on unversioned local deps --- Key: NUTCH-891 URL: https://issues.apache.org/jira/browse/NUTCH-891 Project: Nutch Issue Type: Bug Reporter: Andrzej

[jira] Commented: (NUTCH-891) Nutch build should not depend on unversioned local deps

2010-08-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900455#action_12900455 ] Andrzej Bialecki commented on NUTCH-891: - Yes, this would help. Nutch build

[jira] Commented: (NUTCH-882) Design a Host table in GORA

2010-08-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899810#action_12899810 ] Andrzej Bialecki commented on NUTCH-882: - This functionality is very useful for

[jira] Updated: (NUTCH-880) REST API (and webapp) for Nutch

2010-08-11 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-880: Description: This issue is for discussing a REST-style API for accessing Nutch. Here's an

[jira] Created: (NUTCH-884) FetcherJob should run more reduce tasks than default

2010-08-11 Thread Andrzej Bialecki (JIRA)
FetcherJob should run more reduce tasks than default Key: NUTCH-884 URL: https://issues.apache.org/jira/browse/NUTCH-884 Project: Nutch Issue Type: Improvement Components:

[jira] Resolved: (NUTCH-872) Change the default fetcher.parse to FALSE

2010-08-11 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-872. - Fix Version/s: 2.0 Resolution: Fixed I changed the name of the option to -parse to

[jira] Updated: (NUTCH-884) FetcherJob should run more reduce tasks than default

2010-08-11 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-884: Attachment: NUTCH-884.patch Patch with the change. I also rearranged the arguments to

[jira] Created: (NUTCH-879) URL-s getting lost

2010-08-10 Thread Andrzej Bialecki (JIRA)
URL-s getting lost -- Key: NUTCH-879 URL: https://issues.apache.org/jira/browse/NUTCH-879 Project: Nutch Issue Type: Bug Affects Versions: 2.0 Environment: * Ubuntu 10.4 x64, Sun JDK 1.6 * using 1-node Hadoop +

[jira] Updated: (NUTCH-876) Remove remaining robots/IP blocking code in lib-http

2010-08-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-876: Attachment: NUTCH-876.patch Patch to fix the issue. If there are no objections I'll commit

[jira] Commented: (NUTCH-858) No longer able to set per-field boosts on lucene documents

2010-08-04 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895377#action_12895377 ] Andrzej Bialecki commented on NUTCH-858: - It was r960064, but I have to admit I

[jira] Updated: (NUTCH-867) Port Nutch benchmark to Nutchbase

2010-08-04 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-867: Attachment: benchmark.patch Ported benchmark that uses HSQLDB as the store impl. If there

[jira] Created: (NUTCH-867) Port Nutch benchmark to Nutchbase

2010-07-31 Thread Andrzej Bialecki (JIRA)
Port Nutch benchmark to Nutchbase - Key: NUTCH-867 URL: https://issues.apache.org/jira/browse/NUTCH-867 Project: Nutch Issue Type: New Feature Affects Versions: nutchbase Reporter: Andrzej

[jira] Resolved: (NUTCH-863) Benchmark and a testbed proxy server

2010-07-30 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-863. - Fix Version/s: 2.0 Resolution: Fixed Committed in rev. 980932. Benchmark and a

[jira] Updated: (NUTCH-858) No longer able to set per-field boosts on lucene documents

2010-07-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-858: Assignee: Andrzej Bialecki Fix Version/s: 1.2 No longer able to set per-field

[jira] Commented: (NUTCH-858) No longer able to set per-field boosts on lucene documents

2010-07-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890873#action_12890873 ] Andrzej Bialecki commented on NUTCH-858: - Unfortunately no. The patch was included

[jira] Updated: (NUTCH-844) Improve NutchConfiguration

2010-07-14 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-844: Attachment: NUTCH-844.patch Updated patch. This also addresses an issue in PluginRepository

[jira] Resolved: (NUTCH-844) Improve NutchConfiguration

2010-07-14 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-844. - Resolution: Fixed Committed in r964063. Thanks for review! Improve NutchConfiguration

[jira] Updated: (NUTCH-844) Improve NutchConfiguration

2010-07-08 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-844: Attachment: conf.patch Improve NutchConfiguration --

[jira] Commented: (NUTCH-843) Separate the build and runtime environments

2010-07-08 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886318#action_12886318 ] Andrzej Bialecki commented on NUTCH-843: - runtime/local doesn't need Hadoop

[jira] Commented: (NUTCH-843) Separate the build and runtime environments

2010-07-08 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886330#action_12886330 ] Andrzej Bialecki commented on NUTCH-843: - Pseudo-distributed (i.e. a real

[jira] Resolved: (NUTCH-845) Native hadoop libs not available through maven

2010-07-08 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-845. - Fix Version/s: 2.0 Resolution: Fixed Committed in rev. 961778. Thanks for review!

[jira] Created: (NUTCH-843) Separate the build and runtime environments

2010-07-07 Thread Andrzej Bialecki (JIRA)
Separate the build and runtime environments --- Key: NUTCH-843 URL: https://issues.apache.org/jira/browse/NUTCH-843 Project: Nutch Issue Type: Improvement Components: build Affects

[jira] Updated: (NUTCH-843) Separate the build and runtime environments

2010-07-07 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-843: Attachment: NUTCH-843.patch This patch moves bin/nutch to src/bin/nutch, and creates

[jira] Commented: (NUTCH-843) Separate the build and runtime environments

2010-07-07 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886015#action_12886015 ] Andrzej Bialecki commented on NUTCH-843: - We need to create the job file anyway.

[jira] Updated: (NUTCH-843) Separate the build and runtime environments

2010-07-07 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-843: Attachment: NUTCH-843.patch Updated patch that moves nutch.jar to lib/ for the local

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-07-06 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885583#action_12885583 ] Andrzej Bialecki commented on NUTCH-821: - +1 for this patch for now - all good

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-07-05 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885188#action_12885188 ] Andrzej Bialecki commented on NUTCH-821: - I think this patch refers to some parts

[jira] Updated: (NUTCH-696) Timeout for Parser

2010-07-05 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-696: Attachment: timeout.patch A simple patch that implements the strategy outlined here

[jira] Commented: (NUTCH-696) Timeout for Parser

2010-07-05 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885257#action_12885257 ] Andrzej Bialecki commented on NUTCH-696: - Yes - this patch is a quick solution that

[jira] Reopened: (NUTCH-696) Timeout for Parser

2010-07-05 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reopened NUTCH-696: - This may be useful after all - let's gather more comments. Timeout for Parser

[jira] Commented: (NUTCH-696) Timeout for Parser

2010-07-05 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885295#action_12885295 ] Andrzej Bialecki commented on NUTCH-696: - I agree, ultimately that's the way to go.

[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-837: Attachment: NUTCH-837.patch Updated patch against r959954 (after NUTCH-836). Remove

[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-837: Attachment: (was: NUTCH-837.patch) Remove search servers and Lucene dependencies

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884729#action_12884729 ] Andrzej Bialecki commented on NUTCH-837: - bq. So, I think we should still have a

[jira] Resolved: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-837. - Resolution: Fixed Committed in r960064. Thanks for review! Remove search servers and

[jira] Assigned: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reassigned NUTCH-837: --- Assignee: Andrzej Bialecki Remove search servers and Lucene dependencies

[jira] Commented: (NUTCH-650) Hbase Integration

2010-06-29 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883559#action_12883559 ] Andrzej Bialecki commented on NUTCH-650: - So far as one can digest such a giant