[jira] Updated: (NUTCH-814) SegmentMerger bug

2010-04-27 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-814: Attachment: merger.patch Patch fixing the issue, and a unit test. I will commit this

[jira] Work stopped: (NUTCH-466) Flexible segment format

2010-04-27 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-466 stopped by Andrzej Bialecki . Flexible segment format --- Key: NUTCH-466 URL:

[jira] Updated: (NUTCH-812) Crawl.java incorrectly uses the Generator API resulting in NPE

2010-04-16 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-812: Affects Version/s: 1.1 Priority: Critical (was: Major) Crawl.java

[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-03-30 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851331#action_12851331 ] Andrzej Bialecki commented on NUTCH-789: - There are no diffs, so it's difficult to

[jira] Commented: (NUTCH-784) CrawlDBScanner

2010-03-29 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850896#action_12850896 ] Andrzej Bialecki commented on NUTCH-784: - This should have been reviewed first - I

[jira] Commented: (NUTCH-785) Fetcher : copy metadata from origin URL when redirecting + call scfilters.initialScore on newly created URL

2010-03-29 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850931#action_12850931 ] Andrzej Bialecki commented on NUTCH-785: - +1. The scoring api should allow us to

[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-03-29 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850939#action_12850939 ] Andrzej Bialecki commented on NUTCH-779: - CrawlDbReducer, the cramped line {{if

[jira] Commented: (NUTCH-762) Alternative Generator which can generate several segments in one parse of the crawlDB

2010-03-22 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848090#action_12848090 ] Andrzej Bialecki commented on NUTCH-762: - I just noticed that the new Generator

[jira] Commented: (NUTCH-762) Alternative Generator which can generate several segments in one parse of the crawlDB

2010-03-22 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848110#action_12848110 ] Andrzej Bialecki commented on NUTCH-762: - bq. If we want to replace the old

[jira] Commented: (NUTCH-762) Alternative Generator which can generate several segments in one parse of the crawlDB

2010-03-22 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848173#action_12848173 ] Andrzej Bialecki commented on NUTCH-762: - bq. The change of prefix also reflected

[jira] Commented: (NUTCH-693) Add configurable option for treating nofollow behaviour.

2010-03-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847291#action_12847291 ] Andrzej Bialecki commented on NUTCH-693: - Thanks for the pointer to the article.

[jira] Updated: (NUTCH-693) Add configurable option for treating nofollow behaviour.

2010-03-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-693: Assignee: (was: Otis Gospodnetic) Add configurable option for treating nofollow

[jira] Assigned: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reassigned NUTCH-797: --- Assignee: Andrzej Bialecki parse-tika is not properly constructing URLs when the

[jira] Commented: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847300#action_12847300 ] Andrzej Bialecki commented on NUTCH-797: - If there are no futher comments I'm going

[jira] Updated: (NUTCH-787) Upgrade Lucene to 3.0.1.

2010-03-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-787: Assignee: Andrzej Bialecki Summary: Upgrade Lucene to 3.0.1. (was: Upgrade Lucene to

[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

2010-03-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847315#action_12847315 ] Andrzej Bialecki commented on NUTCH-787: - Using Lucene 3.0.1 artifacts I verified

[jira] Closed: (NUTCH-787) Upgrade Lucene to 3.0.1.

2010-03-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-787. --- Resolution: Fixed Committed. Thanks Dawid! Upgrade Lucene to 3.0.1.

[jira] Created: (NUTCH-803) Upgrade Hadoop to 0.20.2

2010-03-19 Thread Andrzej Bialecki (JIRA)
Upgrade Hadoop to 0.20.2 Key: NUTCH-803 URL: https://issues.apache.org/jira/browse/NUTCH-803 Project: Nutch Issue Type: Improvement Affects Versions: 1.1 Reporter: Andrzej Bialecki

[jira] Closed: (NUTCH-803) Upgrade Hadoop to 0.20.2

2010-03-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-803. --- Resolution: Fixed All tests pass - committed. Upgrade Hadoop to 0.20.2

[jira] Commented: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846923#action_12846923 ] Andrzej Bialecki commented on NUTCH-797: - That's one option, at least until the

[jira] Commented: (NUTCH-762) Alternative Generator which can generate several segments in one parse of the crawlDB

2010-03-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846927#action_12846927 ] Andrzej Bialecki commented on NUTCH-762: - In my experience the IP-based fetching

[jira] Reopened: (NUTCH-802) Problems managing outlinks with large url length

2010-03-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reopened NUTCH-802: - Assignee: Andrzej Bialecki Submitting a patch is not fixing, it's fixed when the patch

[jira] Commented: (NUTCH-802) Problems managing outlinks with large url length

2010-03-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846932#action_12846932 ] Andrzej Bialecki commented on NUTCH-802: - We already have a general way to control

[jira] Closed: (NUTCH-796) Zero results problems difficult to troubleshoot due to lack of logging

2010-03-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-796. --- Resolution: Fixed Fix Version/s: 1.1 Assignee: Andrzej Bialecki Patch

[jira] Commented: (NUTCH-800) Generator builds a URL list that is not encoded

2010-03-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847071#action_12847071 ] Andrzej Bialecki commented on NUTCH-800: - I'm puzzled by your problem description.

[jira] Commented: (NUTCH-693) Add configurable option for treating nofollow behaviour.

2010-03-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847074#action_12847074 ] Andrzej Bialecki commented on NUTCH-693: - This patch is controversial in the sense

[jira] Commented: (NUTCH-795) Add ability to maintain nofollow attribute in linkdb

2010-03-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847075#action_12847075 ] Andrzej Bialecki commented on NUTCH-795: - Please see my comment to that issue. Or

[jira] Commented: (NUTCH-780) Nutch crawler did not read configuration files

2010-03-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847094#action_12847094 ] Andrzej Bialecki commented on NUTCH-780: - Is the purpose of this issue to make

[jira] Commented: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846402#action_12846402 ] Andrzej Bialecki commented on NUTCH-797: - Thanks for reporting this, and providing

[jira] Commented: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846418#action_12846418 ] Andrzej Bialecki commented on NUTCH-797: - Hm, actually the picture is more

[jira] Updated: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-797: Attachment: pureQueryUrl-2.patch Updated patch with some refactoring and unit tests. If no

[jira] Updated: (NUTCH-796) Zero results problems difficult to troubleshoot due to lack of logging

2010-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-796: Attachment: logging.patch I propose this patch. If there are no objections I'll commit it

[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

2010-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846428#action_12846428 ] Andrzej Bialecki commented on NUTCH-787: - Lucene 3.0.1 is out now .. I'll test this

[jira] Commented: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846437#action_12846437 ] Andrzej Bialecki commented on NUTCH-797: - Unfortunately the way your fix was

[jira] Assigned: (NUTCH-774) Retry interval in crawl date is set to 0

2010-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reassigned NUTCH-774: --- Assignee: Andrzej Bialecki Retry interval in crawl date is set to 0

[jira] Commented: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846527#action_12846527 ] Andrzej Bialecki commented on NUTCH-797: - A few issues with this: * does this mean

[jira] Commented: (NUTCH-762) Alternative Generator which can generate several segments in one parse of the crawlDB

2010-03-16 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846133#action_12846133 ] Andrzej Bialecki commented on NUTCH-762: - It appears this class is not a strict

[jira] Commented: (NUTCH-762) Alternative Generator which can generate several segments in one parse of the crawlDB

2010-03-16 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846174#action_12846174 ] Andrzej Bialecki commented on NUTCH-762: - In case of users generating just 1

[jira] Commented: (NUTCH-798) Upgrade to SOLR1.4

2010-03-10 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843555#action_12843555 ] Andrzej Bialecki commented on NUTCH-798: - +1, preferably before the 1.1 freeze so

[jira] Commented: (NUTCH-801) Remove RTF and MP3 parse plugins

2010-03-10 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843587#action_12843587 ] Andrzej Bialecki commented on NUTCH-801: - Definitely +1, the only reason they

[jira] Commented: (NUTCH-799) SOLRIndexer to commit once all reducers have finished

2010-03-05 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841790#action_12841790 ] Andrzej Bialecki commented on NUTCH-799: - I think it's ok to do it this way - the

[jira] Commented: (NUTCH-766) Tika parser

2010-02-10 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832250#action_12832250 ] Andrzej Bialecki commented on NUTCH-766: - +1 to commit this - please remember to

[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-02-05 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830065#action_12830065 ] Andrzej Bialecki commented on NUTCH-673: - +1 on both counts. Upgrade to Lucene 3.0

[jira] Commented: (NUTCH-775) Enhance Searcher interface

2010-01-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806031#action_12806031 ] Andrzej Bialecki commented on NUTCH-775: - IMHO this could go as it is ... one

[jira] Commented: (NUTCH-766) Tika parser

2010-01-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804558#action_12804558 ] Andrzej Bialecki commented on NUTCH-766: - I agree with Chris, +1 on keeping the old

[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-01-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802175#action_12802175 ] Andrzej Bialecki commented on NUTCH-779: - Personally I would use ScoringFilters

[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-01-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801875#action_12801875 ] Andrzej Bialecki commented on NUTCH-779: - You can already achieve this with

[jira] Commented: (NUTCH-655) Injecting Crawl metadata

2010-01-05 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797013#action_12797013 ] Andrzej Bialecki commented on NUTCH-655: - I'm not sure about the latest addition

[jira] Commented: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2009-12-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791979#action_12791979 ] Andrzej Bialecki commented on NUTCH-666: - Do you think it was related to the

[jira] Commented: (NUTCH-775) Enhance Searcher interface

2009-12-16 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791411#action_12791411 ] Andrzej Bialecki commented on NUTCH-775: - +1. I would suggest creating a subclass

[jira] Commented: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2009-12-14 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790225#action_12790225 ] Andrzej Bialecki commented on NUTCH-666: - Dennis, what's the status of this patch

[jira] Updated: (NUTCH-767) Update Tika to v0.5 for the MimeType detection

2009-12-04 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-767: Remaining Estimate: 0h Original Estimate: 0h I applied the patch, and I'm closing this

[jira] Reopened: (NUTCH-767) Update Tika to v0.5 for the MimeType detection

2009-12-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reopened NUTCH-767: - Update Tika to v0.5 for the MimeType detection

[jira] Commented: (NUTCH-767) Update Tika to v0.5 for the MimeType detection

2009-12-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784790#action_12784790 ] Andrzej Bialecki commented on NUTCH-767: - Reopening this issue, because TestContent

[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

2009-12-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784206#action_12784206 ] Andrzej Bialecki commented on NUTCH-768: - +1. Minor nit: file

[jira] Commented: (NUTCH-770) Timebomb for Fetcher

2009-12-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784250#action_12784250 ] Andrzej Bialecki commented on NUTCH-770: - Fixed in rev. 885776. Thank you!

[jira] Closed: (NUTCH-770) Timebomb for Fetcher

2009-12-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-770. --- Resolution: Fixed Fix Version/s: 1.1 Assignee: Andrzej Bialecki Timebomb

[jira] Commented: (NUTCH-769) Fetcher to skip queues for URLS getting repeated exceptions

2009-12-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784260#action_12784260 ] Andrzej Bialecki commented on NUTCH-769: - I had to apply this patch by hand, due to

[jira] Closed: (NUTCH-769) Fetcher to skip queues for URLS getting repeated exceptions

2009-12-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-769. --- Resolution: Fixed Fix Version/s: 1.1 Assignee: Andrzej Bialecki Fetcher to

[jira] Closed: (NUTCH-767) Update Tika to v0.5 for the MimeType detection

2009-12-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-767. --- Resolution: Fixed Fix Version/s: 1.1 Assignee: Andrzej Bialecki (was: Chris

[jira] Commented: (NUTCH-767) Update Tika to v0.5 for the MimeType detection

2009-12-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784337#action_12784337 ] Andrzej Bialecki commented on NUTCH-767: - Fixed in rev. 885869. Thank you! Update

[jira] Commented: (NUTCH-770) Timebomb for Fetcher

2009-11-30 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783638#action_12783638 ] Andrzej Bialecki commented on NUTCH-770: - bq. time limit is definitely better

[jira] Commented: (NUTCH-770) Timebomb for Fetcher

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783283#action_12783283 ] Andrzej Bialecki commented on NUTCH-770: - I propose to change the name of this

[jira] Closed: (NUTCH-746) NutchBeanConstructor does not close NutchBean upon contextDestroyed, causing resource leak in the container.

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-746. --- Resolution: Fixed Assignee: Andrzej Bialecki NutchBeanConstructor does not close

[jira] Commented: (NUTCH-746) NutchBeanConstructor does not close NutchBean upon contextDestroyed, causing resource leak in the container.

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783287#action_12783287 ] Andrzej Bialecki commented on NUTCH-746: - Fixed in rev. 885148. Thanks!

[jira] Closed: (NUTCH-738) Close SegmentUpdater when FetchedSegments is closed

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-738. --- Resolution: Fixed Assignee: Andrzej Bialecki Close SegmentUpdater when

[jira] Closed: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-739. --- Resolution: Fixed Assignee: Andrzej Bialecki SolrDeleteDuplications too slow when

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783290#action_12783290 ] Andrzej Bialecki commented on NUTCH-739: - Fixed in rev. 885152. Thank you!

[jira] Closed: (NUTCH-755) DomainURLFilter crashes on malformed URL

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-755. --- Resolution: Cannot Reproduce Assignee: Andrzej Bialecki DomainURLFilter crashes on

[jira] Commented: (NUTCH-755) DomainURLFilter crashes on malformed URL

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783299#action_12783299 ] Andrzej Bialecki commented on NUTCH-755: - I could not verify that the filter indeed

[jira] Commented: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783302#action_12783302 ] Andrzej Bialecki commented on NUTCH-692: - We should review this issue after the

[jira] Commented: (NUTCH-741) Job file includes multiple copies of nutch config files.

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783304#action_12783304 ] Andrzej Bialecki commented on NUTCH-741: - Fixed in rev. 885156. Thank you! Job

[jira] Closed: (NUTCH-741) Job file includes multiple copies of nutch config files.

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-741. --- Resolution: Fixed Fix Version/s: 1.1 Assignee: Andrzej Bialecki Job file

[jira] Closed: (NUTCH-712) ParseOutputFormat should catch java.net.MalformedURLException coming from normalizers

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-712. --- Resolution: Fixed Fix Version/s: 1.1 Assignee: Andrzej Bialecki

[jira] Commented: (NUTCH-712) ParseOutputFormat should catch java.net.MalformedURLException coming from normalizers

2009-11-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783306#action_12783306 ] Andrzej Bialecki commented on NUTCH-712: - Fixed in rev. 885159. Thank you!

[jira] Created: (NUTCH-772) Upgrade Nutch to use Lucene 2.9.1

2009-11-25 Thread Andrzej Bialecki (JIRA)
Upgrade Nutch to use Lucene 2.9.1 - Key: NUTCH-772 URL: https://issues.apache.org/jira/browse/NUTCH-772 Project: Nutch Issue Type: Improvement Affects Versions: 1.1 Reporter: Andrzej Bialecki

[jira] Closed: (NUTCH-773) some minor bugs in AbstractFetchSchedule.java

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-773. --- Resolution: Fixed Assignee: Andrzej Bialecki some minor bugs in

[jira] Commented: (NUTCH-773) some minor bugs in AbstractFetchSchedule.java

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782509#action_12782509 ] Andrzej Bialecki commented on NUTCH-773: - That was a nasty bug - fixed in rev.

[jira] Commented: (NUTCH-753) Prevent new Fetcher to retrieve the robots twice

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782516#action_12782516 ] Andrzej Bialecki commented on NUTCH-753: - Fixed in rev. 884203 - thanks! Prevent

[jira] Closed: (NUTCH-753) Prevent new Fetcher to retrieve the robots twice

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-753. --- Resolution: Fixed Fix Version/s: 1.1 Assignee: Andrzej Bialecki Prevent new

[jira] Commented: (NUTCH-762) Alternative Generator which can generate several segments in one parse of the crawlDB

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782524#action_12782524 ] Andrzej Bialecki commented on NUTCH-762: - This class offers a strict superset of

[jira] Closed: (NUTCH-761) Avoid cloningCrawlDatum in CrawlDbReducer

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-761. --- Resolution: Fixed Fix Version/s: 1.1 Assignee: Andrzej Bialecki Avoid

[jira] Commented: (NUTCH-761) Avoid cloningCrawlDatum in CrawlDbReducer

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782537#action_12782537 ] Andrzej Bialecki commented on NUTCH-761: - I applied the patch with some changes -

[jira] Closed: (NUTCH-760) Allow field mapping from nutch to solr index

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-760. --- Resolution: Fixed Fix Version/s: 1.1 Allow field mapping from nutch to solr index

[jira] Commented: (NUTCH-760) Allow field mapping from nutch to solr index

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782617#action_12782617 ] Andrzej Bialecki commented on NUTCH-760: - I reworked the patch to get rid of any

[jira] Commented: (NUTCH-772) Upgrade Nutch to use Lucene 2.9.1

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782624#action_12782624 ] Andrzej Bialecki commented on NUTCH-772: - Fixed in rev. 884277. Upgrade Nutch to

[jira] Closed: (NUTCH-772) Upgrade Nutch to use Lucene 2.9.1

2009-11-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-772. --- Resolution: Fixed Fix Version/s: 1.1 Upgrade Nutch to use Lucene 2.9.1

[jira] Commented: (NUTCH-771) Add WebGraph classes to the bin/nutch script

2009-11-24 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782177#action_12782177 ] Andrzej Bialecki commented on NUTCH-771: - +1 to adding these to the script. The

[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

2009-11-24 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782179#action_12782179 ] Andrzej Bialecki commented on NUTCH-768: - Are there any source code changes

[jira] Commented: (NUTCH-764) Add support for vfsfile:// loading of plugins for JBoss

2009-11-10 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775365#action_12775365 ] Andrzej Bialecki commented on NUTCH-764: - First question is: why is it sometimes

[jira] Commented: (NUTCH-764) Add support for vfsfile:// loading of plugins for JBoss

2009-11-10 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776064#action_12776064 ] Andrzej Bialecki commented on NUTCH-764: - Thanks for the explanation. Well, it's

[jira] Commented: (NUTCH-760) Allow field mapping from nutch to solr index

2009-10-20 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767918#action_12767918 ] Andrzej Bialecki commented on NUTCH-760: - A few comments to the latest patch: *

[jira] Commented: (NUTCH-251) Administration GUI

2009-10-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765971#action_12765971 ] Andrzej Bialecki commented on NUTCH-251: - You can create a tar of everything and

[jira] Commented: (NUTCH-760) Allow field mapping from nutch to solr index

2009-10-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766213#action_12766213 ] Andrzej Bialecki commented on NUTCH-760: - Thanks David, this is a good start. We

[jira] Closed: (NUTCH-707) Generation of multiple segments in multiple runs returns only 1 segment

2009-10-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-707. --- Resolution: Fixed Assignee: Andrzej Bialecki Generation of multiple segments in

[jira] Commented: (NUTCH-707) Generation of multiple segments in multiple runs returns only 1 segment

2009-10-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763994#action_12763994 ] Andrzej Bialecki commented on NUTCH-707: - Fixed - the bug was actually present in

[jira] Closed: (NUTCH-730) NPE in LinkRank if no nodes with which to create the WebGraph

2009-10-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-730. --- Resolution: Fixed Assignee: Andrzej Bialecki (was: Dennis Kubes) NPE in LinkRank if

[jira] Commented: (NUTCH-730) NPE in LinkRank if no nodes with which to create the WebGraph

2009-10-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763998#action_12763998 ] Andrzej Bialecki commented on NUTCH-730: - Fixed in rev. 823532, thanks! NPE in

[jira] Commented: (NUTCH-731) Redirection of robots.txt in RobotRulesParser

2009-10-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12764009#action_12764009 ] Andrzej Bialecki commented on NUTCH-731: - Fixed in rev. 823540 - I applied a

[jira] Closed: (NUTCH-731) Redirection of robots.txt in RobotRulesParser

2009-10-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-731. --- Resolution: Fixed Assignee: Andrzej Bialecki (was: Otis Gospodnetic) Redirection of

  1   2   3   4   5   6   7   >