[jira] [Closed] (NUTCH-1340) Increase scalability by only removing markers when they actually exist for DbUpdaterReducer

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1340. --- Resolution: Fixed Increase scalability by only removing markers when they actually exist for

[jira] [Updated] (NUTCH-1340) Increase scalability by only removing markers when they actually exist for DbUpdaterReducer

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1340: Attachment: NUTCH-1340-v2.txt v2 of patch, including javadoc. This patch increases performance,

[jira] [Commented] (NUTCH-882) Design a Host table in GORA

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262488#comment-13262488 ] Ferdy Galema commented on NUTCH-882: Committed. I realize that the current state is far

[jira] [Resolved] (NUTCH-882) Design a Host table in GORA

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema resolved NUTCH-882. Resolution: Fixed Design a Host table in GORA ---

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262496#comment-13262496 ] Ferdy Galema commented on NUTCH-902: I think nutch-default.xml does not correctly use

[jira] [Commented] (NUTCH-1189) add commented out default settings to gora.properties files

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262506#comment-13262506 ] Ferdy Galema commented on NUTCH-1189: - FYI: I just committed a change to update the

[jira] [Closed] (NUTCH-882) Design a Host table in GORA

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-882. -- Ok. Thanks to anyone who was involved. Design a Host table in GORA

[jira] [Closed] (NUTCH-1290) crawlId not supported by all Tools

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1290. --- Resolution: Fixed crawlId not supported by all Tools --

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262548#comment-13262548 ] Ferdy Galema commented on NUTCH-902: Alright I'll change and commit the

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262556#comment-13262556 ] Ferdy Galema commented on NUTCH-902: Ok done. (Note that I did not actually check the

[jira] [Commented] (NUTCH-879) URL-s getting lost

2012-04-26 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262558#comment-13262558 ] Ferdy Galema commented on NUTCH-879: This a pretty old issue. Nevertheless the bug

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-04-27 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: NUTCH-1205-v7.patch OK I got the tests working now. The problem is the fact that

[jira] [Commented] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-04-27 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263596#comment-13263596 ] Ferdy Galema commented on NUTCH-1205: - (Also I reformatted the ivy.xml to only include

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-04-27 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: NUTCH-1205-v8.patch Oops there still was a failure in a test later on.

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-05-02 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: (was: NUTCH-1205-v7.patch) Upgrade gora modules to 0.2 in ivy/ivy.xml

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-05-02 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: NUTCH-1205-v10.patch The tests now work and TestGoraStorage uses a proper standalone

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-05-02 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: (was: NUTCH-1205-v9.patch) Upgrade gora modules to 0.2 in ivy/ivy.xml

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-05-02 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: (was: NUTCH-1205-v9.patch) Upgrade gora modules to 0.2 in ivy/ivy.xml

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-05-03 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: NUTCH-1205-v11.patch Upgrade gora modules to 0.2 in ivy/ivy.xml

[jira] [Resolved] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-05-03 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema resolved NUTCH-1205. - Resolution: Fixed Attached new patch v11. Committed. -Fixed the jdom issue. (Added test dep

[jira] [Updated] (NUTCH-896) Gora-based tests need to have their own config files

2012-05-03 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-896: --- Affects Version/s: (was: nutchgora) Fix Version/s: (was: 2.1)

[jira] [Commented] (NUTCH-1205) Upgrade gora modules to 0.2 in ivy/ivy.xml

2012-05-03 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267581#comment-13267581 ] Ferdy Galema commented on NUTCH-1205: - I committed a minor addition, that fixes the

[jira] [Commented] (NUTCH-1349) Make batchId explcit within debug logging.

2012-05-03 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267758#comment-13267758 ] Ferdy Galema commented on NUTCH-1349: - +1 This will also benefits other jobs depending

[jira] [Created] (NUTCH-1350) remove unused dependancy because of access restriction

2012-05-04 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1350: --- Summary: remove unused dependancy because of access restriction Key: NUTCH-1350 URL: https://issues.apache.org/jira/browse/NUTCH-1350 Project: Nutch Issue

[jira] [Commented] (NUTCH-1349) Make batchId explcit within debug logging and improve CLI

2012-05-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268397#comment-13268397 ] Ferdy Galema commented on NUTCH-1349: - Good work on improving the CLI. About the

[jira] [Created] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-05-07 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1352: --- Summary: Improve regex urlfilters/normalizers synchronization Key: NUTCH-1352 URL: https://issues.apache.org/jira/browse/NUTCH-1352 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1352: Attachment: NUTCH-1352.patch Improve regex urlfilters/normalizers synchronization

[jira] [Updated] (NUTCH-1353) nutchgora DomainStatistics support crawlId, counter bug and reformatting

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1353: Attachment: NUTCH-1353.patch nutchgora DomainStatistics support crawlId, counter bug and

[jira] [Closed] (NUTCH-1353) nutchgora DomainStatistics support crawlId, counter bug and reformatting

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1353. --- Resolution: Fixed committed nutchgora DomainStatistics support crawlId, counter

[jira] [Created] (NUTCH-1354) nutchgora support fetcher.queue.depth.multiplier property

2012-05-07 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1354: --- Summary: nutchgora support fetcher.queue.depth.multiplier property Key: NUTCH-1354 URL: https://issues.apache.org/jira/browse/NUTCH-1354 Project: Nutch Issue

[jira] [Updated] (NUTCH-1354) nutchgora support fetcher.queue.depth.multiplier property

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1354: Attachment: NUTCH-1354.patch nutchgora support fetcher.queue.depth.multiplier property

[jira] [Closed] (NUTCH-1354) nutchgora support fetcher.queue.depth.multiplier property

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1354. --- Resolution: Fixed committed nutchgora support fetcher.queue.depth.multiplier

[jira] [Updated] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1352: Fix Version/s: 1.5 Improve regex urlfilters/normalizers synchronization

[jira] [Commented] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269497#comment-13269497 ] Ferdy Galema commented on NUTCH-1352: - This indeed applies to trunk too. (Except for a

[jira] [Updated] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1352: Fix Version/s: (was: 1.5) 1.6 On second thought, I will hold commit for

[jira] [Created] (NUTCH-1355) nutchgora Configure minimum throughput for fetcher

2012-05-07 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1355: --- Summary: nutchgora Configure minimum throughput for fetcher Key: NUTCH-1355 URL: https://issues.apache.org/jira/browse/NUTCH-1355 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-1355) nutchgora Configure minimum throughput for fetcher

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1355: Attachment: NUTCH-1355.patch nutchgora Configure minimum throughput for fetcher

[jira] [Created] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-05-07 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1356: --- Summary: ParseUtil use ExecutorService instead of manually thread handling. Key: NUTCH-1356 URL: https://issues.apache.org/jira/browse/NUTCH-1356 Project: Nutch

[jira] [Updated] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1356: Attachment: NUTCH-1356.patch ParseUtil use ExecutorService instead of manually thread

[jira] [Updated] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1356: Fix Version/s: 1.6 Sure will create patch for 1.x too. (Seems not that different).

[jira] [Updated] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1356: Attachment: NUTCH-1356-trunk.patch Patch for trunk. ParseUtil use

[jira] [Commented] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269619#comment-13269619 ] Ferdy Galema commented on NUTCH-1352: - Thanks. Improve regex

[jira] [Updated] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1356: Attachment: NUTCH-1356-trunk-v2.patch It was working though, I guess that is because of a

[jira] [Closed] (NUTCH-1355) nutchgora Configure minimum throughput for fetcher

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1355. --- Resolution: Fixed committed nutchgora Configure minimum throughput for fetcher

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269695#comment-13269695 ] Ferdy Galema commented on NUTCH-1356: - committed at nutchgora

[jira] [Commented] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-05-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269697#comment-13269697 ] Ferdy Galema commented on NUTCH-1352: - committed at nutchgora

[jira] [Created] (NUTCH-1357) All gora mapreduce functionality should go through StorageUtils

2012-05-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1357: --- Summary: All gora mapreduce functionality should go through StorageUtils Key: NUTCH-1357 URL: https://issues.apache.org/jira/browse/NUTCH-1357 Project: Nutch

[jira] [Created] (NUTCH-1358) Do not accept bogus arguments

2012-05-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1358: --- Summary: Do not accept bogus arguments Key: NUTCH-1358 URL: https://issues.apache.org/jira/browse/NUTCH-1358 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1358) Do not accept bogus arguments

2012-05-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1358: Attachment: NUTCH-1358.patch Do not accept bogus arguments -

[jira] [Closed] (NUTCH-1358) Do not accept bogus arguments

2012-05-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1358. --- Resolution: Fixed Committed. Do not accept bogus arguments

[jira] [Commented] (NUTCH-1357) All gora mapreduce functionality should go through StorageUtils

2012-05-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271421#comment-13271421 ] Ferdy Galema commented on NUTCH-1357: - Side note: It seems some tools do need to call

[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually work.

2012-05-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271445#comment-13271445 ] Ferdy Galema commented on NUTCH-1363: - Hey Lewis, This does work, with the

[jira] [Issue Comment Edited] (NUTCH-1363) Make parsing in FetcherJob actually work.

2012-05-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271445#comment-13271445 ] Ferdy Galema edited comment on NUTCH-1363 at 5/9/12 2:27 PM: -

[jira] [Updated] (NUTCH-1357) All gora mapreduce functionality should go through StorageUtils

2012-05-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1357: Fix Version/s: (was: nutchgora) On second though, this can be solved later.

[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually work.

2012-05-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272150#comment-13272150 ] Ferdy Galema commented on NUTCH-1363: - I'm not sure I follow. What makes this property

[jira] [Created] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-05-10 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1365: --- Summary: Fix crawlId functionalilty by making using of new gora configuration Key: NUTCH-1365 URL: https://issues.apache.org/jira/browse/NUTCH-1365 Project: Nutch

[jira] [Updated] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-05-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1365: Attachment: NUTCH-1365.patch Fix crawlId functionalilty by making using of new gora

[jira] [Commented] (NUTCH-1306) Commit after finished writing to solr index

2012-05-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272231#comment-13272231 ] Ferdy Galema commented on NUTCH-1306: - Lewis, Do you suggest to add the commit as

[jira] [Closed] (NUTCH-1026) Strip UTF-8 non-character codepoints

2012-05-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1026. --- Resolution: Fixed Fix Version/s: (was: 2.1) nutchgora When indexing a

[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr index

2012-05-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1306: Attachment: NUTCH-1306-v2.patch NUTCH-1306-trunk.patch Agree with trying to make

[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr index

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1306: Attachment: NUTCH-1306-trunk-v2.patch Heh indeed that's not ready for committing yet. Weird though

[jira] [Updated] (NUTCH-1362) Fix error handling of urls with empty fields

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1362: Attachment: NUTCH-1362.patch Hey Lewis, This patches fixes the problem and makes the reversing a

[jira] [Commented] (NUTCH-1362) Fix error handling of urls with empty fields

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273129#comment-13273129 ] Ferdy Galema commented on NUTCH-1362: - Btw this is a duplicate of NUTCH-1077.

[jira] [Closed] (NUTCH-1077) Nutch 2 DbUpdateMapper throws ArrayOutOfBoundsException when running update

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1077. --- Resolution: Duplicate Fix Version/s: (was: 2.1) Will be fixed with NUTCH-1362. (Use

[jira] [Closed] (NUTCH-1362) Fix error handling of urls with empty fields

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1362. --- Resolution: Fixed Done! Thanks. Fix error handling of urls with empty fields

[jira] [Created] (NUTCH-1366) speed up indexing by eliminating the indexreducer

2012-05-11 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1366: --- Summary: speed up indexing by eliminating the indexreducer Key: NUTCH-1366 URL: https://issues.apache.org/jira/browse/NUTCH-1366 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-1366) speed up indexing by eliminating the indexreducer

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1366: Attachment: NUTCH-1366.patch speed up indexing by eliminating the indexreducer

[jira] [Commented] (NUTCH-1366) speed up indexing by eliminating the indexreducer

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273335#comment-13273335 ] Ferdy Galema commented on NUTCH-1366: - The cool part about Nutchgora is that inlinks

[jira] [Commented] (NUTCH-1367) Port ParserChecker to Nutchgora

2012-05-14 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274590#comment-13274590 ] Ferdy Galema commented on NUTCH-1367: - Hey Lewis, This tool is already present in

[jira] [Closed] (NUTCH-1366) speed up indexing by eliminating the indexreducer

2012-05-14 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1366. --- Resolution: Fixed committed speed up indexing by eliminating the indexreducer

[jira] [Commented] (NUTCH-879) URL-s getting lost

2012-05-23 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281459#comment-13281459 ] Ferdy Galema commented on NUTCH-879: Agree to fix this issue later. Although I could

[jira] [Created] (NUTCH-1378) HostDb NullPointerException

2012-05-23 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1378: --- Summary: HostDb NullPointerException Key: NUTCH-1378 URL: https://issues.apache.org/jira/browse/NUTCH-1378 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1378) HostDb NullPointerException

2012-05-23 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1378: Attachment: NUTCH-1378.patch HostDb NullPointerException ---

[jira] [Closed] (NUTCH-1378) HostDb NullPointerException

2012-05-23 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1378. --- Resolution: Fixed HostDb NullPointerException ---

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285510#comment-13285510 ] Ferdy Galema commented on NUTCH-1356: - I find it difficult to believe those exceptions

[jira] [Created] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1379: --- Summary: NPE when reprUrl is null in ParseUtil Key: NUTCH-1379 URL: https://issues.apache.org/jira/browse/NUTCH-1379 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1379: Attachment: NUTCH-1379.patch committed NPE when reprUrl is null in ParseUtil

[jira] [Reopened] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema reopened NUTCH-1379: - NPE when reprUrl is null in ParseUtil -

[jira] [Closed] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1379. --- Resolution: Fixed NPE when reprUrl is null in ParseUtil -

[jira] [Closed] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1379. --- Resolution: Fixed NPE when reprUrl is null in ParseUtil -

[jira] [Updated] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1379: Description: Sometimes reprUrl is null in ParseUtil. Exact cause is still fuzzy but this is a

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-06-12 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293510#comment-13293510 ] Ferdy Galema commented on NUTCH-1356: - Thanks. The parser threads you refer to, is

[jira] [Created] (NUTCH-1387) All parsers should respond to cancellation.

2012-06-12 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1387: --- Summary: All parsers should respond to cancellation. Key: NUTCH-1387 URL: https://issues.apache.org/jira/browse/NUTCH-1387 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1387) All parsers should respond to cancellation / interrupts.

2012-06-12 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1387: Component/s: parser Summary: All parsers should respond to cancellation / interrupts.

[jira] [Commented] (NUTCH-1342) Read time out protocol-http

2012-06-13 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294429#comment-13294429 ] Ferdy Galema commented on NUTCH-1342: - Do you have any clue as to why

[jira] [Updated] (NUTCH-1392) -force and -resume arguments being ignored in ParserJob

2012-06-14 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1392: Attachment: NUTCH-1392.patch -force and -resume arguments being ignored in ParserJob

[jira] [Commented] (NUTCH-1081) ant tests fail

2012-06-15 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295676#comment-13295676 ] Ferdy Galema commented on NUTCH-1081: - Yes this one should be closed.

[jira] [Created] (NUTCH-1411) nutchgora fetcher.store.content does not work

2012-06-27 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1411: --- Summary: nutchgora fetcher.store.content does not work Key: NUTCH-1411 URL: https://issues.apache.org/jira/browse/NUTCH-1411 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1306: Summary: Add option to not commit and clarify existing solr.commit.size (was: Commit after

[jira] [Commented] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406363#comment-13406363 ] Ferdy Galema commented on NUTCH-1306: - New option added solr.commit.index Defaults to

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406495#comment-13406495 ] Ferdy Galema commented on NUTCH-1360: - Sorry for the late response, but this issue is

[jira] [Comment Edited] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406495#comment-13406495 ] Ferdy Galema edited comment on NUTCH-1360 at 7/4/12 1:41 PM: -

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406561#comment-13406561 ] Ferdy Galema commented on NUTCH-1360: - Just one more thing: Should the IP not be

[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1306: Attachment: NUTCH-1306-trunk-v3.patch minor bug in prev. patch. uploaded v3 of trunk patch.

[jira] [Created] (NUTCH-1423) Remove unused fields in LanguageIndexingFilter

2012-07-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1423: --- Summary: Remove unused fields in LanguageIndexingFilter Key: NUTCH-1423 URL: https://issues.apache.org/jira/browse/NUTCH-1423 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1423) Remove unused fields in LanguageIndexingFilter

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1423: Attachment: NUTCH-1423.patch Remove unused fields in LanguageIndexingFilter

[jira] [Updated] (NUTCH-1424) fix fetcher timelimit logging

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1424: Attachment: NUTCH-1424.patch fix fetcher timelimit logging --

[jira] [Closed] (NUTCH-1424) fix fetcher timelimit logging

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1424. --- Resolution: Fixed Committed. fix fetcher timelimit logging

[jira] [Created] (NUTCH-1425) DbUpdaterJob declares PREV_SIGNATURE on input twice

2012-07-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1425: --- Summary: DbUpdaterJob declares PREV_SIGNATURE on input twice Key: NUTCH-1425 URL: https://issues.apache.org/jira/browse/NUTCH-1425 Project: Nutch Issue Type:

[jira] [Closed] (NUTCH-1425) DbUpdaterJob declares PREV_SIGNATURE on input twice

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1425. --- Resolution: Fixed Committed. DbUpdaterJob declares PREV_SIGNATURE on input twice

  1   2   >