[jira] [Commented] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once

2013-07-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723956#comment-13723956 ] Ferdy Galema commented on NUTCH-1457: - Hi, Thanks for submitting the patch. It seems

[jira] [Commented] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2013-01-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545748#comment-13545748 ] Ferdy Galema commented on NUTCH-1508: - Hi, Is this related to?

[jira] [Comment Edited] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2013-01-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545748#comment-13545748 ] Ferdy Galema edited comment on NUTCH-1508 at 1/7/13 10:15 AM: --

[jira] [Commented] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2013-01-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545766#comment-13545766 ] Ferdy Galema commented on NUTCH-1508: - NUTCH-1431 (aka 'distance' concept) only

[jira] [Commented] (NUTCH-1495) -normalize and -filter for updatedb command in nutch 2.x

2012-11-19 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500895#comment-13500895 ] Ferdy Galema commented on NUTCH-1495: - Hi, Nice one! I took a glance at your patch

[jira] [Commented] (NUTCH-1484) TableUtil unreverseURL fails on file:// URLs

2012-11-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493820#comment-13493820 ] Ferdy Galema commented on NUTCH-1484: - Hi, I checked the patch (attached in

[jira] [Commented] (NUTCH-1489) elasticindex should report the indexed documents like solrindex does

2012-11-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493829#comment-13493829 ] Ferdy Galema commented on NUTCH-1489: - Agree with Lewis, it seems there is already

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

2012-11-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493885#comment-13493885 ] Ferdy Galema commented on NUTCH-1370: - Hi, I checked the patch, it seems you are

[jira] [Commented] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once

2012-11-08 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493565#comment-13493565 ] Ferdy Galema commented on NUTCH-1457: - There is a limited description of the Nutch2

[jira] [Commented] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once

2012-11-06 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491289#comment-13491289 ] Ferdy Galema commented on NUTCH-1457: - Hi, Not really because with a partial update

[jira] [Commented] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once

2012-10-08 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471480#comment-13471480 ] Ferdy Galema commented on NUTCH-1457: - Included effort is resolving the conflict of

[jira] [Resolved] (NUTCH-1468) Redirects that are external links not adhering to db.ignore.external.links

2012-09-17 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema resolved NUTCH-1468. - Resolution: Fixed Fix Version/s: 2.1 Committed @ Nutch2.x ref 1386526 Thanks for the

[jira] [Commented] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445849#comment-13445849 ] Ferdy Galema commented on NUTCH-1445: - Hi Matt, Sure we can resolve your issue here.

[jira] [Commented] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445850#comment-13445850 ] Ferdy Galema commented on NUTCH-1445: - (feature requests should be future requests

[jira] [Commented] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445871#comment-13445871 ] Ferdy Galema commented on NUTCH-1445: - Ah I got it now. It's definitely a bug. When

[jira] [Created] (NUTCH-1462) Elasticsearch not indexing when type==null in NutchDocument metadata

2012-08-31 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1462: --- Summary: Elasticsearch not indexing when type==null in NutchDocument metadata Key: NUTCH-1462 URL: https://issues.apache.org/jira/browse/NUTCH-1462 Project: Nutch

[jira] [Updated] (NUTCH-1462) Elasticsearch not indexing when type==null in NutchDocument metadata

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1462: Attachment: nutch-1462.patch Elasticsearch not indexing when type==null in NutchDocument

[jira] [Commented] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445878#comment-13445878 ] Ferdy Galema commented on NUTCH-1445: - Created NUTCH-1462 for a fix. For a quick-fix

[jira] [Closed] (NUTCH-1462) Elasticsearch not indexing when type==null in NutchDocument metadata

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1462. --- Resolution: Fixed committed Elasticsearch not indexing when type==null in

[jira] [Created] (NUTCH-1463) Elasticsearch indexer should wait and check response for last flush

2012-08-31 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1463: --- Summary: Elasticsearch indexer should wait and check response for last flush Key: NUTCH-1463 URL: https://issues.apache.org/jira/browse/NUTCH-1463 Project: Nutch

[jira] [Updated] (NUTCH-1463) Elasticsearch indexer should wait and check response for last flush

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1463: Attachment: nutch-1463.patch Elasticsearch indexer should wait and check response for last

[jira] [Closed] (NUTCH-1463) Elasticsearch indexer should wait and check response for last flush

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1463. --- Resolution: Fixed committed. Elasticsearch indexer should wait and check response

[jira] [Closed] (NUTCH-1448) Redirected urls should be handled more cleanly (more like an outlink url)

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1448. --- Resolution: Fixed Committed. Redirected urls should be handled more cleanly (more

[jira] [Closed] (NUTCH-1431) Introduce link 'distance' and add configurable max distance in the generator

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1431. --- Resolution: Fixed committed Introduce link 'distance' and add configurable max

[jira] [Commented] (NUTCH-872) Change the default fetcher.parse to FALSE

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446511#comment-13446511 ] Ferdy Galema commented on NUTCH-872: Yes that is correct. Change the

[jira] [Commented] (NUTCH-1448) Redirected urls should be handled more cleanly (more like an outlink url)

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446515#comment-13446515 ] Ferdy Galema commented on NUTCH-1448: - Yes it does show up as an outlink. About your

[jira] [Commented] (NUTCH-1461) Problem with TableUtil

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446518#comment-13446518 ] Ferdy Galema commented on NUTCH-1461: - Added comment in NUTCH-1448.

[jira] [Updated] (NUTCH-1448) Redirected urls should be handled more cleanly (more like an outlink url)

2012-08-28 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1448: Attachment: nutch-1448.txt Thank you for you interest Christian. This issue should indeed prevent

[jira] [Created] (NUTCH-1459) Remove dead code (phase2) from InjectorJob

2012-08-17 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1459: --- Summary: Remove dead code (phase2) from InjectorJob Key: NUTCH-1459 URL: https://issues.apache.org/jira/browse/NUTCH-1459 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1442) indexingfilter.order is property is misread in code

2012-08-14 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433955#comment-13433955 ] Ferdy Galema commented on NUTCH-1442: - Thanks. Looks fine. Assertions should not

[jira] [Commented] (NUTCH-1444) Indexing should not create temporary files (do not extend from FileOutputFormat)

2012-08-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429986#comment-13429986 ] Ferdy Galema commented on NUTCH-1444: - Just to add: The following exception is fixed

[jira] [Created] (NUTCH-1446) Port NUTCH-1444 to trunk (Indexing should not create temporary files)

2012-08-06 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1446: --- Summary: Port NUTCH-1444 to trunk (Indexing should not create temporary files) Key: NUTCH-1446 URL: https://issues.apache.org/jira/browse/NUTCH-1446 Project: Nutch

[jira] [Updated] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-03 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1445: Attachment: NUTCH-1445-addPropsToConfig.patch Final addition that adds the properties to

[jira] [Closed] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-03 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1445. --- Resolution: Fixed Add ElasticIndexerJob that indexes to elasticsearch

[jira] [Created] (NUTCH-1438) ParserJob support for option -reparse

2012-07-26 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1438: --- Summary: ParserJob support for option -reparse Key: NUTCH-1438 URL: https://issues.apache.org/jira/browse/NUTCH-1438 Project: Nutch Issue Type: New Feature

[jira] [Updated] (NUTCH-1437) HostInjectorJob to accept lines with or without protocol

2012-07-25 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1437: Attachment: NUTCH-1437.patch HostInjectorJob to accept lines with or without protocol

[jira] [Closed] (NUTCH-1437) HostInjectorJob to accept lines with or without protocol

2012-07-25 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1437. --- Resolution: Cannot Reproduce committed HostInjectorJob to accept lines with or

[jira] [Reopened] (NUTCH-1437) HostInjectorJob to accept lines with or without protocol

2012-07-25 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema reopened NUTCH-1437: - HostInjectorJob to accept lines with or without protocol

[jira] [Closed] (NUTCH-1437) HostInjectorJob to accept lines with or without protocol

2012-07-25 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1437. --- Resolution: Fixed reopening/closing to set correct resolve status (FIXED).

[jira] [Updated] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-07-20 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1365: Attachment: NUTCH-1365-v3.patch Small improvement of the patch by showing the crawlId name in the

[jira] [Created] (NUTCH-1432) property storage.schema does not work anymore, should be storage.schema.webpage and storage.schema.host

2012-07-19 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1432: --- Summary: property storage.schema does not work anymore, should be storage.schema.webpage and storage.schema.host Key: NUTCH-1432 URL:

[jira] [Updated] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-07-19 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1365: Attachment: NUTCH-1365-v2.patch Updated patch for new version of GORA-150. Fix

[jira] [Created] (NUTCH-1431) Introduce link 'distance' and add configurable max distance in the generator

2012-07-18 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1431: --- Summary: Introduce link 'distance' and add configurable max distance in the generator Key: NUTCH-1431 URL: https://issues.apache.org/jira/browse/NUTCH-1431 Project:

[jira] [Updated] (NUTCH-1431) Introduce link 'distance' and add configurable max distance in the generator

2012-07-18 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1431: Attachment: NUTCH-1431.patch Introduce link 'distance' and add configurable max distance in

[jira] [Updated] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-07-18 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1365: Attachment: (was: NUTCH-1365.patch) Fix crawlId functionalilty by making using of new

[jira] [Updated] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-07-18 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1365: Attachment: NUTCH-1365.patch The updated patch. (Because of the splitting up of the corresponding

[jira] [Created] (NUTCH-1427) Reuse SelectorEntry in Generator.

2012-07-10 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1427: --- Summary: Reuse SelectorEntry in Generator. Key: NUTCH-1427 URL: https://issues.apache.org/jira/browse/NUTCH-1427 Project: Nutch Issue Type: Improvement

[jira] [Closed] (NUTCH-1427) Reuse SelectorEntry in Generator.

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1427. --- Resolution: Fixed Reuse SelectorEntry in Generator. -

[jira] [Updated] (NUTCH-1427) Reuse SelectorEntry in Generator.

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1427: Attachment: NUTCH-1427.patch Committed patch. Reuse SelectorEntry in Generator.

[jira] [Created] (NUTCH-1428) GeneratorMapper should not initialize filters/normalizers when they are disabled

2012-07-10 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1428: --- Summary: GeneratorMapper should not initialize filters/normalizers when they are disabled Key: NUTCH-1428 URL: https://issues.apache.org/jira/browse/NUTCH-1428

[jira] [Updated] (NUTCH-1428) GeneratorMapper should not initialize filters/normalizers when they are disabled

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1428: Attachment: NUTCH-1428.patch GeneratorMapper should not initialize filters/normalizers when

[jira] [Closed] (NUTCH-1428) GeneratorMapper should not initialize filters/normalizers when they are disabled

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1428. --- Resolution: Fixed committed. GeneratorMapper should not initialize

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410884#comment-13410884 ] Ferdy Galema commented on NUTCH-1360: - Thanks! Keep up the good work!

[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1306: Attachment: NUTCH-1306-trunk-v3.patch minor bug in prev. patch. uploaded v3 of trunk patch.

[jira] [Created] (NUTCH-1423) Remove unused fields in LanguageIndexingFilter

2012-07-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1423: --- Summary: Remove unused fields in LanguageIndexingFilter Key: NUTCH-1423 URL: https://issues.apache.org/jira/browse/NUTCH-1423 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1423) Remove unused fields in LanguageIndexingFilter

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1423: Attachment: NUTCH-1423.patch Remove unused fields in LanguageIndexingFilter

[jira] [Updated] (NUTCH-1424) fix fetcher timelimit logging

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1424: Attachment: NUTCH-1424.patch fix fetcher timelimit logging --

[jira] [Closed] (NUTCH-1424) fix fetcher timelimit logging

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1424. --- Resolution: Fixed Committed. fix fetcher timelimit logging

[jira] [Created] (NUTCH-1425) DbUpdaterJob declares PREV_SIGNATURE on input twice

2012-07-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1425: --- Summary: DbUpdaterJob declares PREV_SIGNATURE on input twice Key: NUTCH-1425 URL: https://issues.apache.org/jira/browse/NUTCH-1425 Project: Nutch Issue Type:

[jira] [Closed] (NUTCH-1425) DbUpdaterJob declares PREV_SIGNATURE on input twice

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1425. --- Resolution: Fixed Committed. DbUpdaterJob declares PREV_SIGNATURE on input twice

[jira] [Updated] (NUTCH-1425) DbUpdaterJob declares PREV_SIGNATURE on input twice

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1425: Attachment: NUTCH-1425.patch DbUpdaterJob declares PREV_SIGNATURE on input twice

[jira] [Commented] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409366#comment-13409366 ] Ferdy Galema commented on NUTCH-1306: - Committed in trunk and nutchgora. Thanks anyone

[jira] [Resolved] (NUTCH-1025) Add option not to commit to Solr

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema resolved NUTCH-1025. - Resolution: Fixed Fixed per NUTCH-1306. Add option not to commit to Solr

[jira] [Updated] (NUTCH-1426) HostDb close() should close store instead of flush

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1426: Attachment: NUTCH-1426.patch HostDb close() should close store instead of flush

[jira] [Closed] (NUTCH-1426) HostDb close() should close store instead of flush

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1426. --- Resolution: Fixed Fix Version/s: 2.1 Committed. HostDb close() should close

[jira] [Created] (NUTCH-1426) HostDb close() should close store instead of flush

2012-07-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1426: --- Summary: HostDb close() should close store instead of flush Key: NUTCH-1426 URL: https://issues.apache.org/jira/browse/NUTCH-1426 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1411) nutchgora fetcher.store.content does not work

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409395#comment-13409395 ] Ferdy Galema commented on NUTCH-1411: - +1 Nice and clean implementation. Tested with

[jira] [Closed] (NUTCH-628) Host database to keep track of host-level information

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-628. -- Resolution: Duplicate This one should be closed as it is already implemented by various related

[jira] [Closed] (NUTCH-1411) nutchgora fetcher.store.content does not work

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1411. --- nutchgora fetcher.store.content does not work -

[jira] [Resolved] (NUTCH-1411) nutchgora fetcher.store.content does not work

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema resolved NUTCH-1411. - Resolution: Fixed Committed. Thanks Alexander for the patch. nutchgora

[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1306: Summary: Add option to not commit and clarify existing solr.commit.size (was: Commit after

[jira] [Commented] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406363#comment-13406363 ] Ferdy Galema commented on NUTCH-1306: - New option added solr.commit.index Defaults to

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406495#comment-13406495 ] Ferdy Galema commented on NUTCH-1360: - Sorry for the late response, but this issue is

[jira] [Comment Edited] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406495#comment-13406495 ] Ferdy Galema edited comment on NUTCH-1360 at 7/4/12 1:41 PM: -

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406561#comment-13406561 ] Ferdy Galema commented on NUTCH-1360: - Just one more thing: Should the IP not be

[jira] [Created] (NUTCH-1411) nutchgora fetcher.store.content does not work

2012-06-27 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1411: --- Summary: nutchgora fetcher.store.content does not work Key: NUTCH-1411 URL: https://issues.apache.org/jira/browse/NUTCH-1411 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-1081) ant tests fail

2012-06-15 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295676#comment-13295676 ] Ferdy Galema commented on NUTCH-1081: - Yes this one should be closed.

[jira] [Updated] (NUTCH-1392) -force and -resume arguments being ignored in ParserJob

2012-06-14 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1392: Attachment: NUTCH-1392.patch -force and -resume arguments being ignored in ParserJob

[jira] [Commented] (NUTCH-1342) Read time out protocol-http

2012-06-13 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294429#comment-13294429 ] Ferdy Galema commented on NUTCH-1342: - Do you have any clue as to why

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-06-12 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293510#comment-13293510 ] Ferdy Galema commented on NUTCH-1356: - Thanks. The parser threads you refer to, is

[jira] [Created] (NUTCH-1387) All parsers should respond to cancellation.

2012-06-12 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1387: --- Summary: All parsers should respond to cancellation. Key: NUTCH-1387 URL: https://issues.apache.org/jira/browse/NUTCH-1387 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1387) All parsers should respond to cancellation / interrupts.

2012-06-12 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1387: Component/s: parser Summary: All parsers should respond to cancellation / interrupts.

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285510#comment-13285510 ] Ferdy Galema commented on NUTCH-1356: - I find it difficult to believe those exceptions

[jira] [Created] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1379: --- Summary: NPE when reprUrl is null in ParseUtil Key: NUTCH-1379 URL: https://issues.apache.org/jira/browse/NUTCH-1379 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1379: Attachment: NUTCH-1379.patch committed NPE when reprUrl is null in ParseUtil

[jira] [Reopened] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema reopened NUTCH-1379: - NPE when reprUrl is null in ParseUtil -

[jira] [Closed] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1379. --- Resolution: Fixed NPE when reprUrl is null in ParseUtil -

[jira] [Closed] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1379. --- Resolution: Fixed NPE when reprUrl is null in ParseUtil -

[jira] [Updated] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1379: Description: Sometimes reprUrl is null in ParseUtil. Exact cause is still fuzzy but this is a

[jira] [Commented] (NUTCH-879) URL-s getting lost

2012-05-23 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281459#comment-13281459 ] Ferdy Galema commented on NUTCH-879: Agree to fix this issue later. Although I could

[jira] [Created] (NUTCH-1378) HostDb NullPointerException

2012-05-23 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1378: --- Summary: HostDb NullPointerException Key: NUTCH-1378 URL: https://issues.apache.org/jira/browse/NUTCH-1378 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1378) HostDb NullPointerException

2012-05-23 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1378: Attachment: NUTCH-1378.patch HostDb NullPointerException ---

[jira] [Closed] (NUTCH-1378) HostDb NullPointerException

2012-05-23 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1378. --- Resolution: Fixed HostDb NullPointerException ---

[jira] [Commented] (NUTCH-1367) Port ParserChecker to Nutchgora

2012-05-14 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274590#comment-13274590 ] Ferdy Galema commented on NUTCH-1367: - Hey Lewis, This tool is already present in

[jira] [Closed] (NUTCH-1366) speed up indexing by eliminating the indexreducer

2012-05-14 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1366. --- Resolution: Fixed committed speed up indexing by eliminating the indexreducer

[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr index

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1306: Attachment: NUTCH-1306-trunk-v2.patch Heh indeed that's not ready for committing yet. Weird though

[jira] [Updated] (NUTCH-1362) Fix error handling of urls with empty fields

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1362: Attachment: NUTCH-1362.patch Hey Lewis, This patches fixes the problem and makes the reversing a

[jira] [Commented] (NUTCH-1362) Fix error handling of urls with empty fields

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273129#comment-13273129 ] Ferdy Galema commented on NUTCH-1362: - Btw this is a duplicate of NUTCH-1077.

[jira] [Closed] (NUTCH-1077) Nutch 2 DbUpdateMapper throws ArrayOutOfBoundsException when running update

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1077. --- Resolution: Duplicate Fix Version/s: (was: 2.1) Will be fixed with NUTCH-1362. (Use

[jira] [Closed] (NUTCH-1362) Fix error handling of urls with empty fields

2012-05-11 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1362. --- Resolution: Fixed Done! Thanks. Fix error handling of urls with empty fields

  1   2   >