[jira] [Commented] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once

2013-07-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723956#comment-13723956 ] Ferdy Galema commented on NUTCH-1457: - Hi, Thanks for submitting the patch. It seems

[jira] [Commented] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2013-01-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545748#comment-13545748 ] Ferdy Galema commented on NUTCH-1508: - Hi, Is this related to? https

[jira] [Comment Edited] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2013-01-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545748#comment-13545748 ] Ferdy Galema edited comment on NUTCH-1508 at 1/7/13 10:15 AM

[jira] [Commented] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2013-01-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545766#comment-13545766 ] Ferdy Galema commented on NUTCH-1508: - NUTCH-1431 (aka 'distance' concept) only

[jira] [Commented] (NUTCH-1495) -normalize and -filter for updatedb command in nutch 2.x

2012-11-19 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500895#comment-13500895 ] Ferdy Galema commented on NUTCH-1495: - Hi, Nice one! I took a glance at your patch

[jira] [Commented] (NUTCH-1484) TableUtil unreverseURL fails on file:// URLs

2012-11-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493820#comment-13493820 ] Ferdy Galema commented on NUTCH-1484: - Hi, I checked the patch (attached in NUTCH

[jira] [Commented] (NUTCH-1489) elasticindex should report the indexed documents like solrindex does

2012-11-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493829#comment-13493829 ] Ferdy Galema commented on NUTCH-1489: - Agree with Lewis, it seems there is already

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

2012-11-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493885#comment-13493885 ] Ferdy Galema commented on NUTCH-1370: - Hi, I checked the patch, it seems you

[jira] [Commented] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once

2012-11-08 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493565#comment-13493565 ] Ferdy Galema commented on NUTCH-1457: - There is a limited description of the Nutch2

[jira] [Commented] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once

2012-11-06 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491289#comment-13491289 ] Ferdy Galema commented on NUTCH-1457: - Hi, Not really because with a partial update

[jira] [Commented] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once

2012-10-08 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471480#comment-13471480 ] Ferdy Galema commented on NUTCH-1457: - Included effort is resolving the conflict

Re: Nutch 2.1 Release???

2012-09-17 Thread Ferdy Galema
2.1 sounds good! On Sun, Sep 16, 2012 at 12:14 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi, On Sat, Sep 15, 2012 at 10:38 PM, Markus Jelsma markus.jel...@openindex.io wrote: Trunk has some unresolved issues that are eligible for 1.6. Someone here can create a 1.7 version

[jira] [Resolved] (NUTCH-1468) Redirects that are external links not adhering to db.ignore.external.links

2012-09-17 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema resolved NUTCH-1468. - Resolution: Fixed Fix Version/s: 2.1 Committed @ Nutch2.x ref 1386526 Thanks

[jira] [Commented] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445849#comment-13445849 ] Ferdy Galema commented on NUTCH-1445: - Hi Matt, Sure we can resolve your issue here

[jira] [Commented] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445850#comment-13445850 ] Ferdy Galema commented on NUTCH-1445: - (feature requests should be future requests ofc

[jira] [Commented] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445871#comment-13445871 ] Ferdy Galema commented on NUTCH-1445: - Ah I got it now. It's definitely a bug. When

[jira] [Created] (NUTCH-1462) Elasticsearch not indexing when type==null in NutchDocument metadata

2012-08-31 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1462: --- Summary: Elasticsearch not indexing when type==null in NutchDocument metadata Key: NUTCH-1462 URL: https://issues.apache.org/jira/browse/NUTCH-1462 Project: Nutch

[jira] [Updated] (NUTCH-1462) Elasticsearch not indexing when type==null in NutchDocument metadata

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1462: Attachment: nutch-1462.patch Elasticsearch not indexing when type==null in NutchDocument

[jira] [Commented] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445878#comment-13445878 ] Ferdy Galema commented on NUTCH-1445: - Created NUTCH-1462 for a fix. For a quick-fix

[jira] [Closed] (NUTCH-1462) Elasticsearch not indexing when type==null in NutchDocument metadata

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1462. --- Resolution: Fixed committed Elasticsearch not indexing when type==null

[jira] [Created] (NUTCH-1463) Elasticsearch indexer should wait and check response for last flush

2012-08-31 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1463: --- Summary: Elasticsearch indexer should wait and check response for last flush Key: NUTCH-1463 URL: https://issues.apache.org/jira/browse/NUTCH-1463 Project: Nutch

[jira] [Updated] (NUTCH-1463) Elasticsearch indexer should wait and check response for last flush

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1463: Attachment: nutch-1463.patch Elasticsearch indexer should wait and check response for last

[jira] [Closed] (NUTCH-1463) Elasticsearch indexer should wait and check response for last flush

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1463. --- Resolution: Fixed committed. Elasticsearch indexer should wait and check response

[jira] [Closed] (NUTCH-1448) Redirected urls should be handled more cleanly (more like an outlink url)

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1448. --- Resolution: Fixed Committed. Redirected urls should be handled more cleanly (more

[jira] [Closed] (NUTCH-1431) Introduce link 'distance' and add configurable max distance in the generator

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1431. --- Resolution: Fixed committed Introduce link 'distance' and add configurable max

[jira] [Commented] (NUTCH-872) Change the default fetcher.parse to FALSE

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446511#comment-13446511 ] Ferdy Galema commented on NUTCH-872: Yes that is correct. Change

[jira] [Commented] (NUTCH-1448) Redirected urls should be handled more cleanly (more like an outlink url)

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446515#comment-13446515 ] Ferdy Galema commented on NUTCH-1448: - Yes it does show up as an outlink. About your

[jira] [Commented] (NUTCH-1461) Problem with TableUtil

2012-08-31 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446518#comment-13446518 ] Ferdy Galema commented on NUTCH-1461: - Added comment in NUTCH-1448

[jira] [Updated] (NUTCH-1448) Redirected urls should be handled more cleanly (more like an outlink url)

2012-08-28 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1448: Attachment: nutch-1448.txt Thank you for you interest Christian. This issue should indeed prevent

[jira] [Created] (NUTCH-1459) Remove dead code (phase2) from InjectorJob

2012-08-17 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1459: --- Summary: Remove dead code (phase2) from InjectorJob Key: NUTCH-1459 URL: https://issues.apache.org/jira/browse/NUTCH-1459 Project: Nutch Issue Type

Re: DbUpdateReducer could not mark it's batchid

2012-08-15 Thread Ferdy Galema
Hi, This bug was already remarked some posts ago on the mailing list, but thanks anyway for reporting. I have created issue for keeping track: https://issues.apache.org/jira/browse/NUTCH-1456 Ferdy. On Wed, Aug 15, 2012 at 1:59 PM, lin weijian linweiji...@gmail.com wrote: Hi,

[jira] [Commented] (NUTCH-1442) indexingfilter.order is property is misread in code

2012-08-14 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433955#comment-13433955 ] Ferdy Galema commented on NUTCH-1442: - Thanks. Looks fine. Assertions should

[jira] [Commented] (NUTCH-1444) Indexing should not create temporary files (do not extend from FileOutputFormat)

2012-08-07 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429986#comment-13429986 ] Ferdy Galema commented on NUTCH-1444: - Just to add: The following exception is fixed

hadoop.job.history.user.location in nutch-default with CDH rendering job history useless

2012-08-07 Thread Ferdy Galema
Hi, There still is a property in nutch-default 'hadoop.job.history.user.location' that redirects the creation of history files from job output locations to a custom location. I noticed that the current value does not work well with CDH, because ${hadoop.log.dir} is not defined. This actually

[jira] [Created] (NUTCH-1446) Port NUTCH-1444 to trunk (Indexing should not create temporary files)

2012-08-06 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1446: --- Summary: Port NUTCH-1444 to trunk (Indexing should not create temporary files) Key: NUTCH-1446 URL: https://issues.apache.org/jira/browse/NUTCH-1446 Project: Nutch

[jira] [Updated] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-03 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1445: Attachment: NUTCH-1445-addPropsToConfig.patch Final addition that adds the properties to nutch

[jira] [Closed] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-03 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1445. --- Resolution: Fixed Add ElasticIndexerJob that indexes to elasticsearch

[jira] [Created] (NUTCH-1438) ParserJob support for option -reparse

2012-07-26 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1438: --- Summary: ParserJob support for option -reparse Key: NUTCH-1438 URL: https://issues.apache.org/jira/browse/NUTCH-1438 Project: Nutch Issue Type: New Feature

[jira] [Updated] (NUTCH-1437) HostInjectorJob to accept lines with or without protocol

2012-07-25 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1437: Attachment: NUTCH-1437.patch HostInjectorJob to accept lines with or without protocol

[jira] [Closed] (NUTCH-1437) HostInjectorJob to accept lines with or without protocol

2012-07-25 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1437. --- Resolution: Cannot Reproduce committed HostInjectorJob to accept lines

[jira] [Reopened] (NUTCH-1437) HostInjectorJob to accept lines with or without protocol

2012-07-25 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema reopened NUTCH-1437: - HostInjectorJob to accept lines with or without protocol

[jira] [Closed] (NUTCH-1437) HostInjectorJob to accept lines with or without protocol

2012-07-25 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1437. --- Resolution: Fixed reopening/closing to set correct resolve status (FIXED

[jira] [Updated] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-07-20 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1365: Attachment: NUTCH-1365-v3.patch Small improvement of the patch by showing the crawlId name

[jira] [Created] (NUTCH-1432) property storage.schema does not work anymore, should be storage.schema.webpage and storage.schema.host

2012-07-19 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1432: --- Summary: property storage.schema does not work anymore, should be storage.schema.webpage and storage.schema.host Key: NUTCH-1432 URL: https://issues.apache.org/jira/browse/NUTCH

[jira] [Updated] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-07-19 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1365: Attachment: NUTCH-1365-v2.patch Updated patch for new version of GORA-150. Fix

[jira] [Created] (NUTCH-1431) Introduce link 'distance' and add configurable max distance in the generator

2012-07-18 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1431: --- Summary: Introduce link 'distance' and add configurable max distance in the generator Key: NUTCH-1431 URL: https://issues.apache.org/jira/browse/NUTCH-1431 Project

[jira] [Updated] (NUTCH-1431) Introduce link 'distance' and add configurable max distance in the generator

2012-07-18 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1431: Attachment: NUTCH-1431.patch Introduce link 'distance' and add configurable max distance

[jira] [Updated] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-07-18 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1365: Attachment: (was: NUTCH-1365.patch) Fix crawlId functionalilty by making using of new

[jira] [Updated] (NUTCH-1365) Fix crawlId functionalilty by making using of new gora configuration

2012-07-18 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1365: Attachment: NUTCH-1365.patch The updated patch. (Because of the splitting up of the corresponding

[jira] [Created] (NUTCH-1427) Reuse SelectorEntry in Generator.

2012-07-10 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1427: --- Summary: Reuse SelectorEntry in Generator. Key: NUTCH-1427 URL: https://issues.apache.org/jira/browse/NUTCH-1427 Project: Nutch Issue Type: Improvement

[jira] [Closed] (NUTCH-1427) Reuse SelectorEntry in Generator.

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1427. --- Resolution: Fixed Reuse SelectorEntry in Generator

[jira] [Updated] (NUTCH-1427) Reuse SelectorEntry in Generator.

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1427: Attachment: NUTCH-1427.patch Committed patch. Reuse SelectorEntry in Generator

[jira] [Created] (NUTCH-1428) GeneratorMapper should not initialize filters/normalizers when they are disabled

2012-07-10 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1428: --- Summary: GeneratorMapper should not initialize filters/normalizers when they are disabled Key: NUTCH-1428 URL: https://issues.apache.org/jira/browse/NUTCH-1428 Project

[jira] [Updated] (NUTCH-1428) GeneratorMapper should not initialize filters/normalizers when they are disabled

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1428: Attachment: NUTCH-1428.patch GeneratorMapper should not initialize filters/normalizers when

[jira] [Closed] (NUTCH-1428) GeneratorMapper should not initialize filters/normalizers when they are disabled

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1428. --- Resolution: Fixed committed. GeneratorMapper should not initialize filters

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-10 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410884#comment-13410884 ] Ferdy Galema commented on NUTCH-1360: - Thanks! Keep up the good work

Re: [PROPOSAL] Rename branch nutchgora into 2.x

2012-07-09 Thread Ferdy Galema
+1 Makes sense. On Mon, Jul 9, 2012 at 12:37 PM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Guys, Now that we've released 2.0, wouldn't it be better to rename the 'nutchgora' branch into something like 'branch-2.x'? Any thoughts on this? Julien -- * *Open Source Solutions for

[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1306: Attachment: NUTCH-1306-trunk-v3.patch minor bug in prev. patch. uploaded v3 of trunk patch

[jira] [Created] (NUTCH-1423) Remove unused fields in LanguageIndexingFilter

2012-07-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1423: --- Summary: Remove unused fields in LanguageIndexingFilter Key: NUTCH-1423 URL: https://issues.apache.org/jira/browse/NUTCH-1423 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1423) Remove unused fields in LanguageIndexingFilter

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1423: Attachment: NUTCH-1423.patch Remove unused fields in LanguageIndexingFilter

[jira] [Updated] (NUTCH-1424) fix fetcher timelimit logging

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1424: Attachment: NUTCH-1424.patch fix fetcher timelimit logging

[jira] [Closed] (NUTCH-1424) fix fetcher timelimit logging

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1424. --- Resolution: Fixed Committed. fix fetcher timelimit logging

[jira] [Created] (NUTCH-1425) DbUpdaterJob declares PREV_SIGNATURE on input twice

2012-07-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1425: --- Summary: DbUpdaterJob declares PREV_SIGNATURE on input twice Key: NUTCH-1425 URL: https://issues.apache.org/jira/browse/NUTCH-1425 Project: Nutch Issue Type

[jira] [Closed] (NUTCH-1425) DbUpdaterJob declares PREV_SIGNATURE on input twice

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1425. --- Resolution: Fixed Committed. DbUpdaterJob declares PREV_SIGNATURE on input twice

[jira] [Updated] (NUTCH-1425) DbUpdaterJob declares PREV_SIGNATURE on input twice

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1425: Attachment: NUTCH-1425.patch DbUpdaterJob declares PREV_SIGNATURE on input twice

[jira] [Commented] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409366#comment-13409366 ] Ferdy Galema commented on NUTCH-1306: - Committed in trunk and nutchgora. Thanks anyone

[jira] [Resolved] (NUTCH-1025) Add option not to commit to Solr

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema resolved NUTCH-1025. - Resolution: Fixed Fixed per NUTCH-1306. Add option not to commit to Solr

[jira] [Updated] (NUTCH-1426) HostDb close() should close store instead of flush

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1426: Attachment: NUTCH-1426.patch HostDb close() should close store instead of flush

[jira] [Closed] (NUTCH-1426) HostDb close() should close store instead of flush

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1426. --- Resolution: Fixed Fix Version/s: 2.1 Committed. HostDb close() should close

[jira] [Created] (NUTCH-1426) HostDb close() should close store instead of flush

2012-07-09 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1426: --- Summary: HostDb close() should close store instead of flush Key: NUTCH-1426 URL: https://issues.apache.org/jira/browse/NUTCH-1426 Project: Nutch Issue Type

[jira] [Commented] (NUTCH-1411) nutchgora fetcher.store.content does not work

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409395#comment-13409395 ] Ferdy Galema commented on NUTCH-1411: - +1 Nice and clean implementation. Tested

[jira] [Closed] (NUTCH-628) Host database to keep track of host-level information

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-628. -- Resolution: Duplicate This one should be closed as it is already implemented by various related issues

[jira] [Closed] (NUTCH-1411) nutchgora fetcher.store.content does not work

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1411. --- nutchgora fetcher.store.content does not work

[jira] [Resolved] (NUTCH-1411) nutchgora fetcher.store.content does not work

2012-07-09 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema resolved NUTCH-1411. - Resolution: Fixed Committed. Thanks Alexander for the patch. nutchgora

[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1306: Summary: Add option to not commit and clarify existing solr.commit.size (was: Commit after

[jira] [Commented] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406363#comment-13406363 ] Ferdy Galema commented on NUTCH-1306: - New option added solr.commit.index Defaults

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406495#comment-13406495 ] Ferdy Galema commented on NUTCH-1360: - Sorry for the late response, but this issue

[jira] [Comment Edited] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406495#comment-13406495 ] Ferdy Galema edited comment on NUTCH-1360 at 7/4/12 1:41 PM

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-04 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406561#comment-13406561 ] Ferdy Galema commented on NUTCH-1360: - Just one more thing: Should the IP

[jira] [Created] (NUTCH-1411) nutchgora fetcher.store.content does not work

2012-06-27 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1411: --- Summary: nutchgora fetcher.store.content does not work Key: NUTCH-1411 URL: https://issues.apache.org/jira/browse/NUTCH-1411 Project: Nutch Issue Type: Bug

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-06-27 Thread Ferdy Galema
+1 Crawling with HBaseStore works from injecting to indexing. Great work Lewis. On Mon, Jun 25, 2012 at 6:32 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Everyone, A candidate for the Apache Nutch 2.0 RC3 is available at:

Re: [VOTE] Apache Nutch 2.0 RC2

2012-06-18 Thread Ferdy Galema
Hi, Tested it with HBase but there is a slight issue with the dependencies. After building rc2 with ivy-enabled HBase, it seems a test HBase jar is deployed in local/lib, even though it's called hbase-0.90.4.jar. (I do not know yet how this is caused!) But since a user should have a separate

Re: [VOTE] Apache Nutch 2.0 RC2

2012-06-18 Thread Ferdy Galema
(dependency wise for example), let alone multiple versions. Sebastian Thanks, Ferdy. On 06/18/2012 12:27 PM, Ferdy Galema wrote: Hi, Tested it with HBase but there is a slight issue with the dependencies. After building rc2 with ivy-enabled HBase, it seems a test HBase jar is deployed

Re: VOTE Apache Nutch 2.0 RC1

2012-06-15 Thread Ferdy Galema
Agree with only releasing src. On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Or just not ship a bin release at all. Src is the only thing we really VOTE on legally though bin is provided for convenience purposes. Will type more on this

[jira] [Commented] (NUTCH-1081) ant tests fail

2012-06-15 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295676#comment-13295676 ] Ferdy Galema commented on NUTCH-1081: - Yes this one should be closed

[jira] [Updated] (NUTCH-1392) -force and -resume arguments being ignored in ParserJob

2012-06-14 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1392: Attachment: NUTCH-1392.patch -force and -resume arguments being ignored in ParserJob

Re: VOTE Apache Nutch 2.0 RC1

2012-06-14 Thread Ferdy Galema
Maybe just 1392? I went ahead and made a patch that should fix this. Feel free to commit or ignore prior to RC2. On Thu, Jun 14, 2012 at 1:44 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Sebastian, On Wed, Jun 13, 2012 at 11:30 PM, Sebastian Nagel wastl.na...@googlemail.com

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Ferdy Galema
Findings about Nutch-2.0 RC 1. The Nutch job jar is not present in the binary archive. This means distributed running of jobs is not supported. I'm not sure if this is a problem (since users can always build one themselves), merely pointing it out. The recently released 1.5 also lacks this job

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Ferdy Galema
:00 AM, Ferdy Galema ferdy.gal...@kalooga.comwrote: Findings about Nutch-2.0 RC 1. The Nutch job jar is not present in the binary archive. This means distributed running of jobs is not supported. I'm not sure if this is a problem (since users can always build one themselves), merely pointing

Re: Suitable Nutch 2.0 Project Description

2012-06-13 Thread Ferdy Galema
Hi, I would remove the 'experimental' notion. Aside from that it's fine with me. Ferdy. On Wed, Jun 13, 2012 at 2:29 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi, Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask about a suitable project descriptor.

[jira] [Commented] (NUTCH-1342) Read time out protocol-http

2012-06-13 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294429#comment-13294429 ] Ferdy Galema commented on NUTCH-1342: - Do you have any clue as to why protocol

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-06-12 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293510#comment-13293510 ] Ferdy Galema commented on NUTCH-1356: - Thanks. The parser threads you refer

[jira] [Created] (NUTCH-1387) All parsers should respond to cancellation.

2012-06-12 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1387: --- Summary: All parsers should respond to cancellation. Key: NUTCH-1387 URL: https://issues.apache.org/jira/browse/NUTCH-1387 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1387) All parsers should respond to cancellation / interrupts.

2012-06-12 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1387: Component/s: parser Summary: All parsers should respond to cancellation / interrupts

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285510#comment-13285510 ] Ferdy Galema commented on NUTCH-1356: - I find it difficult to believe those exceptions

Re: stackoverflow / stackexchange for user problems

2012-05-30 Thread Ferdy Galema
Hi, Sure no problem I was just polling some opinions and past experiences. We'll have to see what works out best. Thanks. On Tue, May 29, 2012 at 9:43 PM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Hi Is there any experience with using stackoverflow or stackexchange for solving

[jira] [Created] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1379: --- Summary: NPE when reprUrl is null in ParseUtil Key: NUTCH-1379 URL: https://issues.apache.org/jira/browse/NUTCH-1379 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1379: Attachment: NUTCH-1379.patch committed NPE when reprUrl is null in ParseUtil

[jira] [Reopened] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema reopened NUTCH-1379: - NPE when reprUrl is null in ParseUtil

[jira] [Closed] (NUTCH-1379) NPE when reprUrl is null in ParseUtil

2012-05-30 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1379. --- Resolution: Fixed NPE when reprUrl is null in ParseUtil

  1   2   3   4   >