[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2012-04-18 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256398#comment-13256398 ] Ferdy Galema commented on NUTCH-1314: - I assume you mean an URLFilter? Or do you want

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2012-04-18 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256431#comment-13256431 ] Ferdy Galema commented on NUTCH-1314: - I understand. I think the problem with

[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-04-06 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248223#comment-13248223 ] Ferdy Galema commented on NUTCH-1253: - Wow this issue keeps getting more and more

[jira] [Commented] (NUTCH-1311) Add response headers to datastore for the protocol-httpclient plugin

2012-03-16 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231102#comment-13231102 ] Ferdy Galema commented on NUTCH-1311: - This is indeed useful. I can commit this when I

[jira] [Commented] (NUTCH-1311) Add response headers to datastore for the protocol-httpclient plugin

2012-03-16 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231140#comment-13231140 ] Ferdy Galema commented on NUTCH-1311: - Tested and committed at Nutchgora branch.

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2012-03-16 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231373#comment-13231373 ] Ferdy Galema commented on NUTCH-1314: - Good one, I overlooked those but they should

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-03-13 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228301#comment-13228301 ] Ferdy Galema commented on NUTCH-902: Committed change to the gora-hbase line in ivy:

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-03-13 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228303#comment-13228303 ] Ferdy Galema commented on NUTCH-902: Just made a second commit regarding gora-hbase:

[jira] [Commented] (NUTCH-1278) Fetch Improvement in threads per host

2012-03-08 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225545#comment-13225545 ] Ferdy Galema commented on NUTCH-1278: - I noticed you used the diff command this time,

[jira] [Commented] (NUTCH-1298) Pass numTasks to FetcherJob

2012-03-06 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223105#comment-13223105 ] Ferdy Galema commented on NUTCH-1298: - I'm not certain what the status and support of

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-03-06 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223132#comment-13223132 ] Ferdy Galema commented on NUTCH-902: I just committed a minor change to the sql

[jira] [Commented] (NUTCH-1302) nutchgora job failures should be noticed by submitter

2012-03-06 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223231#comment-13223231 ] Ferdy Galema commented on NUTCH-1302: - NutchJob is a nice wrapper of Hadoop's Job, so

[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-03-05 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1314#comment-1314 ] Ferdy Galema commented on NUTCH-1289: - This is a showstopper for the upcoming release.

[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-03-05 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222324#comment-13222324 ] Ferdy Galema commented on NUTCH-1289: - Committed. Dan, could you verify this issue

[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-03-05 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222404#comment-13222404 ] Ferdy Galema commented on NUTCH-1253: - It indeed seems broken for trunk. When running

[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-03-02 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220984#comment-13220984 ] Ferdy Galema commented on NUTCH-1253: - I'll give this one a go..

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-03-01 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220097#comment-13220097 ] Ferdy Galema commented on NUTCH-902: Note I changed gora-hbase-mapping.xml slightly: I

[jira] [Commented] (NUTCH-1291) Fetcher to stringify exception on // unexpected exception

2012-02-29 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219289#comment-13219289 ] Ferdy Galema commented on NUTCH-1291: - Yeah I noticed the exact same too during

[jira] [Commented] (NUTCH-1286) Refactoring/reimplementing crawling API (NutchApp)

2012-02-26 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216808#comment-13216808 ] Ferdy Galema commented on NUTCH-1286: - Thanks for updating the list. As a side note,

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-24 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215530#comment-13215530 ] Ferdy Galema commented on NUTCH-965: Hi Markus, For nutchtrunk I performed the

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-24 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215583#comment-13215583 ] Ferdy Galema commented on NUTCH-965: Ok that's it, I have reverted the changes

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-24 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215591#comment-13215591 ] Ferdy Galema commented on NUTCH-965: Ok I will recommit it. Luckily I did notice a

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-24 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215650#comment-13215650 ] Ferdy Galema commented on NUTCH-965: Recommitted. Thanks all. Skip

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-23 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214443#comment-13214443 ] Ferdy Galema commented on NUTCH-965: OK should be done now. I crosschecked both

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-23 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214595#comment-13214595 ] Ferdy Galema commented on NUTCH-965: Please note that the failure in Nutch-nutchgora

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-22 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213462#comment-13213462 ] Ferdy Galema commented on NUTCH-965: Tested, verified and committed with both trunk and

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-22 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214383#comment-13214383 ] Ferdy Galema commented on NUTCH-965: Will fix the test right away.

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-22 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214390#comment-13214390 ] Ferdy Galema commented on NUTCH-965: The test works now. The pretty obvious fix was

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-22 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214415#comment-13214415 ] Ferdy Galema commented on NUTCH-965: Doublechecked and it seems I made a few other

[jira] [Commented] (NUTCH-965) Skip parsing for truncated documents

2012-02-20 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211875#comment-13211875 ] Ferdy Galema commented on NUTCH-965: Hi Lewis, FYI: I'm currently looking into this

[jira] [Commented] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise.

2012-02-15 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208354#comment-13208354 ] Ferdy Galema commented on NUTCH-1279: - Hi Lewis, I can confirm that trunk already uses

[jira] [Commented] (NUTCH-1205) Upgrade gora modules to 0.2-SNAPSHOT in ivy/ivy.xml

2012-02-13 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206929#comment-13206929 ] Ferdy Galema commented on NUTCH-1205: - Could you please elaborate on what exactly the

[jira] [Commented] (NUTCH-1205) Upgrade gora modules to 0.2-SNAPSHOT in ivy/ivy.xml

2012-02-13 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206949#comment-13206949 ] Ferdy Galema commented on NUTCH-1205: - Haven't got the slightest clue about why this

[jira] [Commented] (NUTCH-1081) ant tests fail

2012-02-01 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197700#comment-13197700 ] Ferdy Galema commented on NUTCH-1081: - Everything works fine except for

[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2012-01-25 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193007#comment-13193007 ] Ferdy Galema commented on NUTCH-1086: - Seems like a JVM bug, perhaps you could

[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-01-25 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193042#comment-13193042 ] Ferdy Galema commented on NUTCH-1253: - Hi, Looking at the revision history it seems

[jira] [Commented] (NUTCH-1205) Upgrade gora modules to 0.2-incubating in ivy/ivy.xml

2012-01-20 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189857#comment-13189857 ] Ferdy Galema commented on NUTCH-1205: - Just to clarify, after applying v3 patch (and

[jira] [Commented] (NUTCH-1184) Fetcher to parse and follow Nth degree outlinks

2011-11-15 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150451#comment-13150451 ] Ferdy Galema commented on NUTCH-1184: - Hi Markus, This functionality is very useful.

[jira] [Commented] (NUTCH-1081) ant tests fail

2011-11-15 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151073#comment-13151073 ] Ferdy Galema commented on NUTCH-1081: - TestAPI is troublesome:

[jira] [Commented] (NUTCH-1202) Fetcher timebomb kills long waiting fetch jobs

2011-11-14 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149694#comment-13149694 ] Ferdy Galema commented on NUTCH-1202: - Do you mean initialization as in

[jira] [Commented] (NUTCH-1202) Fetcher timebomb kills long waiting fetch jobs

2011-11-14 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149722#comment-13149722 ] Ferdy Galema commented on NUTCH-1202: - Exactly. So the question is: How bad is it to

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144025#comment-13144025 ] Ferdy Galema commented on NUTCH-1098: - Radim, that's funny. I don't believe that is

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144132#comment-13144132 ] Ferdy Galema commented on NUTCH-1098: - Like I said before, I'm up for converting

[jira] [Commented] (NUTCH-1196) Update job should impose an upper limit on the number of inlinks (nutchgora)

2011-11-04 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144274#comment-13144274 ] Ferdy Galema commented on NUTCH-1196: - Thanks Andrzej. When I have the chance I will

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-02 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142291#comment-13142291 ] Ferdy Galema commented on NUTCH-1098: - @Markus/Radim I certainly do not want to

[jira] [Commented] (NUTCH-1189) add commented out default settings to gora.properties files

2011-11-02 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142446#comment-13142446 ] Ferdy Galema commented on NUTCH-1189: - You are right Lewis, HBase does not need any

[jira] [Commented] (NUTCH-1189) add commented out default settings to gora.properties files

2011-11-02 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142475#comment-13142475 ] Ferdy Galema commented on NUTCH-1189: - I see what you're getting at. We could indeed