[jira] [Commented] (NUTCH-1920) Upgrade Nutch to use Java 1.7

2015-01-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281171#comment-14281171 ] Hudson commented on NUTCH-1920: --- SUCCESS: Integrated in Nutch-trunk #2937 (See

[jira] [Commented] (NUTCH-1913) LinkDB to implement db.ignore.external.links

2015-02-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317874#comment-14317874 ] Hudson commented on NUTCH-1913: --- SUCCESS: Integrated in Nutch-trunk #2971 (See

[jira] [Commented] (NUTCH-1323) AjaxNormalizer

2015-02-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317873#comment-14317873 ] Hudson commented on NUTCH-1323: --- SUCCESS: Integrated in Nutch-trunk #2971 (See

[jira] [Commented] (NUTCH-1893) Parse-tika fails to parse feed files

2015-01-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294265#comment-14294265 ] Hudson commented on NUTCH-1893: --- SUCCESS: Integrated in Nutch-trunk #2950 (See

[jira] [Commented] (NUTCH-1920) Upgrade Nutch to use Java 1.7

2015-01-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294493#comment-14294493 ] Hudson commented on NUTCH-1920: --- SUCCESS: Integrated in Nutch-nutchgora #1318 (See

[jira] [Commented] (NUTCH-1893) Parse-tika fails to parse feed files

2015-01-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294343#comment-14294343 ] Hudson commented on NUTCH-1893: --- SUCCESS: Integrated in Nutch-nutchgora #1316 (See

[jira] [Commented] (NUTCH-1660) Index filter for Page's latitude and longitude

2015-01-10 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272733#comment-14272733 ] Hudson commented on NUTCH-1660: --- SUCCESS: Integrated in Nutch-trunk #2927 (See

[jira] [Commented] (NUTCH-1856) Document webpage.avsc and host.avsc

2015-01-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270520#comment-14270520 ] Hudson commented on NUTCH-1856: --- SUCCESS: Integrated in Nutch-nutchgora #1295 (See

[jira] [Commented] (NUTCH-1907) Incorrect output of Outlinks to Hosts within HostDbUpdateReducer

2015-01-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270641#comment-14270641 ] Hudson commented on NUTCH-1907: --- SUCCESS: Integrated in Nutch-nutchgora #1296 (See

[jira] [Commented] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318099#comment-14318099 ] Hudson commented on NUTCH-1939: --- SUCCESS: Integrated in Nutch-trunk #2972 (See

[jira] [Commented] (NUTCH-827) HTTP POST Authentication

2015-02-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321194#comment-14321194 ] Hudson commented on NUTCH-827: -- SUCCESS: Integrated in Nutch-trunk #2976 (See

[jira] [Commented] (NUTCH-1904) Schema for Solr4 doesn't include _version_ field

2015-01-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264002#comment-14264002 ] Hudson commented on NUTCH-1904: --- SUCCESS: Integrated in Nutch-trunk #2919 (See

[jira] [Commented] (NUTCH-1140) index-more plugin, resetTitle method creates multiple values in the Title field

2015-01-07 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268444#comment-14268444 ] Hudson commented on NUTCH-1140: --- SUCCESS: Integrated in Nutch-trunk #2923 (See

[jira] [Commented] (NUTCH-1967) Possible SIooBE in MimeAdaptiveFetchSchedule

2015-03-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366841#comment-14366841 ] Hudson commented on NUTCH-1967: --- SUCCESS: Integrated in Nutch-trunk #3021 (See

[jira] [Commented] (NUTCH-1957) FileDumper output file name collisions

2015-03-14 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362208#comment-14362208 ] Hudson commented on NUTCH-1957: --- SUCCESS: Integrated in Nutch-trunk #3017 (See

[jira] [Commented] (NUTCH-1966) Configuration endpoint for 1x REST API [A sub-issue of NUTCH-1931]

2015-03-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368476#comment-14368476 ] Hudson commented on NUTCH-1966: --- SUCCESS: Integrated in Nutch-trunk #3022 (See

[jira] [Commented] (NUTCH-1968) File Name too long issue of DumpFileUtil.java file

2015-03-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368509#comment-14368509 ] Hudson commented on NUTCH-1968: --- SUCCESS: Integrated in Nutch-trunk #3023 (See

[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351431#comment-14351431 ] Hudson commented on NUTCH-1954: --- SUCCESS: Integrated in Nutch-trunk #3005 (See

[jira] [Commented] (NUTCH-1962) Need to have mimetype-filter.txt file available by default

2015-03-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359935#comment-14359935 ] Hudson commented on NUTCH-1962: --- SUCCESS: Integrated in Nutch-trunk #3012 (See

[jira] [Commented] (NUTCH-1956) Members to be public in URLCrawlDatum

2015-03-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360532#comment-14360532 ] Hudson commented on NUTCH-1956: --- SUCCESS: Integrated in Nutch-trunk #3013 (See

[jira] [Commented] (NUTCH-1955) ByteWritable missing in NutchWritable

2015-03-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360533#comment-14360533 ] Hudson commented on NUTCH-1955: --- SUCCESS: Integrated in Nutch-trunk #3013 (See

[jira] [Commented] (NUTCH-1976) Allow Users to Set Hostname for Server

2015-03-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385639#comment-14385639 ] Hudson commented on NUTCH-1976: --- SUCCESS: Integrated in Nutch-trunk #3037 (See

[jira] [Commented] (NUTCH-1970) Pretty print JSON output in config resource

2015-03-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385640#comment-14385640 ] Hudson commented on NUTCH-1970: --- SUCCESS: Integrated in Nutch-trunk #3037 (See

[jira] [Commented] (NUTCH-1979) CrawlDbReader to implement Tool

2015-03-31 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389269#comment-14389269 ] Hudson commented on NUTCH-1979: --- SUCCESS: Integrated in Nutch-trunk #3041 (See

[jira] [Commented] (NUTCH-1979) CrawlDbReader to implement Tool

2015-03-31 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388924#comment-14388924 ] Hudson commented on NUTCH-1979: --- FAILURE: Integrated in Nutch-trunk #3040 (See

[jira] [Commented] (NUTCH-1949) Dump out the Nutch data into the Common Crawl format

2015-03-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347600#comment-14347600 ] Hudson commented on NUTCH-1949: --- SUCCESS: Integrated in Nutch-trunk #3001 (See

[jira] [Commented] (NUTCH-1950) File name too long when bin/nutch dump

2015-03-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346320#comment-14346320 ] Hudson commented on NUTCH-1950: --- SUCCESS: Integrated in Nutch-trunk #2999 (See

[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332350#comment-14332350 ] Hudson commented on NUTCH-1925: --- SUCCESS: Integrated in Nutch-nutchgora #1347 (See

[jira] [Commented] (NUTCH-1928) Indexing filter of documents by the MIME type

2015-02-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332504#comment-14332504 ] Hudson commented on NUTCH-1928: --- SUCCESS: Integrated in Nutch-trunk #2986 (See

[jira] [Commented] (NUTCH-1941) Optional rolling http.agent.name's

2015-03-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384756#comment-14384756 ] Hudson commented on NUTCH-1941: --- SUCCESS: Integrated in Nutch-trunk #3034 (See

[jira] [Commented] (NUTCH-1941) Optional rolling http.agent.name's

2015-03-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384826#comment-14384826 ] Hudson commented on NUTCH-1941: --- SUCCESS: Integrated in Nutch-nutchgora #1381 (See

[jira] [Commented] (NUTCH-1974) keyPrefix option for CommonCrawlDataDumper tool

2015-03-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381308#comment-14381308 ] Hudson commented on NUTCH-1974: --- SUCCESS: Integrated in Nutch-trunk #3030 (See

[jira] [Commented] (NUTCH-1959) Improving CommonCrawlFormat implementations

2015-03-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381365#comment-14381365 ] Hudson commented on NUTCH-1959: --- SUCCESS: Integrated in Nutch-trunk #3032 (See

[jira] [Commented] (NUTCH-1974) keyPrefix option for CommonCrawlDataDumper tool

2015-03-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381364#comment-14381364 ] Hudson commented on NUTCH-1974: --- SUCCESS: Integrated in Nutch-trunk #3032 (See

[jira] [Commented] (NUTCH-1912) Dump tool -mimetype parameter needs to be optional to prevent NPE

2015-01-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275928#comment-14275928 ] Hudson commented on NUTCH-1912: --- SUCCESS: Integrated in Nutch-trunk #2932 (See

[jira] [Commented] (NUTCH-1918) TikaParser specifies a default namespace when generating DOM

2015-01-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298420#comment-14298420 ] Hudson commented on NUTCH-1918: --- SUCCESS: Integrated in Nutch-trunk #2956 (See

[jira] [Commented] (NUTCH-1975) New configuration for CommonCrawlDataDumper tool

2015-04-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394527#comment-14394527 ] Hudson commented on NUTCH-1975: --- SUCCESS: Integrated in Nutch-trunk #3045 (See

[jira] [Commented] (NUTCH-1960) JUnit test for dump method of CommonCrawlDataDumper

2015-04-10 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490763#comment-14490763 ] Hudson commented on NUTCH-1960: --- SUCCESS: Integrated in Nutch-trunk #3056 (See

[jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504282#comment-14504282 ] Hudson commented on NUTCH-1987: --- SUCCESS: Integrated in Nutch-trunk #3074 (See

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509899#comment-14509899 ] Hudson commented on NUTCH-1994: --- FAILURE: Integrated in Nutch-trunk #3083 (See

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509995#comment-14509995 ] Hudson commented on NUTCH-1994: --- SUCCESS: Integrated in Nutch-nutchgora #1412 (See

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510167#comment-14510167 ] Hudson commented on NUTCH-1927: --- FAILURE: Integrated in Nutch-trunk #3084 (See

[jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool

2015-04-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504641#comment-14504641 ] Hudson commented on NUTCH-1697: --- SUCCESS: Integrated in Nutch-trunk #3075 (See

[jira] [Commented] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512588#comment-14512588 ] Hudson commented on NUTCH-1991: --- FAILURE: Integrated in Nutch-trunk #3089 (See

[jira] [Commented] (NUTCH-1997) Add CBOR magic header to CommonCrawlDataDumper output

2015-04-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512589#comment-14512589 ] Hudson commented on NUTCH-1997: --- FAILURE: Integrated in Nutch-trunk #3089 (See

[jira] [Commented] (NUTCH-1969) URL Normalizer properly handling slashes

2015-04-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513409#comment-14513409 ] Hudson commented on NUTCH-1969: --- FAILURE: Integrated in Nutch-trunk #3091 (See

[jira] [Commented] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513410#comment-14513410 ] Hudson commented on NUTCH-2001: --- FAILURE: Integrated in Nutch-trunk #3091 (See

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506286#comment-14506286 ] Hudson commented on NUTCH-1973: --- FAILURE: Integrated in Nutch-trunk #3077 (See

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506388#comment-14506388 ] Hudson commented on NUTCH-1973: --- SUCCESS: Integrated in Nutch-trunk #3078 (See

[jira] [Commented] (NUTCH-1996) Make protocol-selenium README part of plugin

2015-04-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507411#comment-14507411 ] Hudson commented on NUTCH-1996: --- SUCCESS: Integrated in Nutch-trunk #3081 (See

[jira] [Commented] (NUTCH-1990) Use URI.normalise() in BasicURLNormalizer

2015-04-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507974#comment-14507974 ] Hudson commented on NUTCH-1990: --- SUCCESS: Integrated in Nutch-nutchgora #1410 (See

[jira] [Commented] (NUTCH-1990) Use URI.normalise() in BasicURLNormalizer

2015-04-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506814#comment-14506814 ] Hudson commented on NUTCH-1990: --- SUCCESS: Integrated in Nutch-trunk #3080 (See

[jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510388#comment-14510388 ] Hudson commented on NUTCH-1985: --- FAILURE: Integrated in Nutch-trunk #3086 (See

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520080#comment-14520080 ] Hudson commented on NUTCH-1994: --- SUCCESS: Integrated in Nutch-trunk #3095 (See

[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-05-07 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533844#comment-14533844 ] Hudson commented on NUTCH-1934: --- SUCCESS: Integrated in Nutch-trunk #3107 (See

[jira] [Commented] (NUTCH-2004) ParseChecker does not handle redirects

2015-05-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531712#comment-14531712 ] Hudson commented on NUTCH-2004: --- SUCCESS: Integrated in Nutch-trunk #3104 (See

[jira] [Commented] (NUTCH-1988) Make nested output directory dump optional

2015-05-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536887#comment-14536887 ] Hudson commented on NUTCH-1988: --- FAILURE: Integrated in Nutch-trunk #3111 (See

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-05-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538399#comment-14538399 ] Hudson commented on NUTCH-1927: --- FAILURE: Integrated in Nutch-trunk #3114 (See

[jira] [Commented] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper

2015-05-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538712#comment-14538712 ] Hudson commented on NUTCH-1998: --- SUCCESS: Integrated in Nutch-trunk #3115 (See

[jira] [Commented] (NUTCH-2004) ParseChecker does not handle redirects

2015-05-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536851#comment-14536851 ] Hudson commented on NUTCH-2004: --- SUCCESS: Integrated in Nutch-trunk #3110 (See

[jira] [Commented] (NUTCH-1873) Solr IndexWriter/Job to report number of docs indexed.

2015-05-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535940#comment-14535940 ] Hudson commented on NUTCH-1873: --- SUCCESS: Integrated in Nutch-trunk #3108 (See

[jira] [Commented] (NUTCH-2006) IndexingFiltersChecker to take custom metadata as input

2015-05-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545618#comment-14545618 ] Hudson commented on NUTCH-2006: --- SUCCESS: Integrated in Nutch-trunk #3121 (See

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-05-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547426#comment-14547426 ] Hudson commented on NUTCH-1973: --- SUCCESS: Integrated in Nutch-trunk #3125 (See

[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-05-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547425#comment-14547425 ] Hudson commented on NUTCH-1854: --- SUCCESS: Integrated in Nutch-trunk #3125 (See

[jira] [Commented] (NUTCH-2008) IndexerMapReduce to use single instance of NutchIndexAction for deletions

2015-05-14 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543477#comment-14543477 ] Hudson commented on NUTCH-2008: --- SUCCESS: Integrated in Nutch-trunk #3119 (See

[jira] [Commented] (NUTCH-2014) Fetcher hang-up on completion

2015-05-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549319#comment-14549319 ] Hudson commented on NUTCH-2014: --- SUCCESS: Integrated in Nutch-trunk #3127 (See

[jira] [Commented] (NUTCH-2013) Fetcher: missing logs fetching ... on stdout

2015-05-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549320#comment-14549320 ] Hudson commented on NUTCH-2013: --- SUCCESS: Integrated in Nutch-trunk #3127 (See

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545988#comment-14545988 ] Hudson commented on NUTCH-2011: --- SUCCESS: Integrated in Nutch-trunk #3122 (See

[jira] [Commented] (NUTCH-1906) Typo in CrawlDbReader command line help

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500430#comment-14500430 ] Hudson commented on NUTCH-1906: --- SUCCESS: Integrated in Nutch-trunk #3065 (See

[jira] [Commented] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500431#comment-14500431 ] Hudson commented on NUTCH-1911: --- SUCCESS: Integrated in Nutch-trunk #3065 (See

[jira] [Commented] (NUTCH-1988) Make nested output directory dump optional

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500754#comment-14500754 ] Hudson commented on NUTCH-1988: --- SUCCESS: Integrated in Nutch-trunk #3067 (See

[jira] [Commented] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500647#comment-14500647 ] Hudson commented on NUTCH-1986: --- SUCCESS: Integrated in Nutch-trunk #3066 (See

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500753#comment-14500753 ] Hudson commented on NUTCH-1927: --- SUCCESS: Integrated in Nutch-trunk #3067 (See

[jira] [Commented] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501465#comment-14501465 ] Hudson commented on NUTCH-1989: --- SUCCESS: Integrated in Nutch-trunk #3069 (See

[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501580#comment-14501580 ] Hudson commented on NUTCH-1854: --- SUCCESS: Integrated in Nutch-trunk #3070 (See

[jira] [Commented] (NUTCH-1981) Upgrade icu4j

2015-04-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491238#comment-14491238 ] Hudson commented on NUTCH-1981: --- SUCCESS: Integrated in Nutch-nutchgora #1398 (See

[jira] [Commented] (NUTCH-1981) Upgrade icu4j

2015-04-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491242#comment-14491242 ] Hudson commented on NUTCH-1981: --- SUCCESS: Integrated in Nutch-trunk #3058 (See

[jira] [Commented] (NUTCH-1983) CommonCrawlDumper and FileDumper don't dump correct JSON

2015-04-10 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490558#comment-14490558 ] Hudson commented on NUTCH-1983: --- SUCCESS: Integrated in Nutch-trunk #3055 (See

[jira] [Commented] (NUTCH-2017) Remove debug log from MimeUtil

2015-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577911#comment-14577911 ] Hudson commented on NUTCH-2017: --- SUCCESS: Integrated in Nutch-trunk #3155 (See

[jira] [Commented] (NUTCH-2037) Job endpoint to support Indexing from the REST API

2015-06-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578436#comment-14578436 ] Hudson commented on NUTCH-2037: --- SUCCESS: Integrated in Nutch-trunk #3157 (See

[jira] [Commented] (NUTCH-2016) Remove unused class OldFetcher

2015-06-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601853#comment-14601853 ] Hudson commented on NUTCH-2016: --- SUCCESS: Integrated in Nutch-trunk #3176 (See

[jira] [Commented] (NUTCH-2041) indexer fails if linkdb is missing

2015-06-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601852#comment-14601852 ] Hudson commented on NUTCH-2041: --- SUCCESS: Integrated in Nutch-trunk #3176 (See

[jira] [Commented] (NUTCH-2036) Adding some continuous crawl goodies to the crawl script

2015-06-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601291#comment-14601291 ] Hudson commented on NUTCH-2036: --- SUCCESS: Integrated in Nutch-trunk #3174 (See

[jira] [Commented] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-06-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601734#comment-14601734 ] Hudson commented on NUTCH-2000: --- SUCCESS: Integrated in Nutch-trunk #3175 (See

[jira] [Commented] (NUTCH-2039) Relevance based scoring filter

2015-06-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593818#comment-14593818 ] Hudson commented on NUTCH-2039: --- SUCCESS: Integrated in Nutch-trunk #3167 (See

[jira] [Commented] (NUTCH-2045) index-basic incorrect assignment of next fetch time (page.getFetchTime()) as page fetch time

2015-06-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598511#comment-14598511 ] Hudson commented on NUTCH-2045: --- SUCCESS: Integrated in Nutch-nutchgora #1477 (See

[jira] [Commented] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service

2015-06-02 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570243#comment-14570243 ] Hudson commented on NUTCH-2031: --- FAILURE: Integrated in Nutch-trunk #3148 (See

[jira] [Commented] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service

2015-06-02 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570281#comment-14570281 ] Hudson commented on NUTCH-2031: --- SUCCESS: Integrated in Nutch-trunk #3149 (See

[jira] [Commented] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561555#comment-14561555 ] Hudson commented on NUTCH-1995: --- SUCCESS: Integrated in Nutch-trunk #3138 (See

[jira] [Commented] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560237#comment-14560237 ] Hudson commented on NUTCH-1995: --- FAILURE: Integrated in Nutch-trunk #3136 (See

[jira] [Commented] (NUTCH-2007) add test libs to classpath of bin/nutch junit

2015-05-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561772#comment-14561772 ] Hudson commented on NUTCH-2007: --- SUCCESS: Integrated in Nutch-trunk #3139 (See

[jira] [Commented] (NUTCH-208) http: proxy exception list:

2015-05-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562063#comment-14562063 ] Hudson commented on NUTCH-208: -- SUCCESS: Integrated in Nutch-trunk #3140 (See

[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-06-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568517#comment-14568517 ] Hudson commented on NUTCH-2015: --- SUCCESS: Integrated in Nutch-trunk #3147 (See

[jira] [Commented] (NUTCH-1684) ParseMeta to be added before fetch schedulers are run

2015-07-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609695#comment-14609695 ] Hudson commented on NUTCH-1684: --- SUCCESS: Integrated in Nutch-trunk #3187 (See

[jira] [Commented] (NUTCH-1980) Jexl expressions for CrawlDbReader

2015-07-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609693#comment-14609693 ] Hudson commented on NUTCH-1980: --- SUCCESS: Integrated in Nutch-trunk #3187 (See

[jira] [Commented] (NUTCH-1692) SegmentReader broken in distributed mode

2015-07-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609694#comment-14609694 ] Hudson commented on NUTCH-1692: --- SUCCESS: Integrated in Nutch-trunk #3187 (See

[jira] [Commented] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613602#comment-14613602 ] Hudson commented on NUTCH-2052: --- FAILURE: Integrated in Nutch-trunk #3189 (See

[jira] [Commented] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614121#comment-14614121 ] Hudson commented on NUTCH-2052: --- SUCCESS: Integrated in Nutch-trunk #3191 (See

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614120#comment-14614120 ] Hudson commented on NUTCH-2059: --- SUCCESS: Integrated in Nutch-trunk #3191 (See

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-07-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609625#comment-14609625 ] Hudson commented on NUTCH-2038: --- SUCCESS: Integrated in Nutch-trunk #3186 (See

<    1   2   3   4   5   6   7   8   9   10   >