[jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509167#comment-14509167 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1985: --- Should we

Re: [PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015

2015-04-23 Thread Mattmann, Chris A (3980)
s/1.8/1.10/ right? If so +1! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email:

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509501#comment-14509501 ] Lewis John McGibbney commented on NUTCH-1994: - Would like to commit by EoB

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509514#comment-14509514 ] Tyler Palsulich commented on NUTCH-1994: Happy to help, [~lewismc]! Upgrade to

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509393#comment-14509393 ] Lewis John McGibbney commented on NUTCH-1994: - Anyone to review? I can roll a

[PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015

2015-04-23 Thread Lewis John Mcgibbney
Hi Folks, Does anyone have an issue with the above proposal? Thanks Lewis -- *Lewis*

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509522#comment-14509522 ] Lewis John McGibbney commented on NUTCH-1994: - Dynamite [~tpalsulich] I'll get

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509678#comment-14509678 ] Sebastian Nagel commented on NUTCH-1994: +1 Upgrade to Apache Tika 1.8

Unsubscribe

2015-04-23 Thread Mengxian Li
Hi, I want to unsubscribe the email list. Best, Mengxian

Unsubscribe

2015-04-23 Thread Zhaohui Zhang
Hi, I want to unsubscribe the email list. Best, Zhaohui -- Zhaohui Zhang Dept. of Chemical Engineering, University of Southern California Addr: 2611 Portland Street, Los Angeles, CA, USA 90007 Mobile:(+1)213-880-8321 Email: zhaoh...@usc.edu; happy...@gmail.com;

[jira] [Resolved] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1994. - Resolution: Fixed Committed revision 1675723 in trunk Committed revision 1675724

[jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509873#comment-14509873 ] Lewis John McGibbney commented on NUTCH-1985: - [~jorgelbg] +1 please commit

[jira] [Commented] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509898#comment-14509898 ] Julien Nioche commented on NUTCH-2000: -- [~lewismc] reverted to 1.10 as this is a

Build failed in Jenkins: Nutch-trunk #3083

2015-04-23 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/3083/changes Changes: [lewismc] NUTCH-1994 Upgrade to Apache Tika 1.8 -- [...truncated 5538 lines...] [echo] Testing plugin: urlfilter-validator [junit] WARNING: multiple versions of ant detected in

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509899#comment-14509899 ] Hudson commented on NUTCH-1994: --- FAILURE: Integrated in Nutch-trunk #3083 (See

[jira] [Commented] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509916#comment-14509916 ] Lewis John McGibbney commented on NUTCH-2000: - ACK Link inversion fails with

Re: Unsubscribe

2015-04-23 Thread Michael Joyce
Email dev-unsubscr...@nutch.apache.org You unsub the same way you subbed. It's just a different email. -- Jimmy On Thu, Apr 23, 2015 at 1:23 PM, Zhaohui Zhang happy...@gmail.com wrote: Hi, I want to unsubscribe the email list. Best, Zhaohui -- Zhaohui Zhang Dept. of Chemical

Unsubscribe

2015-04-23 Thread Zhaohui Zhang
Hi, I want to unsubscribe the email list. Best, Zhaohui -- Zhaohui Zhang PhD Student at University of Southern California Mobile: (213)-880-8321 Email: zhaoh...@usc.edu yuan...@usc.edu

[jira] [Created] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-23 Thread Jeff Cocking (JIRA)
Jeff Cocking created NUTCH-2001: --- Summary: SubCollection Field Name incorrect in nutch-default.xml Key: NUTCH-2001 URL: https://issues.apache.org/jira/browse/NUTCH-2001 Project: Nutch Issue

[jira] [Updated] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-23 Thread Jeff Cocking (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Cocking updated NUTCH-2001: Attachment: NUTCH-2001-1.x.patch SubCollection Field Name incorrect in nutch-default.xml

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509995#comment-14509995 ] Hudson commented on NUTCH-1994: --- SUCCESS: Integrated in Nutch-nutchgora #1412 (See

[jira] [Updated] (NUTCH-1958) Remove scoring-opic from nutch-default.xml

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1958: Fix Version/s: (was: 1.10) 1.11 Remove scoring-opic from

[jira] [Updated] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2000: Fix Version/s: (was: 1.10) 1.11 Link inversion fails with

[jira] [Updated] (NUTCH-1947) Overhaul o.a.n.parse.OutlinkExtractor.java

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1947: Fix Version/s: (was: 1.10) 1.11 Overhaul

[jira] [Commented] (NUTCH-1963) CommonsCrawlDataDumper is too long ( 100 bytes) when -gzip option invoked

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509876#comment-14509876 ] Lewis John McGibbney commented on NUTCH-1963: - [~gostep] is this issue

[jira] [Commented] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509920#comment-14509920 ] Lewis John McGibbney commented on NUTCH-2000: - Julien... I wonder if the 2nd

[jira] [Commented] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-23 Thread Jeff Cocking (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509997#comment-14509997 ] Jeff Cocking commented on NUTCH-2001: - Attached is a patch I created from a clean

[jira] [Commented] (NUTCH-1969) URL Normalizer properly handling slashes

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509880#comment-14509880 ] Lewis John McGibbney commented on NUTCH-1969: - +1 for commit

[jira] [Updated] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-2000: - Priority: Blocker (was: Major) Link inversion fails with .locked already exists.

[jira] [Updated] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-2000: - Fix Version/s: (was: 1.11) 1.10 Link inversion fails with .locked already

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509492#comment-14509492 ] Tyler Palsulich commented on NUTCH-1994: Applied and tested both patches, both

Build failed in Jenkins: Nutch-trunk #3087

2015-04-23 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/3087/ -- [...truncated 5611 lines...] test: [echo] Testing plugin: urlfilter-validator [junit] WARNING: multiple versions of ant detected in path for junit [junit]

[jira] [Resolved] (NUTCH-1963) CommonsCrawlDataDumper is too long ( 100 bytes) when -gzip option invoked

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1963. - Resolution: Fixed Assignee: Giuseppe Totaro Addressed within NUTCH-1959

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510097#comment-14510097 ] Lewis John McGibbney commented on NUTCH-1973: - This commit accidently removed

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510167#comment-14510167 ] Hudson commented on NUTCH-1927: --- FAILURE: Integrated in Nutch-trunk #3084 (See

Build failed in Jenkins: Nutch-trunk #3084

2015-04-23 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/3084/changes Changes: [lewismc] Add back in NUTCH-1927 property to nutch-default as revoved during commit @1675022 -- [...truncated 5373 lines...] [junit] WARNING: multiple versions of ant detected in

[jira] [Commented] (NUTCH-1963) CommonsCrawlDataDumper is too long ( 100 bytes) when -gzip option invoked

2015-04-23 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510062#comment-14510062 ] Giuseppe Totaro commented on NUTCH-1963: Hi [~lewismc]. Yes,

[jira] [Commented] (NUTCH-1997) Add CBOR magic header to CommonCrawlDataDumper output

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508541#comment-14508541 ] Luke sh commented on NUTCH-1997: i am working on the update. Add CBOR magic header to

[jira] [Commented] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508520#comment-14508520 ] Luke sh commented on NUTCH-1998: Hi [~gostep], this patch works. I run a quick tested it

[jira] [Commented] (NUTCH-1997) Add CBOR magic header to CommonCrawlDataDumper output

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508522#comment-14508522 ] Luke sh commented on NUTCH-1997: Thanks a lot [~gostep], highly appreciated, this patch

[jira] [Commented] (NUTCH-1997) Add CBOR magic header to CommonCrawlDataDumper output

2015-04-23 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508549#comment-14508549 ] Giuseppe Totaro commented on NUTCH-1997: Great. Thanks [~Lukeliush]. Please let me

[jira] [Commented] (NUTCH-1997) Add CBOR magic header to CommonCrawlDataDumper output

2015-04-23 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508540#comment-14508540 ] Giuseppe Totaro commented on NUTCH-1997: Thanks [~Lukeliush]. Do you verify if

[jira] [Created] (NUTCH-1999) Add http://nutch.apache.org/robots.txt

2015-04-23 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1999: Summary: Add http://nutch.apache.org/robots.txt Key: NUTCH-1999 URL: https://issues.apache.org/jira/browse/NUTCH-1999 Project: Nutch Issue Type: Improvement

[jira] [Assigned] (NUTCH-1999) Add http://nutch.apache.org/robots.txt

2015-04-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-1999: Assignee: Julien Nioche Add http://nutch.apache.org/robots.txt

[jira] [Created] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-2000: Summary: Link inversion fails with .locked already exists. Key: NUTCH-2000 URL: https://issues.apache.org/jira/browse/NUTCH-2000 Project: Nutch Issue Type:

Build failed in Jenkins: Nutch-trunk #3085

2015-04-23 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/3085/ -- [...truncated 5536 lines...] [echo] Testing plugin: urlfilter-validator [junit] WARNING: multiple versions of ant detected in path for junit [junit]

[jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510341#comment-14510341 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1985: --- Committed

Re: [MASSMAIL]Re: [PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015

2015-04-23 Thread Jorge Luis Betancourt González
+1 - Original Message - From: Chris A Mattmann (3980) chris.a.mattm...@jpl.nasa.gov To: dev@nutch.apache.org Sent: Thursday, April 23, 2015 2:16:09 PM Subject: [MASSMAIL]Re: [PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015 s/1.8/1.10/ right? If so +1!

[jira] [Commented] (NUTCH-1997) Add CBOR magic header to CommonCrawlDataDumper output

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510380#comment-14510380 ] Luke sh commented on NUTCH-1997: Notes: The attached cbor file contains both magic bytes

[jira] [Resolved] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Luis Betancourt Gonzalez resolved NUTCH-1985. --- Resolution: Fixed Adding a main() method to the

[jira] [Issue Comment Deleted] (NUTCH-1997) Add CBOR magic header to CommonCrawlDataDumper output

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated NUTCH-1997: --- Comment: was deleted (was: Notes: The attached cbor file contains both magic bytes for type xhtml and type

Build failed in Jenkins: Nutch-trunk #3086

2015-04-23 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/3086/changes Changes: [jorgelbg] NUTCH-1985 Adding a main() method to the MimeTypeIndexingFilter -- [...truncated 5373 lines...] copy-generated-lib: test: [echo] Testing plugin: urlfilter-validator

[jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510388#comment-14510388 ] Hudson commented on NUTCH-1985: --- FAILURE: Integrated in Nutch-trunk #3086 (See