[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501113#comment-14501113 ] Chris A. Mattmann commented on NUTCH-1973: -- [~sujenshah] this patch doesn't apply

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501110#comment-14501110 ] Chris A. Mattmann commented on NUTCH-1973: -- ok going to commit this now. > Job A

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500753#comment-14500753 ] Hudson commented on NUTCH-1927: --- SUCCESS: Integrated in Nutch-trunk #3067 (See [https://bui

[jira] [Commented] (NUTCH-1988) Make nested output directory dump optional

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500754#comment-14500754 ] Hudson commented on NUTCH-1988: --- SUCCESS: Integrated in Nutch-trunk #3067 (See [https://bui

[GitHub] nutch pull request: NUTCH-1988 - Add optional flat directory flag ...

2015-04-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/19 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[GitHub] nutch pull request: NUTCH-1986 - Update and clarify default Elasti...

2015-04-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/17 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[jira] [Commented] (NUTCH-1988) Make nested output directory dump optional

2015-04-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500657#comment-14500657 ] ASF GitHub Bot commented on NUTCH-1988: --- Github user asfgit closed the pull request

[jira] [Resolved] (NUTCH-1988) Make nested output directory dump optional

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1988. -- Resolution: Fixed Thanks [~jo...@apache.org] {noformat} [chipotle:~/tmp/nutch-1.10-tru

[jira] [Assigned] (NUTCH-1988) Make nested output directory dump optional

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1988: Assignee: Chris A. Mattmann > Make nested output directory dump optional >

DARPA Memex

2015-04-17 Thread Mattmann, Chris A (3980)
Hey Everyone, Here’s what we’ve been involved in: http://www.forbes.com/sites/thomasbrewster/2015/04/17/darpa-nasa-and-partne rs-show-off-memex/ :) Nutch, Tika, Solr FTW! Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect In

[GitHub] nutch pull request: NUTCH-1986 - Update and clarify default Elasti...

2015-04-17 Thread MJJoyce
GitHub user MJJoyce opened a pull request: https://github.com/apache/nutch/pull/17 NUTCH-1986 - Update and clarify default Elasticsearch conf values - Host value is now defaulted to 'localhost'. - Update port description to make it apparent that 9300 is more likely the valu

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500652#comment-14500652 ] Sebastian Nagel commented on NUTCH-1927: Committed to trunk r1674399. Should be ea

[GitHub] nutch pull request: NUTCH-1988 - Add optional flat directory flag ...

2015-04-17 Thread MJJoyce
GitHub user MJJoyce opened a pull request: https://github.com/apache/nutch/pull/19 NUTCH-1988 - Add optional flat directory flag to dump command - Add optional flatdir flag to dump command so that a user can dump their crawl data to a flat directory instead of the nested struct

[GitHub] nutch pull request: NUTCH-1987 - Make bin/crawl indexer agnostic

2015-04-17 Thread MJJoyce
GitHub user MJJoyce opened a pull request: https://github.com/apache/nutch/pull/18 NUTCH-1987 - Make bin/crawl indexer agnostic - Add solr.server.url property to nutch-default and set to value consistent with URL used in the Nutch Tutorial. - Change SOLRURL references to IN

[jira] [Commented] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500647#comment-14500647 ] Hudson commented on NUTCH-1986: --- SUCCESS: Integrated in Nutch-trunk #3066 (See [https://bui

[jira] [Resolved] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1986. -- Resolution: Fixed Thanks [~jo...@apache.org]! {noformat} [chipotle:~/tmp/nutch-1.10-tru

[jira] [Commented] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings

2015-04-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500613#comment-14500613 ] ASF GitHub Bot commented on NUTCH-1986: --- Github user asfgit closed the pull request

Re: [jira] [Updated] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-17 Thread Mattmann, Chris A (3980)
+1 please commit! Thanks seb Sent from my iPhone > On Apr 17, 2015, at 4:15 PM, Sebastian Nagel (JIRA) wrote: > > > [ > https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Sebastian Nagel updated NUTCH-1927: > -

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-17 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500604#comment-14500604 ] Mattmann, Chris A (388J) commented on NUTCH-1927: - +1 please commit! Thank

[jira] [Updated] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1927: --- Attachment: test_NUTCH-1927.2015-04-17.txt NUTCH-1927.2015-04-17.patch Patch t

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500544#comment-14500544 ] Sebastian Nagel commented on NUTCH-1927: Hi, Chris: agreed to log more verbosely.

[jira] [Commented] (NUTCH-1906) Typo in CrawlDbReader command line help

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500430#comment-14500430 ] Hudson commented on NUTCH-1906: --- SUCCESS: Integrated in Nutch-trunk #3065 (See [https://bui

[jira] [Commented] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing

2015-04-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500431#comment-14500431 ] Hudson commented on NUTCH-1911: --- SUCCESS: Integrated in Nutch-trunk #3065 (See [https://bui

[GitHub] nutch pull request: NUTCH-1906 - Remove duplicate stats flag listi...

2015-04-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/20 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[jira] [Commented] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500366#comment-14500366 ] Chris A. Mattmann commented on NUTCH-1986: -- oops, you already did this! OK will c

[jira] [Assigned] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1986: Assignee: Chris A. Mattmann > Clarify Elastic Search Indexer Plugin Settings >

[jira] [Work started] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1986 started by Chris A. Mattmann. > Clarify Elastic Search Indexer Plugin Settings > --

[GitHub] nutch pull request: NUTCH-1906 - Remove duplicate stats flag listi...

2015-04-17 Thread MJJoyce
GitHub user MJJoyce opened a pull request: https://github.com/apache/nutch/pull/20 NUTCH-1906 - Remove duplicate stats flag listing in readdb help You can merge this pull request into a Git repository by running: $ git pull https://github.com/MJJoyce/nutch NUTCH-1906 Alternat

[jira] [Commented] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500364#comment-14500364 ] Chris A. Mattmann commented on NUTCH-1986: -- +1, Mike if you can work up a PR for

[jira] [Resolved] (NUTCH-1906) Typo in CrawlDbReader command line help

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1906. -- Resolution: Fixed Fix Version/s: (was: 1.11) 1.10 Thanks [

[jira] [Commented] (NUTCH-1906) Typo in CrawlDbReader command line help

2015-04-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500359#comment-14500359 ] ASF GitHub Bot commented on NUTCH-1906: --- Github user asfgit closed the pull request

[jira] [Work started] (NUTCH-1906) Typo in CrawlDbReader command line help

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1906 started by Chris A. Mattmann. > Typo in CrawlDbReader command line help > -

[jira] [Assigned] (NUTCH-1906) Typo in CrawlDbReader command line help

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1906: Assignee: Chris A. Mattmann (was: Lewis John McGibbney) > Typo in CrawlDbReader co

[GitHub] nutch pull request: NUTCH-1911 - Make domainstatics help info a sm...

2015-04-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/21 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[jira] [Assigned] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1911: Assignee: Chris A. Mattmann > Imeprove DomainStatistics tool command line parsing >

[jira] [Commented] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing

2015-04-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500325#comment-14500325 ] ASF GitHub Bot commented on NUTCH-1911: --- Github user asfgit closed the pull request

[jira] [Resolved] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1911. -- Resolution: Fixed Fix Version/s: (was: 1.11) 1.10 Thanks [

[jira] [Commented] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing

2015-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500055#comment-14500055 ] Chris A. Mattmann commented on NUTCH-1911: -- awesome mike going to commit this now

[GitHub] nutch pull request: NUTCH-1911 - Make domainstatics help info a sm...

2015-04-17 Thread MJJoyce
GitHub user MJJoyce opened a pull request: https://github.com/apache/nutch/pull/21 NUTCH-1911 - Make domainstatics help info a smidge more helpful You can merge this pull request into a Git repository by running: $ git pull https://github.com/MJJoyce/nutch NUTCH-1911 Alternat

[jira] [Updated] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-17 Thread Iain Lopata (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iain Lopata updated NUTCH-1991: --- Description: From Nutch Version 1.5 onwards the MimeUtil.java class that acts as a facade to Tika to p