[jira] [Commented] (NUTCH-2284) Basic Authentication Support for REST API

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342693#comment-15342693
 ] 

ASF GitHub Bot commented on NUTCH-2284:
---

GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/124

NUTCH-2284 Basic Authentication support for Nutch 2.X REST API.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2284

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #124


commit 52ffc5a983f261570fd25f10f2f8fcf70d543c88
Author: Furkan KAMACI 
Date:   2016-06-19T20:27:15Z

NUTCH-2284 Basic Authentication support for Nutch 2.X REST API.




> Basic Authentication Support for REST API
> -
>
> Key: NUTCH-2284
> URL: https://issues.apache.org/jira/browse/NUTCH-2284
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>
> Add Basic Authentication for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #124: NUTCH-2284 Basic Authentication support for Nutch 2...

2016-06-21 Thread kamaci
GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/124

NUTCH-2284 Basic Authentication support for Nutch 2.X REST API.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2284

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #124


commit 52ffc5a983f261570fd25f10f2f8fcf70d543c88
Author: Furkan KAMACI 
Date:   2016-06-19T20:27:15Z

NUTCH-2284 Basic Authentication support for Nutch 2.X REST API.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2199) Documentation for Nutch 2.X REST API

2016-06-21 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342644#comment-15342644
 ] 

Furkan KAMACI commented on NUTCH-2199:
--

This issue can be closed due to it is duplicated of NUTCH-2243

> Documentation for Nutch 2.X REST API
> 
>
> Key: NUTCH-2199
> URL: https://issues.apache.org/jira/browse/NUTCH-2199
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Affects Versions: 2.3.1
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.5
>
>
> The work done on NUTCH-1800 needs to be ported to 2.X branch. This is 
> trivial, I thought I had already done it but obviously not. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2243) Documentation for Nutch 2.X REST API

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342621#comment-15342621
 ] 

ASF GitHub Bot commented on NUTCH-2243:
---

GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/123

NUTCH-2243 REST API documentation for Nutch 2.X



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2243

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #123


commit 728d0de8bac399ac8dff5d0a0eee89f5c53428b9
Author: Furkan KAMACI 
Date:   2016-06-19T15:16:30Z

NUTCH-2243 REST API documentation for Nutch 2.X




> Documentation for Nutch 2.X REST API
> 
>
> Key: NUTCH-2243
> URL: https://issues.apache.org/jira/browse/NUTCH-2243
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>
> This issue should build on NUTCH-1769 with full Java documentation for all 
> classes in the following packages:
> org.apache.nutch.api.*
> for Nutch 2.x as done at NUTCH-1800 for Nutch 1.x 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #123: NUTCH-2243 REST API documentation for Nutch 2.X

2016-06-21 Thread kamaci
GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/123

NUTCH-2243 REST API documentation for Nutch 2.X



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2243

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #123


commit 728d0de8bac399ac8dff5d0a0eee89f5c53428b9
Author: Furkan KAMACI 
Date:   2016-06-19T15:16:30Z

NUTCH-2243 REST API documentation for Nutch 2.X




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2281) Support non-default FileSystem

2016-06-21 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341680#comment-15341680
 ] 

Sebastian Nagel commented on NUTCH-2281:


I tried to fix all tools but haven't tested all of them yet.  Yes, there may be 
some I've overseen :(.  I didn't fix unit tests, rarely used tools (Benchmark, 
DmozParser) and some main() methods which are intended for debugging or 
explicitly take the file system as argument (ParseData, ParseText).  I'll 
continue testing the next days but help is welcome!

> Support non-default FileSystem
> --
>
> Key: NUTCH-2281
> URL: https://issues.apache.org/jira/browse/NUTCH-2281
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.12
>Reporter: Sebastian Nagel
> Fix For: 1.13
>
>
> If a path (input or output) does not belong to the configured default 
> FileSystem various Nutch tools may raise an exception like
> {noformat}
>   Exception in ... java.lang.IllegalArgumentException: Wrong FS: s3a://..., 
> expected: hdfs://...
> {noformat}
> This is fixed by getting a reference to the FileSystem from the Path object
> {noformat}
>   FileSystem fs = path.getFileSystem(getConf());
> {noformat}
> instead of
> {noformat}
>   FileSystem fs = FileSystem.get(getConf());
> {noformat}
> A given path (e.g., {{s3a://...}}) may not belong to the default file system 
> ({{hdfs://}} or {{file://}} in local mode) and simple checks such as 
> {{fs.exists(path)}} then will fail. Cf. 
> [FileSystem.checkPath(path)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#checkPath(org.apache.hadoop.fs.Path)],
>  and 
> [FileSystem.get(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(org.apache.hadoop.conf.Configuration)]
>  vs. 
> [FileSystem.get(URI,conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(java.net.URI,%20org.apache.hadoop.conf.Configuration)]
>  which is called by 
> [Path.getFileSystem(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/Path.html#getFileSystem%28org.apache.hadoop.conf.Configuration%29].
>   
> Note that the FileSystem for input and output may be different, e.g., read 
> from HDFS and write to S3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2281) Support non-default FileSystem

2016-06-21 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341651#comment-15341651
 ] 

Markus Jelsma commented on NUTCH-2281:
--

Hello Sebastian, interesting issue! This patch doesn't apply to all jobs it 
seems. Any particular reason or is the patch just not yet complete.
M.

> Support non-default FileSystem
> --
>
> Key: NUTCH-2281
> URL: https://issues.apache.org/jira/browse/NUTCH-2281
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.12
>Reporter: Sebastian Nagel
> Fix For: 1.13
>
>
> If a path (input or output) does not belong to the configured default 
> FileSystem various Nutch tools may raise an exception like
> {noformat}
>   Exception in ... java.lang.IllegalArgumentException: Wrong FS: s3a://..., 
> expected: hdfs://...
> {noformat}
> This is fixed by getting a reference to the FileSystem from the Path object
> {noformat}
>   FileSystem fs = path.getFileSystem(getConf());
> {noformat}
> instead of
> {noformat}
>   FileSystem fs = FileSystem.get(getConf());
> {noformat}
> A given path (e.g., {{s3a://...}}) may not belong to the default file system 
> ({{hdfs://}} or {{file://}} in local mode) and simple checks such as 
> {{fs.exists(path)}} then will fail. Cf. 
> [FileSystem.checkPath(path)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#checkPath(org.apache.hadoop.fs.Path)],
>  and 
> [FileSystem.get(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(org.apache.hadoop.conf.Configuration)]
>  vs. 
> [FileSystem.get(URI,conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(java.net.URI,%20org.apache.hadoop.conf.Configuration)]
>  which is called by 
> [Path.getFileSystem(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/Path.html#getFileSystem%28org.apache.hadoop.conf.Configuration%29].
>   
> Note that the FileSystem for input and output may be different, e.g., read 
> from HDFS and write to S3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1800) Documentation for Nutch 1.X REST API

2016-06-21 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341646#comment-15341646
 ] 

Markus Jelsma commented on NUTCH-1800:
--

cool!

> Documentation for Nutch 1.X REST API
> 
>
> Key: NUTCH-1800
> URL: https://issues.apache.org/jira/browse/NUTCH-1800
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.11
>
> Attachments: NUTCH-1800.patch
>
>
> This issue should build on NUTCH-1769 with full Java documentation for all 
> classes in the following packages
> org.apache.nutch.api.*
> I am assigning this one to [~fjodor.vershinin] as he is doing an excellent 
> job on the REST API. His UML graphic in [0] and commantary shows that he has 
> a good understanding of the REST API and its functionality.
> Thank you [~fjodor.vershinin] great work.
> [0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: [ANNOUNCE] Apache Nutch 1.12 Release

2016-06-21 Thread Markus Jelsma
To those who upgrade,

The release announcement is missing some additional upgrade notes.  If you use 
the db.ignore.internal|external.links parameters, read the points below.

Regards,
Markus

-

Fellow committers, Nutch 1.12 contains a breaking change NUTCH-2220. Please use 
the note below and
in the release announcement and keep it on top in this CHANGES.txt for the 
Nutch 1.12 release.

* replace your old conf/nutch-default.xml with the conf/nutch-default.xml from 
Nutch 1.12 release

* if you use LinkDB (e.g. invertlinks) and modified parameters db.max.inlinks 
and/or db.max.anchor.length
  and/or db.ignore.internal.links, rename those parameters to 
linkdb.max.inlinks and
  linkdb.max.anchor.length and linkdb.ignore.internal.links

* db.ignore.internal.links and db.ignore.external.links now operate on the 
CrawlDB only

* linkdb.ignore.internal.links and linkdb.ignore.external.links now operate on 
the LinkDB only

 
-Original message-
> From:lewis john mcgibbney 
> Sent: Monday 20th June 2016 4:01
> To: u...@nutch.apache.org; dev@nutch.apache.org; annou...@apache.org
> Subject: [ANNOUNCE] Apache Nutch 1.12 Release
> 
> The Apache Nutch PMC are pleased to announce the immediate release of Apache 
> Nutch v1.12, we advise all
 
> current users and developers of the 1.X series to upgrade to this release. 
> Nutch is a well matured, production ready Web crawler. Nutch 1.x enables 
 
>  fine grained configuration, relying on Apache Hadoop™ 
 
>  data structures, which are great for batch processing.
> This release is the result of many months of work and over 40 issues 
 
> addressed. For a complete overview of these issues please see the
 
> release report . 
> As usual in the 1.X series, release artifacts are made available as both 
> source and binary and also available within
 
> Maven Central 
> 
>  as a Maven dependency.
 
> The release is available from our DOWNLOADS PAGE 
> . 
> The Nutch DOAP can be found at 
> https://svn.apache.org/repos/asf/nutch/cms_site/trunk/content/doap.rdf 
> 
> Lewis
> (On behalf of the Nutch PMC)