Re: need suggestion for GSoC 2016

2016-01-22 Thread Ammar Shadiq
Hi Lewis,

My nutch wiki user name is "AmmarShadiq".

My interest with nutch so far would be precise crawling.

\Ammar

On Sat, Jan 23, 2016 at 1:49 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Ammar,
> CC dev@
> Apologies I must have missed the post!
> Well... I've created a new entry on the wiki for you to register your
> interest. Please provide me with your wiki username and I'll grant you
> write access to the wiki.
> It would be great if we could hash out here what you are interested in and
> what would make a good project.
> Lets do a but of brainstorming here and see where we get.
> Lewis
>
>
> On Fri, Jan 22, 2016 at 2:59 PM, Ammar Shadiq 
> wrote:
>
>> Hi Lewis,
>>
>> I've write to dev list several month ago (
>> http://www.mail-archive.com/dev%40nutch.apache.org/msg19783.html) and
>> doesn't have any reply so far.
>> I would appreciate use your suggestion.
>>
>> Warmest regards
>> Ammar Shadiq
>>
>> On Tue, Nov 3, 2015 at 3:28 AM, Lewis John Mcgibbney <
>> lewis.mcgibb...@gmail.com> wrote:
>>
>>> Hi Ammar,
>>> I have a few suggestions but in all honesty I would write to the Nutch
>>> dev@ list and ask there.
>>> The PMC have not really started thinking about GSoC yet so your
>>> conversation would be really good.
>>> Let's take it from there.
>>>
>>>
>>> On Sunday, November 1, 2015, Ammar Shadiq 
>>> wrote:
>>>
 Hi Lewis,

 Several years ago I've submitted GSoC proposal for development of Nutch
 screen scrapper plugin https://issues.apache.org/jira/browse/NUTCH-978 and 
 couldn't
 re-participate for GSoc 2012 because i'm not a student anymore. But i'm
 currently pursuing my master degree, and eligible to  participate for the
 next year GSoC. I'm interested in contributing for apache Nutch for GSoC
 2016 and i need suggestion for what project/feature available. could you
 give any advice?

 --
 Thank you,
 Ammar Shadiq
 http://ammarshadiq.web.id/

>>>
>>>
>>> --
>>> *Lewis*
>>>
>>>
>>
>>
>> --
>> Thank you,
>> Ammar Shadiq
>> http://ammarshadiq.web.id/
>>
>
>
>
> --
> *Lewis*
>



-- 
Thank you,
Ammar Shadiq
http://ammarshadiq.web.id/


[jira] [Commented] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-22 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113423#comment-15113423
 ] 

Lewis John McGibbney commented on NUTCH-1741:
-

I'm nearly finished updating v6 patch for 2.X and will commit once this is 
done. This was not suitable for inclusion in 2.3.1 as it was not a big fix. It 
is now good for inclusion in 2.4.

> Support of Sitemaps in Nutch 2.x
> 
>
> Key: NUTCH-1741
> URL: https://issues.apache.org/jira/browse/NUTCH-1741
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher, generator
>Reporter: Alparslan Avcı
>Assignee: cihad güzel
>  Labels: gsoc2015
> Fix For: 2.4
>
> Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch, 
> NUTCH-1741-v4.patch, NUTCH-1741.patch, NUTCH-1741v5.patch, 
> NUTCH-1741v6.patch, SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed 
> in NUTCH-1465 for trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-22 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1741:

Assignee: cihad güzel

> Support of Sitemaps in Nutch 2.x
> 
>
> Key: NUTCH-1741
> URL: https://issues.apache.org/jira/browse/NUTCH-1741
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher, generator
>Reporter: Alparslan Avcı
>Assignee: cihad güzel
>  Labels: gsoc2015
> Fix For: 2.4
>
> Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch, 
> NUTCH-1741-v4.patch, NUTCH-1741.patch, NUTCH-1741v5.patch, 
> NUTCH-1741v6.patch, SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed 
> in NUTCH-1465 for trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: need suggestion for GSoC 2016

2016-01-22 Thread Lewis John Mcgibbney
Hi Ammar,
CC dev@
Apologies I must have missed the post!
Well... I've created a new entry on the wiki for you to register your
interest. Please provide me with your wiki username and I'll grant you
write access to the wiki.
It would be great if we could hash out here what you are interested in and
what would make a good project.
Lets do a but of brainstorming here and see where we get.
Lewis


On Fri, Jan 22, 2016 at 2:59 PM, Ammar Shadiq 
wrote:

> Hi Lewis,
>
> I've write to dev list several month ago (
> http://www.mail-archive.com/dev%40nutch.apache.org/msg19783.html) and
> doesn't have any reply so far.
> I would appreciate use your suggestion.
>
> Warmest regards
> Ammar Shadiq
>
> On Tue, Nov 3, 2015 at 3:28 AM, Lewis John Mcgibbney <
> lewis.mcgibb...@gmail.com> wrote:
>
>> Hi Ammar,
>> I have a few suggestions but in all honesty I would write to the Nutch
>> dev@ list and ask there.
>> The PMC have not really started thinking about GSoC yet so your
>> conversation would be really good.
>> Let's take it from there.
>>
>>
>> On Sunday, November 1, 2015, Ammar Shadiq  wrote:
>>
>>> Hi Lewis,
>>>
>>> Several years ago I've submitted GSoC proposal for development of Nutch
>>> screen scrapper plugin https://issues.apache.org/jira/browse/NUTCH-978 and 
>>> couldn't
>>> re-participate for GSoc 2012 because i'm not a student anymore. But i'm
>>> currently pursuing my master degree, and eligible to  participate for the
>>> next year GSoC. I'm interested in contributing for apache Nutch for GSoC
>>> 2016 and i need suggestion for what project/feature available. could you
>>> give any advice?
>>>
>>> --
>>> Thank you,
>>> Ammar Shadiq
>>> http://ammarshadiq.web.id/
>>>
>>
>>
>> --
>> *Lewis*
>>
>>
>
>
> --
> Thank you,
> Ammar Shadiq
> http://ammarshadiq.web.id/
>



-- 
*Lewis*


[Nutch Wiki] Trivial Update of "GoogleSummerOfCode" by LewisJohnMcgibbney

2016-01-22 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "GoogleSummerOfCode" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/GoogleSummerOfCode?action=diff&rev1=14&rev2=15

  === Ideas ===
  You can see GSoC ideas from 
[[https://wiki.apache.org/nutch/GoogleSummerOfCode/Ideas|this page.]]
  == Projects ==
+ 
+ === 2016 ===
+ List of accepted projects for GSoC 2016 are listed below. Both students and 
mentors are encouraged to sign up as well as 
[[http://nutch.apache.org/mailing_lists.html|discuss ideas on the community 
mailing lists]].
+ 
+ ||'''Student'''||'''Project'''||'''Mentor(s)'''||
+ 
+ 
+ 
+ -
  
  {{attachment:gsoc2015.png}}
  === 2015 ===


[jira] [Commented] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2016-01-22 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113380#comment-15113380
 ] 

Lewis John McGibbney commented on NUTCH-2171:
-

Hey [~jorgelbg] feel free to assign this to yourself. It would be a reasonably 
large patch touching a number of files but it would be a real valuable 
contribution.

> Upgrade Nutch Trunk to Java 1.8
> ---
>
> Key: NUTCH-2171
> URL: https://issues.apache.org/jira/browse/NUTCH-2171
> Project: Nutch
>  Issue Type: Task
>Reporter: Lewis John McGibbney
>
> Lambda expressions are fantastic. I tried to undertake a small exercise which 
> would indicate how many we could implement however this was a fruitless 
> effort. A patch is going to be a better approach. This task involves 
> upgrading various properties in default.properties as well as a systemic 
> source code analysis with the aim of implementing Java 8 goodies throughout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2204) Remove junit lib from runtime

2016-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113206#comment-15113206
 ] 

Hudson commented on NUTCH-2204:
---

SUCCESS: Integrated in Nutch-trunk #3341 (See 
[https://builds.apache.org/job/Nutch-trunk/3341/])
NUTCH-2204 : revert erroneous commit (snagel: 
[http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1726318])
* trunk/conf/regex-normalize.xml.template
NUTCH-2204 Remove junit lib from runtime (snagel: 
[http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1726314])
* trunk/CHANGES.txt
* trunk/conf/regex-normalize.xml.template
* trunk/ivy/ivy.xml


> Remove junit lib from runtime
> -
>
> Key: NUTCH-2204
> URL: https://issues.apache.org/jira/browse/NUTCH-2204
> Project: Nutch
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.12
>
> Attachments: NUTCH-2204.patch
>
>
> The junit library is shipped in the Nutch bin package as an unnecessary 
> dependency (apache-nutch-1.11/lib/junit-3.8.1.jar). Unit tests use a 
> different library version:
> {noformat}
> % ls build/lib/junit* build/test/lib/junit*
> build/lib/junit-3.8.1.jar  build/test/lib/junit-4.11.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2016-01-22 Thread Jorge Luis Betancourt Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113192#comment-15113192
 ] 

Jorge Luis Betancourt Gonzalez commented on NUTCH-2171:
---

Perhaps an approach using checkstyle could be useful, combined with this recipe 
http://www.puppycrawl.com/blog/2015/09/03/checkstyle-force-lambdas.html could 
help us move forward. This could address at least the code analysis part.

> Upgrade Nutch Trunk to Java 1.8
> ---
>
> Key: NUTCH-2171
> URL: https://issues.apache.org/jira/browse/NUTCH-2171
> Project: Nutch
>  Issue Type: Task
>Reporter: Lewis John McGibbney
>
> Lambda expressions are fantastic. I tried to undertake a small exercise which 
> would indicate how many we could implement however this was a fruitless 
> effort. A patch is going to be a better approach. This task involves 
> upgrading various properties in default.properties as well as a systemic 
> source code analysis with the aim of implementing Java 8 goodies throughout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2204) remove junit lib from runtime

2016-01-22 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2204.

Resolution: Fixed

Committed to trunk, r1726318.

> remove junit lib from runtime
> -
>
> Key: NUTCH-2204
> URL: https://issues.apache.org/jira/browse/NUTCH-2204
> Project: Nutch
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.12
>
> Attachments: NUTCH-2204.patch
>
>
> The junit library is shipped in the Nutch bin package as an unnecessary 
> dependency (apache-nutch-1.11/lib/junit-3.8.1.jar). Unit tests use a 
> different library version:
> {noformat}
> % ls build/lib/junit* build/test/lib/junit*
> build/lib/junit-3.8.1.jar  build/test/lib/junit-4.11.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2204) Remove junit lib from runtime

2016-01-22 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2204:
---
Summary: Remove junit lib from runtime  (was: remove junit lib from runtime)

> Remove junit lib from runtime
> -
>
> Key: NUTCH-2204
> URL: https://issues.apache.org/jira/browse/NUTCH-2204
> Project: Nutch
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.12
>
> Attachments: NUTCH-2204.patch
>
>
> The junit library is shipped in the Nutch bin package as an unnecessary 
> dependency (apache-nutch-1.11/lib/junit-3.8.1.jar). Unit tests use a 
> different library version:
> {noformat}
> % ls build/lib/junit* build/test/lib/junit*
> build/lib/junit-3.8.1.jar  build/test/lib/junit-4.11.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2204) remove junit lib from runtime

2016-01-22 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113021#comment-15113021
 ] 

Julien Nioche commented on NUTCH-2204:
--

+1

> remove junit lib from runtime
> -
>
> Key: NUTCH-2204
> URL: https://issues.apache.org/jira/browse/NUTCH-2204
> Project: Nutch
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.12
>
> Attachments: NUTCH-2204.patch
>
>
> The junit library is shipped in the Nutch bin package as an unnecessary 
> dependency (apache-nutch-1.11/lib/junit-3.8.1.jar). Unit tests use a 
> different library version:
> {noformat}
> % ls build/lib/junit* build/test/lib/junit*
> build/lib/junit-3.8.1.jar  build/test/lib/junit-4.11.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2204) remove junit lib from runtime

2016-01-22 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2204:
---
Attachment: NUTCH-2204.patch

> remove junit lib from runtime
> -
>
> Key: NUTCH-2204
> URL: https://issues.apache.org/jira/browse/NUTCH-2204
> Project: Nutch
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.12
>
> Attachments: NUTCH-2204.patch
>
>
> The junit library is shipped in the Nutch bin package as an unnecessary 
> dependency (apache-nutch-1.11/lib/junit-3.8.1.jar). Unit tests use a 
> different library version:
> {noformat}
> % ls build/lib/junit* build/test/lib/junit*
> build/lib/junit-3.8.1.jar  build/test/lib/junit-4.11.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2204) remove junit lib from runtime

2016-01-22 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2204:
--

 Summary: remove junit lib from runtime
 Key: NUTCH-2204
 URL: https://issues.apache.org/jira/browse/NUTCH-2204
 Project: Nutch
  Issue Type: Improvement
  Components: build
Affects Versions: 1.11
Reporter: Sebastian Nagel
Priority: Trivial
 Fix For: 1.12


The junit library is shipped in the Nutch bin package as an unnecessary 
dependency (apache-nutch-1.11/lib/junit-3.8.1.jar). Unit tests use a different 
library version:
{noformat}
% ls build/lib/junit* build/test/lib/junit*
build/lib/junit-3.8.1.jar  build/test/lib/junit-4.11.jar
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)