Re: Nutch Plugins Source Control

2017-04-07 Thread lewis john mcgibbney
Hi Folks,

Maven build is actually pretty close now. We need to bring the following
branch up-to-date with 1.14 then stabilize tests... then it is good to
propose as a PR for 1.14-SNAPSHOT.
Transferring this work over to 2.x will be much easier than the work done
for master branch.
I'm over on JIRA discussing this on the ticket.
Lewis

On Fri, Apr 7, 2017 at 2:24 PM, <user-digest-h...@nutch.apache.org> wrote:

>
> From: Chris Mattmann <mattm...@apache.org>
> To: "user@nutch.apache.org" <user@nutch.apache.org>
> Cc:
> Bcc:
> Date: Fri, 07 Apr 2017 10:03:46 -0700
> Subject: Re: Nutch Plugins Source Control
> Thanks Julien.
>
> We do intend to publish the artifacts to Central, so they should be
> available in the org.apache tree.
>
> Thamme, Lewis, any update on the Mavenization?
>
>


Re: Nutch Plugins Source Control

2017-04-07 Thread Chris Mattmann
Thanks Julien.

We do intend to publish the artifacts to Central, so they should be
available in the org.apache tree.

Thamme, Lewis, any update on the Mavenization?

Cheers,
Chris




On 4/7/17, 7:28 AM, "Julien Nioche"  wrote:

Hi Ben

On 7 April 2017 at 15:10, Ben Vachon  wrote:

> Hi Isroudi,
>
> I am not working with an install of Nutch, I'm just working with the jar I
> got via maven, and it doesn't have any of the plugins.
>
> I could build the plugins into the project myself, but to do that I would
> need to download them and copy their source files into my project. I don't
> want to own the source for the plugins.
>

That's one of the current limitations of Nutch and a major difference with
e.g. StormCrawler. You'll probably need the shell scripts as well as the
jars + the config files etc... so in practice you'd download the whole code
and build.


>
> Ideally, these plugin jars would be available as maven dependencies the
> same way that Nutch is.
>
> Is there a plan to deploy the default plugins as maven artifacts? If not,
> I would like to request that this happen.
>

There are plans to build with Maven see

https://issues.apache.org/jira/browse/NUTCH-1371 (now closed)

and more recently

https://issues.apache.org/jira/browse/NUTCH-2292

Not sure whether the artefacts will be published. Feel free to discuss it
on JIRA 

HTH

Julien



>
> Thanks very much,
>
> Ben V.
>
>
>
> On 04/07/2017 09:48 AM, lsroudi abdel wrote:
>
>> hi,
>> i think you should add it in the ivy/ivy.xml and and just run ant runtime
>>
>> On Thu, Apr 6, 2017 at 9:35 PM, Ben Vachon  wrote:
>>
>> Hi all,
>>>
>>> I'm working on a project that gets Nutch 2.3.1 from maven and uses it to
>>> set off crawl jobs which are configurable in our own UI and through our
>>> own search platform's properties. To allow specific configuration of
>>> crawlers, I want to use many of the default plugins that come with a
>>> Nutch 2.x install.
>>>
>>> I have not been able to find the plugins in the org.apache.nutch jar or
>>> anywhere on maven. How do you recommend getting these plugins into our
>>> platform? Should I just copy the jars from an install and upload them to
>>> our own source control?
>>>
>>> Thanks,
>>>
>>> Ben V.
>>>
>>>
>>
>>
>


-- 

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble 





Re: Nutch Plugins Source Control

2017-04-07 Thread Ben Vachon

Hi Julien,

Thanks for the quick response! I will add a comment to the ticket 
requesting that the work happen for Nutch 2.x as well as 1.14 and that 
the plugins be publicly available when this work is done.



On 04/07/2017 10:28 AM, Julien Nioche wrote:

Hi Ben

On 7 April 2017 at 15:10, Ben Vachon  wrote:


Hi Isroudi,

I am not working with an install of Nutch, I'm just working with the jar I
got via maven, and it doesn't have any of the plugins.

I could build the plugins into the project myself, but to do that I would
need to download them and copy their source files into my project. I don't
want to own the source for the plugins.


That's one of the current limitations of Nutch and a major difference with
e.g. StormCrawler. You'll probably need the shell scripts as well as the
jars + the config files etc... so in practice you'd download the whole code
and build.
I've actually be having success without the shell scripts or very many 
config files. I've been building the configuration out of properties set 
through our own platform code in the process that's running Nutch and 
then just instantiating and running each of the required Nutch job 
classes in a loop (similarly to how the crawl script works).


My only real pain point is the lack of plugins.




Ideally, these plugin jars would be available as maven dependencies the
same way that Nutch is.

Is there a plan to deploy the default plugins as maven artifacts? If not,
I would like to request that this happen.


There are plans to build with Maven see

https://issues.apache.org/jira/browse/NUTCH-1371 (now closed)

and more recently

https://issues.apache.org/jira/browse/NUTCH-2292

Not sure whether the artefacts will be published. Feel free to discuss it
on JIRA 

HTH

Julien




Thanks very much,

Ben V.



On 04/07/2017 09:48 AM, lsroudi abdel wrote:


hi,
i think you should add it in the ivy/ivy.xml and and just run ant runtime

On Thu, Apr 6, 2017 at 9:35 PM, Ben Vachon  wrote:

Hi all,

I'm working on a project that gets Nutch 2.3.1 from maven and uses it to
set off crawl jobs which are configurable in our own UI and through our
own search platform's properties. To allow specific configuration of
crawlers, I want to use many of the default plugins that come with a
Nutch 2.x install.

I have not been able to find the plugins in the org.apache.nutch jar or
anywhere on maven. How do you recommend getting these plugins into our
platform? Should I just copy the jars from an install and upload them to
our own source control?

Thanks,

Ben V.










Re: Nutch Plugins Source Control

2017-04-07 Thread Julien Nioche
Hi Ben

On 7 April 2017 at 15:10, Ben Vachon  wrote:

> Hi Isroudi,
>
> I am not working with an install of Nutch, I'm just working with the jar I
> got via maven, and it doesn't have any of the plugins.
>
> I could build the plugins into the project myself, but to do that I would
> need to download them and copy their source files into my project. I don't
> want to own the source for the plugins.
>

That's one of the current limitations of Nutch and a major difference with
e.g. StormCrawler. You'll probably need the shell scripts as well as the
jars + the config files etc... so in practice you'd download the whole code
and build.


>
> Ideally, these plugin jars would be available as maven dependencies the
> same way that Nutch is.
>
> Is there a plan to deploy the default plugins as maven artifacts? If not,
> I would like to request that this happen.
>

There are plans to build with Maven see

https://issues.apache.org/jira/browse/NUTCH-1371 (now closed)

and more recently

https://issues.apache.org/jira/browse/NUTCH-2292

Not sure whether the artefacts will be published. Feel free to discuss it
on JIRA 

HTH

Julien



>
> Thanks very much,
>
> Ben V.
>
>
>
> On 04/07/2017 09:48 AM, lsroudi abdel wrote:
>
>> hi,
>> i think you should add it in the ivy/ivy.xml and and just run ant runtime
>>
>> On Thu, Apr 6, 2017 at 9:35 PM, Ben Vachon  wrote:
>>
>> Hi all,
>>>
>>> I'm working on a project that gets Nutch 2.3.1 from maven and uses it to
>>> set off crawl jobs which are configurable in our own UI and through our
>>> own search platform's properties. To allow specific configuration of
>>> crawlers, I want to use many of the default plugins that come with a
>>> Nutch 2.x install.
>>>
>>> I have not been able to find the plugins in the org.apache.nutch jar or
>>> anywhere on maven. How do you recommend getting these plugins into our
>>> platform? Should I just copy the jars from an install and upload them to
>>> our own source control?
>>>
>>> Thanks,
>>>
>>> Ben V.
>>>
>>>
>>
>>
>


-- 

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble 


Re: Nutch Plugins Source Control

2017-04-07 Thread Ben Vachon

Hi Isroudi,

I am not working with an install of Nutch, I'm just working with the jar 
I got via maven, and it doesn't have any of the plugins.


I could build the plugins into the project myself, but to do that I 
would need to download them and copy their source files into my project. 
I don't want to own the source for the plugins.


Ideally, these plugin jars would be available as maven dependencies the 
same way that Nutch is.


Is there a plan to deploy the default plugins as maven artifacts? If 
not, I would like to request that this happen.


Thanks very much,

Ben V.


On 04/07/2017 09:48 AM, lsroudi abdel wrote:

hi,
i think you should add it in the ivy/ivy.xml and and just run ant runtime

On Thu, Apr 6, 2017 at 9:35 PM, Ben Vachon  wrote:


Hi all,

I'm working on a project that gets Nutch 2.3.1 from maven and uses it to
set off crawl jobs which are configurable in our own UI and through our
own search platform's properties. To allow specific configuration of
crawlers, I want to use many of the default plugins that come with a
Nutch 2.x install.

I have not been able to find the plugins in the org.apache.nutch jar or
anywhere on maven. How do you recommend getting these plugins into our
platform? Should I just copy the jars from an install and upload them to
our own source control?

Thanks,

Ben V.








Re: Nutch Plugins Source Control

2017-04-07 Thread lsroudi abdel
hi,
i think you should add it in the ivy/ivy.xml and and just run ant runtime

On Thu, Apr 6, 2017 at 9:35 PM, Ben Vachon  wrote:

> Hi all,
>
> I'm working on a project that gets Nutch 2.3.1 from maven and uses it to
> set off crawl jobs which are configurable in our own UI and through our
> own search platform's properties. To allow specific configuration of
> crawlers, I want to use many of the default plugins that come with a
> Nutch 2.x install.
>
> I have not been able to find the plugins in the org.apache.nutch jar or
> anywhere on maven. How do you recommend getting these plugins into our
> platform? Should I just copy the jars from an install and upload them to
> our own source control?
>
> Thanks,
>
> Ben V.
>



-- 
Concepteur et développeur web symfony2
https://github.com/lsroudi
http://lsroudi.com/


Nutch Plugins Source Control

2017-04-06 Thread Ben Vachon

Hi all,

I'm working on a project that gets Nutch 2.3.1 from maven and uses it to
set off crawl jobs which are configurable in our own UI and through our
own search platform's properties. To allow specific configuration of
crawlers, I want to use many of the default plugins that come with a
Nutch 2.x install.

I have not been able to find the plugins in the org.apache.nutch jar or
anywhere on maven. How do you recommend getting these plugins into our
platform? Should I just copy the jars from an install and upload them to
our own source control?

Thanks,

Ben V.