Re: Unable to get regex-urlfilter working

2018-10-11 Thread lewis john mcgibbney
Hi Gajanan,
Seeing as you are using 2.x, are you making sure that the project has been
built with the correct   regex-urlfilter.txt being present on ClassPath and
included in the job jar you are using?

On Thu, Oct 11, 2018 at 12:19 AM  wrote:

>
>
> From: Gajanan Watkar 
> To: user@nutch.apache.org
> Cc:
> Bcc:
> Date: Wed, 10 Oct 2018 17:19:24 +0530
> Subject: Re: Unable to get regex-urlfilter working
> I am using Nutch 2.x with habse as backend storage.
>
> *-Gajanan*
>


RE: Nutch 1.15: Solr indexing issue

2018-10-11 Thread hany . nasr
Thank you so much.

They changed it dramatically. 
It is not accepting solr.server.url anymore and even old solr mapping xml file. 
Everything now under IndexWriter.xml

Kind regards, 
Hany Shehata
Solutions Architect, Marketing and Communications IT 
Corporate Functions | HSBC Operations, Services and Technology (HOST)
ul. Kapelanka 42A, 30-347 Kraków, Poland
__ 

Tie line: 7148 7689 4698 
External: +48 123 42 0698 
Mobile: +48 723 680 278 
E-mail: hany.n...@hsbc.com 
__ 
Protect our environment - please only print this if you have to!


-Original Message-
From: Yossi Tamari [mailto:yossi.tam...@pipl.com] 
Sent: Thursday, October 11, 2018 9:33 AM
To: user@nutch.apache.org
Subject: RE: Nutch 1.15: Solr indexing issue

I'm using 1.15, but not with Solr. However, the configuration of IndexWriters 
changed in 1.15, you may want to read 
https://wiki.apache.org/nutch/IndexWriters#Solr_indexer_properties.

Yossi.

> -Original Message-
> From: hany.n...@hsbc.com 
> Sent: 11 October 2018 10:20
> To: user@nutch.apache.org
> Subject: Nutch 1.15: Solr indexing issue
> 
> Hi All,
> 
> Anyone is using Nutch 1.15?
> 
> I am trying to index my crawled urls into Solr but it is indexing only 
> for http://localhost:8983/solr/nutch. Is it hard coded somewhere in the code?
> 
> When I created a nutch core, my urls are indexed into it and ignored 
> my solr.server.url property.
> 
> My crawl command is:
> 
> sudo bin/crawl -i -D 
> solr.server.url=http://localhost:8983/solr/website -s urls 
> /home/hany.nasr/apache-nutch-1.15/crawl 1
> 
> Kind regards,
> Hany Shehata
> Solutions Architect, Marketing and Communications IT Corporate 
> Functions | HSBC Operations, Services and Technology (HOST) ul. 
> Kapelanka 42A, 30-347 Kraków, Poland 
> _
> _
> 
> Tie line: 7148 7689 4698
> External: +48 123 42 0698
> Mobile: +48 723 680 278
> E-mail: hany.n...@hsbc.com
> _
> _
> Protect our environment - please only print this if you have to!
> 
> 
> 
> -
> SAVE PAPER - THINK BEFORE YOU PRINT!
> 
> This E-mail is confidential.
> 
> It may also be legally privileged. If you are not the addressee you 
> may not copy, forward, disclose or use any part of it. If you have 
> received this message in error, please delete it and all copies from 
> your system and notify the sender immediately by return E-mail.
> 
> Internet communications cannot be guaranteed to be timely secure, 
> error or virus-free.
> The sender does not accept liability for any errors or omissions.



***
This message originated from the Internet. Its originator may or may not be who 
they claim to be and the information contained in the message and any 
attachments may or may not be accurate.


 


-
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.  

It may also be legally privileged. If you are not the addressee you may not 
copy,
forward, disclose or use any part of it. If you have received this message in 
error,
please delete it and all copies from your system and notify the sender 
immediately by
return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or 
virus-free.
The sender does not accept liability for any errors or omissions.


RE: Nutch 1.15: Solr indexing issue

2018-10-11 Thread Yossi Tamari
I'm using 1.15, but not with Solr. However, the configuration of IndexWriters 
changed in 1.15, you may want to read 
https://wiki.apache.org/nutch/IndexWriters#Solr_indexer_properties.

Yossi.

> -Original Message-
> From: hany.n...@hsbc.com 
> Sent: 11 October 2018 10:20
> To: user@nutch.apache.org
> Subject: Nutch 1.15: Solr indexing issue
> 
> Hi All,
> 
> Anyone is using Nutch 1.15?
> 
> I am trying to index my crawled urls into Solr but it is indexing only for
> http://localhost:8983/solr/nutch. Is it hard coded somewhere in the code?
> 
> When I created a nutch core, my urls are indexed into it and ignored my
> solr.server.url property.
> 
> My crawl command is:
> 
> sudo bin/crawl -i -D solr.server.url=http://localhost:8983/solr/website -s 
> urls
> /home/hany.nasr/apache-nutch-1.15/crawl 1
> 
> Kind regards,
> Hany Shehata
> Solutions Architect, Marketing and Communications IT Corporate Functions |
> HSBC Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347
> Kraków, Poland
> _
> _
> 
> Tie line: 7148 7689 4698
> External: +48 123 42 0698
> Mobile: +48 723 680 278
> E-mail: hany.n...@hsbc.com
> _
> _
> Protect our environment - please only print this if you have to!
> 
> 
> 
> -
> SAVE PAPER - THINK BEFORE YOU PRINT!
> 
> This E-mail is confidential.
> 
> It may also be legally privileged. If you are not the addressee you may not 
> copy,
> forward, disclose or use any part of it. If you have received this message in
> error, please delete it and all copies from your system and notify the sender
> immediately by return E-mail.
> 
> Internet communications cannot be guaranteed to be timely secure, error or
> virus-free.
> The sender does not accept liability for any errors or omissions.



Nutch 1.15: Solr indexing issue

2018-10-11 Thread hany . nasr
Hi All,

Anyone is using Nutch 1.15?

I am trying to index my crawled urls into Solr but it is indexing only for 
http://localhost:8983/solr/nutch. Is it hard coded somewhere in the code?

When I created a nutch core, my urls are indexed into it and ignored my 
solr.server.url property.

My crawl command is:

sudo bin/crawl -i -D solr.server.url=http://localhost:8983/solr/website -s urls 
/home/hany.nasr/apache-nutch-1.15/crawl 1

Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT
Corporate Functions | HSBC Operations, Services and Technology (HOST)
ul. Kapelanka 42A, 30-347 Kraków, Poland
__

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.n...@hsbc.com
__
Protect our environment - please only print this if you have to!



-
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.  

It may also be legally privileged. If you are not the addressee you may not 
copy,
forward, disclose or use any part of it. If you have received this message in 
error,
please delete it and all copies from your system and notify the sender 
immediately by
return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or 
virus-free.
The sender does not accept liability for any errors or omissions.