RE: Depth option

2015-01-05 Thread Markus Jelsma
I would recommend to use the domain-urlfilter, it is the most straightforward 
method of controlling the list of hosts in the crawldb.
M

 
 
-Original message-
 From:Shadi Saleh propat...@gmail.com
 Sent: Sunday 4th January 2015 16:23
 To: user user@nutch.apache.org
 Subject: Depth option
 
 Hello,
 
 I want to check this point please.
 
 I am using crawl to crawl www.example.com with depth =1 option, So if that
 website contains url to other website e.g. www.example2.com nutch will not
 crawl it , is it enogh to use depth option or should I use url filer?
 
 
 Best
 
 
 -- 
 
 
 
 
 *Shadi SalehPh.D StudentInstitute of Formal and Applied LinguisticsFaculty
 of Mathematics and Physics*
 *-Charles University in Prague*
 
 *16017 Prague 6 - Czech Republic Mob +420773515578*
 


Depth option

2015-01-04 Thread Shadi Saleh
Hello,

I want to check this point please.

I am using crawl to crawl www.example.com with depth =1 option, So if that
website contains url to other website e.g. www.example2.com nutch will not
crawl it , is it enogh to use depth option or should I use url filer?


Best


-- 




*Shadi SalehPh.D StudentInstitute of Formal and Applied LinguisticsFaculty
of Mathematics and Physics*
*-Charles University in Prague*

*16017 Prague 6 - Czech Republic Mob +420773515578*


Re: Depth option

2015-01-04 Thread Adil Ishaque Abbasi
Yes, you are correct. no need to use the url filter. But this will work
only if your crawldb remains empty.

Regards
Adil I. Abbasi

On Sun, Jan 4, 2015 at 8:22 PM, Shadi Saleh propat...@gmail.com wrote:

 Hello,

 I want to check this point please.

 I am using crawl to crawl www.example.com with depth =1 option, So if that
 website contains url to other website e.g. www.example2.com nutch will not
 crawl it , is it enogh to use depth option or should I use url filer?


 Best


 --




 *Shadi SalehPh.D StudentInstitute of Formal and Applied LinguisticsFaculty
 of Mathematics and Physics*
 *-Charles University in Prague*

 *16017 Prague 6 - Czech Republic Mob +420773515578*



Re: Depth option

2015-01-04 Thread Shadi Saleh
Thanks Adil,

crawldb is not empty, now it contains old and current folder, should I
clean it before I start new crawl? what is the proper way?

Best

On Sun, Jan 4, 2015 at 4:28 PM, Adil Ishaque Abbasi aiabb...@gmail.com
wrote:

 Yes, you are correct. no need to use the url filter. But this will work
 only if your crawldb remains empty.

 Regards
 Adil I. Abbasi

 On Sun, Jan 4, 2015 at 8:22 PM, Shadi Saleh propat...@gmail.com wrote:

  Hello,
 
  I want to check this point please.
 
  I am using crawl to crawl www.example.com with depth =1 option, So if
 that
  website contains url to other website e.g. www.example2.com nutch will
 not
  crawl it , is it enogh to use depth option or should I use url filer?
 
 
  Best
 
 
  --
 
 
 
 
  *Shadi SalehPh.D StudentInstitute of Formal and Applied
 LinguisticsFaculty
  of Mathematics and Physics*
  *-Charles University in Prague*
 
  *16017 Prague 6 - Czech Republic Mob +420773515578*
 




-- 




*Shadi SalehPh.D StudentInstitute of Formal and Applied LinguisticsFaculty
of Mathematics and Physics*
*-Charles University in Prague*

*16017 Prague 6 - Czech Republic Mob +420773515578*


Re: Depth option

2015-01-04 Thread Meraj A. Khan
Shadi,

I am not sure what will be the case if example.com itself has external
links,I think it will fetch those with depth 1,but  if you want to disbale
the fetching of external links , just set the external.links property to
false,you dont need any url filter set up if you do so.
On Jan 4, 2015 10:37 AM, Shadi Saleh propat...@gmail.com wrote:

 Thanks Adil,

 crawldb is not empty, now it contains old and current folder, should I
 clean it before I start new crawl? what is the proper way?

 Best

 On Sun, Jan 4, 2015 at 4:28 PM, Adil Ishaque Abbasi aiabb...@gmail.com
 wrote:

  Yes, you are correct. no need to use the url filter. But this will work
  only if your crawldb remains empty.
 
  Regards
  Adil I. Abbasi
 
  On Sun, Jan 4, 2015 at 8:22 PM, Shadi Saleh propat...@gmail.com wrote:
 
   Hello,
  
   I want to check this point please.
  
   I am using crawl to crawl www.example.com with depth =1 option, So if
  that
   website contains url to other website e.g. www.example2.com nutch will
  not
   crawl it , is it enogh to use depth option or should I use url filer?
  
  
   Best
  
  
   --
  
  
  
  
   *Shadi SalehPh.D StudentInstitute of Formal and Applied
  LinguisticsFaculty
   of Mathematics and Physics*
   *-Charles University in Prague*
  
   *16017 Prague 6 - Czech Republic Mob +420773515578*
  
 



 --




 *Shadi SalehPh.D StudentInstitute of Formal and Applied LinguisticsFaculty
 of Mathematics and Physics*
 *-Charles University in Prague*

 *16017 Prague 6 - Czech Republic Mob +420773515578*



Depth option

2015-01-04 Thread Adil Ishaque Abbasi
I believe you need to clean it.

Regards
Adil I. Abbasi

On Sun, Jan 4, 2015 at 8:35 PM, Shadi Saleh propat...@gmail.com
javascript:_e(%7B%7D,'cvml','propat...@gmail.com'); wrote:

 Thanks Adil,

 crawldb is not empty, now it contains old and current folder, should I
 clean it before I start new crawl? what is the proper way?

 Best

 On Sun, Jan 4, 2015 at 4:28 PM, Adil Ishaque Abbasi aiabb...@gmail.com
 javascript:_e(%7B%7D,'cvml','aiabb...@gmail.com');
 wrote:

  Yes, you are correct. no need to use the url filter. But this will work
  only if your crawldb remains empty.
 
  Regards
  Adil I. Abbasi
 
  On Sun, Jan 4, 2015 at 8:22 PM, Shadi Saleh propat...@gmail.com
 javascript:_e(%7B%7D,'cvml','propat...@gmail.com'); wrote:
 
   Hello,
  
   I want to check this point please.
  
   I am using crawl to crawl www.example.com with depth =1 option, So if
  that
   website contains url to other website e.g. www.example2.com nutch will
  not
   crawl it , is it enogh to use depth option or should I use url filer?
  
  
   Best
  
  
   --
  
  
  
  
   *Shadi SalehPh.D StudentInstitute of Formal and Applied
  LinguisticsFaculty
   of Mathematics and Physics*
   *-Charles University in Prague*
  
   *16017 Prague 6 - Czech Republic Mob +420773515578*
  
 



 --




 *Shadi SalehPh.D StudentInstitute of Formal and Applied LinguisticsFaculty
 of Mathematics and Physics*
 *-Charles University in Prague*

 *16017 Prague 6 - Czech Republic Mob +420773515578*




-- 
Regards
Adil I. Abbasi