Too many fether failures

2009-07-12 Thread lei wang
This is also how I fixed this problem.

On 6/21/08, Sayali Kulkarni sayali_s_kulka...@yahoo.co.in wrote:

 Hi!

 My problem of Too many fetch failures as well as shuffle error was
 resolved when I added the list of all the slave machines in the /etc/hosts
 file.

 Earlier on every slave I just had the entries of the master and own
machine
 in the /etc/hosts file. But now I have updated all the /etc/hosts files to
 include the IP address and the names of all the machines in the cluster
and
 my problem is resolved.

 One question still,
 I currently have just 5-6 nodes. But when Hadoop is deployed on a larger
 cluster, say of 1000+ nodes, is it expected that every time a new machine
is
 added to the cluster, you add an entry in the /etc/hosts of all the
(1000+)
 machines in the cluster?


 Regards,
 Sayali

 Sayali Kulkarni sayali_s_kulka...@yahoo.co.in wrote:
  Can you post the reducer logs. How many nodes are there in the cluster?
 There are 6 nodes in the cluster - 1 master and 5 slaves
 I tried to reduce the number of nodes, and found that the problem is
solved
 only if there is a single node in the cluster. So I can deduce that the
 problem is there in some configuration.

 Configuration file:
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration

 property
 namehadoop.tmp.dir/name
 value/extra/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-${user.name}/value
 descriptionA base for other temporary directories./description
 /property

 property
 namefs.default.name/name
 valuehdfs://10.105.41.25:54310/value
   descriptionThe name of the default file system.  A URI whose
 scheme and authority determine the FileSystem implementation.  The
 uri's scheme determines the config property (fs.SCHEME.impl) naming
 the FileSystem implementation class.  The uri's authority is used to
 determine the host, port, etc. for a filesystem./description
 /property

 property
 namemapred.job.tracker/name
 value10.105.41.25:54311/value
 descriptionThe host and port that the MapReduce job tracker runs
 at.  If local, then jobs are run in-process as a single map
 and reduce task.
 /description
 /property

 property
 namedfs.replication/name
 value2/value
 descriptionDefault block replication.
 The actual  number of replications can be specified when the file is
 created.
 The default is used if replication is not specified in create time.
 /description
 /property


 property
 namemapred.child.java.opts/name
 value-Xmx1048M/value
 /property

 property
namemapred.local.dir/name
value/extra/HADOOP/hadoop-0.16.3/tmp/mapred/value
 /property

 property
 namemapred.map.tasks/name
 value53/value
 descriptionThe default number of map tasks per job.  Typically set
 to a prime several times greater than number of available hosts.
 Ignored when mapred.job.tracker is local.
   /description
 /property

 property
 namemapred.reduce.tasks/name
 value7/value
 descriptionThe default number of reduce tasks per job.  Typically set
 to a prime close to the number of available hosts.  Ignored when
 mapred.job.tracker is local.
 /description
 /property

 /configuration


 
 This is the output that I get when running the tasks with 2 nodes in the
 cluster:

 08/06/20 11:07:45 INFO mapred.FileInputFormat: Total input paths to
process
 : 1
 08/06/20 11:07:45 INFO mapred.JobClient: Running job:
job_200806201106_0001
 08/06/20 11:07:46 INFO mapred.JobClient:  map 0% reduce 0%
 08/06/20 11:07:53 INFO mapred.JobClient:  map 8% reduce 0%
 08/06/20 11:07:55 INFO mapred.JobClient:  map 17% reduce 0%
 08/06/20 11:07:57 INFO mapred.JobClient:  map  26% reduce 0%
 08/06/20 11:08:00 INFO mapred.JobClient:  map 34% reduce 0%
 08/06/20 11:08:01 INFO mapred.JobClient:  map 43% reduce 0%
 08/06/20 11:08:04 INFO mapred.JobClient:  map 47% reduce 0%
 08/06/20 11:08:05 INFO mapred.JobClient:  map 52% reduce 0%
 08/06/20 11:08:08 INFO mapred.JobClient:  map 60% reduce 0%
 08/06/20 11:08:09 INFO mapred.JobClient:  map 69% reduce 0%
 08/06/20 11:08:10 INFO mapred.JobClient:  map 73% reduce 0%
 08/06/20 11:08:12 INFO mapred.JobClient:  map 78% reduce 0%
 08/06/20 11:08:13 INFO mapred.JobClient:  map 82% reduce 0%
 08/06/20 11:08:15 INFO mapred.JobClient:  map 91% reduce 1%
 08/06/20 11:08:16 INFO mapred.JobClient:  map 95% reduce 1%
 08/06/20 11:08:18 INFO mapred.JobClient:  map 99% reduce 3%
 08/06/20 11:08:23 INFO mapred.JobClient:  map 100% reduce 3%
 08/06/20 11:08:25 INFO mapred.JobClient:  map 100% reduce 7%
 08/06/20  11:08:28 INFO mapred.JobClient:  map 100% reduce 10%
 08/06/20 11:08:30 INFO mapred.JobClient:  map 100% reduce 11%
 08/06/20 11:08:33 INFO mapred.JobClient:  map 100% reduce 12%
 08/06/20 11:08:35 INFO mapred.JobClient:  map 100% reduce 14%
 08/06/20 11:08:38 INFO mapred.JobClient:  map 100% reduce 15%
 08/06/20 11:09:54 INFO mapred.JobClient:  map 100% reduce 13%
 08/06/20 11:09:54 INFO mapred.JobClient: Task Id :
 

Search results return 0

2009-07-12 Thread Zaihan
Hi All,

I had been installing nutch on my server and crawling yields results,
however when I search on the site, it yields 0 results.

I do not know where to put the crawl directory in the tomcat folder, so if
you can give me a hint please do tell me as it's not written in the docs.
(That part is skipped).

On the command line, the results are returned as follows:

[u...@ogn003 engine]$ bin/nutch org.apache.nutch.searcher.NutchBean linux
Total hits: 14
 0 20090709133038/http://www.amjad.ws/
 ... تصÙÙÙات How To Linux Network php VB.NET Web ... to Windows Server
2008 Ubuntu Linux For Novices Building ...
 1 20090709133038/http://www.aramcode.net/vb/
 ... ÙسÙ
 ÙظاÙ
 تشغÙÙ
  Unix  Linux (ÙشاÙد٠1 زائر) تÙاÙØ´ ÙÙÙ ... Ù Ù
  ÙÙÙس بÙÙ
 
تÙزÙعات٠. ارشÙÙ .:. Linux Vs Windows ::... بÙاسطة Ù
Ø´ÙÙر007 07 ...
 2 20090709133038/http://www.bowlfr.org/
 ... et SMTP sous Mozilla-Thunerbird (Linux), Outlook (Microsoft), ainsi que
POP3 ...
 3 20090709133038/http://emaus.czest.pl/
 ... 6,7), FireFox, Opera (Win, Linux) Uwagi dotyczÄ
ce funkcjonalnoÅi i wyglÄ
du ...
 4 20090709133038/http://www.geocities.com/ivan_penkov/
 ... MS Windows 2000/NT/9x, Linux (RedHat), DOS / Win 3.11 ...
 5 20090709133038/http://www.mentor-it.dk/
 ... 6 cell Li-Ion, Full Linux 2.246,- ex moms   Mentor ...
 6 20090709133038/http://www.webmatters.co.uk/
 ... Cocoa/Objective-C, MySQL, Apache, Linux/Solaris/Mac OS X, C++ ...
 7 20090709133038/http://www.yale.edu/its/help/cmc.html
 ... FAS IT support Educational Technologies Linux Systems Design  Support
Social ...
 8 20090709133038/http://www.project-open.com/whitepapers/localization/
Multilingual Architecture Primer         This site has been
restructured in order to make it more accessible  ...
 9 20090709133038/http://m4rtin.com/
 ... notebooku, kde mi bľí perfektní linux ubuntu . O tom, že lze ...
jsem nikde nic lovit. ProstÄinux pro lidi. Share/Save no ...

Regards,
Zaihan





Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2009-07-12 Thread Andrzej Bialecki

lei wang wrote:

anyone help? so disappointed.

On Fri, Jul 10, 2009 at 4:29 PM, lei wang nutchmaill...@gmail.com wrote:


Yes, I am also occuring to  this problem. Can anyone help?


On Sun, Jul 5, 2009 at 11:33 PM, xiao yang yangxiao9...@gmail.com wrote:


I often get this error message while crawling the intranet
Is it the network problem? What can I do for it?

$bin/nutch crawl urls -dir crawl -depth 3 -topN 4

crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 4
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20090705212324
Generator: filtering: true
Generator: topN: 4
Generator: Partitioning selected urls by host, for politeness.
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Exception in thread main java.io.IOException: Job failed!
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
   at org.apache.nutch.crawl.Generator.generate(Generator.java:524)
   at org.apache.nutch.crawl.Generator.generate(Generator.java:409)
   at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)







If you are running a large crawl on a single machine, you could be 
running out of file descriptors - please check ulimit -n, the value 
should be much much larger than 1024.


Also, please check the hadoop.log for clues why shuffle fetching failed 
- this could be something trivial as a blocked port, or routing problem, 
or DNS resolution problem, or the problem I mentioned above.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



How to search part of words?

2009-07-12 Thread stefan . kaifer
Hi

how can I search part of words? For example I want to search for ghost, 
and I should already get all pages with the word ghostbusters within!
I get only the pages with the exact word! Where is the option for that?

Best reguards

Stefan

Problem with nutch

2009-07-12 Thread Pranay Gunna
Hi This is pranay.

I want to join the Nutch Form.I am new to nutch.

I have problem withe the ontology.I am not able to Clear the Cached Data.So can 
you please help me with that.

Problem:I have created different ontology for different user.But when i Search 
soem Word in Semantic Search.It is giving me previous users results also.So i 
ant the cached date to be cleaned.

When i stop the server and restart it.Its working fine.



  

Changing fieldsNorm at query time

2009-07-12 Thread ilayaraja
 Hi,,

   I observe that my search results are bad only because the fieldNorm is so 
high for bad results.

   Can anyone please suggest on how can we change the fieldNorm factor at 
search time though I know it is an indexed fiels.
   Is there a way can we set fieldNorm to a constatnt value while using Nutch 
for searching over the lucene index.


 Thanks
 Ilayaraja

Nutch Character encoding converter

2009-07-12 Thread Saurabh Suman

hi
Nutch has a auto detector for character encoding. Does it convert character
to standard encoding automatically, after detecting it?
-- 
View this message in context: 
http://www.nabble.com/Nutch-Character-encoding-converter-tp24456144p24456144.html
Sent from the Nutch - User mailing list archive at Nabble.com.



Re: Nutch Character encoding converter

2009-07-12 Thread Ken Krugler

Nutch has a auto detector for character encoding. Does it convert character
to standard encoding automatically, after detecting it?


Yes - Nutch converts text to Unicode for all subsequent processing.

-- Ken
--
Ken Krugler
+1 530-210-6378