Re: Nutch 1.7 and Hadoop Release 2.2.0

S.L Wed, 04 Dec 2013 06:11:50 -0800

Markus,

There seem to three logfiles inside the userlogs/
container_1386136488805_0971_01_000003/
<http://localhost.localdomain:50075/logs/userlogs/application_1386136488805_0971/container_1386136488805_0971_01_000003/>direcotry.
Those three are



stderr 
<http://localhost.localdomain:50075/logs/userlogs/application_1386136488805_0971/container_1386136488805_0971_01_000003/stderr>783
bytes Dec 4, 2013 9:08:59 AM stdout
<http://localhost.localdomain:50075/logs/userlogs/application_1386136488805_0971/container_1386136488805_0971_01_000003/stdout>0
bytes Dec 4, 2013 9:08:59 AM syslog
<http://localhost.localdomain:50075/logs/userlogs/application_1386136488805_0971/container_1386136488805_0971_01_000003/syslog>12300
bytes Dec 4, 2013 9:09:03 AM
I am thinking stdout should have the log statements , however it is empty 0
bytes as can be seen above, however the stderr has the following
statements. Can you please advise if this is the right place to look for
logs and if so why I cant see anything being logged here.

Statements in stderr :

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/general/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1386136488805_0971/filecache/10/job.jar/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1386136488805_0971/filecache/10/job.jar/lib/DynaOCrawlerUtils.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]




On Tue, Dec 3, 2013 at 4:24 AM, Markus Jelsma <[email protected]>wrote:

> Hmm, so it seems to work afterall! I see a lot of deprecated warnings but
> that can be easily fixed. It is most about job set up. The logs are in your
> Hadoop directory under the same name as the job name.
>
> -----Original message-----
> > From:S.L <[email protected]>
> > Sent: Tuesday 3rd December 2013 0:34
> > To: [email protected]; [email protected]
> > Subject: Re: Nutch 1.7 and Hadoop Release 2.2.0
> >
> > I am running it on a single machine.  I have no idea how to get to  the
> nutch logs and see what's going on.
> >
> > Sent from my HTC Inspire™ 4G on AT&T
> >
> > ----- Reply message -----
> > From: "Paul Inventado" <[email protected]>
> > To: <[email protected]>
> > Subject: Nutch 1.7 and Hadoop Release 2.2.0
> > Date: Mon, Dec 2, 2013 2:21 am
> >
> >
> > Wow you got it working! I can see that you're using Nutch 1.8? Are you
> > running on a single machine or a distributed cluster?
> >
> >
> > On Mon, Dec 2, 2013 at 3:22 AM, S.L <[email protected]> wrote:
> >
> > > Markus,
> > >
> > > I was able to set it up Nutch 1.7 on Hadoop 2.2 finally , I am using
> the
> > > following command to start it up
> > >
> > > bin/hadoop jar
> > >
> /home/general/workspace/nutch/runtime/deploy/apache-nutch-1.8-SNAPSHOT.job
> > > org.apache.nutch.crawl.Crawl urls -dir crawldirectory -depth 1000 -topN
> > > 30000
> > >
> > > And the intial log output I get from Hadoop is as follows, however I
> am not
> > > sure if the crawl is happening at all or if its happening at a dead
> slow
> > > pace.
> > >
> > > If I use the same single url to crawl as I use in my local crawl from
> > > eclipse to test I get the proper crawling speed however the hadoop job
> is
> > > as I mentioned eithe rnot running at all or is running at a slow pace.
> > >
> > > Please see the log below , you will see a WARN message that the Solr
> URL is
> > > not set , this is because I internally log the data into Solr  and I
> dont
> > > have a  Solr URL to mention into the arguments , so that warning
> message
> > > can be ignored.
> > >
> > > Please see the log below.
> > >
> > >
> > >
> ----------------------------------------------------------------------------------------------------------
> > >
> > >
> > > 13/12/01 14:15:45 WARN crawl.Crawl: solrUrl is not set, indexing will
> be
> > > skipped...
> > > 13/12/01 14:15:45 WARN util.NativeCodeLoader: Unable to load
> native-hadoop
> > > library for your platform... using builtin-java classes where
> applicable
> > > 13/12/01 14:15:46 INFO crawl.Crawl: crawl started in: crawldirectory
> > > 13/12/01 14:15:46 INFO crawl.Crawl: rootUrlDir = urls
> > > 13/12/01 14:15:46 INFO crawl.Crawl: threads = 30
> > > 13/12/01 14:15:46 INFO crawl.Crawl: depth = 1000
> > > 13/12/01 14:15:46 INFO crawl.Crawl: solrUrl=null
> > > 13/12/01 14:15:46 INFO crawl.Crawl: topN = 30000
> > > 13/12/01 14:15:46 INFO crawl.Injector: Injector: starting at 2013-12-01
> > > 14:15:46
> > > 13/12/01 14:15:46 INFO crawl.Injector: Injector: crawlDb:
> > > crawldirectory/crawldb
> > > 13/12/01 14:15:46 INFO crawl.Injector: Injector: urlDir: urls
> > > 13/12/01 14:15:46 INFO Configuration.deprecation: mapred.temp.dir is
> > > deprecated. Instead, use mapreduce.cluster.temp.dir
> > > 13/12/01 14:15:46 INFO crawl.Injector: Injector: Converting injected
> urls
> > > to crawl db entries.
> > > 13/12/01 14:15:46 INFO client.RMProxy: Connecting to ResourceManager
> at /
> > > 0.0.0.0:8032
> > > 13/12/01 14:15:46 INFO client.RMProxy: Connecting to ResourceManager
> at /
> > > 0.0.0.0:8032
> > > 13/12/01 14:15:47 INFO mapred.FileInputFormat: Total input paths to
> process
> > > : 1
> > > 13/12/01 14:15:47 INFO mapreduce.JobSubmitter: number of splits:2
> > > 13/12/01 14:15:47 INFO Configuration.deprecation: user.name is
> deprecated.
> > > Instead, use mapreduce.job.user.name
> > > 13/12/01 14:15:47 INFO Configuration.deprecation: mapred.jar is
> deprecated.
> > > Instead, use mapreduce.job.jar
> > > 13/12/01 14:15:47 INFO Configuration.deprecation:
> mapred.output.value.class
> > > is deprecated. Instead, use mapreduce.job.output.value.class
> > > 13/12/01 14:15:48 INFO Configuration.deprecation: mapred.job.name is
> > > deprecated. Instead, use mapreduce.job.name
> > > 13/12/01 14:15:48 INFO Configuration.deprecation: mapred.input.dir is
> > > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
> > > 13/12/01 14:15:48 INFO Configuration.deprecation: mapred.output.dir is
> > > deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
> > > 13/12/01 14:15:48 INFO Configuration.deprecation: mapred.map.tasks is
> > > deprecated. Instead, use mapreduce.job.maps
> > > 13/12/01 14:15:48 INFO Configuration.deprecation:
> mapred.output.key.class
> > > is deprecated. Instead, use mapreduce.job.output.key.class
> > > 13/12/01 14:15:48 INFO Configuration.deprecation: mapred.working.dir is
> > > deprecated. Instead, use mapreduce.job.working.dir
> > > 13/12/01 14:15:48 INFO mapreduce.JobSubmitter: Submitting tokens for
> job:
> > > job_1385868843066_1747
> > > 13/12/01 14:15:48 INFO impl.YarnClientImpl: Submitted application
> > > application_1385868843066_1747 to ResourceManager at /0.0.0.0:8032
> > > 13/12/01 14:15:48 INFO mapreduce.Job: The url to track the job:
> > >
> http://localhost.localdomain:8088/proxy/application_1385868843066_1747/
> > > 13/12/01 14:15:48 INFO mapreduce.Job: Running job:
> job_1385868843066_1747
> > >
> > >
> > >
> > > On Mon, Nov 25, 2013 at 6:51 AM, Markus Jelsma
> > > <[email protected]>wrote:
> > >
> > > > I'm not sure it works, i think i've seen some issues with it. You
> can try
> > > > though
> > > >
> > > >
> > > > -----Original message-----
> > > > > From:S.L <[email protected]>
> > > > > Sent: Monday 25th November 2013 2:38
> > > > > To: [email protected]
> > > > > Subject: Nutch 1.7 and Hadoop Release 2.2.0
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I am trying to set up a single node Hadoop cluster and noticed
> that the
> > > > > Hadoop 2.2 release has been made available as a GA release
> candidate.
> > > > >
> > > > > As I am using Nutch 1.7 , I was wondering if it was compatible with
> > > > Hadoop
> > > > > 2.2 , please let me know .
> > > > >
> > > > > Thanks in advance!
> > > >
> > >
> >
> >
> >
> > --
> > Paul Inventado
> > Waagle
> > [email protected]
> > www.waagle.com
> >
> > ************************************************
> > All material herein is intended for information purposes only and has
> been
> > compiled from sources deemed reliable. Though information is believed to
> be
> > correct, it is presented subject to errors, omissions, changes or
> > withdrawal without notice. The information in this electronic mail
> message
> > is the sender's business confidential and may be legally privileged. It
> is
> > intended solely for the addressee(s). Access to this internet electronic
> > mail message by anyone else is unauthorized. If you are not the intended
> > recipient, any disclosure, copying, distribution or any action taken or
> > omitted to be taken in reliance on it is prohibited and may be unlawful.
> > The sender believes that this E-mail and any attachments were free of any
> > virus, worm, Trojan horse, and/or malicious code when sent. This message
> > and its attachments could have been infected during transmission. By
> > reading the message and opening any attachments, the recipient accepts
> full
> > responsibility for taking protective and remedial action about viruses
> and
> > other defects. The sender's employer is not liable for any loss or damage
> > arising in any way from this message or its attachments. Owned and
> operated
> > by Waagle, Inc.
> > ************************************************
> >
>

Re: Nutch 1.7 and Hadoop Release 2.2.0

Reply via email to