Thanks  Sebastian  ,

Now after I have looked into the Jira issue, I knew the cause of the
confusion. I have used all previous versions of Nutch up to 1.13. I recall
seeing these messages in the pseudo or cluster modes. The log issue was
added recently, it was confusing me.

Do you think it will be better to use the Nutch server and monitor the jobs
and their statuses? I will then delete the failed ones.


Regards
Ameer



On Wed, Feb 20, 2019 at 8:58 PM Sebastian Nagel <wastl.na...@googlemail.com>
wrote:

> Hi Ameer,
>
> (bringing this back to user@nutch - sorry, I hit the wrong reply to)
>
> > So, does that mean we do not have the standalone mode anymore as it used
> be in the past
>
> Nutch is based on Hadoop since the beginning and the "local" mode is an
> emulated Hadoop system in a
> single process/JVM.
> There has been no change to this behavior in recent Nutch versions.
>
> > Any thoughts in getting back the old behavior with no jobs being created
> in the
> > *tmp* directory.
>
> The issues with the /tmp directory have ever been there in local mode, see
>   http://lucene.472066.n3.nabble.com/tmp-folder-problem-td4008834.html
>
> In local mode, you can change the temporary folder used by Hadoop via the
> Java
> option
>   -Dhadoop.tmp.dir
>
> With bin/nutch or bin/crawl this is done by setting the environment
> variable NUTCH_OPTS
>
>   export NUTCH_OPTS=-Dhadoop.tmp.dir=/my/nutch/tmpdir
>
> Then all temporary data is written to /my/nutch/tmpdir but you're still
> responsible
> to clean-up this folder.
>
>
> > It confuses me to see these messages
>
> You can suppress them by removing the following lines in
> conf/log4j.properties:
>
> # log mapreduce job messages and counters
> log4j.logger.org.apache.hadoop.mapreduce.Job=INFO
>
> However, for debugging these messages are really useful, esp. the job
> counters.
> See https://issues.apache.org/jira/browse/NUTCH-2519
>
>
> Best,
> Sebastian
>
>
>
> On 2/19/19 11:01 PM, Ameer Tawfik wrote:
> > Thanks Sebastian for the reply.
> >
> > So, does that mean we do not have the standalone mode anymore as it used
> be in the past. It confuses
> > me to see these messages
> >
> >  The url to track the job: http://localhost:8080/
> > 2019-02-20 04:48:08,156 INFO  mapreduce.Job - Running job:
> job_local2035597620_0001
> > 2019-02-20 04:48:09,159 INFO  mapreduce.Job - J*ob
> job_local2035597620_0001* running in uber mode :
> > false
> > 2019-02-20 04:48:09,161 INFO  mapreduce.Job -  *map 0% reduce 100%*
> > 2019-02-20 04:48:09,163 INFO  mapreduce.Job - J*ob
> job_local2035597620_0001 *completed successfully
> > 2019-02-20 04:48:09,194 INFO  mapreduce.Job - Counters: 24
> >
> > In addition, it starts to create problems as these jobs accumulated in
> > the */tmp/hadoop-ameer/mapred/local/localRunner/ameer/jobcache/
> *directory* *and eats up the
> > harddisk space. Any thoughts in getting back the old behavior with no
> jobs being created in the
> > *tmp* directory. It also seems slow to me.
> >
> > Regards
> > Ameer
> >
> >
> >
> > On Wed, Feb 20, 2019 at 6:10 AM Sebastian Nagel <
> wastl.na...@googlemail.com
> > <mailto:wastl.na...@googlemail.com>> wrote:
> >
> >     Hi Ameer,
> >
> >     yes, you're correct.  If launched by
> >       runtime/local/bin/nutch
> >     resp.
> >       runtime/local/bin/crawl
> >     Nutch runs in "local" mode - Hadoop is "emulated" running HDFS, job
> and task clients
> >     in a single process (JVM).
> >
> >     The other options are:
> >      - pseudo-distributed mode: HDFS namenode and datanode, job and task
> clients
> >        as multiple processes on a single node
> >      - fully distributed mode: multiple processes on multiple nodes
> >
> >     Best,
> >     Sebastian
> >
> >
> >
> >     On 2/19/19 7:03 PM, atawfik wrote:
> >     > Hi all,
> >     >
> >     > I downloaded Nutch 1.15 and built using *ant runtime*. When I
> issue the
> >     > following crawl command from *runtime/local*
> >     >
> >     >
> >     >
> >     > Nutch generates hadoop jobs and  hadoop single node logs. See the
> content of
> >     > the *hadoop.log* file below:
> >     >
> >     >
> >     >
> >     > If I understand right, it seems that nutch is running in a
> SingleNode mode.
> >     > We are not running Nutch in a cluster. We are just running locally.
> >     >
> >     > Please correct me if I misunderstood anything.
> >     >
> >     > Regards
> >     > Ameer
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > Sent from:
> http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html
> >     >
> >
>
>

Reply via email to