Re: Drawbacks of Hadoop Pipes

2014-03-04 Thread Silvina Caíno Lores
Hi there,

I've been working with pipes for some months and I've finally managed to
get it working as I wanted with some legacy code I had. However, I had many
many issues regarding not only my implementation (it had to be adapted in
several ways to fit pipes, it is very restrictive) but pipes itself (bugs,
obscure errors and lack of proper logging with the subsequent mad
debugging).

I also tried streaming but I found it even more complex to debug and I
found some deal-breaker errors that I couldn't overcome regarding buffering
and such. I also tried a SWIG interface to wrap my code into a Java
library, I'd never recommend that for you might end up introducing a lot of
memory issues and potential bugs into your already working code, and you
basically don't get anything useful from it.

I've never worked with CUDA though, but it shouldn't be any different from
my Hadoop Pipes deployment besides the specific libraries you need. Be
prepared to deal with configuration issues and many esoteric logs,
nevertheless.

My advise, based in my experience, is that you should be 99% sure that your
original code is solid before migrating to Hadoop Pipes, you will have
enough problems there anyway.

Good luck on your work :)
Regards,
Silvina


On 3 March 2014 16:11, Basu,Indrashish  wrote:

>
> Hello,
>
> Anyone can help regarding the below query.
>
> Regards,
> Indrashish
>
>
> On Sat, 01 Mar 2014 13:52:11 -0500, Basu,Indrashish wrote:
>
>> Hello,
>>
>> I am trying to execute a CUDA benchmark in a Hadoop Framework and
>> using Hadoop Pipes for invoking the CUDA code which is written in a
>> C++ interface from the Hadoop Framework. I am just a bit interested in
>> knowing what can be the drawbacks of using Hadoop Pipes for this and
>> whether the implementation of Hadoop Streaming and JNI interface will
>> be a better choice. I am a bit unclear on this, so if anyone can throw
>> some light on this and clarify.
>>
>> Regards,
>> Indrashish
>>
>
> --
> Indrashish Basu
> Graduate Student
> Department of Electrical and Computer Engineering
> University of Florida
>


Re: Unable to export hadoop trunk into eclipse

2014-03-04 Thread nagarjuna kanamarlapudi
Yes I installed..

mvn clean install -DskipTests  was successful. Only import to eclipse is
failing.


On Tue, Mar 4, 2014 at 12:51 PM, Azuryy Yu  wrote:

> Have you installed protobuf on your computer?
>
> https://code.google.com/p/protobuf/downloads/list
>
>
>
> On Tue, Mar 4, 2014 at 3:08 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlap...@gmail.com> wrote:
>
>> Hi Ted,
>>
>> I didn't do that earlier.
>>
>> Now , I did it
>> mvn:eclipse eclipse
>>  and tried importing the projects same into eclipse. Now, this is
>> throwing the following errors
>>
>>
>> 1. No marketplace entries found to handle Execution compile-protoc, in
>> hadoop-common/pom.xml in Eclipse.  Please see Help for more information.
>> 2. No marketplace entries found to handle Execution compile-protoc, in
>> hadoop-hdfs/src/contrib/bkjournal/pom.xml in Eclipse.  Please see Help for
>> more information.
>>
>>
>> Any idea  ??
>>
>>
>> On Tue, Mar 4, 2014 at 10:59 AM, Ted Yu  wrote:
>>
>>> Have you run the following command under the root of your workspace ?
>>>
>>> mvn eclipse:eclipse
>>>
>>> On Mar 3, 2014, at 9:18 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlap...@gmail.com> wrote:
>>>
>>> Hi,
>>> I checked out the hadoop trunck from
>>> http://svn.apache.org/repos/asf/hadoop/common/trunk.
>>>
>>> I set up protobuf-2.5.0 and then did mvn  build.
>>> mvn clean install -DskipTests .. worked well. Maven build was
>>> Successful.
>>>
>>> So, I tried importing the project into eclipse.
>>>
>>> It is showing errors in pom.xml of hadoop-common project. Below are the
>>> errors .. Can some one help me here ?
>>>
>>> Plugin execution not covered by lifecycle configuration:
>>> org.apache.hadoop:hadoop-maven-plugins:
>>>  3.0.0-SNAPSHOT:version-info (execution: version-info, phase:
>>> generate-resources
>>>
>>>
>>> The error is at line 299  of pom.xml in hadoop-common project.
>>>
>>>
>>>  
>>> version-info
>>> generate-resources
>>> 
>>>   version-info
>>> 
>>> 
>>>   
>>> ${basedir}/src/main
>>> 
>>>   java/**/*.java
>>>   proto/**/*.proto
>>> 
>>>   
>>> 
>>>   
>>>   
>>>
>>> There are multiple projects which failed of that error, hadoop-common is
>>> one such project.
>>>
>>> Regards,
>>> Nagarjuna K
>>>
>>>
>>
>


decommissioning a node

2014-03-04 Thread John Lilley
Our cluster has a node that reboot randomly.  So I've gone to Ambari, 
decommissioned its HDFS service, stopped all services, and deleted the node 
from the cluster.  I expected and fsck to immediately show under-replicated 
blocks, but everything comes up fine.  How do I tell the cluster that this node 
is really gone, and it should start replicating the missing blocks?
Thanks
John




RE: decommissioning a node

2014-03-04 Thread John Lilley
OK, restarting all services now fsck shows under-replication.  Was it the 
NameNode restart?
John

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 5:47 AM
To: user@hadoop.apache.org
Subject: decommissioning a node

Our cluster has a node that reboot randomly.  So I've gone to Ambari, 
decommissioned its HDFS service, stopped all services, and deleted the node 
from the cluster.  I expected and fsck to immediately show under-replicated 
blocks, but everything comes up fine.  How do I tell the cluster that this node 
is really gone, and it should start replicating the missing blocks?
Thanks
John




Re: class org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto overrides final method getUnknownFields

2014-03-04 Thread Margusja

Thank you for replay, I got it work.

[hduser@vm38 ~]$ /usr/lib/hadoop-yarn/bin/yarn version
Hadoop 2.2.0.2.0.6.0-101
Subversion g...@github.com:hortonworks/hadoop.git -r 
b07b2906c36defd389c8b5bd22bebc1bead8115b

Compiled by jenkins on 2014-01-09T05:18Z
Compiled with protoc 2.5.0
From source with checksum 704f1e463ebc4fb89353011407e965
This command was run using 
/usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-101.jar

[hduser@vm38 ~]$

The main problem I think was I had yarn binary in two places and I used 
wrong one that didn't use my yarn-site.xml.
Every time I look into .staging/job.../job.xml there were values from 
yarn-default.xml even I set them in yarn-site.xml.


Typical mess up :)

Tervitades, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)"
-BEGIN PUBLIC KEY-
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
BjM8j36yJvoBVsfOHQIDAQAB
-END PUBLIC KEY-

On 04/03/14 05:14, Rohith Sharma K S wrote:

Hi

   The reason for " org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto 
overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet" is 
hadoop is compiled with protoc-2.5.0 version, but in the classpath lower version of 
protobuf is present.

1. Check MRAppMaster classpath, which version of protobuf is in classpath. 
Expected to have 2.5.0 version.



Thanks & Regards
Rohith Sharma K S



-Original Message-
From: Margusja [mailto:mar...@roo.ee]
Sent: 03 March 2014 22:45
To: user@hadoop.apache.org
Subject: Re: class org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto 
overrides final method getUnknownFields

Hi

2.2.0 and 2.3.0 gave me the same container log.

A little bit more details.
I'll try to use external java client who submits job.
some lines from maven pom.xml file:
  
org.apache.hadoop
hadoop-client
2.3.0
  
  
  org.apache.hadoop
  hadoop-core
  1.2.1
  

lines from external client:
...
2014-03-03 17:36:01 INFO  FileInputFormat:287 - Total input paths to process : 1
2014-03-03 17:36:02 INFO  JobSubmitter:396 - number of splits:1
2014-03-03 17:36:03 INFO  JobSubmitter:479 - Submitting tokens for job:
job_1393848686226_0018
2014-03-03 17:36:04 INFO  YarnClientImpl:166 - Submitted application
application_1393848686226_0018
2014-03-03 17:36:04 INFO  Job:1289 - The url to track the job:
http://vm38.dbweb.ee:8088/proxy/application_1393848686226_0018/
2014-03-03 17:36:04 INFO  Job:1334 - Running job: job_1393848686226_0018
2014-03-03 17:36:10 INFO  Job:1355 - Job job_1393848686226_0018 running in uber 
mode : false
2014-03-03 17:36:10 INFO  Job:1362 -  map 0% reduce 0%
2014-03-03 17:36:10 INFO  Job:1375 - Job job_1393848686226_0018 failed with 
state FAILED due to: Application application_1393848686226_0018 failed 2 times 
due to AM Container for
appattempt_1393848686226_0018_02 exited with  exitCode: 1 due to:
Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
  at org.apache.hadoop.util.Shell.run(Shell.java:379)
  at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
  at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
  at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
  at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)
...

Lines from namenode:
...
14/03/03 19:12:42 INFO namenode.FSEditLog: Number of transactions: 900 Total 
time for transactions(ms): 69 Number of transactions batched in
Syncs: 0 Number of syncs: 542 SyncTimes(ms): 9783
14/03/03 19:12:42 INFO BlockStateChange: BLOCK* addToInvalidates:
blk_1073742050_1226 90.190.106.33:50010
14/03/03 19:12:42 INFO hdfs.StateChange: BLOCK* allocateBlock:
/user/hduser/input/data666.noheader.data.
BP-802201089-90.190.106.33-1393506052071
blk_1073742056_1232{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[90.190.106.33:50010|RBW]]}
14/03/03 19:12:44 INFO hdfs.StateChange: BLOCK* InvalidateBlocks: ask
90.190.106.33:50010 to delete [blk_1073742050_1226]
14/03/03 19:12:53 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap
updated: 90.190.106.33:50010 is added to 
blk_1073742056_1232{blockU

Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
I have a file system with some missing/corrupt blocks.  However, running hdfs 
fsck -delete also fails with errors.  How do I get around this?
Thanks
John

[hdfs@metallica yarn]$ hdfs fsck -delete 
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld
Connecting to namenode via http://anthrax.office.datalever.com:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.57.110 for path 
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld at Tue Mar 04 
06:05:40 MST 2014
.
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld: CORRUPT 
blockpool BP-1827033441-192.168.57.112-1384284857542 block blk_1074200714

/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld: CORRUPT 
blockpool BP-1827033441-192.168.57.112-1384284857542 block blk_1074200741

/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld: CORRUPT 
blockpool BP-1827033441-192.168.57.112-1384284857542 block blk_1074200778

/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld: MISSING 3 
blocks of total size 299116266 B.Status: CORRUPT
Total size:299116266 B
Total dirs:0
Total files:   1
Total symlinks:0
Total blocks (validated):  3 (avg. block size 99705422 B)
  
  CORRUPT FILES:1
  MISSING BLOCKS:   3
  MISSING SIZE: 299116266 B
  CORRUPT BLOCKS:   3
  
Minimally replicated blocks:   0 (0.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 0.0
Corrupt blocks:3
Missing replicas:  0
Number of data-nodes:  8
Number of racks:   1
FSCK ended at Tue Mar 04 06:05:40 MST 2014 in 1 milliseconds
FSCK ended at Tue Mar 04 06:05:40 MST 2014 in 1 milliseconds
fsck encountered internal errors!


Fsck on path '/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld' 
FAILED


Question on DFS Balancing

2014-03-04 Thread divye sheth
Hi,

I am new to the mailing list.

I am using Hadoop 0.20.2 with an append r1056497 version. The question I
have is related to balancing. I have a 5 datanode cluster and each node has
2 disks attached to it. The second disk was added when the first disk was
reaching its capacity.

Now the scenario that I am facing is, when the new disk was added hadoop
automatically moved over some data to the new disk. But over the time I
notice that data is no longer being written to the second disk. I have also
faced an issue on the datanode where the first disk had 100% utilization.

How can I overcome such scenario, is it not hadoop's job to balance the
disk utilization between multiple disks on single datanode?

Thanks
Divye Sheth


Node manager or Resource Manager crash

2014-03-04 Thread Krishna Kishore Bonagiri
Hi,
  I am running an application on a 2-node cluster, which tries to acquire
all the containers that are available on one of those nodes and remaining
containers from the other node in the cluster. When I run this application
continuously in a loop, one of the NM or RM is getting killed at a random
point. There is no corresponding message in the log files.

One of the times that NM had got killed today, the tail of the it's log is
like this:

2014-03-04 02:42:44,386 DEBUG
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
isredeng:52867 sending out status for 16 containers
2014-03-04 02:42:44,386 DEBUG
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
health-status : true,


And at the time of NM's crash, the RM's log has the following entries:

2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
isredeng:52867 of type STATUS_UPDATE
2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
NODE_UPDATE
2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server
Responder: responding to
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
nodeUpdate: isredeng:52867 clusterResources:

2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Node being looked for scheduling isredeng:52867
availableResource: 
2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151


Note: the name of the node on which NM has got killed is isredeng, does it
indicate anything from the above message as to why it got killed?

Thanks,
Kishore


Meaning of messages in log and debugging

2014-03-04 Thread Yves Weissig
Hello list,

I'm currently debugging my Hadoop MR application and I have some general
questions to the messages in the log and the debugging process.

- What does "Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143" mean? What does 143 stand
for?

- I also see the following exception in the log: "Exception from
container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)". What does this mean? It
originates from a "Diagnostics report" from a container and the log4j
message level is set to INFO.

- Are there any related links which describe the life cycle of a container?

- Is there a "golden rule" to debug a Hadoop MR application?

- My application is very memory intense... is there any way to profile
the memory consumption of a single container?

Thanks!
Best regards
Yves



signature.asc
Description: OpenPGP digital signature


Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-04 Thread John Pauley
Outside hadoop: avro-1.7.6
Inside hadoop:  avro-mapred-1.7.6-hadoop2

From: Stanley Shi mailto:s...@gopivotal.com>>
Reply-To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Date: Monday, March 3, 2014 at 8:30 PM
To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Subject: Re: [hadoop] AvroMultipleOutputs 
org.apache.avro.file.DataFileWriter$AppendWriteException

which avro version are you using when running outside of hadoop?

Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]


On Mon, Mar 3, 2014 at 11:49 PM, John Pauley 
mailto:john.pau...@threattrack.com>> wrote:
This is cross posted to avro-user list 
(http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e).

Hello all,

I’m having an issue using AvroMultipleOutputs in a map/reduce job.  The issue 
occurs when using a schema that has a union of null and a fixed (among other 
complex types), default to null, and it is not null.  Please find the full 
stack trace below and a sample map/reduce job that generates an Avro container 
file and uses that for the m/r input.  Note that I can serialize/deserialize 
without issue using GenericDatumWriter/GenericDatumReader outside of hadoop…  
Any insight would be helpful.

Stack trace:
java.lang.Exception: org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of 
union in field baz of com.foo.bar.simple_schema
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of 
union in field baz of com.foo.bar.simple_schema
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
at 
org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77)
at 
org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378)
at 
com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78)
at 
com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema in 
union null of union in field baz of com.foo.bar.simple_schema
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
... 16 more
Caused by: java.lang.NullPointerException
at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457)
at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167)
at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608)
at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597)
at 
org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at 
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)

Sample m/r job:

package com.tts.ox.mapreduce.example.avro;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.g

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
More information from the NameNode log.  I don't understand... it is saying 
that I cannot delete the corrupted file until the NameNode leaves safe mode, 
but it won't leave safe mode until the file system is no longer corrupt.  How 
do I get there from here?
Thanks
john

2014-03-04 06:02:51,584 ERROR namenode.NameNode 
(NamenodeFsck.java:deleteCorruptedFile(446)) - Fsck: error deleting corrupted 
file /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete 
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld. Name node is in 
safe mode.
The reported blocks 169302 needs additional 36 blocks to reach the threshold 
1. of total blocks 169337.
Safe mode will be turned off automatically
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1063)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3141)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3101)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3085)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:697)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.deleteCorruptedFile(NamenodeFsck.java:443)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:426)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:206)
at 
org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:67)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.hadoop.hdfs.server.namenode.FsckServlet.doGet(FsckServlet.java:58)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 6:08 AM
To: user@hadoop.apache.org
Subject: Need help: fsck FAILs, refuses to clean up corrupt fs

I have a file system with some missing/corrupt blocks.  However, running hdfs 
fsck -delete also fails with errors.  How do I get around this?
Thanks
John

[hdfs@metallica yarn]$ hdfs fsck -delete 
/rpdm/tmp/Proj

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
Ah... found the answer.  I had to manually leave safe mode to delete the 
corrupt files.
john

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 9:33 AM
To: user@hadoop.apache.org
Subject: RE: Need help: fsck FAILs, refuses to clean up corrupt fs

More information from the NameNode log.  I don't understand... it is saying 
that I cannot delete the corrupted file until the NameNode leaves safe mode, 
but it won't leave safe mode until the file system is no longer corrupt.  How 
do I get there from here?
Thanks
john

2014-03-04 06:02:51,584 ERROR namenode.NameNode 
(NamenodeFsck.java:deleteCorruptedFile(446)) - Fsck: error deleting corrupted 
file /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete 
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld. Name node is in 
safe mode.
The reported blocks 169302 needs additional 36 blocks to reach the threshold 
1. of total blocks 169337.
Safe mode will be turned off automatically
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1063)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3141)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3101)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3085)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:697)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.deleteCorruptedFile(NamenodeFsck.java:443)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:426)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:206)
at 
org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:67)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.hadoop.hdfs.server.namenode.FsckServlet.doGet(FsckServlet.java:58)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 6:08 AM
To: user@hadoop.apache.

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread divye sheth
You can force namenode to leave safemode.

hadoop dfsadmin -safemode leave

Then run the hadoop fsck.

Thanks
Divye Sheth
On Mar 4, 2014 10:03 PM, "John Lilley"  wrote:

>  More information from the NameNode log.  I don't understand... it is
> saying that I cannot delete the corrupted file until the NameNode leaves
> safe mode, but it won't leave safe mode until the file system is no longer
> corrupt.  How do I get there from here?
>
> Thanks
>
> john
>
>
>
> 2014-03-04 06:02:51,584 ERROR namenode.NameNode
> (NamenodeFsck.java:deleteCorruptedFile(446)) - Fsck: error deleting
> corrupted file
> /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld
>
> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
> /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld. Name node
> is in safe mode.
>
> The reported blocks 169302 needs additional 36 blocks to reach the
> threshold 1. of total blocks 169337.
>
> Safe mode will be turned off automatically
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1063)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3141)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3101)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3085)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:697)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.deleteCorruptedFile(NamenodeFsck.java:443)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:426)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:206)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:67)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FsckServlet.doGet(FsckServlet.java:58)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>
> at
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>
> at
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
>
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>
> at
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>
> at
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>
> at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>
> at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>
> at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>
> at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>
> at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>
> at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>
> at org.mortbay.jetty.Server.handle(Server.java:326)
>
> at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>
> at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>
> at
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>
> at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(Q

Re: Hadoop Jobtracker cluster summary of heap size and OOME

2014-03-04 Thread Pabale Vikas
join the group


On Fri, Oct 11, 2013 at 10:28 PM, Viswanathan J
wrote:

> Hi,
>
> I'm running a 14 nodes Hadoop cluster with tasktrackers running in all
> nodes.
>
> Have set the jobtracker default memory size in hadoop-env.sh
>
> *HADOOP_HEAPSIZE="1024"*
>
> Have set the mapred.child.java.opts value in mapred-site.xml as,
>
> 
>   mapred.child.java.opts
> -Xmx2048m
>
>
> --
> Regards,
> Viswa.J
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "CDH Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdh-user+unsubscr...@cloudera.org.
> For more options, visit
> https://groups.google.com/a/cloudera.org/groups/opt_out.
>



-- 


 Regards.
Vikas S Pabale.
+919730198004


Re: Node manager or Resource Manager crash

2014-03-04 Thread Vinod Kumar Vavilapalli
I remember you asking this question before. Check if your OS' OOM killer is 
killing it.

+Vinod

On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri  
wrote:

> Hi,
>   I am running an application on a 2-node cluster, which tries to acquire all 
> the containers that are available on one of those nodes and remaining 
> containers from the other node in the cluster. When I run this application 
> continuously in a loop, one of the NM or RM is getting killed at a random 
> point. There is no corresponding message in the log files.
> 
> One of the times that NM had got killed today, the tail of the it's log is 
> like this:
> 
> 2014-03-04 02:42:44,386 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
> isredeng:52867 sending out status for 16 containers
> 2014-03-04 02:42:44,386 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's 
> health-status : true,
> 
> 
> And at the time of NM's crash, the RM's log has the following entries:
> 
> 2014-03-04 02:42:40,371 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing 
> isredeng:52867 of type STATUS_UPDATE
> 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
>  NODE_UPDATE
> 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server 
> Responder: responding to 
> org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 
> 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
> 2014-03-04 02:42:40,371 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  nodeUpdate: isredeng:52867 clusterResources: 
> 
> 2014-03-04 02:42:40,371 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Node being looked for scheduling isredeng:52867 
> availableResource: 
> 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151
> 
> 
> Note: the name of the node on which NM has got killed is isredeng, does it 
> indicate anything from the above message as to why it got killed?
> 
> Thanks,
> Kishore
> 
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Not information in Job History UI

2014-03-04 Thread SF Hadoop
That explains a lot.  Thanks for the information.  I appreciate your help.


On Mon, Mar 3, 2014 at 7:47 PM, Jian He  wrote:

> > You said, "there are no job logs generated on the server that is
> running the job.".
> that was quoting your previous sentence and answer your question..
>
> > If I were to run a job and I wanted to tail the job log as it was
> running, where would I find that log?
> 1) set yarn.nodemanager.delete.debug-delay-sec to be a larger value, and
> look for logs in local dirs specified by yarn.nodemanager.log-dirs.
> Or
> 2) enable log aggregation yarn.log-aggregation-enable. Log aggregation is
> to aggregate those NM local logs and upload them to HDFS once application
> is finished.Then you can use yarn logs command  or simply go the history UI
> to see the logs.
> You can find good explanation from
> http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
>
> Thanks.
>
>
> On Mon, Mar 3, 2014 at 4:29 PM, SF Hadoop  wrote:
>
>> Thanks for that info Jian.
>>
>> You said, "there are no job logs generated on the server that is running
>> the job.".  So am I correct in assuming the logs will be in the dir
>> specified by yarn.nodemanager.log-dirs on the datanodes?
>>
>> I am quite confused as to where the logs for each specific part of the
>> ecosystem reside.
>>
>> If I were to run a job and I wanted to tail the job log as it was
>> running, where would I find that log?
>>
>> Thanks for your help.
>>
>>
>>  On Mon, Mar 3, 2014 at 11:46 AM, Jian He  wrote:
>>
>>>  Note that node manager will not keep the finished applications and
>>> only show running apps,  so the UI won't show the finished apps.
>>>  Conversely, job history server UI will only show the finished apps but
>>> not the running apps.
>>>
>>> bq. there are no job logs generated on the server that is running the
>>> job.
>>> by default, the local logs will be deleted after job finished.  you can
>>> config yarn.nodemanager.delete.debug-delay-sec, to delay the deletion
>>> of the logs.
>>>
>>> Jian
>>>
>>>
>>> On Mon, Mar 3, 2014 at 10:45 AM, SF Hadoop  wrote:
>>>
 Hadoop 2.2.0
 CentOS 6.4
 Viewing UI in various browsers.

 I am having a problem where no information is visible in my Job History
 UI.  I run test jobs, they complete without error, but no information ever
 populates the nodemanager or jobhistory server UI.

 Also, there are no job logs generated on the server that is running the
 job.

 I have the following settings configured:
 yarn.nodemanager.local-dirs
 yarn.nodemanager.log-dirs
 yarn.log.server.url

 ...plus the basic yarn log dir.  I get output in regards to the daemons
 but very little in regards to the job.  All I get that refers to the
 jobhistory server is the following (so it appears to be functioning
 properly):

 2014-02-18 11:43:06,824 INFO org.apache.hadoop.http.HttpServer: Jetty
 bound to port 19888
 2014-02-18 11:43:06,824 INFO org.mortbay.log: jetty-6.1.26
 2014-02-18 11:43:06,847 INFO org.mortbay.log: Extract
 jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.1.0.2.0.5.0-67.jar!/webapps/jobhistory
 to /tmp/Jetty_server_19888_jobhistoryv7gnnv/webapp
 2014-02-18 11:43:07,085 INFO org.mortbay.log: Started
 SelectChannelConnector@server:19888
 2014-02-18 11:43:07,085 INFO org.apache.hadoop.yarn.webapp.WebApps: Web
 app /jobhistory started at 19888
 2014-02-18 11:43:07,477 INFO org.apache.hadoop.yarn.webapp.WebApps:
 Registered webapp guice modules

 I have a feeling this is a misconfiguration but I cannot figure out
 what setting is missing or wrong.

 Other than not being able to see any of the jobs in the UIs, everything
 appears to be working correctly so this is quite confusing.

 Any help is appreciated.

>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this 

Re: Meaning of messages in log and debugging

2014-03-04 Thread Zhijie Shen
bq. Container killed by the ApplicationMaster. Container killed on request.
Exit code is 143" mean? What does 143 stand for?

It's the diagnostic message generated by YARN, which indicates the
container is killed by MR's ApplicationMaster. 143 is a exit code of an
YARN container, which indicates the termination of a container.

bq. Are there any related links which describe the life cycle of a
container?

This is what I found online:
http://diggerk.wordpress.com/2013/09/19/lifecycle-of-yarn-resource-manager-containers/.
Otherwise, you can have a look at ContainerImpl.java if you want to know
the detail.

bq. My application is very memory intense... is there any way to profile the
memory consumption of a single container?

You can find the metrics info RM and NM web UI, or you
can programmatically access the RESTful APIs.

- Zhijie


On Tue, Mar 4, 2014 at 7:24 AM, Yves Weissig  wrote:

> Hello list,
>
> I'm currently debugging my Hadoop MR application and I have some general
> questions to the messages in the log and the debugging process.
>
> - What does "Container killed by the ApplicationMaster.
> Container killed on request. Exit code is 143" mean? What does 143 stand
> for?
>
> - I also see the following exception in the log: "Exception from
> container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)". What does this mean? It
> originates from a "Diagnostics report" from a container and the log4j
> message level is set to INFO.
>
> - Are there any related links which describe the life cycle of a container?
>
> - Is there a "golden rule" to debug a Hadoop MR application?
>
> - My application is very memory intense... is there any way to profile
> the memory consumption of a single container?
>
> Thanks!
> Best regards
> Yves
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Benchmarking Hive Changes

2014-03-04 Thread Anthony Mattas
I’ve been trying to benchmark some of the Hive enhancements in Hadoop 2.0 using 
the HDP Sandbox. 

I took one of their example queries and executed it with the tables stored as 
TEXTFILE, RCFILE, and ORC. I also tried enabling enabling vectorized execution, 
and predicate pushdown.

SELECT s07.description, s07.salary, s08.salary,
  s08.salary - s07.salary
FROM
  sample_07 s07 JOIN sample_08 s08
ON ( s07.code = s08.code)
WHERE
 s07.salary < s08.salary
SORT BY s08.salary-s07.salary DESC

Ultimately there was not much different performance in any of the executions, 
can someone clarify for me if I need an actual full cluster to see performance 
improvements, or if I’m missing something else. I thought at minimum I would 
have seen an improvement moving to ORC from TEXTFILE.

Re: Node manager or Resource Manager crash

2014-03-04 Thread Krishna Kishore Bonagiri
Yes Vinod, I was asking this question sometime back, and I got back to
resolve the issue again.

I tried to see if the OOM is killing but it is not. I have checked the free
swap  space on my box while my test is going on, but it doesn't seem to be
the issue. Also, I  have verified if OOM score is going high for any of
these process because that is when OOM killer kills them, but they are not
going high too.

Thanks,
Kishore


On Tue, Mar 4, 2014 at 10:51 PM, Vinod Kumar Vavilapalli  wrote:

> I remember you asking this question before. Check if your OS' OOM killer
> is killing it.
>
> +Vinod
>
> On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri <
> write2kish...@gmail.com> wrote:
>
> Hi,
>   I am running an application on a 2-node cluster, which tries to acquire
> all the containers that are available on one of those nodes and remaining
> containers from the other node in the cluster. When I run this application
> continuously in a loop, one of the NM or RM is getting killed at a random
> point. There is no corresponding message in the log files.
>
> One of the times that NM had got killed today, the tail of the it's log is
> like this:
>
> 2014-03-04 02:42:44,386 DEBUG
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
> isredeng:52867 sending out status for 16 containers
> 2014-03-04 02:42:44,386 DEBUG
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
> health-status : true,
>
>
> And at the time of NM's crash, the RM's log has the following entries:
>
> 2014-03-04 02:42:40,371 DEBUG
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
> isredeng:52867 of type STATUS_UPDATE
> 2014-03-04 02:42:40,371 DEBUG
> org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
> NODE_UPDATE
> 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server
> Responder: responding to
> org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
> 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
> 2014-03-04 02:42:40,371 DEBUG
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> nodeUpdate: isredeng:52867 clusterResources:
> 
> 2014-03-04 02:42:40,371 DEBUG
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Node being looked for scheduling isredeng:52867
> availableResource: 
> 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151
>
>
> Note: the name of the node on which NM has got killed is isredeng, does it
> indicate anything from the above message as to why it got killed?
>
> Thanks,
> Kishore
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.


Re: Question on DFS Balancing

2014-03-04 Thread Harsh J
You're probably looking for https://issues.apache.org/jira/browse/HDFS-1804

On Tue, Mar 4, 2014 at 5:54 AM, divye sheth  wrote:
> Hi,
>
> I am new to the mailing list.
>
> I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> have is related to balancing. I have a 5 datanode cluster and each node has
> 2 disks attached to it. The second disk was added when the first disk was
> reaching its capacity.
>
> Now the scenario that I am facing is, when the new disk was added hadoop
> automatically moved over some data to the new disk. But over the time I
> notice that data is no longer being written to the second disk. I have also
> faced an issue on the datanode where the first disk had 100% utilization.
>
> How can I overcome such scenario, is it not hadoop's job to balance the disk
> utilization between multiple disks on single datanode?
>
> Thanks
> Divye Sheth



-- 
Harsh J


Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-04 Thread Stanley Shi
Which version of hadoop are you using?
There's a possibility that the hadoop environment already have a avro**.jar
in place, thus caused the jar conflict.

Regards,
*Stanley Shi,*



On Tue, Mar 4, 2014 at 11:25 PM, John Pauley wrote:

>  Outside hadoop: avro-1.7.6
> Inside hadoop:  avro-mapred-1.7.6-hadoop2
>
>
>   From: Stanley Shi 
> Reply-To: "user@hadoop.apache.org" 
> Date: Monday, March 3, 2014 at 8:30 PM
> To: "user@hadoop.apache.org" 
> Subject: Re: [hadoop] AvroMultipleOutputs
> org.apache.avro.file.DataFileWriter$AppendWriteException
>
>   which avro version are you using when running outside of hadoop?
>
>  Regards,
> *Stanley Shi,*
>
>
>
> On Mon, Mar 3, 2014 at 11:49 PM, John Pauley 
> wrote:
>
>>   This is cross posted to avro-user list (
>> http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e
>> ).
>>
>>   Hello all,
>>
>>  I’m having an issue using AvroMultipleOutputs in a map/reduce job.  The
>> issue occurs when using a schema that has a union of null and a fixed
>> (among other complex types), default to null, and it is not null.
>>  Please find the full stack trace below and a sample map/reduce job that
>> generates an Avro container file and uses that for the m/r input.  Note
>> that I can serialize/deserialize without issue using
>> GenericDatumWriter/GenericDatumReader outside of hadoop…  Any insight would
>> be helpful.
>>
>>  Stack trace:
>>  java.lang.Exception:
>> org.apache.avro.file.DataFileWriter$AppendWriteException:
>> java.lang.NullPointerException: in com.foo.bar.simple_schema in union null
>> of union in field baz of com.foo.bar.simple_schema
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
>> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
>> java.lang.NullPointerException: in com.foo.bar.simple_schema in union null
>> of union in field baz of com.foo.bar.simple_schema
>> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
>> at
>> org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77)
>> at
>> org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39)
>> at
>> org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400)
>> at
>> org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378)
>> at
>> com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78)
>> at
>> com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> at java.lang.Thread.run(Thread.java:695)
>> Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema
>> in union null of union in field baz of com.foo.bar.simple_schema
>> at
>> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
>> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
>> ... 16 more
>> Caused by: java.lang.NullPointerException
>> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457)
>> at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
>> at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167)
>> at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608)
>> at
>> org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
>> at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597)
>> at
>> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
>> at
>> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
>> at
>> org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
>> at
>> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
>>
>>  Sample 

Re: Question on DFS Balancing

2014-03-04 Thread divye sheth
Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
0.20.2 (we are in a process of upgrading) is there a workaround for the
short term to balance the disk utilization? The patch in the Jira, if
applied to the version that I am using, will it break anything?

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 11:28 AM, Harsh J  wrote:

> You're probably looking for
> https://issues.apache.org/jira/browse/HDFS-1804
>
> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth  wrote:
> > Hi,
> >
> > I am new to the mailing list.
> >
> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> > have is related to balancing. I have a 5 datanode cluster and each node
> has
> > 2 disks attached to it. The second disk was added when the first disk was
> > reaching its capacity.
> >
> > Now the scenario that I am facing is, when the new disk was added hadoop
> > automatically moved over some data to the new disk. But over the time I
> > notice that data is no longer being written to the second disk. I have
> also
> > faced an issue on the datanode where the first disk had 100% utilization.
> >
> > How can I overcome such scenario, is it not hadoop's job to balance the
> disk
> > utilization between multiple disks on single datanode?
> >
> > Thanks
> > Divye Sheth
>
>
>
> --
> Harsh J
>


Re: Question on DFS Balancing

2014-03-04 Thread Azuryy Yu
Hi,
That probably break something if you apply the patch from 2.x to 0.20.x,
but it depends on.

AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
yourself based on HDFS-1804.



On Wed, Mar 5, 2014 at 3:47 PM, divye sheth  wrote:

> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J  wrote:
>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth  wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> has
>> > 2 disks attached to it. The second disk was added when the first disk
>> was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> also
>> > faced an issue on the datanode where the first disk had 100%
>> utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>>
>
>