No luck. I get the same error even when using a single reducer. I'm
attaching the job configuration as shown in the web ui.

When I look at the job tracker for the job, it has no map tasks. Is that
expected? I've never heard of a reduce only job.

J


On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <[email protected]> wrote:

> This is my first time on a  cluster I'll try what Josh suggests now.
>
> J
>
>
> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <[email protected]> wrote:
>
>>
>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <[email protected]>wrote:
>>
>>> Hi Jeremy,
>>>
>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <[email protected]> wrote:
>>> > Hi
>>> >
>>> > I'm hitting the exception pasted below when using AvroPathPerKeyTarget.
>>> > Interestingly, my code works just fine when I run on a small dataset
>>> using
>>> > the LocalJobTracker. However, when I run on a large dataset using a
>>> hadoop
>>> > cluster I hit the exception.
>>> >
>>>
>>> Have you ever been able to successfully use the AvroPathPerKeyTarget
>>> on a real cluster, or is this the first try with it?
>>>
>>> I'm wondering if this could be a problem that's always been around (as
>>> the integration test for AvroPathPerKeyTarget also runs in the local
>>> jobtracker), or if this could be something new.
>>>
>>
>> +1-- Jeremy, if you force the job to run w/a single reducer on the
>> cluster (i.e., via groupByKey(1)), does it work?
>>
>>
>>>
>>> - Gabriel
>>>
>>
>>
>
Title: Job Configuration: JobId - job_201312300132_0420

Job Configuration: JobId - job_201312300132_0420


namevalue
job.end.retry.interval30000
avro.map.output.schema{"type":"record","name":"Pair","namespace":"org.apache.avro.mapred","fields":[{"name":"key","type":"string","doc":""},{"name":"value","type":{"type":"record","name":"MatePair","namespace":"contrail.sequences","doc":"Mate File Record Structure","fields":[{"name":"left","type":{"type":"record","name":"FastQRecord","doc":"A FastQ Read","fields":[{"name":"id","type":"string"},{"name":"read","type":"string"},{"name":"qvalue","type":"string"}]}},{"name":"right","type":"FastQRecord"}]},"doc":"","order":"ignore"}]}
io.bytes.per.checksum512
mapred.job.tracker.retiredjobs.cache.size1000
mapreduce.jobhistory.cleaner.interval-ms86400000
mapred.queue.default.acl-administer-jobs*
dfs.image.transfer.bandwidthPerSec0
mapred.task.profile.reduces0-2
mapreduce.jobtracker.staging.root.dir${hadoop.tmp.dir}/mapred/staging
mapreduce.job.cache.files.visibilitiestrue,true
mapred.job.reuse.jvm.num.tasks1
dfs.block.access.token.lifetime600
mapred.reduce.tasks.speculative.executiontrue
mapred.job.namecontrail.scaffolding.FilterReads: Avro(/tmp/crunch-797729802/p6)+GBK+ungroup+AvroFilePerKey... (4/4)
hadoop.http.authentication.kerberos.keytab${user.home}/hadoop.keytab
dfs.permissions.supergroupsupergroup
io.seqfile.sorter.recordlimit1000000
hadoop.relaxed.worker.version.checkfalse
mapred.task.tracker.http.address0.0.0.0:50060
dfs.namenode.delegation.token.renew-interval86400000
fs.ramfs.implorg.apache.hadoop.fs.InMemoryFileSystem
mapred.system.dir${hadoop.tmp.dir}/mapred/system
dfs.namenode.edits.toleration.length0
mapred.task.tracker.report.address127.0.0.1:0
mapreduce.reduce.shuffle.connect.timeout180000
mapreduce.job.counters.max120
dfs.datanode.readahead.bytes4193404
mapred.healthChecker.interval60000
mapreduce.job.complete.cancel.delegation.tokenstrue
dfs.namenode.replication.work.multiplier.per.iteration2
fs.trash.interval0
hadoop.jetty.logs.serve.aliasestrue
mapred.skip.map.auto.incr.proc.counttrue
hadoop.http.authentication.kerberos.principalHTTP/localhost@LOCALHOST
mapred.child.tmp./tmp
fs.gsb.implcom.google.cloud.hadoop.fs.gcs.GoogleHadoopBucketRootedFileSystem
fs.gs.enable.service.account.authtrue
mapred.tasktracker.taskmemorymanager.monitoring-interval5000
crunch.work.dir/tmp/crunch-797729802/p9
dfs.datanode.http.address0.0.0.0:50075
mapred.output.key.comparator.classorg.apache.avro.mapred.AvroKeyComparator
io.sort.spill.percent0.80
dfs.namenode.write.stale.datanode.ratio0.5f
dfs.client.use.datanode.hostnamefalse
mapred.job.shuffle.input.buffer.percent0.70
dfs.max.objects0
hadoop.skip.worker.version.checkfalse
hadoop.security.instrumentation.requires.adminfalse
mapred.skip.map.max.skip.records0
mapreduce.reduce.shuffle.maxfetchfailures10
hadoop.security.authorizationfalse
user.namehadoop
mapred.task.profile.maps0-2
dfs.datanode.sync.behind.writesfalse
dfs.https.server.keystore.resourcessl-server.xml
dfs.replication.interval3
mapred.local.dir${hadoop.tmp.dir}/mapred/local
mapred.merge.recordsBeforeProgress10000
mapred.job.tracker.http.address0.0.0.0:50030
mapred.compress.map.outputfalse
mapred.userlog.retain.hours24
mapred.used.genericoptionsparsertrue
mapred.tasktracker.reduce.tasks.maximum8
dfs.namenode.safemode.min.datanodes0
hadoop.security.uid.cache.secs14400
mapred.disk.healthChecker.interval60000
fs.har.impl.disable.cachetrue
mapred.cluster.map.memory.mb-1
crunch.avro.modeSPECIFIC
dfs.data.dir/mnt/ed0/hadoop/dfs/data,/mnt/ed1/hadoop/dfs/data,/mnt/pd0/hadoop/dfs/data
dfs.access.time.precision3600000
dfs.replication.min1
mapreduce.job.submithostdesktop-0.c.biocloudops.internal
fs.checkpoint.dir${hadoop.tmp.dir}/dfs/namesecondary
fs.s3n.implorg.apache.hadoop.fs.s3native.NativeS3FileSystem
mapreduce.tasktracker.outofband.heartbeatfalse
mapreduce.tasktracker.outofband.heartbeat.damper1000000
mapred.jobtracker.restart.recoverfalse
hadoop.logfile.size10000000
hadoop.security.token.service.use_iptrue
mapred.inmem.merge.threshold1000
ipc.client.connection.maxidletime10000
fs.checkpoint.size67108864
dfs.namenode.invalidate.work.pct.per.iteration0.32f
read_id_parsercontrail.sequences.ReadIdUtil$ReadParserUsingUnderscore
dfs.blockreport.intervalMsec3600000
fs.s3.sleepTimeSeconds10
mapreduce.job.counters.counter.name.max64
dfs.client.block.write.retries3
mapred.reduce.tasks1
mapred.queue.namesdefault
crunch.planner.dotfiledigraph G { "Avro(/speciesA/scaffolding_2014_0218/FilteredBowtieAlignments/*avro)" [label="Avro(/speciesA/scaffolding_2014_0218/FilteredBowtieAlignments/*avro)" shape=folder]; "Avro(/tmp/crunch-797729802/p1)" [label="Avro(/tmp/crunch-797729802/p1)" shape=folder]; subgraph "cluster-job1" { label="Crunch Job 1"; subgraph "cluster-job1-map" { label = Map; color = blue; "S5@899134710@1644373942" [label="S5" shape=box]; "Aggregate.count@1348188201@1644373942" [label="Aggregate.count" shape=box]; } subgraph "cluster-job1-reduce" { label = Reduce; color = red; "PTables.values@1661020955@1644373942" [label="PTables.values" shape=box]; "GBK@729117068@1644373942" [label="GBK" shape=box]; "combine@2070479808@1644373942" [label="combine" shape=box]; } } "PTables.values@1661020955@1644373942" -> "Avro(/tmp/crunch-797729802/p1)"; "Avro(/speciesA/scaffolding_2014_0218/FilteredBowtieAlignments/*avro)" -> "S5@899134710@1644373942"; "combine@2070479808@1644373942" -> "PTables.values@1661020955@1644373942"; "GBK@729117068@1644373942" -> "combine@2070479808@1644373942"; "Aggregate.count@1348188201@1644373942" -> "GBK@729117068@1644373942"; "S5@899134710@1644373942" -> "Aggregate.count@1348188201@1644373942"; }
io.seqfile.lazydecompresstrue
dfs.https.enablefalse
dfs.replication3
mapred.jobtracker.blacklist.fault-timeout-window180
ipc.client.tcpnodelayfalse
crunch.outputs.dirout0;N29yZy5hcGFjaGUuY3J1bmNoLnR5cGVzLmF2cm8uQXZyb1BhdGhQZXJLZXlPdXRwdXRGb3JtYXQA AAACF2F2cm8ub3V0cHV0LnNjaGVtYS5vdXQwjgFWeyJ0eXBlIjoicmVjb3JkIiwibmFtZSI6Ik1h dGVQYWlyIiwibmFtZXNwYWNlIjoiY29udHJhaWwuc2VxdWVuY2VzIiwiZG9jIjoiTWF0ZSBGaWxl IFJlY29yZCBTdHJ1Y3R1cmUiLCJmaWVsZHMiOlt7Im5hbWUiOiJsZWZ0IiwidHlwZSI6eyJ0eXBl IjoicmVjb3JkIiwibmFtZSI6IkZhc3RRUmVjb3JkIiwiZG9jIjoiQSBGYXN0USBSZWFkIiwiZmll bGRzIjpbeyJuYW1lIjoiaWQiLCJ0eXBlIjoic3RyaW5nIn0seyJuYW1lIjoicmVhZCIsInR5cGUi OiJzdHJpbmcifSx7Im5hbWUiOiJxdmFsdWUiLCJ0eXBlIjoic3RyaW5nIn1dfX0seyJuYW1lIjoi cmlnaHQiLCJ0eXBlIjoiRmFzdFFSZWNvcmQifV19EGNydW5jaC5hdnJvLm1vZGUIU1BFQ0lGSUM= ;org.apache.avro.mapred.AvroWrapper;org.apache.hadoop.io.NullWritable
mapred.acls.enabledfalse
mapred.tasktracker.dns.nameserverdefault
mapred.submit.replication10
io.compression.codecsorg.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec
io.file.buffer.size4096
mapred.map.tasks.speculative.executiontrue
bowtie_alignments/speciesA/scaffolding_2014_0218/FilteredBowtieAlignments/*avro
mapreduce.job.split.metainfo.maxsize10000000
mapred.map.max.attempts4
mapred.job.shuffle.merge.percent0.66
fs.har.implorg.apache.hadoop.fs.HarFileSystem
hadoop.security.authenticationsimple
fs.s3.buffer.dir${hadoop.tmp.dir}/s3
mapred.skip.reduce.auto.incr.proc.counttrue
dfs.http.address0.0.0.0:50070
mapred.job.tracker.jobhistory.lru.cache.size5
dfs.namenode.avoid.read.stale.datanodefalse
dfs.datanode.drop.cache.behind.writesfalse
dfs.replication.considerLoadtrue
mapred.jobtracker.blacklist.fault-bucket-width15
dfs.block.access.token.enablefalse
mapreduce.job.acl-view-job
mapred.job.queue.namedefault
dfs.permissionstrue
mapred.job.tracker.persist.jobstatus.hours0
fs.gs.implcom.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
fs.file.implorg.apache.hadoop.fs.LocalFileSystem
dfs.block.size67108864
dfs.https.address0.0.0.0:50470
ipc.client.kill.max10
fs.gs.system.bucketcontrail-ghfs
mapred.healthChecker.script.timeout600000
mapred.tasktracker.map.tasks.maximum8
mapred.job.tracker.persist.jobstatus.dir/jobtracker/jobsInfo
mapreduce.jobhistory.max-age-ms2592000000
dfs.default.chunk.view.size32768
mapred.reduce.slowstart.completed.maps0.05
mapreduce.reduce.classorg.apache.crunch.impl.mr.run.CrunchReducer
helpfalse
io.sort.mb100
dfs.datanode.failed.volumes.tolerated0
dfs.https.need.client.authfalse
hadoop.http.authentication.typesimple
mapreduce.inputformat.classorg.apache.crunch.impl.mr.run.CrunchInputFormat
dfs.datanode.data.dir.perm755
ipc.server.listen.queue.size128
io.mapfile.bloom.size1048576
fs.hsftp.implorg.apache.hadoop.hdfs.HsftpFileSystem
mapred.cache.files.timestamps1396015124294,1396015124313
mapred.combine.recordsBeforeProgress10000
dfs.datanode.dns.nameserverdefault
mapred.child.java.opts-Xms512m -Xmx1024m
dfs.replication.max512
mapred.queue.default.stateRUNNING
map.sort.classorg.apache.hadoop.util.QuickSort
hadoop.util.hash.typemurmur
topology.node.switch.mapping.implorg.apache.hadoop.net.ScriptBasedMapping
dfs.block.access.key.update.interval600
dfs.datanode.dns.interfacedefault
dfs.datanode.use.datanode.hostnamefalse
mapred.output.compression.typeRECORD
hadoop.security.use-weak-http-cryptofalse
mapred.reducer.new-apitrue
mapred.skip.attempts.to.start.skipping2
mapreduce.job.dirhdfs://biocloud-nn-0:8020/mnt/pd0/hadoop/tmp/mapred/staging/hadoop/.staging/job_201312300132_0420
io.map.index.skip0
crunch.inputs.dirLG9yZy5hcGFjaGUuY3J1bmNoLnR5cGVzLmF2cm8uQXZyb0lucHV0Rm9ybWF0AAAABBFhdnJvLmlu cHV0LnNjaGVtYY4CAHsidHlwZSI6InJlY29yZCIsIm5hbWUiOiJQYWlyIiwibmFtZXNwYWNlIjoi b3JnLmFwYWNoZS5hdnJvLm1hcHJlZCIsImZpZWxkcyI6W3sibmFtZSI6ImtleSIsInR5cGUiOiJz dHJpbmciLCJkb2MiOiIifSx7Im5hbWUiOiJ2YWx1ZSIsInR5cGUiOnsidHlwZSI6InJlY29yZCIs Im5hbWUiOiJNYXRlUGFpciIsIm5hbWVzcGFjZSI6ImNvbnRyYWlsLnNlcXVlbmNlcyIsImRvYyI6 Ik1hdGUgRmlsZSBSZWNvcmQgU3RydWN0dXJlIiwiZmllbGRzIjpbeyJuYW1lIjoibGVmdCIsInR5 cGUiOnsidHlwZSI6InJlY29yZCIsIm5hbWUiOiJGYXN0UVJlY29yZCIsImRvYyI6IkEgRmFzdFEg UmVhZCIsImZpZWxkcyI6W3sibmFtZSI6ImlkIiwidHlwZSI6InN0cmluZyJ9LHsibmFtZSI6InJl YWQiLCJ0eXBlIjoic3RyaW5nIn0seyJuYW1lIjoicXZhbHVlIiwidHlwZSI6InN0cmluZyJ9XX19 LHsibmFtZSI6InJpZ2h0IiwidHlwZSI6IkZhc3RRUmVjb3JkIn1dfSwiZG9jIjoiIiwib3JkZXIi OiJpZ25vcmUifV19GWNydW5jaC5yZWZsZWN0ZGF0YWZhY3Rvcnkvb3JnLmFwYWNoZS5jcnVuY2gu dHlwZXMuYXZyby5SZWZsZWN0RGF0YUZhY3RvcnkVYXZyby5pbnB1dC5pcy5yZWZsZWN0BWZhbHNl EGNydW5jaC5hdnJvLm1vZGUIU1BFQ0lGSUM= ;-1;/tmp/crunch-797729802/p6
mapred.cluster.max.map.memory.mb-1
fs.s3.maxRetries4
dfs.namenode.logging.levelinfo
mapred.task.tracker.task-controllerorg.apache.hadoop.mapred.DefaultTaskController
mapred.userlog.limit.kb0
mapreduce.ifile.readahead.bytes4194304
hadoop.http.authentication.simple.anonymous.allowedtrue
mapred.jobtracker.nodegroup.awarefalse
hadoop.rpc.socket.factory.class.defaultorg.apache.hadoop.net.StandardSocketFactory
fs.hftp.implorg.apache.hadoop.hdfs.HftpFileSystem
dfs.namenode.handler.count10
fs.kfs.implorg.apache.hadoop.fs.kfs.KosmosFileSystem
mapreduce.job.submithostaddress10.240.0.142
mapred.map.tasks0
mapred.local.dir.minspacekill0
fs.hdfs.implorg.apache.hadoop.hdfs.DistributedFileSystem
mapred.job.map.memory.mb-1
mapred.jobtracker.completeuserjobs.maximum100
dfs.namenode.avoid.write.stale.datanodefalse
dfs.blockreport.initialDelay0
mapred.min.split.size0
hadoop.http.authentication.token.validity36000
dfs.namenode.delegation.token.max-lifetime604800000
fs.ftp.implorg.apache.hadoop.fs.ftp.FTPFileSystem
dfs.secondary.http.address0.0.0.0:50090
mapred.output.compression.codecorg.apache.hadoop.io.compress.DefaultCodec
mapred.cache.files/tmp/crunch-797729802/p9/REDUCE,/tmp/crunch-797729802/p9/MAP
mapred.cluster.max.reduce.memory.mb-1
mapred.cluster.reduce.memory.mb-1
dfs.web.ugiwebuser,webgroup
mapred.task.profilefalse
mapred.reduce.parallel.copies5
dfs.heartbeat.interval3
net.topology.implorg.apache.hadoop.net.NetworkTopology
local.cache.size10737418240
io.sort.factor10
mapreduce.map.classorg.apache.crunch.impl.mr.run.CrunchMapper
mapreduce.job.counters.groups.max50
mapred.task.timeout600000
dfs.safemode.extension30000
ipc.client.idlethreshold4000
ipc.server.tcpnodelayfalse
hadoop.logfile.count10
dfs.namenode.stale.datanode.interval30000
mapreduce.job.restart.recovertrue
mapred.output.dir/tmp/crunch-797729802/p9/output
mapred.heartbeats.in.second100
fs.s3.block.size67108864
mapred.jobtracker.jobSchedulableorg.apache.hadoop.mapred.JobSchedulable
mapred.map.output.compression.codecorg.apache.hadoop.io.compress.DefaultCodec
mapred.task.cache.levels2
mapred.tasktracker.dns.interfacedefault
dfs.secondary.namenode.kerberos.internal.spnego.principal${dfs.web.authentication.kerberos.principal}
mapred.job.reduce.memory.mb-1
mapred.mapoutput.value.classorg.apache.avro.mapred.AvroValue
mapred.max.tracker.failures4
hadoop.http.authentication.signature.secret.file${user.home}/hadoop-http-auth-signature-secret
dfs.df.interval60000
mapreduce.reduce.shuffle.read.timeout180000
mapred.tasktracker.tasks.sleeptime-before-sigkill5000
mapred.max.tracker.blacklists4
fs.gs.project.idbiocloudops
jobclient.output.filterFAILED
mapreduce.ifile.readaheadtrue
io.serializationsorg.apache.hadoop.io.serializer.WritableSerialization,org.apache.crunch.types.avro.SafeAvroSerialization
io.seqfile.compress.blocksize1000000
mapred.jobtracker.taskSchedulerorg.apache.hadoop.mapred.JobQueueTaskScheduler
job.end.retry.attempts0
ipc.client.connect.max.retries10
dfs.namenode.delegation.key.update-interval86400000
webinterface.private.actionsfalse
mapred.tasktracker.indexcache.mb10
fs.checkpoint.edits.dir${fs.checkpoint.dir}
mapreduce.reduce.input.limit-1
mapred.mapper.new-apitrue
tasktracker.http.threads40
dfs.namenode.kerberos.internal.spnego.principal${dfs.web.authentication.kerberos.principal}
mapreduce.job.counters.group.name.max128
mapred.job.tracker.handler.count10
keep.failed.task.filesfalse
mapred.output.compressfalse
hadoop.security.group.mappingorg.apache.hadoop.security.ShellBasedUnixGroupsMapping
dfs.https.client.keystore.resourcessl-client.xml
mapred.cache.files.filesizes4614,4452
mapred.jobtracker.job.history.block.size3145728
mapred.skip.reduce.max.skip.groups0
dfs.datanode.address0.0.0.0:50010
dfs.datanode.max.xcievers4096
dfs.datanode.https.address0.0.0.0:50475
fs.s3.implorg.apache.hadoop.fs.s3.S3FileSystem
reads_fastq/speciesA/scaffolding_2014_0218/reversed_reads/*fastq,/speciesA/original_data_2013_1215/speciesA_200i_40x*fastq,/speciesA/original_data_2013_1215/speciesA_300i_40x*fastq
dfs.datanode.drop.cache.behind.readsfalse
mapred.jarhdfs://biocloud-nn-0:8020/mnt/pd0/hadoop/tmp/mapred/staging/hadoop/.staging/job_201312300132_0420/job.jar
hadoop.tmp.dir/mnt/ed0/hadoop/tmp
mapred.line.input.format.linespermap1
dfs.webhdfs.enabledfalse
dfs.datanode.du.reserved0
topology.script.number.args100
fs.default.namehdfs://biocloud-nn-0:8020/
dfs.balance.bandwidthPerSec1048576
mapred.local.dir.minspacestart0
mapred.jobtracker.maxtasks.per.job-1
mapred.user.jobconf.limit5242880
mapred.reduce.max.attempts4
mapred.job.trackerbiocloud-nn-0:9101
dfs.namenode.decommission.interval30
dfs.name.edits.dir${dfs.name.dir}
io.mapfile.bloom.error.rate0.005
mapred.tasktracker.expiry.interval600000
io.sort.record.percent0.05
dfs.safemode.threshold.pct0.999f
mapred.job.tracker.persist.jobstatus.activefalse
outputpath/speciesA/scaffolding_2014_0218/FilteredReads
dfs.name.dir${hadoop.tmp.dir}/dfs/name
mapreduce.job.acl-modify-job
fs.checkpoint.period3600
io.skip.checksum.errorsfalse
log_file/users/jlewi/speciesA/FilterReads.0328_0658.log.txt
dfs.datanode.handler.count3
dfs.namenode.decommission.nodes.per.interval5
mapred.temp.dir${hadoop.tmp.dir}/mapred/temp
mapred.mapoutput.key.classorg.apache.avro.mapred.AvroKey
ipc.client.fallback-to-simple-auth-allowedfalse
hadoop.native.libtrue
fs.webhdfs.implorg.apache.hadoop.hdfs.web.WebHdfsFileSystem
dfs.datanode.ipc.address0.0.0.0:50020
mapred.working.dirhdfs://biocloud-nn-0:8020/user/hadoop
mapred.job.reduce.input.buffer.percent0.0


This is Apache Hadoop release 1.2.1

Reply via email to