[ https://issues.apache.org/jira/browse/TEZ-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan reassigned TEZ-4244: ------------------------------------- Assignee: Rajesh Balamohan > Consider using RawLocalFileSystem in LocalDiskFetchedInput > ---------------------------------------------------------- > > Key: TEZ-4244 > URL: https://issues.apache.org/jira/browse/TEZ-4244 > Project: Apache Tez > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Minor > Labels: performance > Attachments: TEZ-4244.1.patch > > > Using RawLocalFileSystem (LocalFSFileInputStream) should avoid the native FS > call for seek() and should be using just (pos < 0) condition. > > {noformat} > "TezTR-348763_0_9_6_172_0" #68186 daemon prio=5 os_prio=0 > tid=0x000055d7afbce800 nid=0x3877 runnable [0x00007f645019c000] > java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) > at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242) > at java.io.File.exists(File.java:821) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:646) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:939) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:640) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456) > at > org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1781) > at > org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.getFileLength(ChecksumFileSystem.java:294) > at > org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:337) > - locked <0x00007f9f10196f00> (a > org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream) > at > org.apache.tez.runtime.library.common.shuffle.LocalDiskFetchedInput.getInputStream(LocalDiskFetchedInput.java:73) > at > org.apache.tez.runtime.library.common.readers.UnorderedKVReader.openIFileReader(UnorderedKVReader.java:226) > at > org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput(UnorderedKVReader.java:212) > at > org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next(UnorderedKVReader.java:125) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:144) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:386) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:455) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:242) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:555) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:111) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:193) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)