[ 
https://issues.apache.org/jira/browse/TEZ-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned TEZ-4244:
-------------------------------------

    Assignee: Rajesh Balamohan

> Consider using RawLocalFileSystem in LocalDiskFetchedInput
> ----------------------------------------------------------
>
>                 Key: TEZ-4244
>                 URL: https://issues.apache.org/jira/browse/TEZ-4244
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>              Labels: performance
>         Attachments: TEZ-4244.1.patch
>
>
> Using RawLocalFileSystem (LocalFSFileInputStream) should avoid the native FS 
> call for seek() and should be using just (pos < 0) condition.
>  
> {noformat}
> "TezTR-348763_0_9_6_172_0" #68186 daemon prio=5 os_prio=0 
> tid=0x000055d7afbce800 nid=0x3877 runnable [0x00007f645019c000]
>    java.lang.Thread.State: RUNNABLE
>       at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>       at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
>       at java.io.File.exists(File.java:821)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:646)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:939)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:640)
>       at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456)
>       at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1781)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.getFileLength(ChecksumFileSystem.java:294)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:337)
>       - locked <0x00007f9f10196f00> (a 
> org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream)
>       at 
> org.apache.tez.runtime.library.common.shuffle.LocalDiskFetchedInput.getInputStream(LocalDiskFetchedInput.java:73)
>       at 
> org.apache.tez.runtime.library.common.readers.UnorderedKVReader.openIFileReader(UnorderedKVReader.java:226)
>       at 
> org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput(UnorderedKVReader.java:212)
>       at 
> org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next(UnorderedKVReader.java:125)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:144)
>       at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:386)
>       at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:455)
>       at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:242)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:555)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:111)
>       at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:193)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to