[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-05-06 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2366:
--
Attachment: TEZ-2366.4.patch

fixed UT 

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> 
>
> Key: TEZ-2366
> URL: https://issues.apache.org/jira/browse/TEZ-2366
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Prakash Ramachandran
>Priority: Critical
> Attachments: TEZ-2366.1.patch, TEZ-2366.2.patch, TEZ-2366.3.patch, 
> TEZ-2366.4.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
> of the configured local directories
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-05-05 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2366:
--
Attachment: TEZ-2366.3.patch

addressed comments by [~sseth]. 

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> 
>
> Key: TEZ-2366
> URL: https://issues.apache.org/jira/browse/TEZ-2366
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Prakash Ramachandran
>Priority: Critical
> Attachments: TEZ-2366.1.patch, TEZ-2366.2.patch, TEZ-2366.3.patch, 
> TEZ-2366.test.txt, TEZ-2366.wip.1.patch
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
> of the configured local directories
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-05-04 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2366:
--
Attachment: TEZ-2366.2.patch

patch 2
* added check for shufflemetadata being null.

[~hitesh]
the localcontainerlauncher sets the shuffle port to 0
{code}
AuxiliaryServiceHelper.setServiceDataIntoEnv(
ShuffleUtils.SHUFFLE_HANDLER_SERVICE_ID, 
ByteBuffer.allocate(4).putInt(0), localEnv);
{code}

the UT failure does not seem related, ran locally. 

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> 
>
> Key: TEZ-2366
> URL: https://issues.apache.org/jira/browse/TEZ-2366
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Prakash Ramachandran
>Priority: Critical
> Attachments: TEZ-2366.1.patch, TEZ-2366.2.patch, TEZ-2366.test.txt, 
> TEZ-2366.wip.1.patch
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
> of the configured local directories
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-05-04 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2366:
--
Attachment: TEZ-2366.1.patch

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> 
>
> Key: TEZ-2366
> URL: https://issues.apache.org/jira/browse/TEZ-2366
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Prakash Ramachandran
>Priority: Critical
> Attachments: TEZ-2366.1.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
> of the configured local directories
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-04-27 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2366:
--
Attachment: TEZ-2366.wip.1.patch

[~sseth] attaching a patch which checks the port along with the host. one quick 
question though. the mapreduce.shuffle.port is not exposed by yarn. is it fine 
to rely on that conf and its default value? if the patch looks ok. i can add 
the tests.

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> 
>
> Key: TEZ-2366
> URL: https://issues.apache.org/jira/browse/TEZ-2366
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Priority: Critical
> Attachments: TEZ-2366.test.txt, TEZ-2366.wip.1.patch
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
> of the configured local directories
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-04-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2366:

Attachment: TEZ-2366.test.txt

This is what I believe causes this.
Pig is running the MiniCluster with multiple instances of the NodeManager - 
which is good. The moment there's multiple instances however, the hostname on 
all of them matches, and each task attempts doing a local fetch, even though 
the data may have been generated on a different NodeManager (implying a 
different local-dir).

Tez never runs into this, because all tests run with a single NodeManager 
instance.

Two possible fixes - disable local-fetch for MiniCluster, for which I'm 
uploading a temporary patch.
The other is to potentially make use of port information to figure out whether 
to do a local fetch or not.

[~daijy] - please try out this patch.
Alternately, disable local fetch in the pig config after setting up the cluster 
and calling getConfg on the cluster instance.

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> 
>
> Key: TEZ-2366
> URL: https://issues.apache.org/jira/browse/TEZ-2366
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Priority: Critical
> Attachments: TEZ-2366.test.txt
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
> of the configured local directories
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-04-24 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2366:

Priority: Critical  (was: Blocker)

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> 
>
> Key: TEZ-2366
> URL: https://issues.apache.org/jira/browse/TEZ-2366
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Priority: Critical
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
> of the configured local directories
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-04-24 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2366:

Priority: Blocker  (was: Major)

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> 
>
> Key: TEZ-2366
> URL: https://issues.apache.org/jira/browse/TEZ-2366
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Priority: Blocker
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
> of the configured local directories
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-04-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated TEZ-2366:

Summary: Pig tez MiniTezCluster unit tests fail intermittently after 
TEZ-2333  (was: Pig tez local mode unit tests fail intermittently after 
TEZ-2333)

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> 
>
> Key: TEZ-2366
> URL: https://issues.apache.org/jira/browse/TEZ-2366
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>
> There are around 20 unit tests (out of around 2000) fail intermittently after 
> TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
> of the configured local directories
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
> at 
> org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
> test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
> "true" 
> (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
>  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does 
> not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)