[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch

2015-01-08 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270711#comment-14270711
 ] 

Tsuyoshi OZAWA commented on TEZ-1421:
-

Sure, wait a moment.

> MRCombiner throws NPE in MapredWordCount on master branch
> -
>
> Key: TEZ-1421
> URL: https://issues.apache.org/jira/browse/TEZ-1421
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Tsuyoshi OZAWA
>Priority: Blocker
>
> I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
> Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
> correctly.
> {quote}
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
> at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1931) Publish tez version info to Timeline

2015-01-08 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270637#comment-14270637
 ] 

Jonathan Eagles commented on TEZ-1931:
--

This will be good to go into 0.6.0, [~hitesh]

> Publish tez version info to Timeline
> 
>
> Key: TEZ-1931
> URL: https://issues.apache.org/jira/browse/TEZ-1931
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Critical
>
> We are not publishing any version info to Timeline. This will be useful to 
> compare different dags/apps over time and also to catch issues if needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-08 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1923:
--
Fix Version/s: 0.7.0

> FetcherOrderedGrouped gets into infinite loop due to memory pressure
> 
>
> Key: TEZ-1923
> URL: https://issues.apache.org/jira/browse/TEZ-1923
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 0.7.0
>
> Attachments: TEZ-1923.1.patch, TEZ-1923.2.patch, TEZ-1923.3.patch, 
> TEZ-1923.4.patch
>
>
> - Ran a comparatively large job (temp table creation) at 10 TB scale.
> - Turned on intermediate mem-to-mem 
> (tez.runtime.shuffle.memory-to-memory.enable=true and 
> tez.runtime.shuffle.memory-to-memory.segments=4)
> - Some reducers get lots of data and quickly gets into infinite loop
> {code}
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> {code}
> Additional debug/patch statements revealed that InMemoryMerge is not invoked 
> appropriately and not releasing the memory back for fetchers to proceed. e.g 
> debug/patch messages are given below
> {code}
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
> [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
> Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
> mergeThreshold=708669632  <<=== InMemoryMerge would be started in this case 
> as commitMemory >= mergeThreshold
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
> [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
> Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
> mergeThreshold=708669632 <<=== InMemoryMerge would *NOT* be started in this 
> case as commitMemory < mergeThreshold.  But the usedMemory is higher than 
> memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
> released. InMemoryMerge will not kick in and not release memory.
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
> [fetcher [Map_1] #1] orderedgrouped.MergeManager: 
> Patch..usedMemory=1191994052, memoryLimit=1073741824, commitMemory=523155206, 
> mergeThreshold=708669632 <<=== InMemoryMerge would *NOT* be started in this 
> case as commitMemory < mergeThreshold.  But the usedMemory is higher than 
> memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
> released.  InMemoryMerge will not kick in and not release memory.
> {code}
> In MergeManager, in memory merging is invoked under the following condition
> {code}
> if (!inMemoryMerg

[jira] [Updated] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-08 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1923:
--
Attachment: TEZ-1923.4.patch

Right; Incorporated it in the latest patch. Will check in to master asap. 

> FetcherOrderedGrouped gets into infinite loop due to memory pressure
> 
>
> Key: TEZ-1923
> URL: https://issues.apache.org/jira/browse/TEZ-1923
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1923.1.patch, TEZ-1923.2.patch, TEZ-1923.3.patch, 
> TEZ-1923.4.patch
>
>
> - Ran a comparatively large job (temp table creation) at 10 TB scale.
> - Turned on intermediate mem-to-mem 
> (tez.runtime.shuffle.memory-to-memory.enable=true and 
> tez.runtime.shuffle.memory-to-memory.segments=4)
> - Some reducers get lots of data and quickly gets into infinite loop
> {code}
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> {code}
> Additional debug/patch statements revealed that InMemoryMerge is not invoked 
> appropriately and not releasing the memory back for fetchers to proceed. e.g 
> debug/patch messages are given below
> {code}
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
> [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
> Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
> mergeThreshold=708669632  <<=== InMemoryMerge would be started in this case 
> as commitMemory >= mergeThreshold
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
> [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
> Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
> mergeThreshold=708669632 <<=== InMemoryMerge would *NOT* be started in this 
> case as commitMemory < mergeThreshold.  But the usedMemory is higher than 
> memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
> released. InMemoryMerge will not kick in and not release memory.
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
> [fetcher [Map_1] #1] orderedgrouped.MergeManager: 
> Patch..usedMemory=1191994052, memoryLimit=1073741824, commitMemory=523155206, 
> mergeThreshold=708669632 <<=== InMemoryMerge would *NOT* be started in this 
> case as commitMemory < mergeThreshold.  But the usedMemory is higher than 
> memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
> released.  InMemoryMerge will not kick in and not release memory.
> {code}
> In MergeManager, in memory merging is invoked under

[jira] [Commented] (TEZ-1932) Add Prakash Ramachandran to team list

2015-01-08 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270516#comment-14270516
 ] 

Rajesh Balamohan commented on TEZ-1932:
---

+1

> Add Prakash Ramachandran to team list
> -
>
> Key: TEZ-1932
> URL: https://issues.apache.org/jira/browse/TEZ-1932
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
>Priority: Minor
> Attachments: TEZ-1932.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1932) Add Prakash Ramachandran to team list

2015-01-08 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-1932:
--
Attachment: TEZ-1932.1.patch

[~rajesh.balamohan] can you review 

> Add Prakash Ramachandran to team list
> -
>
> Key: TEZ-1932
> URL: https://issues.apache.org/jira/browse/TEZ-1932
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
>Priority: Minor
> Attachments: TEZ-1932.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1932) Add Prakash Ramachandran to team list

2015-01-08 Thread Prakash Ramachandran (JIRA)
Prakash Ramachandran created TEZ-1932:
-

 Summary: Add Prakash Ramachandran to team list
 Key: TEZ-1932
 URL: https://issues.apache.org/jira/browse/TEZ-1932
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1931) Publish tez version info to Timeline

2015-01-08 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270392#comment-14270392
 ] 

Hitesh Shah commented on TEZ-1931:
--

[~jeagles] Should have a patch soon. Are you ok with putting this into 0.6.0 ? 

> Publish tez version info to Timeline
> 
>
> Key: TEZ-1931
> URL: https://issues.apache.org/jira/browse/TEZ-1931
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Critical
>
> We are not publishing any version info to Timeline. This will be useful to 
> compare different dags/apps over time and also to catch issues if needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1931) Publish tez version info to Timeline

2015-01-08 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1931:


 Summary: Publish tez version info to Timeline
 Key: TEZ-1931
 URL: https://issues.apache.org/jira/browse/TEZ-1931
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Critical


We are not publishing any version info to Timeline. This will be useful to 
compare different dags/apps over time and also to catch issues if needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270281#comment-14270281
 ] 

Siddharth Seth commented on TEZ-1923:
-

+1. Looks good. Thanks [~rajesh.balamohan]
Minor
{code}+  LOG.info("Starting inMemoryMerger's merge since commitMemory=" +
+  commitMemory + " > mergeThreshold=" + mergeThreshold +
+  ". Current usedMemory=" + usedMemory);
{code}
This log line can be misleading since startMemToDiskMerge may not start the 
merge if another is already running. Should be after the condition in 
startMemToDiskMerge.

> FetcherOrderedGrouped gets into infinite loop due to memory pressure
> 
>
> Key: TEZ-1923
> URL: https://issues.apache.org/jira/browse/TEZ-1923
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1923.1.patch, TEZ-1923.2.patch, TEZ-1923.3.patch
>
>
> - Ran a comparatively large job (temp table creation) at 10 TB scale.
> - Turned on intermediate mem-to-mem 
> (tez.runtime.shuffle.memory-to-memory.enable=true and 
> tez.runtime.shuffle.memory-to-memory.segments=4)
> - Some reducers get lots of data and quickly gets into infinite loop
> {code}
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> {code}
> Additional debug/patch statements revealed that InMemoryMerge is not invoked 
> appropriately and not releasing the memory back for fetchers to proceed. e.g 
> debug/patch messages are given below
> {code}
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
> [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
> Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
> mergeThreshold=708669632  <<=== InMemoryMerge would be started in this case 
> as commitMemory >= mergeThreshold
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
> [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
> Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
> mergeThreshold=708669632 <<=== InMemoryMerge would *NOT* be started in this 
> case as commitMemory < mergeThreshold.  But the usedMemory is higher than 
> memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
> released. InMemoryMerge will not kick in and not release memory.
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
> [fetcher [Map_1] #1] orderedgrouped.MergeManager: 
> Patch..usedMemory=1191994052, memoryLimit=1073741824, commitMemory=523155206, 
> mergeThreshold=

[jira] [Updated] (TEZ-1274) Remove Key/Value type checks in IFile

2015-01-08 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1274:
--
Fix Version/s: 0.7.0

> Remove Key/Value type checks in IFile
> -
>
> Key: TEZ-1274
> URL: https://issues.apache.org/jira/browse/TEZ-1274
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Rajesh Balamohan
> Fix For: 0.7.0
>
> Attachments: TEZ-1274.1.patch
>
>
> We check key and value types for each record - this should be removed from 
> the tight loop. Maybe an assertion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-08 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1923:
--
Attachment: TEZ-1923.3.patch

Addressing review comments.  

> FetcherOrderedGrouped gets into infinite loop due to memory pressure
> 
>
> Key: TEZ-1923
> URL: https://issues.apache.org/jira/browse/TEZ-1923
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1923.1.patch, TEZ-1923.2.patch, TEZ-1923.3.patch
>
>
> - Ran a comparatively large job (temp table creation) at 10 TB scale.
> - Turned on intermediate mem-to-mem 
> (tez.runtime.shuffle.memory-to-memory.enable=true and 
> tez.runtime.shuffle.memory-to-memory.segments=4)
> - Some reducers get lots of data and quickly gets into infinite loop
> {code}
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> {code}
> Additional debug/patch statements revealed that InMemoryMerge is not invoked 
> appropriately and not releasing the memory back for fetchers to proceed. e.g 
> debug/patch messages are given below
> {code}
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
> [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
> Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
> mergeThreshold=708669632  <<=== InMemoryMerge would be started in this case 
> as commitMemory >= mergeThreshold
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
> [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
> Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
> mergeThreshold=708669632 <<=== InMemoryMerge would *NOT* be started in this 
> case as commitMemory < mergeThreshold.  But the usedMemory is higher than 
> memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
> released. InMemoryMerge will not kick in and not release memory.
> syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
> [fetcher [Map_1] #1] orderedgrouped.MergeManager: 
> Patch..usedMemory=1191994052, memoryLimit=1073741824, commitMemory=523155206, 
> mergeThreshold=708669632 <<=== InMemoryMerge would *NOT* be started in this 
> case as commitMemory < mergeThreshold.  But the usedMemory is higher than 
> memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
> released.  InMemoryMerge will not kick in and not release memory.
> {code}
> In MergeManager, in memory merging is invoked under the following condition
> {code}
> if (!inMemoryMerger.isInProgre

[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch

2015-01-08 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270181#comment-14270181
 ] 

Jonathan Eagles commented on TEZ-1421:
--

[~ozawa], can you verify this still exists? I have tried to reproduce this 
using the setup you described but am unable get find this NPE. If you are still 
able to reproduce, please specify the steps needed for setup.

> MRCombiner throws NPE in MapredWordCount on master branch
> -
>
> Key: TEZ-1421
> URL: https://issues.apache.org/jira/browse/TEZ-1421
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Tsuyoshi OZAWA
>Priority: Blocker
>
> I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
> Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
> correctly.
> {quote}
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
> at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1904) Fix findbugs warnings in tez-runtime-library

2015-01-08 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1904:

Attachment: TEZ-1904.3.txt

Updated to fix some more inconsistent_sync warnings.

> Fix findbugs warnings in tez-runtime-library
> 
>
> Key: TEZ-1904
> URL: https://issues.apache.org/jira/browse/TEZ-1904
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Siddharth Seth
> Attachments: TEZ-1904.1.txt, TEZ-1904.2.txt, TEZ-1904.3.txt
>
>
> https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1929) AM intermittently sending kill signal to running task in heartbeat

2015-01-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269943#comment-14269943
 ] 

Siddharth Seth commented on TEZ-1929:
-

[~rajesh.balamohan] - do you have the AM logs as well ? This could be a result 
of pre-emption - either because the wrong task is running, or because YARN 
decided the application is over it's resource limit.

> AM intermittently sending kill signal to running task in heartbeat
> --
>
> Key: TEZ-1929
> URL: https://issues.apache.org/jira/browse/TEZ-1929
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-08 at 2.09.11 PM.png, Screen Shot 
> 2015-01-08 at 2.28.04 PM.png, tasklog.txt
>
>
> Observed this behavior 3 or 4 times
> - Ran a hive query with tez (query_17 at 10 TB scale)
> - Occasionally, Map_7 task will get into failed state in the middle of 
> fetching data from other sources (only one task is available in Map_7).  
> {code}
> 2015-01-08 00:19:10,289 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: 
> Completed fetch for attempt: InputAttemptIdentifier 
> [inputIdentifier=InputIdentifier [inputIndex=0], attemptNumber=0, 
> pathComponent=attempt_142126204_0233_1_06_00_0_10003] to MEMORY, 
> CompressedSize=6757, DecompressedSize=16490,EndTime=1420705150289, 
> TimeTaken=5, Rate=1.29 MB/s
> 2015-01-08 00:19:10,290 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: All 
> inputs fetched for input vertex : Map 6
> 2015-01-08 00:19:10,290 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: copy(0 
> of 1. Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.01 MB/s)
> 2015-01-08 00:19:10,290 INFO [ShuffleRunner [Map_6]] impl.ShuffleManager: 
> Shutting down FetchScheduler, Was Interrupted: false
> 2015-01-08 00:19:10,290 INFO [ShuffleRunner [Map_6]] impl.ShuffleManager: 
> Scheduler thread completed
> 2015-01-08 00:19:41,986 INFO [TaskHeartbeatThread] task.TaskReporter: 
> Received should die response from AM
> 2015-01-08 00:19:41,986 INFO [TaskHeartbeatThread] task.TaskReporter: Asked 
> to die via task heartbeat
> 2015-01-08 00:19:41,987 INFO [main] task.TezTaskRunner: Interrupted while 
> waiting for task to complete. Interrupting task
> 2015-01-08 00:19:41,987 INFO [main] task.TezTaskRunner: Shutdown requested... 
> returning
> 2015-01-08 00:19:41,987 INFO [main] task.TezChild: Got a shouldDie 
> notification via hearbeats. Shutting down
> 2015-01-08 00:19:41,990 ERROR [TezChild] tez.TezProcessor: 
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
>   at 
> org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.awaitCondition(InputReadyTracker.java:120)
>   at 
> org.apache.tez.runtime.InputReadyTracker.waitForAnyInputReady(InputReadyTracker.java:83)
>   at 
> org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAnyInputReady(TezProcessorContextImpl.java:106)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:328)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> {code}
> From the initial look, it appears that TaskAttemptListenerImpTezDag.heartbeat 
> is unable to identify the containerId from registeredContainers.  Need to 
> verify this.
> I will attach the sample task log and the tez-ui details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1930) Merger (MemToDiskMerge) can exceed memory limits in a corner case

2015-01-08 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1930:
---

 Summary: Merger (MemToDiskMerge) can exceed memory limits in a 
corner case
 Key: TEZ-1930
 URL: https://issues.apache.org/jira/browse/TEZ-1930
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth


Merger.reserve allows one segment to go over the allocated memory. If the 
segment size is large, this can be problematic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269939#comment-14269939
 ] 

Siddharth Seth commented on TEZ-1923:
-

{code}
+  if (usedMemory > memoryLimit) {
+LOG.info("Starting inMemoryMerger's merge since usedMemory=" +
+memoryLimit + " > memoryLimit=" + memoryLimit +
+". commitMemory=" + commitMemory + ", mergeThreshold=" + 
mergeThreshold);
+startMemToDiskMerge();
+  }
{code}
This will, at best, attempt to start the memToDiskMerger - there's no guarantee 
that it'll actually run since one may already be in progress. It ends up not 
waiting for the MemToMemMerger to complete - which would free up some memory - 
and potentially trigger another merge based on thresholds. The usedMemory at 
this point will be determined by a race between the current thread and the 
memtomemmerge thread (whether the unconditional reserve has been done yet or 
not).  Meanwhile, Fetchers block in any case - since memory isn't available. I 
think it's better to leave this section of the patch out - to be fixed in the 
MemToMem merger jiras.

{code}
+  if ((usedMemory + mergeOutputSize) > memoryLimit) {
+LOG.info("Not enough memory to carry out mem-to-mem merging. 
usedMemory=" + usedMemory +
+" > memoryLimit=" + memoryLimit);
+return;
+  }
{code}
usedMemory may not be visible correctly - since it isn't inside the main 
MergeManager lock. This could also be part of the MemToMemMerger fixes.

{code}merger.waitForShuffleToMergeMemory();{code}
Would this be a problem in terms of connection timeouts - since this wait is 
while the connection is established. IThis could be in the run() method similar 
to merger.waitForInMemoryMerge() instead.

> FetcherOrderedGrouped gets into infinite loop due to memory pressure
> 
>
> Key: TEZ-1923
> URL: https://issues.apache.org/jira/browse/TEZ-1923
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1923.1.patch, TEZ-1923.2.patch
>
>
> - Ran a comparatively large job (temp table creation) at 10 TB scale.
> - Turned on intermediate mem-to-mem 
> (tez.runtime.shuffle.memory-to-memory.enable=true and 
> tez.runtime.shuffle.memory-to-memory.segments=4)
> - Some reducers get lots of data and quickly gets into infinite loop
> {code}
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
> 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
> 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
> orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
> url=http://m1:13562/mapOutput?job=job_142126204_0201&reduce=34&map=attempt_142126204_0201_1_00_000420_0_10027&keepAlive=true
>  sent hash and receievd reply 0 ms
> 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
> orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
> Status.WAIT ...
> {code}
> Additional debug/patch statements revealed that InMemoryMerge is not invoked 
> appropriately and not 

[jira] [Commented] (TEZ-1274) Remove Key/Value type checks in IFile

2015-01-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269872#comment-14269872
 ] 

Siddharth Seth commented on TEZ-1274:
-

+1. Looks good. Minor, please remove the unused constants - WRONG_KEY_CLASS etc 
before commit.

> Remove Key/Value type checks in IFile
> -
>
> Key: TEZ-1274
> URL: https://issues.apache.org/jira/browse/TEZ-1274
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1274.1.patch
>
>
> We check key and value types for each record - this should be removed from 
> the tight loop. Maybe an assertion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1904) Fix findbugs warnings in tez-runtime-library

2015-01-08 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1904:

Attachment: TEZ-1904.2.txt

Thanks for taking a look. Updated patch with comments addressed.

> Fix findbugs warnings in tez-runtime-library
> 
>
> Key: TEZ-1904
> URL: https://issues.apache.org/jira/browse/TEZ-1904
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Siddharth Seth
> Attachments: TEZ-1904.1.txt, TEZ-1904.2.txt
>
>
> https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1904) Fix findbugs warnings in tez-runtime-library

2015-01-08 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269485#comment-14269485
 ] 

Rajesh Balamohan commented on TEZ-1904:
---

Minor comments:
- In pipelinedSorter, comparator isn't used in SpanMerger's constructor.  
Should we remove it?
- In SecureShuffleUtils, toHex() is not used anywhere.  Should we remove it, if 
relevant?
- IFileInputStream.getChecksum() is not used anywhere. Should we remove it, if 
relevant?


> Fix findbugs warnings in tez-runtime-library
> 
>
> Key: TEZ-1904
> URL: https://issues.apache.org/jira/browse/TEZ-1904
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Siddharth Seth
> Attachments: TEZ-1904.1.txt
>
>
> https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1274) Remove Key/Value type checks in IFile

2015-01-08 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1274:
--
Attachment: TEZ-1274.1.patch

[~sseth] Please review when you find time.

Removed unwanted checks.  It is left to the caller to ensure that proper type 
checks and length checks are done (since its all in tez code, it should be fine 
removing these checks).  We also don't want the length checks (if negative 
key/value lengths are passed, outputstream would anyways throw 
IndexOutOfBoundsException).

> Remove Key/Value type checks in IFile
> -
>
> Key: TEZ-1274
> URL: https://issues.apache.org/jira/browse/TEZ-1274
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1274.1.patch
>
>
> We check key and value types for each record - this should be removed from 
> the tight loop. Maybe an assertion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1929) AM intermittently sending kill signal to running task in heartbeat

2015-01-08 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1929:
--
Attachment: tasklog.txt
Screen Shot 2015-01-08 at 2.09.11 PM.png
Screen Shot 2015-01-08 at 2.28.04 PM.png

> AM intermittently sending kill signal to running task in heartbeat
> --
>
> Key: TEZ-1929
> URL: https://issues.apache.org/jira/browse/TEZ-1929
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-08 at 2.09.11 PM.png, Screen Shot 
> 2015-01-08 at 2.28.04 PM.png, tasklog.txt
>
>
> Observed this behavior 3 or 4 times
> - Ran a hive query with tez (query_17 at 10 TB scale)
> - Occasionally, Map_7 task will get into failed state in the middle of 
> fetching data from other sources (only one task is available in Map_7).  
> {code}
> 2015-01-08 00:19:10,289 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: 
> Completed fetch for attempt: InputAttemptIdentifier 
> [inputIdentifier=InputIdentifier [inputIndex=0], attemptNumber=0, 
> pathComponent=attempt_142126204_0233_1_06_00_0_10003] to MEMORY, 
> CompressedSize=6757, DecompressedSize=16490,EndTime=1420705150289, 
> TimeTaken=5, Rate=1.29 MB/s
> 2015-01-08 00:19:10,290 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: All 
> inputs fetched for input vertex : Map 6
> 2015-01-08 00:19:10,290 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: copy(0 
> of 1. Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.01 MB/s)
> 2015-01-08 00:19:10,290 INFO [ShuffleRunner [Map_6]] impl.ShuffleManager: 
> Shutting down FetchScheduler, Was Interrupted: false
> 2015-01-08 00:19:10,290 INFO [ShuffleRunner [Map_6]] impl.ShuffleManager: 
> Scheduler thread completed
> 2015-01-08 00:19:41,986 INFO [TaskHeartbeatThread] task.TaskReporter: 
> Received should die response from AM
> 2015-01-08 00:19:41,986 INFO [TaskHeartbeatThread] task.TaskReporter: Asked 
> to die via task heartbeat
> 2015-01-08 00:19:41,987 INFO [main] task.TezTaskRunner: Interrupted while 
> waiting for task to complete. Interrupting task
> 2015-01-08 00:19:41,987 INFO [main] task.TezTaskRunner: Shutdown requested... 
> returning
> 2015-01-08 00:19:41,987 INFO [main] task.TezChild: Got a shouldDie 
> notification via hearbeats. Shutting down
> 2015-01-08 00:19:41,990 ERROR [TezChild] tez.TezProcessor: 
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
>   at 
> org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.awaitCondition(InputReadyTracker.java:120)
>   at 
> org.apache.tez.runtime.InputReadyTracker.waitForAnyInputReady(InputReadyTracker.java:83)
>   at 
> org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAnyInputReady(TezProcessorContextImpl.java:106)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:328)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> {code}
> From the initial look, it appears that TaskAttemptListenerImpTezDag.heartbeat 
> is unable to identify the containerId from registeredContainers.  Need to 
> verify this.
> I will attach the sample task log and the tez-ui details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1929) AM intermittently sending kill signal to running task in heartbeat

2015-01-08 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-1929:
-

 Summary: AM intermittently sending kill signal to running task in 
heartbeat
 Key: TEZ-1929
 URL: https://issues.apache.org/jira/browse/TEZ-1929
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


Observed this behavior 3 or 4 times

- Ran a hive query with tez (query_17 at 10 TB scale)
- Occasionally, Map_7 task will get into failed state in the middle of fetching 
data from other sources (only one task is available in Map_7).  

{code}
2015-01-08 00:19:10,289 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: 
Completed fetch for attempt: InputAttemptIdentifier 
[inputIdentifier=InputIdentifier [inputIndex=0], attemptNumber=0, 
pathComponent=attempt_142126204_0233_1_06_00_0_10003] to MEMORY, 
CompressedSize=6757, DecompressedSize=16490,EndTime=1420705150289, TimeTaken=5, 
Rate=1.29 MB/s
2015-01-08 00:19:10,290 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: All 
inputs fetched for input vertex : Map 6
2015-01-08 00:19:10,290 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: copy(0 
of 1. Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.01 MB/s)
2015-01-08 00:19:10,290 INFO [ShuffleRunner [Map_6]] impl.ShuffleManager: 
Shutting down FetchScheduler, Was Interrupted: false
2015-01-08 00:19:10,290 INFO [ShuffleRunner [Map_6]] impl.ShuffleManager: 
Scheduler thread completed
2015-01-08 00:19:41,986 INFO [TaskHeartbeatThread] task.TaskReporter: Received 
should die response from AM
2015-01-08 00:19:41,986 INFO [TaskHeartbeatThread] task.TaskReporter: Asked to 
die via task heartbeat
2015-01-08 00:19:41,987 INFO [main] task.TezTaskRunner: Interrupted while 
waiting for task to complete. Interrupting task
2015-01-08 00:19:41,987 INFO [main] task.TezTaskRunner: Shutdown requested... 
returning
2015-01-08 00:19:41,987 INFO [main] task.TezChild: Got a shouldDie notification 
via hearbeats. Shutting down
2015-01-08 00:19:41,990 ERROR [TezChild] tez.TezProcessor: 
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
at 
org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.awaitCondition(InputReadyTracker.java:120)
at 
org.apache.tez.runtime.InputReadyTracker.waitForAnyInputReady(InputReadyTracker.java:83)
at 
org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAnyInputReady(TezProcessorContextImpl.java:106)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:153)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:328)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
{code}

>From the initial look, it appears that TaskAttemptListenerImpTezDag.heartbeat 
>is unable to identify the containerId from registeredContainers.  Need to 
>verify this.

I will attach the sample task log and the tez-ui details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-1573) Exception from InputInitializer and VertexManagerPlugin is not propogated to client

2015-01-08 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang resolved TEZ-1573.
-
Resolution: Won't Fix

Fixed in other jiras.

> Exception from InputInitializer and VertexManagerPlugin is not propogated to 
> client
> ---
>
> Key: TEZ-1573
> URL: https://issues.apache.org/jira/browse/TEZ-1573
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1573) Exception from InputInitializer and VertexManagerPlugin is not propogated to client

2015-01-08 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268966#comment-14268966
 ] 

Jeff Zhang commented on TEZ-1573:
-

It has been resolved in TEZ-1703 & TEZ-1267. 



> Exception from InputInitializer and VertexManagerPlugin is not propogated to 
> client
> ---
>
> Key: TEZ-1573
> URL: https://issues.apache.org/jira/browse/TEZ-1573
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)