[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-05 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3655:

Attachment: YARN-3655.004.patch

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-05 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3655:

Attachment: (was: YARN-3655.004.patch)

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3778) Fix Yarn resourcemanger CLI usage

2015-06-05 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created YARN-3778:
--

 Summary: Fix Yarn resourcemanger CLI usage
 Key: YARN-3778
 URL: https://issues.apache.org/jira/browse/YARN-3778
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula


The usage message from code does not match with the one documented. 
1. "java ResourceManager " should be "yarn resourcemanager "

{code}
 private static void printUsage(PrintStream out) {
out.println("Usage: java ResourceManager [-format-state-store]");
out.println(""
+ "[-remove-application-from-state-store ]" + "\n");
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3706) Generalize native HBase writer for additional tables

2015-06-05 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated YARN-3706:
---
Attachment: YARN-3726-YARN-2928.006.patch

Added YARN-3726-YARN-2928.006.patch
Thanks for comments [~sjlee0]

Reason I had the relatively simple BaseTable.setTableName(...) is that it 
allows me to not have to "leak" the name of the configuration value for the 
table to be a public attribute. Do you think it is better to just have a public 
member on EntityTable and set the value directly, or to keep that private?

Wrt. EntityTable.java
I92 should be static. Not sure what you mean by this.

Reason I chose this to be static is to be able to have default behavior in the 
base that that is extends. I'd love for Java to be able to let me define 
default behavior for static methods and/or define method signatures (like in an 
interface) for static methods, but I can't seem to do that.
What I'm really after is a getInstance(). Would you prefer if getInstance 
simple news up a table instance each time, or are you generally against the 
pattern of being able to call getInstance()?

ColumnImpl.java does not implement the Column interface because it a) 
doesn't implement store(byte[], TypedBufferedMutator, Long, Object).
Also it isn't mean to be instantiated directly as a Column, it just is intended 
as the backing functionality for actual Column implementations. ColumnImpl is 
probably a poor table name choice. BaseColumn is also not the right name, 
because Columns wouldn't extent BaseColumn. Should I rename this to 
BackingColumn?

TimelineEntitySchemaConstants
I agree with the USERNAME_SPLITS being public. I've left a comment to remove 
this completely and have this read from the configuration. I think it would be 
better to provide a default property in a config file for this. This was in 
place in the code currently checked in and I did not tackle that in this patch. 
Would it be OK if I file a separate jira for this?

Wrt. TimelineWriterUtils.java re-use of encoded separators.
Hmm, good point. We use different separators for different situations 
(sometimes only encoding space, sometimes space and "!", sometimes space and 
"?").
Let me ponder to see if I can remove the encode method and use a join method 
instead. Columns are comprised of encoded parts (only some parts need to be 
encoded). I can see if I can make that happen without totally bastardizing the 
structure.
I could also do a check first to see if (token.contains(separator)) but that 
might end up being more expensive than encoding the token.
In the old code we always replaced spaces and separators with underscores, now 
we do it only for those parts that are really needed. I got to think about that 
a bit more.

Uploaded a new patch with the other things fixed.

Unit tests completed and verified entity read back.
Still think it is useful to add a read method that reads an entire entity back 
and have a assertsEqual(Entity a, Entity b) method somewhere and perhaps some 
additional read/write tests for edge cases (for example, entities with 
individual fields).
I can add them in this patch, or in a separate one.

> Generalize native HBase writer for additional tables
> 
>
> Key: YARN-3706
> URL: https://issues.apache.org/jira/browse/YARN-3706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Joep Rottinghuis
>Priority: Minor
> Attachments: YARN-3706-YARN-2928.001.patch, 
> YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, 
> YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch, 
> YARN-3726-YARN-2928.006.patch
>
>
> When reviewing YARN-3411 we noticed that we could change the class hierarchy 
> a little in order to accommodate additional tables easily.
> In order to get ready for benchmark testing we left the original layout in 
> place, as performance would not be impacted by the code hierarchy.
> Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-05 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575519#comment-14575519
 ] 

MENG DING commented on YARN-1197:
-

Just an update, I am currently working on:

YARN-1449, API in NM side to support change container resource
YARN-1643, ContainerMonitor changes in NM
YARN-1510, NMClient

I will append patches and drive discussions in each ticket.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

2015-06-05 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575490#comment-14575490
 ] 

James Taylor commented on YARN-2928:


Happy to help, [~gtCarrera9]. Thanks for the information.

bq. If I understand this correctly, in this case, Phoenix will inherit 
pre-split settings from HBase? Will this alter the existing HBase table, 
including its schema and/or data inside? In general, if one runs CREATE TABLE 
IF NOT EXISTS or simply CREATE TABLE commands over a pre-split existing HBase 
table, will Phoenix simply accept the existing table as-is?
If you create a table in Phoenix and the table already exists in HBase, Phoenix 
will accept the existing table as-is, adding any metadata it needs (i.e. it's 
coprocessors). If the table has existing data, then Phoenix will add an empty 
KeyValue to each row in the first column family referenced in the create table 
statement (or the default column family if there are no column families 
referenced). Phoenix needs this empty key value for a variety of reasons. The 
onus is on the user to ensure that the types in the create table statement 
match the actual means in which the data was serialized.

For your configuration/metric key-value pair, how are they named? Do you know 
the possible set of key values in advance? Or are they known more-or-less 
on-the-fly? One way you could model this with views is to just dynamically add 
the column to the view when you need to. Adding a column to a view is a very 
light weight operation - corresponding to a few Puts to the SYSTEM.CATALOG 
table. Then you'd have a way of looping through all metrics for a given view 
using the metadata APIs. Think of a view as a set of explicitly named dynamic 
columns. You'd still need to generate the SQL statement, though.

bq. One potential solution is to use HBase coprocessors to aggregate 
application data from the HBase storage, and then store them in a Phoenix 
aggregation table.
I'm not following. Are you thinking to have a secondary table that's a rollup 
aggregation of more raw data? Is that required, or is it more of a convenience 
for the user? If the raw data is Phoenix-queryable, then I think you have a lot 
of options. Can you point me to some more info on your design?

The stable APIs for Phoenix are the ones we expose through our public APIs: 
JDBC and our various integration modules (i.e. MapReduce, Pig, etc.). I'd say 
that our serialization format produced by PDataType is stable (it needs to be 
for us to meet our b/w compat guarantees) and the PDataType APIs are more 
stable than others. Also, we're looking to integrate with Apache Calcite, so we 
may have some other APIs that could be hooked into as well down the road.


> YARN Timeline Service: Next generation
> --
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
> TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2015-06-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated YARN-574:

Attachment: YARN-574.2.patch

Fixed syntax error.

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3777) Move all reservation-related tests from TestFairScheduler to TestFairSchedulerReservations.

2015-06-05 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3777:

Component/s: (was: fairscheduler)

> Move all reservation-related tests from TestFairScheduler to 
> TestFairSchedulerReservations.
> ---
>
> Key: YARN-3777
> URL: https://issues.apache.org/jira/browse/YARN-3777
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>
> As discussed at YARN-3655, Move all reservation-related tests from 
> TestFairScheduler to TestFairSchedulerReservations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-05 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575472#comment-14575472
 ] 

zhihai xu commented on YARN-3655:
-

[~kasha], thanks for the thorough review, I uploaded a new patch 
YARN-3655.004.patch which addressed your first comment.
And I created two follow up JIRAs YARN-3776 and YARN-3777 which addressed your 
second and third comments. Please review it. Many thanks.

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3777) Move all reservation-related tests from TestFairScheduler to TestFairSchedulerReservations.

2015-06-05 Thread zhihai xu (JIRA)
zhihai xu created YARN-3777:
---

 Summary: Move all reservation-related tests from TestFairScheduler 
to TestFairSchedulerReservations.
 Key: YARN-3777
 URL: https://issues.apache.org/jira/browse/YARN-3777
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler, test
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


As discussed at YARN-3655, Move all reservation-related tests from 
TestFairScheduler to TestFairSchedulerReservations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3748) Cleanup Findbugs volatile warnings

2015-06-05 Thread Gabor Liptak (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Liptak updated YARN-3748:
---
Attachment: YARN-3748.5.patch

> Cleanup Findbugs volatile warnings
> --
>
> Key: YARN-3748
> URL: https://issues.apache.org/jira/browse/YARN-3748
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gabor Liptak
>Priority: Minor
> Attachments: YARN-3748.1.patch, YARN-3748.2.patch, YARN-3748.3.patch, 
> YARN-3748.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3776) FairScheduler code refactoring to separate out the code paths for assigning a reserved container and a non-reserved container

2015-06-05 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3776:

Description: FairScheduler code refactoring, as discussed at YARN-3655, 
Separate out the code paths for assigning a reserved container and a 
non-reserved container.  (was: FairScheduler code refactoring  to separate out 
the code paths for assigning a reserved container and a non-reserved container.)

> FairScheduler code refactoring to separate out the code paths for assigning a 
> reserved container and a non-reserved container
> -
>
> Key: YARN-3776
> URL: https://issues.apache.org/jira/browse/YARN-3776
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> FairScheduler code refactoring, as discussed at YARN-3655, Separate out the 
> code paths for assigning a reserved container and a non-reserved container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3776) FairScheduler code refactoring to separate out the code paths for assigning a reserved container and a non-reserved container

2015-06-05 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3776:

Description: FairScheduler code refactoring  to separate out the code paths 
for assigning a reserved container and a non-reserved container.  (was: 
FairScheduler code refactoring  toSeparate out the code paths for assigning a 
reserved container and a non-reserved container.)

> FairScheduler code refactoring to separate out the code paths for assigning a 
> reserved container and a non-reserved container
> -
>
> Key: YARN-3776
> URL: https://issues.apache.org/jira/browse/YARN-3776
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> FairScheduler code refactoring  to separate out the code paths for assigning 
> a reserved container and a non-reserved container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3776) FairScheduler code refactoring to separate out the code paths for assigning a reserved container and a non-reserved container

2015-06-05 Thread zhihai xu (JIRA)
zhihai xu created YARN-3776:
---

 Summary: FairScheduler code refactoring to separate out the code 
paths for assigning a reserved container and a non-reserved container
 Key: YARN-3776
 URL: https://issues.apache.org/jira/browse/YARN-3776
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu


FairScheduler code refactoring  toSeparate out the code paths for assigning a 
reserved container and a non-reserved container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables

2015-06-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575435#comment-14575435
 ] 

Sangjin Lee commented on YARN-3706:
---

OK, I finally got around to making one pass. These are high level comments (the 
stack overflow issue notwithstanding). I generally agree with the approach 
taken here. This will make future implementation work on this a lot safer and 
easier with less duplication.

(BaseTable.java)
- l.102: how about requiring subclasses to provide the default table name along 
with the conf name and then provide the default implementation for 
getTableName()? For example,

{code}
private final String defaultTableName;

protected BaseTable(String tableNameConfName, String defaultTableName) {
  this.tableNameConfName = tableNameConfName;
  this.defaultTableName = defaultTableName;
}

...

public TableName getTableName(Configuration hbaseConf) {
  return TableName.valueOf(hbaseConf.get(tableNameConfName, defaultTableName));
}
{code}
- l.55: I'm not sure if I understand the rationale of the setTableName() 
method; it sounds more like a static helper method, but then it's really a 
trivial helper method; should it even be here?

(BufferedMutatorDelegator.java)
- nit: I would remove all the trivial method comments

(EntityTable.java)
- l.92: should be static
- l.111: just curious, is there a strong reason it has to be a singleton? I 
generally shun singletons (which also causes bit of a challenge with unit 
testst).

(ColumnImpl.java)
- It doesn't implement Column? Shouldn't it?
- l.57: it should have TypedBufferedMutator as opposed to 
TypedBufferedMutator, right?

(TimelineEntitySchemaConstants.java)
- l.67: nit: username_splits -> USERNAME_SPLITS
- findbugs will flag any public constants or methods that return the raw 
byte[]... See if you can live without them (or make them non-public)

(TimelineWriterUtils.java)
- l.72: do you think it'd be possible to do the separator encoding once and 
keep reusing it? It's probably not terribly expensive, but if it is in a 
critical path, its cost may add up


> Generalize native HBase writer for additional tables
> 
>
> Key: YARN-3706
> URL: https://issues.apache.org/jira/browse/YARN-3706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Joep Rottinghuis
>Priority: Minor
> Attachments: YARN-3706-YARN-2928.001.patch, 
> YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, 
> YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch
>
>
> When reviewing YARN-3411 we noticed that we could change the class hierarchy 
> a little in order to accommodate additional tables easily.
> In order to get ready for benchmark testing we left the original layout in 
> place, as performance would not be impacted by the code hierarchy.
> Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-05 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3655:

Attachment: YARN-3655.004.patch

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-05 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3655:

Attachment: (was: YARN-3655.004.patch)

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch, YARN-3655.003.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-05 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3655:

Attachment: YARN-3655.004.patch

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch, YARN-3655.003.patch, YARN-3655.004.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2716:
---
Attachment: yarn-2716-3.patch

New patch just moves the class members (fields and methods) around to put all 
curator-related methods together. 

> Refactor ZKRMStateStore retry code with Apache Curator
> --
>
> Key: YARN-2716
> URL: https://issues.apache.org/jira/browse/YARN-2716
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Karthik Kambatla
> Attachments: yarn-2716-1.patch, yarn-2716-2.patch, yarn-2716-3.patch, 
> yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch
>
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575417#comment-14575417
 ] 

Jian He commented on YARN-2716:
---

bq. creates and deletes the fencing node as well.
I see, thanks. 

> Refactor ZKRMStateStore retry code with Apache Curator
> --
>
> Key: YARN-2716
> URL: https://issues.apache.org/jira/browse/YARN-2716
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Karthik Kambatla
> Attachments: yarn-2716-1.patch, yarn-2716-2.patch, 
> yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch
>
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3775) Job does not exit after all node become unhealthy

2015-06-05 Thread Chengshun Xia (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengshun Xia updated YARN-3775:

Attachment: logs.tar.gz

log of resource manager and node manager,  /etc/hadoop
command output of terasort 

> Job does not exit after all node become unhealthy
> -
>
> Key: YARN-3775
> URL: https://issues.apache.org/jira/browse/YARN-3775
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
> Environment: Environment:
> Version : 2.7.0
> OS: RHEL7 
> NameNodes:  xiachsh11 xiachsh12 (HA enabled)
> DataNodes:  5 xiachsh13-17
> ResourceManage:  xiachsh11
> NodeManage: 5 xiachsh13-17 
> all nodes are openstack provisioned:  
> MEM: 1.5G 
> Disk: 16G 
>Reporter: Chengshun Xia
> Attachments: logs.tar.gz
>
>
> Running Terasort with data size 10G, all the containers exit since the disk 
> space threshold 0.90 reached,at this point,the job does not exit with error 
> 15/06/05 13:13:28 INFO mapreduce.Job:  map 9% reduce 0%
> 15/06/05 13:13:52 INFO mapreduce.Job:  map 10% reduce 0%
> 15/06/05 13:14:30 INFO mapreduce.Job:  map 11% reduce 0%
> 15/06/05 13:15:11 INFO mapreduce.Job:  map 12% reduce 0%
> 15/06/05 13:15:43 INFO mapreduce.Job:  map 13% reduce 0%
> 15/06/05 13:16:38 INFO mapreduce.Job:  map 14% reduce 0%
> 15/06/05 13:16:41 INFO mapreduce.Job:  map 15% reduce 0%
> 15/06/05 13:16:53 INFO mapreduce.Job:  map 16% reduce 0%
> 15/06/05 13:17:24 INFO mapreduce.Job:  map 17% reduce 0%
> 15/06/05 13:17:53 INFO mapreduce.Job:  map 18% reduce 0%
> 15/06/05 13:18:36 INFO mapreduce.Job:  map 19% reduce 0%
> 15/06/05 13:19:03 INFO mapreduce.Job:  map 20% reduce 0%
> 15/06/05 13:19:09 INFO mapreduce.Job:  map 15% reduce 0%
> 15/06/05 13:19:32 INFO mapreduce.Job:  map 16% reduce 0%
> 15/06/05 13:20:00 INFO mapreduce.Job:  map 17% reduce 0%
> 15/06/05 13:20:36 INFO mapreduce.Job:  map 18% reduce 0%
> 15/06/05 13:20:57 INFO mapreduce.Job:  map 19% reduce 0%
> 15/06/05 13:21:22 INFO mapreduce.Job:  map 18% reduce 0%
> 15/06/05 13:21:24 INFO mapreduce.Job:  map 14% reduce 0%
> 15/06/05 13:21:25 INFO mapreduce.Job:  map 9% reduce 0%
> 15/06/05 13:21:28 INFO mapreduce.Job:  map 10% reduce 0%
> 15/06/05 13:22:22 INFO mapreduce.Job:  map 11% reduce 0%
> 15/06/05 13:23:06 INFO mapreduce.Job:  map 12% reduce 0%
> 15/06/05 13:23:41 INFO mapreduce.Job:  map 9% reduce 0%
> 15/06/05 13:23:42 INFO mapreduce.Job:  map 5% reduce 0%
> 15/06/05 13:24:38 INFO mapreduce.Job:  map 6% reduce 0%
> 15/06/05 13:25:16 INFO mapreduce.Job:  map 7% reduce 0%
> 15/06/05 13:25:53 INFO mapreduce.Job:  map 8% reduce 0%
> 15/06/05 13:26:35 INFO mapreduce.Job:  map 9% reduce 0%
> the last response time is  15/06/05 13:26:35
> and current time :
> [root@xiachsh11 logs]# date
> Fri Jun  5 19:19:59 EDT 2015
> [root@xiachsh11 logs]#
> [root@xiachsh11 logs]# yarn node -list
> 15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at 
> xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032
> Total Nodes:0
>  Node-Id Node-State Node-Http-Address   
> Number-of-Running-Containers
> [root@xiachsh11 logs]#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575406#comment-14575406
 ] 

Karthik Kambatla commented on YARN-2716:


bq. removeApplicationStateInternal can also use the 
curatorFramework.delete().deletingChildrenIfNeeded() instead of adding all 
children manually ?
safeDelete adds the nodes to a transaction that creates and deletes the fencing 
node as well. Curator transactions don't support {{deletingChildrenIfNeeded}} 
yet. 

> Refactor ZKRMStateStore retry code with Apache Curator
> --
>
> Key: YARN-2716
> URL: https://issues.apache.org/jira/browse/YARN-2716
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Karthik Kambatla
> Attachments: yarn-2716-1.patch, yarn-2716-2.patch, 
> yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch
>
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575395#comment-14575395
 ] 

Jian He commented on YARN-2716:
---

bq. safeDelete checks if the znode exists before attempting to delete it. So, 
shouldn't throw NoNodeException.
ah, right sorry, I overlooked the implementation of the method

only comment is :
- removeApplicationStateInternal can also use the 
{{curatorFramework.delete().deletingChildrenIfNeeded()}} instead of adding all 
children manually ?

> Refactor ZKRMStateStore retry code with Apache Curator
> --
>
> Key: YARN-2716
> URL: https://issues.apache.org/jira/browse/YARN-2716
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Karthik Kambatla
> Attachments: yarn-2716-1.patch, yarn-2716-2.patch, 
> yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch
>
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3775) Job does not exit after all node become unhealthy

2015-06-05 Thread Chengshun Xia (JIRA)
Chengshun Xia created YARN-3775:
---

 Summary: Job does not exit after all node become unhealthy
 Key: YARN-3775
 URL: https://issues.apache.org/jira/browse/YARN-3775
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1
 Environment: Environment:
Version : 2.7.0
OS: RHEL7 
NameNodes:  xiachsh11 xiachsh12 (HA enabled)
DataNodes:  5 xiachsh13-17
ResourceManage:  xiachsh11
NodeManage: 5 xiachsh13-17 
all nodes are openstack provisioned:  
MEM: 1.5G 
Disk: 16G 

Reporter: Chengshun Xia


Running Terasort with data size 10G, all the containers exit since the disk 
space threshold 0.90 reached,at this point,the job does not exit with error 
15/06/05 13:13:28 INFO mapreduce.Job:  map 9% reduce 0%
15/06/05 13:13:52 INFO mapreduce.Job:  map 10% reduce 0%
15/06/05 13:14:30 INFO mapreduce.Job:  map 11% reduce 0%
15/06/05 13:15:11 INFO mapreduce.Job:  map 12% reduce 0%
15/06/05 13:15:43 INFO mapreduce.Job:  map 13% reduce 0%
15/06/05 13:16:38 INFO mapreduce.Job:  map 14% reduce 0%
15/06/05 13:16:41 INFO mapreduce.Job:  map 15% reduce 0%
15/06/05 13:16:53 INFO mapreduce.Job:  map 16% reduce 0%
15/06/05 13:17:24 INFO mapreduce.Job:  map 17% reduce 0%
15/06/05 13:17:53 INFO mapreduce.Job:  map 18% reduce 0%
15/06/05 13:18:36 INFO mapreduce.Job:  map 19% reduce 0%
15/06/05 13:19:03 INFO mapreduce.Job:  map 20% reduce 0%
15/06/05 13:19:09 INFO mapreduce.Job:  map 15% reduce 0%
15/06/05 13:19:32 INFO mapreduce.Job:  map 16% reduce 0%
15/06/05 13:20:00 INFO mapreduce.Job:  map 17% reduce 0%
15/06/05 13:20:36 INFO mapreduce.Job:  map 18% reduce 0%
15/06/05 13:20:57 INFO mapreduce.Job:  map 19% reduce 0%
15/06/05 13:21:22 INFO mapreduce.Job:  map 18% reduce 0%
15/06/05 13:21:24 INFO mapreduce.Job:  map 14% reduce 0%
15/06/05 13:21:25 INFO mapreduce.Job:  map 9% reduce 0%
15/06/05 13:21:28 INFO mapreduce.Job:  map 10% reduce 0%
15/06/05 13:22:22 INFO mapreduce.Job:  map 11% reduce 0%
15/06/05 13:23:06 INFO mapreduce.Job:  map 12% reduce 0%
15/06/05 13:23:41 INFO mapreduce.Job:  map 9% reduce 0%
15/06/05 13:23:42 INFO mapreduce.Job:  map 5% reduce 0%
15/06/05 13:24:38 INFO mapreduce.Job:  map 6% reduce 0%
15/06/05 13:25:16 INFO mapreduce.Job:  map 7% reduce 0%
15/06/05 13:25:53 INFO mapreduce.Job:  map 8% reduce 0%
15/06/05 13:26:35 INFO mapreduce.Job:  map 9% reduce 0%



the last response time is  15/06/05 13:26:35
and current time :
[root@xiachsh11 logs]# date
Fri Jun  5 19:19:59 EDT 2015
[root@xiachsh11 logs]#

[root@xiachsh11 logs]# yarn node -list
15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at 
xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032
Total Nodes:0
 Node-Id Node-State Node-Http-Address   
Number-of-Running-Containers
[root@xiachsh11 logs]#







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

2015-06-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575358#comment-14575358
 ] 

Li Lu commented on YARN-2928:
-

Hi [~jamestaylor]

Thank you very much for your suggestions and PHOENIX-2028! I wrote the 
experimental Phoenix writer code and currently have some follow up questions 
w.r.t your comments. 

bq. The easiest is probably to create the HBase table the same way (through 
code or using the HBase shell) with the KeyPrefixRegionSplitPolicy specified at 
create time. Then, in Phoenix you can issue a CREATE TABLE statement against 
the existing HBase table and it'll just map to it. Then you'll have your split 
policy for your benchmark in both write paths.

If I understand this correctly, in this case, Phoenix will inherit pre-split 
settings from HBase? Will this alter the existing HBase table, including its 
schema and/or data inside? In general, if one runs CREATE TABLE IF NOT EXISTS 
or simply CREATE TABLE commands over a pre-split existing HBase table, will 
Phoenix simply accept the existing table as-is? 

bq. An alternative to dynamic columns is to define views over your Phoenix 
table (http://phoenix.apache.org/views.html).

I once looked at views but I'm not sure if that fits our write path use case 
well. Let me briefly talk about our use case in YARN first. In general, we 
would like to dynamically store the configuration and metrics for each YARN 
timeline entity in a Phoenix database, such that our timeline reader apps or 
users can use SQL to query historical data. Phoenix view may make a perfect 
solution for the reader use cases. However, we are hitting problems on the 
writer side. We store each configuration/metric key-value pair in a dynamic 
column. This causes us two main troubles. First, we need to use a dynamically 
generated SQL statement to write to the Phoenix table which is cumbersome and 
error-prone. Second, when performing aggregations, we need to aggregate on all 
available metrics for an application (or a user, flow), but we cannot simply 
iterate on those dynamic columns because there is no such API. I'm not sure how 
to resolve these two problems via Phoenix view, or via existing Phoenix APIs. 
Actually, I suspect that if it's possible to fall back to the HBase-style APIs, 
our writer path would be much simpler. 

bq. If you do end up going with a direct HBase write path, I'd encourage you to 
use the Phoenix serialization format (through PDataType and derived classes) to 
ensure you can do adhoc querying on the data.

We're currently looking into this method in the aggregation part. We're doing 
our best to support SQL on the aggregated data by using Phoenix. One potential 
solution is to use HBase coprocessors to aggregate application data from the 
HBase storage, and then store them in a Phoenix aggregation table. However, if 
we want to keep aggregating on the Phoenix table, can we also write a HBase 
coprocessor that read the Phoenix PDataTypes, and aggregate them into other 
Phoenix tables? If it's possible, are there any stable (or "safe") APIs for 
PDataTypes?

A slightly more generalized question here is, is SQL the _only_ API for 
Phoenix, or there may be more? I ask this question because from a YARN timeline 
service perspective, Phoenix is a nice tool through which we can easily add SQL 
support to our final users, but we may not necessarily use SQL to program it 
all the time? 

Thank you very much for your comments and help from the Phoenix side. Our 
current Phoenix writer is more of an experimental version, but we really hope 
to have something for our aggregators and readers in near future. 


> YARN Timeline Service: Next generation
> --
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
> TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575163#comment-14575163
 ] 

Hadoop QA commented on YARN-2716:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 18s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 35s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  5s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 57s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 31s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  50m 47s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 58s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738007/yarn-2716-2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7588585 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8200/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8200/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8200/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8200/console |


This message was automatically generated.

> Refactor ZKRMStateStore retry code with Apache Curator
> --
>
> Key: YARN-2716
> URL: https://issues.apache.org/jira/browse/YARN-2716
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Karthik Kambatla
> Attachments: yarn-2716-1.patch, yarn-2716-2.patch, 
> yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch
>
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-38) Add an option to drain the ResourceManager of all apps for upgrades

2015-06-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-38.
--
Resolution: Won't Fix

Now that we have work-preserving RM/NM restarts, rolling restarts/upgrades 
shouldn't require draining apps from the RM.

YARN-914 addresses draining jobs when a node is decommissioned.

> Add an option to drain the ResourceManager of all apps for upgrades
> ---
>
> Key: YARN-38
> URL: https://issues.apache.org/jira/browse/YARN-38
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> MAPREDUCE-4575 for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575102#comment-14575102
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7977 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7977/])
YARN-1462. AHS API and other AHS changes to handle tags for completed MR jobs. 
Contributed by Xuan Gong (xgong: rev 3e000a919fede85230fcfb06309a1f1d5e0c479c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt


> AHS API and other AHS changes to handle tags for completed MR jobs
> --
>
> Key: YARN-1462
> URL: https://issues.apache.org/jira/browse/YARN-1462
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-1462-branch-2.7-1.2.patch, 
> YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
> YARN-1462.3.patch, YARN-1462.4.patch
>
>
> AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-05 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575094#comment-14575094
 ] 

Xuan Gong commented on YARN-1462:
-

Committed into trunk/branch-2. Thanks, zhijie for review, and Sergey for 
verification.

> AHS API and other AHS changes to handle tags for completed MR jobs
> --
>
> Key: YARN-1462
> URL: https://issues.apache.org/jira/browse/YARN-1462
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-1462-branch-2.7-1.2.patch, 
> YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
> YARN-1462.3.patch, YARN-1462.4.patch
>
>
> AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

2015-06-05 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575059#comment-14575059
 ] 

Vrushali C commented on YARN-2928:
--


Hi [~jamestaylor]

Thank you for taking the time to look through the write up and for filing  
PHOENIX-2028.

In the context of pre-splits, yes, we wanted to have both writers write to 
tables that were pre-split with the same presplit strategy. However, I believe 
the folks working on the Phoenix writer mentioned that the only way  to achieve 
in Phoenix that was  to use SPLIT ON substatement, which required that approach 
to rewrite the HBase presplitting strategy. Perhaps [~gtCarrera9] might be able 
to speak to this better. 

bq. I'd encourage you to use the Phoenix serialization format (through 
PDataType and derived classes) to ensure you can do adhoc querying on the data
Okay, thanks, I will check that out. We are working on a whole set of 
enhancements for the base writer as well and I will look at this. 

bq. The most important aspect is how your row key is written and the separators 
you use if you're storing multiple values in the row key.
You’ve hit the nail on the head. We do have multiple values with different 
datatypes in row key as well as in column names with and without prefixes, so 
we have different datatypes and bunch of separators. [~jrottinghuis] has been 
addressing these points in YARN-3706 , for e.g. dealing with storing and 
parsing byte representations of separators. 

The timeline service schema has more tables and we are considering storing 
aggregated values in these Phoenix based tables (current thinking is to have 
them populated via co-processors watching the basic entity table).  Thanks for 
suggesting defining views on Phoenix tables, I will look up more details on 
that. 

Thanks once again,
Vrushali

> YARN Timeline Service: Next generation
> --
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
> TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables

2015-06-05 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575025#comment-14575025
 ] 

Joep Rottinghuis commented on YARN-3706:


Oops, I introduced an infinite loop:
{noformat}
java.lang.StackOverflowError
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineWriterUtils.joinEncoded(TimelineWriterUtils.java:185)
{noformat}
I'll fix that, it makes the writer a bit slow...


> Generalize native HBase writer for additional tables
> 
>
> Key: YARN-3706
> URL: https://issues.apache.org/jira/browse/YARN-3706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Joep Rottinghuis
>Priority: Minor
> Attachments: YARN-3706-YARN-2928.001.patch, 
> YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, 
> YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch
>
>
> When reviewing YARN-3411 we noticed that we could change the class hierarchy 
> a little in order to accommodate additional tables easily.
> In order to get ready for benchmark testing we left the original layout in 
> place, as performance would not be impacted by the code hierarchy.
> Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574996#comment-14574996
 ] 

Sergey Shelukhin commented on YARN-1462:


Patch looks like it won't break Tez

> AHS API and other AHS changes to handle tags for completed MR jobs
> --
>
> Key: YARN-1462
> URL: https://issues.apache.org/jira/browse/YARN-1462
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-1462-branch-2.7-1.2.patch, 
> YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
> YARN-1462.3.patch, YARN-1462.4.patch
>
>
> AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2015-06-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574993#comment-14574993
 ] 

Karthik Kambatla commented on YARN-3774:


Fair point, Sean. We should hold off until Hadoop 3, primarily to handle 
potential compat issues. 

> ZKRMStateStore should use Curator 3.0 and avail CuratorOp
> -
>
> Key: YARN-3774
> URL: https://issues.apache.org/jira/browse/YARN-3774
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>
> YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
> somewhat involved, and could be improved using CuratorOp introduced in 
> Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
> and make this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

2015-06-05 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574991#comment-14574991
 ] 

James Taylor commented on YARN-2928:


Nice writeup, [~vrushalic]. For your benchmarks, if you're pre-splitting for 
the HBase direct write path but not for the Phoenix write path, you're not 
really comparing apples-to-apples. There are a number of ways you can install 
your KeyPrefixRegionSplitPolicy in Phoenix. The easiest is probably to create 
the HBase table the same way (through code or using the HBase shell) with the 
KeyPrefixRegionSplitPolicy specified at create time. Then, in Phoenix you can 
issue a CREATE TABLE statement against the existing HBase table and  it'll just 
map to it. Then you'll have your split policy for your benchmark in both write 
paths.

An alternative to dynamic columns is to define views over your Phoenix table 
(http://phoenix.apache.org/views.html). In each view, you could specify the set 
of columns it contains. Then you can use the regular JDBC metadata APIs to get 
the set of columns that define your view: 
http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getColumns%28java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String%29

Another interesting angle with views (not sure if this is relevant for your use 
case or not), but they're capable of being multi-tenant where the definition of 
the "tenant" is up to you (maybe it would map to a User?). In this case, each 
tenant can define their own derived view and add columns specific to their 
usage. You can even create secondary indexes over a view. This is the way 
Phoenix surfaces NoSQL in the SQL world. More here: 
http://phoenix.apache.org/multi-tenancy.html

There is room for improvement in the Phoenix write path, though. I've filed 
PHOENIX-2028 and plan to work on that shortly.

If you do end up going with a direct HBase write path, I'd encourage you to use 
the Phoenix serialization format (through PDataType and derived classes) to 
ensure you can do adhoc querying on the data. The most important aspect is how 
your row key is written and the separators you use if you're storing multiple 
values in the row key.

> YARN Timeline Service: Next generation
> --
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
> TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2015-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574988#comment-14574988
 ] 

Hadoop QA commented on YARN-574:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 17s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 18s | The applied patch generated  2 
new checkstyle issues (total was 213, now 214). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m  4s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  47m 49s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737994/YARN-574.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7588585 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8199/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8199/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8199/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8199/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8199/console |


This message was automatically generated.

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-574.1.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2716:
---
Attachment: yarn-2716-2.patch

New patch to address review comments.

[~jianhe] - once you are comfortable with the patch, I would like to move 
methods in {{ZKRMStateStore}} around to put all Curator-related methods 
together towards the end for better readability. 

> Refactor ZKRMStateStore retry code with Apache Curator
> --
>
> Key: YARN-2716
> URL: https://issues.apache.org/jira/browse/YARN-2716
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Karthik Kambatla
> Attachments: yarn-2716-1.patch, yarn-2716-2.patch, 
> yarn-2716-prelim.patch, yarn-2716-prelim.patch, yarn-2716-super-prelim.patch
>
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574979#comment-14574979
 ] 

Karthik Kambatla commented on YARN-2716:


Thanks for the thorough review, Jian. Sorry for missing some simple things in 
the patch. 

bq. will the safeDelete throw noNodeExist exception if deleting a non-existing 
zone?
safeDelete checks if the znode exists before attempting to delete it. So, 
shouldn't throw NoNodeException.

bq. why in HA case, zkRetryInterval is calculated as below
When HA is not enabled, we should give the store as much time as possible to 
connect to ZK. When HA is enabled, it is possible the other RM has better 
chance of connecting to ZK; so, we should give up trying by session-timeout. 
YARN-2054 has all the details.

Posting a patch shortly to address all the review feedback.

> Refactor ZKRMStateStore retry code with Apache Curator
> --
>
> Key: YARN-2716
> URL: https://issues.apache.org/jira/browse/YARN-2716
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Karthik Kambatla
> Attachments: yarn-2716-1.patch, yarn-2716-prelim.patch, 
> yarn-2716-prelim.patch, yarn-2716-super-prelim.patch
>
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2015-06-05 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574957#comment-14574957
 ] 

Sean Busbey commented on YARN-3774:
---

Please make sure to document the impact of moving to Curator 3. The last time 
we updated the curator version (2.6.0 -> 2.7.1) they broke compatibility.

> ZKRMStateStore should use Curator 3.0 and avail CuratorOp
> -
>
> Key: YARN-3774
> URL: https://issues.apache.org/jira/browse/YARN-3774
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>
> YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
> somewhat involved, and could be improved using CuratorOp introduced in 
> Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
> and make this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher

2015-06-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574935#comment-14574935
 ] 

Jian He commented on YARN-3508:
---

I think this make sense to move preemption events from main dispatcher to the 
scheduler dispatcher. Otherwise, any non-scheduler events on main dispatcher 
will be waiting for the preemption events to grab the scheduler lock, which is 
unnecessary. 

> Preemption processing occuring on the main RM dispatcher
> 
>
> Key: YARN-3508
> URL: https://issues.apache.org/jira/browse/YARN-3508
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-3508.002.patch, YARN-3508.01.patch
>
>
> We recently saw the RM for a large cluster lag far behind on the 
> AsyncDispacher event queue.  The AsyncDispatcher thread was consistently 
> blocked on the highly-contended CapacityScheduler lock trying to dispatch 
> preemption-related events for RMContainerPreemptEventDispatcher.  Preemption 
> processing should occur on the scheduler event dispatcher thread or a 
> separate thread to avoid delaying the processing of other events in the 
> primary dispatcher queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-06-05 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574928#comment-14574928
 ] 

Ashwin Shankar commented on YARN-3453:
--

hey folks, Looking into the patch, will get back with comments.

> Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
> even in DRF mode causing thrashing
> 
>
> Key: YARN-3453
> URL: https://issues.apache.org/jira/browse/YARN-3453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Ashwin Shankar
>Assignee: Arun Suresh
> Attachments: YARN-3453.1.patch, YARN-3453.2.patch
>
>
> There are two places in preemption code flow where DefaultResourceCalculator 
> is used, even in DRF mode.
> Which basically results in more resources getting preempted than needed, and 
> those extra preempted containers aren’t even getting to the “starved” queue 
> since scheduling logic is based on DRF's Calculator.
> Following are the two places :
> 1. {code:title=FSLeafQueue.java|borderStyle=solid}
> private boolean isStarved(Resource share)
> {code}
> A queue shouldn’t be marked as “starved” if the dominant resource usage
> is >=  fair/minshare.
> 2. {code:title=FairScheduler.java|borderStyle=solid}
> protected Resource resToPreempt(FSLeafQueue sched, long curTime)
> {code}
> --
> One more thing that I believe needs to change in DRF mode is : during a 
> preemption round,if preempting a few containers results in satisfying needs 
> of a resource type, then we should exit that preemption round, since the 
> containers that we just preempted should bring the dominant resource usage to 
> min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3259) FairScheduler: Trigger fairShare updates on node events

2015-06-05 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574918#comment-14574918
 ] 

Anubhav Dhoot commented on YARN-3259:
-

Thanks [~kasha] for review and commit!

> FairScheduler: Trigger fairShare updates on node events
> ---
>
> Key: YARN-3259
> URL: https://issues.apache.org/jira/browse/YARN-3259
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3259.001.patch, YARN-3259.002.patch, 
> YARN-3259.003.patch
>
>
> Instead of waiting for update interval unconditionally, we can trigger early 
> updates on important events - for eg node join and leave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1502) Protocol changes in RM side to support change container resource

2015-06-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-1502.
--
Resolution: Duplicate

This one is actually duplicated with YARN-1646, "protocol changes" are already 
done.

> Protocol changes in RM side to support change container resource
> 
>
> Key: YARN-1502
> URL: https://issues.apache.org/jira/browse/YARN-1502
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Wangda Tan (No longer used)
> Attachments: yarn-1502.1.patch, yarn-1502.2.patch
>
>
> This JIRA is to track protocol (including ApplicationMasterProtocol and 
> ApplicationClientProtocol) changes to support change container resource in RM 
> side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2015-06-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated YARN-574:

Release Note: YARN-574. Allow parallel download of resources in 
PrivateLocalizer. Contributed by Zheng Shao.  (was: YARN-543. Allow parallel 
download of resources in PrivateLocalizer. Contributed by Zheng Shao.)

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-574.1.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2015-06-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated YARN-467:

Attachment: (was: YARN-574.1.patch)

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Fix For: 2.1.0-beta
>
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
> yarn-467-20130402.2.patch, yarn-467-20130402.patch, yarn-467-testCode.tar
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2015-06-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated YARN-574:

Attachment: YARN-574.1.patch

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-574.1.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2015-06-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated YARN-467:

Attachment: YARN-574.1.patch

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Fix For: 2.1.0-beta
>
> Attachments: YARN-574.1.patch, yarn-467-20130322.1.patch, 
> yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, 
> yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, 
> yarn-467-20130328.patch, yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
> yarn-467-20130402.2.patch, yarn-467-20130402.patch, yarn-467-testCode.tar
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2015-06-05 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-3774:
--

 Summary: ZKRMStateStore should use Curator 3.0 and avail CuratorOp
 Key: YARN-3774
 URL: https://issues.apache.org/jira/browse/YARN-3774
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker


YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
somewhat involved, and could be improved using CuratorOp introduced in Curator 
3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version and make 
this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3259) FairScheduler: Trigger fairShare updates on node events

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574817#comment-14574817
 ] 

Hudson commented on YARN-3259:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7976 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7976/])
YARN-3259. FairScheduler: Trigger fairShare updates on node events. (Anubhav 
Dhoot via kasha) (kasha: rev 75885852cc19dd6de12e62498b112d5d70ce87f4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingUpdate.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java


> FairScheduler: Trigger fairShare updates on node events
> ---
>
> Key: YARN-3259
> URL: https://issues.apache.org/jira/browse/YARN-3259
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-3259.001.patch, YARN-3259.002.patch, 
> YARN-3259.003.patch
>
>
> Instead of waiting for update interval unconditionally, we can trigger early 
> updates on important events - for eg node join and leave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3259) FairScheduler: Trigger fairShare updates on node events

2015-06-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3259:
---
Summary: FairScheduler: Trigger fairShare updates on node events  (was: 
FairScheduler: Update to fairShare could be triggered early on node events 
instead of waiting for update interval )

> FairScheduler: Trigger fairShare updates on node events
> ---
>
> Key: YARN-3259
> URL: https://issues.apache.org/jira/browse/YARN-3259
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3259.001.patch, YARN-3259.002.patch, 
> YARN-3259.003.patch
>
>
> Instead of waiting for update interval unconditionally, we can trigger early 
> updates on important events - for eg node join and leave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher

2015-06-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574753#comment-14574753
 ] 

Jason Lowe commented on YARN-3508:
--

Yes, it's not a cure-all to move the preemption processing to the scheduler 
event queue when the scheduler is the bottleneck, but we do have separate event 
queues for a reason.  If it didn't matter who was the bottleneck then we'd just 
have one event queue for everything, correct?  The scheduler event queue is 
primarily blocked by the big scheduler lock, and IMHO we should dispatch events 
that need that lock to that queue.  Doing otherwise starts to couple the two 
event dispatchers together and we might as well just have the one event queue 
to rule them all.

> Preemption processing occuring on the main RM dispatcher
> 
>
> Key: YARN-3508
> URL: https://issues.apache.org/jira/browse/YARN-3508
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-3508.002.patch, YARN-3508.01.patch
>
>
> We recently saw the RM for a large cluster lag far behind on the 
> AsyncDispacher event queue.  The AsyncDispatcher thread was consistently 
> blocked on the highly-contended CapacityScheduler lock trying to dispatch 
> preemption-related events for RMContainerPreemptEventDispatcher.  Preemption 
> processing should occur on the scheduler event dispatcher thread or a 
> separate thread to avoid delaying the processing of other events in the 
> primary dispatcher queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-06-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574746#comment-14574746
 ] 

Karthik Kambatla commented on YARN-3655:


Thanks for the clarifications, Zhihai. The latest patch looks mostly good, nice 
test. Few nit picks before we get this in:
# In hasContainerForNode, the patch has some spurious changes. Also, would be 
nice to add a comment for the newly added check.
# File a follow-up JIRA to separate out the code paths for assigning a reserved 
container and a non-reserved container. 
# File a follow-up JIRA to move all reservation-related tests from 
TestFairScheduler to TestFairSchedulerReservations

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch, YARN-3655.003.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval

2015-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574665#comment-14574665
 ] 

Hadoop QA commented on YARN-3259:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 59s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 46s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m  2s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  87m 54s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737948/YARN-3259.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 790a861 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8198/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8198/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8198/console |


This message was automatically generated.

> FairScheduler: Update to fairShare could be triggered early on node events 
> instead of waiting for update interval 
> --
>
> Key: YARN-3259
> URL: https://issues.apache.org/jira/browse/YARN-3259
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3259.001.patch, YARN-3259.002.patch, 
> YARN-3259.003.patch
>
>
> Instead of waiting for update interval unconditionally, we can trigger early 
> updates on important events - for eg node join and leave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574658#comment-14574658
 ] 

Hudson commented on YARN-3733:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #217 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/217/])
YARN-3733. Fix DominantRC#compare() does not work as expected if cluster 
resource is empty. (Rohith Sharmaks via wangda) (wangda: rev 
ebd797c48fe236b404cf3a125ac9d1f7714e291e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
Add missing test file of YARN-3733 (wangda: rev 
405bbcf68c32d8fd8a83e46e686eacd14e5a533c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java


> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574655#comment-14574655
 ] 

Hudson commented on YARN-2392:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #217 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/217/])
YARN-2392. Add more diags about app retry limits on AM failures. Contributed by 
Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


> add more diags about app retry limits on AM failures
> 
>
> Key: YARN-2392
> URL: https://issues.apache.org/jira/browse/YARN-2392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
> YARN-2392-002.patch
>
>
> # when an app fails the failure count is shown, but not what the global + 
> local limits are. If the two are different, they should both be printed. 
> # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574648#comment-14574648
 ] 

Hudson commented on YARN-3764:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #217 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/217/])
YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to 
another. Contributed by Wangda Tan (jianhe: rev 
6ad4e59cfc111a92747fdb1fb99cc6378044832a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* hadoop-yarn-project/CHANGES.txt


> CapacityScheduler should forbid moving LeafQueue from one parent to another
> ---
>
> Key: YARN-3764
> URL: https://issues.apache.org/jira/browse/YARN-3764
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: YARN-3764.1.patch
>
>
> Currently CapacityScheduler doesn't handle the case well, for example:
> A queue structure:
> {code}
> root
>   |
>   a (100)
> /   \
>x y
>   (50)   (50)
> {code}
> And reinitialize using following structure:
> {code}
>  root
>  /   \ 
> (50)a x (50)
> |
> y
>(100)
> {code}
> The actual queue structure after reinitialize is:
> {code}
>  root
> /\
>a (50) x (50)
>   /  \
>  xy
> (50)  (100)
> {code}
> We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574652#comment-14574652
 ] 

Hudson commented on YARN-3766:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #217 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/217/])
YARN-3766. Fixed the apps table column error of generic history web UI. 
Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java


> ATS Web UI breaks because of YARN-3467
> --
>
> Key: YARN-3766
> URL: https://issues.apache.org/jira/browse/YARN-3766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch
>
>
> The ATS web UI breaks because of the following changes made in YARN-3467.
> {code}
> +++ 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
> @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
>.append(", 'mRender': renderHadoopDate }")
>.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':");
>  if (isFairSchedulerPage) {
> -  sb.append("[11]");
> +  sb.append("[13]");
>  } else if (isResourceManager) {
> -  sb.append("[10]");
> +  sb.append("[12]");
>  } else {
>sb.append("[9]");
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3724) Native compilation on Solaris fails on Yarn due to use of FTS

2015-06-05 Thread Alan Burlison (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison reassigned YARN-3724:
---

Assignee: Alan Burlison

> Native compilation on Solaris fails on Yarn due to use of FTS
> -
>
> Key: YARN-3724
> URL: https://issues.apache.org/jira/browse/YARN-3724
> Project: Hadoop YARN
>  Issue Type: Sub-task
> Environment: Solaris 11.2
>Reporter: Malcolm Kavalsky
>Assignee: Alan Burlison
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Compiling the Yarn Node Manager results in "fts" not found. On Solaris we 
> have an alternative ftw with similar functionality.
> This is isolated to a single file container-executor.c
> Note that this will just fix the compilation error. A more serious issue is 
> that Solaris does not support cgroups as Linux does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574620#comment-14574620
 ] 

Hudson commented on YARN-3764:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2165/])
YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to 
another. Contributed by Wangda Tan (jianhe: rev 
6ad4e59cfc111a92747fdb1fb99cc6378044832a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java


> CapacityScheduler should forbid moving LeafQueue from one parent to another
> ---
>
> Key: YARN-3764
> URL: https://issues.apache.org/jira/browse/YARN-3764
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: YARN-3764.1.patch
>
>
> Currently CapacityScheduler doesn't handle the case well, for example:
> A queue structure:
> {code}
> root
>   |
>   a (100)
> /   \
>x y
>   (50)   (50)
> {code}
> And reinitialize using following structure:
> {code}
>  root
>  /   \ 
> (50)a x (50)
> |
> y
>(100)
> {code}
> The actual queue structure after reinitialize is:
> {code}
>  root
> /\
>a (50) x (50)
>   /  \
>  xy
> (50)  (100)
> {code}
> We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3773) hadoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable

2015-06-05 Thread Alan Burlison (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison updated YARN-3773:

Summary: hadoop-yarn-server-nodemanager's use of Linux /sbin/tc is 
non-portable  (was: adoop-yarn-server-nodemanager's use of Linux /sbin/tc is 
non-portable)

> hadoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable
> --
>
> Key: YARN-3773
> URL: https://issues.apache.org/jira/browse/YARN-3773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
> Environment: BSD OSX Solaris Windows Linux
>Reporter: Alan Burlison
>
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
>  makes use of the Linux-only executable /sbin/tc 
> (http://lartc.org/manpages/tc.txt)  but there is no corresponding 
> functionality for non-Linux platforms. The code in question also seems to try 
> to execute tc even on platforms where it will never exist.
> Other platforms provide similar functionality, e.g. Solaris has an extensive 
> range of network management features 
> (http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-095-s11-app-traffic-525038.html).
>  Work is needed to abstract the network management features of Yarn so that 
> the same facilities for network management can be provided on all platforms 
> that provide the requisite functionality,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3773) adoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable

2015-06-05 Thread Alan Burlison (JIRA)
Alan Burlison created YARN-3773:
---

 Summary: adoop-yarn-server-nodemanager's use of Linux /sbin/tc is 
non-portable
 Key: YARN-3773
 URL: https://issues.apache.org/jira/browse/YARN-3773
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: BSD OSX Solaris Windows Linux
Reporter: Alan Burlison


hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
 makes use of the Linux-only executable /sbin/tc 
(http://lartc.org/manpages/tc.txt)  but there is no corresponding functionality 
for non-Linux platforms. The code in question also seems to try to execute tc 
even on platforms where it will never exist.

Other platforms provide similar functionality, e.g. Solaris has an extensive 
range of network management features 
(http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-095-s11-app-traffic-525038.html).
 Work is needed to abstract the network management features of Yarn so that the 
same facilities for network management can be provided on all platforms that 
provide the requisite functionality,




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574624#comment-14574624
 ] 

Hudson commented on YARN-3766:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2165/])
YARN-3766. Fixed the apps table column error of generic history web UI. 
Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java


> ATS Web UI breaks because of YARN-3467
> --
>
> Key: YARN-3766
> URL: https://issues.apache.org/jira/browse/YARN-3766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch
>
>
> The ATS web UI breaks because of the following changes made in YARN-3467.
> {code}
> +++ 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
> @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
>.append(", 'mRender': renderHadoopDate }")
>.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':");
>  if (isFairSchedulerPage) {
> -  sb.append("[11]");
> +  sb.append("[13]");
>  } else if (isResourceManager) {
> -  sb.append("[10]");
> +  sb.append("[12]");
>  } else {
>sb.append("[9]");
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574627#comment-14574627
 ] 

Hudson commented on YARN-2392:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2165/])
YARN-2392. Add more diags about app retry limits on AM failures. Contributed by 
Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


> add more diags about app retry limits on AM failures
> 
>
> Key: YARN-2392
> URL: https://issues.apache.org/jira/browse/YARN-2392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
> YARN-2392-002.patch
>
>
> # when an app fails the failure count is shown, but not what the global + 
> local limits are. If the two are different, they should both be printed. 
> # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574630#comment-14574630
 ] 

Hudson commented on YARN-3733:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2165/])
YARN-3733. Fix DominantRC#compare() does not work as expected if cluster 
resource is empty. (Rohith Sharmaks via wangda) (wangda: rev 
ebd797c48fe236b404cf3a125ac9d1f7714e291e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
Add missing test file of YARN-3733 (wangda: rev 
405bbcf68c32d8fd8a83e46e686eacd14e5a533c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java


> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574600#comment-14574600
 ] 

Hudson commented on YARN-3766:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/])
YARN-3766. Fixed the apps table column error of generic history web UI. 
Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java


> ATS Web UI breaks because of YARN-3467
> --
>
> Key: YARN-3766
> URL: https://issues.apache.org/jira/browse/YARN-3766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch
>
>
> The ATS web UI breaks because of the following changes made in YARN-3467.
> {code}
> +++ 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
> @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
>.append(", 'mRender': renderHadoopDate }")
>.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':");
>  if (isFairSchedulerPage) {
> -  sb.append("[11]");
> +  sb.append("[13]");
>  } else if (isResourceManager) {
> -  sb.append("[10]");
> +  sb.append("[12]");
>  } else {
>sb.append("[9]");
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574596#comment-14574596
 ] 

Hudson commented on YARN-3764:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/])
YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to 
another. Contributed by Wangda Tan (jianhe: rev 
6ad4e59cfc111a92747fdb1fb99cc6378044832a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> CapacityScheduler should forbid moving LeafQueue from one parent to another
> ---
>
> Key: YARN-3764
> URL: https://issues.apache.org/jira/browse/YARN-3764
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: YARN-3764.1.patch
>
>
> Currently CapacityScheduler doesn't handle the case well, for example:
> A queue structure:
> {code}
> root
>   |
>   a (100)
> /   \
>x y
>   (50)   (50)
> {code}
> And reinitialize using following structure:
> {code}
>  root
>  /   \ 
> (50)a x (50)
> |
> y
>(100)
> {code}
> The actual queue structure after reinitialize is:
> {code}
>  root
> /\
>a (50) x (50)
>   /  \
>  xy
> (50)  (100)
> {code}
> We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574604#comment-14574604
 ] 

Hudson commented on YARN-2392:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/])
YARN-2392. Add more diags about app retry limits on AM failures. Contributed by 
Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


> add more diags about app retry limits on AM failures
> 
>
> Key: YARN-2392
> URL: https://issues.apache.org/jira/browse/YARN-2392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
> YARN-2392-002.patch
>
>
> # when an app fails the failure count is shown, but not what the global + 
> local limits are. If the two are different, they should both be printed. 
> # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574602#comment-14574602
 ] 

Hudson commented on YARN-41:


FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yar

[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574607#comment-14574607
 ] 

Hudson commented on YARN-3733:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #208 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/208/])
YARN-3733. Fix DominantRC#compare() does not work as expected if cluster 
resource is empty. (Rohith Sharmaks via wangda) (wangda: rev 
ebd797c48fe236b404cf3a125ac9d1f7714e291e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
Add missing test file of YARN-3733 (wangda: rev 
405bbcf68c32d8fd8a83e46e686eacd14e5a533c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java


> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574574#comment-14574574
 ] 

Hudson commented on YARN-3733:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/])
YARN-3733. Fix DominantRC#compare() does not work as expected if cluster 
resource is empty. (Rohith Sharmaks via wangda) (wangda: rev 
ebd797c48fe236b404cf3a125ac9d1f7714e291e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
Add missing test file of YARN-3733 (wangda: rev 
405bbcf68c32d8fd8a83e46e686eacd14e5a533c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java


> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574567#comment-14574567
 ] 

Hudson commented on YARN-3766:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/])
YARN-3766. Fixed the apps table column error of generic history web UI. 
Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java


> ATS Web UI breaks because of YARN-3467
> --
>
> Key: YARN-3766
> URL: https://issues.apache.org/jira/browse/YARN-3766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch
>
>
> The ATS web UI breaks because of the following changes made in YARN-3467.
> {code}
> +++ 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
> @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
>.append(", 'mRender': renderHadoopDate }")
>.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':");
>  if (isFairSchedulerPage) {
> -  sb.append("[11]");
> +  sb.append("[13]");
>  } else if (isResourceManager) {
> -  sb.append("[10]");
> +  sb.append("[12]");
>  } else {
>sb.append("[9]");
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574569#comment-14574569
 ] 

Hudson commented on YARN-41:


SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYSCRPCFactories.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn

[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574571#comment-14574571
 ] 

Hudson commented on YARN-2392:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/])
YARN-2392. Add more diags about app retry limits on AM failures. Contributed by 
Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt


> add more diags about app retry limits on AM failures
> 
>
> Key: YARN-2392
> URL: https://issues.apache.org/jira/browse/YARN-2392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
> YARN-2392-002.patch
>
>
> # when an app fails the failure count is shown, but not what the global + 
> local limits are. If the two are different, they should both be printed. 
> # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574563#comment-14574563
 ] 

Hudson commented on YARN-3764:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2147 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2147/])
YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to 
another. Contributed by Wangda Tan (jianhe: rev 
6ad4e59cfc111a92747fdb1fb99cc6378044832a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt


> CapacityScheduler should forbid moving LeafQueue from one parent to another
> ---
>
> Key: YARN-3764
> URL: https://issues.apache.org/jira/browse/YARN-3764
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: YARN-3764.1.patch
>
>
> Currently CapacityScheduler doesn't handle the case well, for example:
> A queue structure:
> {code}
> root
>   |
>   a (100)
> /   \
>x y
>   (50)   (50)
> {code}
> And reinitialize using following structure:
> {code}
>  root
>  /   \ 
> (50)a x (50)
> |
> y
>(100)
> {code}
> The actual queue structure after reinitialize is:
> {code}
>  root
> /\
>a (50) x (50)
>   /  \
>  xy
> (50)  (100)
> {code}
> We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3772) Let AMs to change the name of the application upon RM registration

2015-06-05 Thread JIRA
Zoltán Zvara created YARN-3772:
--

 Summary: Let AMs to change the name of the application upon RM 
registration
 Key: YARN-3772
 URL: https://issues.apache.org/jira/browse/YARN-3772
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Zoltán Zvara


Many applications like to set their name in their own way with their own API 
internally, but also want to display that name on YARN. Therefore it is not 
always possible to know the name of the application during submission. YARN 
should let AMs to change their name on - at least - the first time they 
register with the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval

2015-06-05 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3259:

Attachment: YARN-3259.003.patch

Addressed feedback

> FairScheduler: Update to fairShare could be triggered early on node events 
> instead of waiting for update interval 
> --
>
> Key: YARN-3259
> URL: https://issues.apache.org/jira/browse/YARN-3259
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3259.001.patch, YARN-3259.002.patch, 
> YARN-3259.003.patch
>
>
> Instead of waiting for update interval unconditionally, we can trigger early 
> updates on important events - for eg node join and leave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3770) SerializedException should also handle java.lang.Error

2015-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574439#comment-14574439
 ] 

Hadoop QA commented on YARN-3770:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  1s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 22s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 52s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| | |  39m 58s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737919/YARN-3770.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 790a861 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8197/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8197/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8197/console |


This message was automatically generated.

> SerializedException should also handle java.lang.Error 
> ---
>
> Key: YARN-3770
> URL: https://issues.apache.org/jira/browse/YARN-3770
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3770.patch
>
>
> IN SerializedExceptionPBImpl deserialize() method
> {code}
> Class classType = null;
> if (YarnException.class.isAssignableFrom(realClass)) {
>   classType = YarnException.class;
> } else if (IOException.class.isAssignableFrom(realClass)) {
>   classType = IOException.class;
> } else if (RuntimeException.class.isAssignableFrom(realClass)) {
>   classType = RuntimeException.class;
> } else {
>   classType = Exception.class;
> }
> return instantiateException(realClass.asSubclass(classType), getMessage(),
>   cause == null ? null : cause.deSerialize());
>   }
> {code}
> if realClass is a subclass of java.lang.Error deSerialize() throws 
> ClassCastException.
> in the last else statement classType should be equal to Trowable.class 
> instead of Exception.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-05 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574421#comment-14574421
 ] 

Lavkesh Lahngir commented on YARN-3745:
---

[~zxu] Sorry my bad. It *must* throw ClassNotFoundException because there was 
no call to pb.init(cause); 

> SerializedException should also try to instantiate internal exception with 
> the default constructor
> --
>
> Key: YARN-3745
> URL: https://issues.apache.org/jira/browse/YARN-3745
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.patch
>
>
> While deserialising a SerializedException it tries to create internal 
> exception in instantiateException() with cn = 
> cls.getConstructor(String.class).
> if cls does not has a constructor with String parameter it throws 
> Nosuchmethodexception
> for example ClosedChannelException class.  
> We should also try to instantiate exception with default constructor so that 
> inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574402#comment-14574402
 ] 

Hadoop QA commented on YARN-3745:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 30s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 10s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  50m 48s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737915/YARN-3745.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 790a861 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8196/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8196/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8196/console |


This message was automatically generated.

> SerializedException should also try to instantiate internal exception with 
> the default constructor
> --
>
> Key: YARN-3745
> URL: https://issues.apache.org/jira/browse/YARN-3745
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.patch
>
>
> While deserialising a SerializedException it tries to create internal 
> exception in instantiateException() with cn = 
> cls.getConstructor(String.class).
> if cls does not has a constructor with String parameter it throws 
> Nosuchmethodexception
> for example ClosedChannelException class.  
> We should also try to instantiate exception with default constructor so that 
> inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-06-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3771:

Attachment: 0001-YARN-3771.patch

Attached the patch. Please review

> "final" behavior is not honored for 
> YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
> 
>
> Key: YARN-3771
> URL: https://issues.apache.org/jira/browse/YARN-3771
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3771.patch
>
>
> i was going through some find bugs rules. One issue reported in that is 
>  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
> and 
>   public static final String[] 
> DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
> is not honoring the final qualifier. The string array contents can be re 
> assigned !
> Simple test
> {code}
> public class TestClass {
>   static final String[] t = { "1", "2" };
>   public static void main(String[] args) {
> System.out.println(12 < 10);
> String[] t1={"u"};
> //t = t1; // this will show compilation  error
> t (1) = t1 (1) ; // But this works
>   }
> }
> {code}
> One option is to use Collections.unmodifiableList
> any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working as expected in FairScheduler

2015-06-05 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574393#comment-14574393
 ] 

Rohith commented on YARN-3758:
--

All these confusion should be solved probably after YARN-2986. This issue can 
be raised there whether they will be handling it.

> The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not 
> working as expected in FairScheduler
> 
>
> Key: YARN-3758
> URL: https://issues.apache.org/jira/browse/YARN-3758
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: skrho
>
> Hello there~~
> I have 2 clusters
> First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G 
> Physical memory each node
> Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G 
> Physical memory each node
> Wherever a mapreduce job is running, I want resourcemanager is to set the 
> minimum memory  256m to container
> So I was changing configuration in yarn-site.xml & mapred-site.xml
> yarn.scheduler.minimum-allocation-mb : 256
> mapreduce.map.java.opts : -Xms256m 
> mapreduce.reduce.java.opts : -Xms256m 
> mapreduce.map.memory.mb : 256 
> mapreduce.reduce.memory.mb : 256 
> In First cluster  whenever a mapreduce job is running , I can see used memory 
> 256m in web console( http://installedIP:8088/cluster/nodes )
> But In Second cluster whenever a mapreduce job is running , I can see used 
> memory 1024m in web console( http://installedIP:8088/cluster/nodes ) 
> I know default memory value is 1024m, so if there is not changing memory 
> setting, the default value is working.
> I have been testing for two weeks, but I don't know why mimimum memory 
> setting is not working in second cluster
> Why this difference is happened? 
> Am I wrong setting configuration?
> or Is there bug?
> Thank you for reading~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574386#comment-14574386
 ] 

Hudson commented on YARN-41:


SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/949/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYSCRPCFactories.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apa

[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574391#comment-14574391
 ] 

Hudson commented on YARN-3733:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/949/])
YARN-3733. Fix DominantRC#compare() does not work as expected if cluster 
resource is empty. (Rohith Sharmaks via wangda) (wangda: rev 
ebd797c48fe236b404cf3a125ac9d1f7714e291e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
Add missing test file of YARN-3733 (wangda: rev 
405bbcf68c32d8fd8a83e46e686eacd14e5a533c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java


> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574381#comment-14574381
 ] 

Hudson commented on YARN-3764:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/949/])
YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to 
another. Contributed by Wangda Tan (jianhe: rev 
6ad4e59cfc111a92747fdb1fb99cc6378044832a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> CapacityScheduler should forbid moving LeafQueue from one parent to another
> ---
>
> Key: YARN-3764
> URL: https://issues.apache.org/jira/browse/YARN-3764
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: YARN-3764.1.patch
>
>
> Currently CapacityScheduler doesn't handle the case well, for example:
> A queue structure:
> {code}
> root
>   |
>   a (100)
> /   \
>x y
>   (50)   (50)
> {code}
> And reinitialize using following structure:
> {code}
>  root
>  /   \ 
> (50)a x (50)
> |
> y
>(100)
> {code}
> The actual queue structure after reinitialize is:
> {code}
>  root
> /\
>a (50) x (50)
>   /  \
>  xy
> (50)  (100)
> {code}
> We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574388#comment-14574388
 ] 

Hudson commented on YARN-2392:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/949/])
YARN-2392. Add more diags about app retry limits on AM failures. Contributed by 
Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> add more diags about app retry limits on AM failures
> 
>
> Key: YARN-2392
> URL: https://issues.apache.org/jira/browse/YARN-2392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
> YARN-2392-002.patch
>
>
> # when an app fails the failure count is shown, but not what the global + 
> local limits are. If the two are different, they should both be printed. 
> # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574385#comment-14574385
 ] 

Hudson commented on YARN-3766:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #949 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/949/])
YARN-3766. Fixed the apps table column error of generic history web UI. 
Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java


> ATS Web UI breaks because of YARN-3467
> --
>
> Key: YARN-3766
> URL: https://issues.apache.org/jira/browse/YARN-3766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch
>
>
> The ATS web UI breaks because of the following changes made in YARN-3467.
> {code}
> +++ 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
> @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
>.append(", 'mRender': renderHadoopDate }")
>.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':");
>  if (isFairSchedulerPage) {
> -  sb.append("[11]");
> +  sb.append("[13]");
>  } else if (isResourceManager) {
> -  sb.append("[10]");
> +  sb.append("[12]");
>  } else {
>sb.append("[9]");
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-06-05 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel moved HDFS-8526 to YARN-3771:
---

Key: YARN-3771  (was: HDFS-8526)
Project: Hadoop YARN  (was: Hadoop HDFS)

> "final" behavior is not honored for 
> YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
> 
>
> Key: YARN-3771
> URL: https://issues.apache.org/jira/browse/YARN-3771
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>
> i was going through some find bugs rules. One issue reported in that is 
>  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
> and 
>   public static final String[] 
> DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
> is not honoring the final qualifier. The string array contents can be re 
> assigned !
> Simple test
> {code}
> public class TestClass {
>   static final String[] t = { "1", "2" };
>   public static void main(String[] args) {
> System.out.println(12 < 10);
> String[] t1={"u"};
> //t = t1; // this will show compilation  error
> t (1) = t1 (1) ; // But this works
>   }
> }
> {code}
> One option is to use Collections.unmodifiableList
> any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3770) SerializedException should also handle java.lang.Error

2015-06-05 Thread Lavkesh Lahngir (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-3770:
--
Attachment: YARN-3770.patch

> SerializedException should also handle java.lang.Error 
> ---
>
> Key: YARN-3770
> URL: https://issues.apache.org/jira/browse/YARN-3770
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3770.patch
>
>
> IN SerializedExceptionPBImpl deserialize() method
> {code}
> Class classType = null;
> if (YarnException.class.isAssignableFrom(realClass)) {
>   classType = YarnException.class;
> } else if (IOException.class.isAssignableFrom(realClass)) {
>   classType = IOException.class;
> } else if (RuntimeException.class.isAssignableFrom(realClass)) {
>   classType = RuntimeException.class;
> } else {
>   classType = Exception.class;
> }
> return instantiateException(realClass.asSubclass(classType), getMessage(),
>   cause == null ? null : cause.deSerialize());
>   }
> {code}
> if realClass is a subclass of java.lang.Error deSerialize() throws 
> ClassCastException.
> in the last else statement classType should be equal to Trowable.class 
> instead of Exception.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574350#comment-14574350
 ] 

Hudson commented on YARN-2392:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/])
YARN-2392. Add more diags about app retry limits on AM failures. Contributed by 
Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> add more diags about app retry limits on AM failures
> 
>
> Key: YARN-2392
> URL: https://issues.apache.org/jira/browse/YARN-2392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
> YARN-2392-002.patch
>
>
> # when an app fails the failure count is shown, but not what the global + 
> local limits are. If the two are different, they should both be printed. 
> # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574348#comment-14574348
 ] 

Hudson commented on YARN-41:


FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYSCRPCFactories.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/s

[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574353#comment-14574353
 ] 

Hudson commented on YARN-3733:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/])
YARN-3733. Fix DominantRC#compare() does not work as expected if cluster 
resource is empty. (Rohith Sharmaks via wangda) (wangda: rev 
ebd797c48fe236b404cf3a125ac9d1f7714e291e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
Add missing test file of YARN-3733 (wangda: rev 
405bbcf68c32d8fd8a83e46e686eacd14e5a533c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java


> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574343#comment-14574343
 ] 

Hudson commented on YARN-3764:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/])
YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to 
another. Contributed by Wangda Tan (jianhe: rev 
6ad4e59cfc111a92747fdb1fb99cc6378044832a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java


> CapacityScheduler should forbid moving LeafQueue from one parent to another
> ---
>
> Key: YARN-3764
> URL: https://issues.apache.org/jira/browse/YARN-3764
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: YARN-3764.1.patch
>
>
> Currently CapacityScheduler doesn't handle the case well, for example:
> A queue structure:
> {code}
> root
>   |
>   a (100)
> /   \
>x y
>   (50)   (50)
> {code}
> And reinitialize using following structure:
> {code}
>  root
>  /   \ 
> (50)a x (50)
> |
> y
>(100)
> {code}
> The actual queue structure after reinitialize is:
> {code}
>  root
> /\
>a (50) x (50)
>   /  \
>  xy
> (50)  (100)
> {code}
> We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574347#comment-14574347
 ] 

Hudson commented on YARN-3766:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #219 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/219/])
YARN-3766. Fixed the apps table column error of generic history web UI. 
Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java


> ATS Web UI breaks because of YARN-3467
> --
>
> Key: YARN-3766
> URL: https://issues.apache.org/jira/browse/YARN-3766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch
>
>
> The ATS web UI breaks because of the following changes made in YARN-3467.
> {code}
> +++ 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
> @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
>.append(", 'mRender': renderHadoopDate }")
>.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':");
>  if (isFairSchedulerPage) {
> -  sb.append("[11]");
> +  sb.append("[13]");
>  } else if (isResourceManager) {
> -  sb.append("[10]");
> +  sb.append("[12]");
>  } else {
>sb.append("[9]");
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-05 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574332#comment-14574332
 ] 

Lavkesh Lahngir commented on YARN-3745:
---

deSerialize() method throws class ClassNotFoundException which is wrapped in 
YarnRuntimeException if there would be some class loading issues. (Other tests 
also has it)
No other exception should be thrown for the test to be passed.

> SerializedException should also try to instantiate internal exception with 
> the default constructor
> --
>
> Key: YARN-3745
> URL: https://issues.apache.org/jira/browse/YARN-3745
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.patch
>
>
> While deserialising a SerializedException it tries to create internal 
> exception in instantiateException() with cn = 
> cls.getConstructor(String.class).
> if cls does not has a constructor with String parameter it throws 
> Nosuchmethodexception
> for example ClosedChannelException class.  
> We should also try to instantiate exception with default constructor so that 
> inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-05 Thread Lavkesh Lahngir (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-3745:
--
Attachment: YARN-3745.2.patch

> SerializedException should also try to instantiate internal exception with 
> the default constructor
> --
>
> Key: YARN-3745
> URL: https://issues.apache.org/jira/browse/YARN-3745
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3745.1.patch, YARN-3745.2.patch, YARN-3745.patch
>
>
> While deserialising a SerializedException it tries to create internal 
> exception in instantiateException() with cn = 
> cls.getConstructor(String.class).
> if cls does not has a constructor with String parameter it throws 
> Nosuchmethodexception
> for example ClosedChannelException class.  
> We should also try to instantiate exception with default constructor so that 
> inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-05 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574308#comment-14574308
 ] 

Lavkesh Lahngir commented on YARN-3745:
---

sorry, typo: we don't need to declare it to be thrown. 

> SerializedException should also try to instantiate internal exception with 
> the default constructor
> --
>
> Key: YARN-3745
> URL: https://issues.apache.org/jira/browse/YARN-3745
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3745.1.patch, YARN-3745.patch
>
>
> While deserialising a SerializedException it tries to create internal 
> exception in instantiateException() with cn = 
> cls.getConstructor(String.class).
> if cls does not has a constructor with String parameter it throws 
> Nosuchmethodexception
> for example ClosedChannelException class.  
> We should also try to instantiate exception with default constructor so that 
> inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-05 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574306#comment-14574306
 ] 

Lavkesh Lahngir commented on YARN-3745:
---

Uh. Yes you are right cls.getConstructor() throws SecurityException, but we 
don't need to declared it to be thrown.
We need to only capture NoSuchMethodException.

> SerializedException should also try to instantiate internal exception with 
> the default constructor
> --
>
> Key: YARN-3745
> URL: https://issues.apache.org/jira/browse/YARN-3745
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3745.1.patch, YARN-3745.patch
>
>
> While deserialising a SerializedException it tries to create internal 
> exception in instantiateException() with cn = 
> cls.getConstructor(String.class).
> if cls does not has a constructor with String parameter it throws 
> Nosuchmethodexception
> for example ClosedChannelException class.  
> We should also try to instantiate exception with default constructor so that 
> inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3754) Race condition when the NodeManager is shutting down and container is launched

2015-06-05 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574305#comment-14574305
 ] 

Bibin A Chundatt commented on YARN-3754:


[~rohithsharma] and [~sunilg] Have tried with build containing YARN-3585 and 
YARN-3641.
org.iq80.leveldb.DBException: Closed. exception i am not able to reproduce . 



> Race condition when the NodeManager is shutting down and container is launched
> --
>
> Key: YARN-3754
> URL: https://issues.apache.org/jira/browse/YARN-3754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Sunil G
>Priority: Critical
> Attachments: NM.log
>
>
> Container is launched and returned to ContainerImpl
> NodeManager closed the DB connection which resulting in 
> {{org.iq80.leveldb.DBException: Closed}}. 
> *Attaching the exception trace*
> {code}
> 2015-05-30 02:11:49,122 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Unable to update state store diagnostics for 
> container_e310_1432817693365_3338_01_02
> java.io.IOException: org.iq80.leveldb.DBException: Closed
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.iq80.leveldb.DBException: Closed
> at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123)
> at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259)
> ... 15 more
> {code}
> we can add a check whether DB is closed while we move container from ACQUIRED 
> state.
> As per the discussion in YARN-3585 have add the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2015-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574263#comment-14574263
 ] 

Hadoop QA commented on YARN-2674:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 54s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 51s | The applied patch generated  1 
new checkstyle issues (total was 47, now 47). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 42s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   8m  8s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| {color:green}+1{color} | yarn tests |   6m  2s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |   1m 51s | Tests passed in 
hadoop-yarn-server-tests. |
| | |  56m 48s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737886/YARN-2674.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b2540f4 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8195/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8195/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8195/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8195/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8195/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8195/console |


This message was automatically generated.

> Distributed shell AM may re-launch containers if RM work preserving restart 
> happens
> ---
>
> Key: YARN-2674
> URL: https://issues.apache.org/jira/browse/YARN-2674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chun Chen
>Assignee: Chun Chen
> Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, 
> YARN-2674.4.patch, YARN-2674.5.patch
>
>
> Currently, if RM work preserving restart happens while distributed shell is 
> running, distribute shell AM may re-launch all the containers, including 
> new/running/complete. We must make sure it won't re-launch the 
> running/complete containers.
> We need to remove allocated containers from 
> AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3770) SerializedException should also handle java.lang.Error

2015-06-05 Thread Lavkesh Lahngir (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-3770:
--
Description: 
IN SerializedExceptionPBImpl deserialize() method
{code}
Class classType = null;
if (YarnException.class.isAssignableFrom(realClass)) {
  classType = YarnException.class;
} else if (IOException.class.isAssignableFrom(realClass)) {
  classType = IOException.class;
} else if (RuntimeException.class.isAssignableFrom(realClass)) {
  classType = RuntimeException.class;
} else {
  classType = Exception.class;
}
return instantiateException(realClass.asSubclass(classType), getMessage(),
  cause == null ? null : cause.deSerialize());
  }
{code}
if realClass is a subclass of java.lang.Error deSerialize() throws 
ClassCastException.
in the last else statement classType should be equal to Trowable.class instead 
of Exception.class.

  was:
IN SerializedExceptionPBImpl
{code}
Class classType = null;
if (YarnException.class.isAssignableFrom(realClass)) {
  classType = YarnException.class;
} else if (IOException.class.isAssignableFrom(realClass)) {
  classType = IOException.class;
} else if (RuntimeException.class.isAssignableFrom(realClass)) {
  classType = RuntimeException.class;
} else {
  classType = Exception.class;
}
return instantiateException(realClass.asSubclass(classType), getMessage(),
  cause == null ? null : cause.deSerialize());
  }
{code}
if realClass is a subclass of java.lang.Error deSerialize() throws 
ClassCastException.
in the last else statement classType should be equal to Trowable.class instead 
of Exception.class.


> SerializedException should also handle java.lang.Error 
> ---
>
> Key: YARN-3770
> URL: https://issues.apache.org/jira/browse/YARN-3770
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> IN SerializedExceptionPBImpl deserialize() method
> {code}
> Class classType = null;
> if (YarnException.class.isAssignableFrom(realClass)) {
>   classType = YarnException.class;
> } else if (IOException.class.isAssignableFrom(realClass)) {
>   classType = IOException.class;
> } else if (RuntimeException.class.isAssignableFrom(realClass)) {
>   classType = RuntimeException.class;
> } else {
>   classType = Exception.class;
> }
> return instantiateException(realClass.asSubclass(classType), getMessage(),
>   cause == null ? null : cause.deSerialize());
>   }
> {code}
> if realClass is a subclass of java.lang.Error deSerialize() throws 
> ClassCastException.
> in the last else statement classType should be equal to Trowable.class 
> instead of Exception.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-05 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574228#comment-14574228
 ] 

Rohith commented on YARN-3017:
--

bq. Could you give a little more detail about the possibility to break the 
rolling upgrade?
I was thinking that does it cause any issue while parsing the containerId after 
upgrade. Say, current container id format is 
container_1430441527236_0001_01_01 which is running in the NM-1, after 
upgrade container-id format changes container_1430441527236_0001_01_01. 
But NM reports running containers as container_1430441527236_0001_01_01. 

> ContainerID in ResourceManager Log Has Slightly Different Format From 
> AppAttemptID
> --
>
> Key: YARN-3017
> URL: https://issues.apache.org/jira/browse/YARN-3017
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: MUFEED USMAN
>Priority: Minor
>  Labels: PatchAvailable
> Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch
>
>
> Not sure if this should be filed as a bug or not.
> In the ResourceManager log in the events surrounding the creation of a new
> application attempt,
> ...
> ...
> 2014-11-14 17:45:37,258 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
> masterappattempt_1412150883650_0001_02
> ...
> ...
> The application attempt has the ID format "_1412150883650_0001_02".
> Whereas the associated ContainerID goes by "_1412150883650_0001_02_".
> ...
> ...
> 2014-11-14 17:45:37,260 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
> up
> container Container: [ContainerId: container_1412150883650_0001_02_01,
> NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource:  vCores:1,
> disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
> ...
> ...
> Curious to know if this is kept like that for a reason. If not while using
> filtering tools to, say, grep events surrounding a specific attempt by the
> numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >