[jira] [Commented] (HADOOP-11890) Uber-JIRA: Hadoop should support IPv6

2022-06-17 Thread Sangjin Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555707#comment-17555707
 ] 

Sangjin Lee commented on HADOOP-11890:
--

Out of curiosity, where do we stand today? It seems like the last activity was 
about a year ago.

> Uber-JIRA: Hadoop should support IPv6
> -
>
> Key: HADOOP-11890
> URL: https://issues.apache.org/jira/browse/HADOOP-11890
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Reporter: Nate Edel
>Assignee: Nate Edel
>Priority: Major
>  Labels: ipv6
> Attachments: hadoop_2.7.3_ipv6_commits.txt
>
>
> Hadoop currently treats IPv6 as unsupported.  Track related smaller issues to 
> support IPv6.
> (Current case here is mainly HBase on HDFS, so any suggestions about other 
> test cases/workload are really appreciated.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17728) Fix issue of the StatisticsDataReferenceCleaner cleanUp

2021-09-18 Thread Sangjin Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417206#comment-17417206
 ] 

Sangjin Lee commented on HADOOP-17728:
--

Rather than discussing what to change, I would respectfully ask the first 
question that needs to be answered.

*What is a real problem/issue that needs to be fixed?* What is the "anomaly" 
that needs to be fixed? IMO, I haven't heard a real problem yet.

So that we are on the same page, let me state what I believe is *NOT* a 
problem. The cleaner thread being in the blocked state is *not* a problem.
 * the cleaner thread will be blocked until a GC reference is enqueued, and 
that is by design; that's how the reference queue works
 * the cleaner thread will wake up whenever the data references get garbage 
collected; enqueueing is done by the JVM, so the fact that there is no Hadoop 
code that enqueues on the queue is not a problem
 * it is *NOT* true that the cleaner thread does not respond to interruption; 
please see the code for StatisticsDataReferenceCleaner.run()

I'd like to know if there is any *real-world* problem associated with the 
cleaner thread and its operations. For example, does it cause unexpected 
exceptions? Does it cause unexpected CPU spikes? Does it cause unexpected 
memory increase? Does it prevent the JVM from exiting when it needs to exit? 
Does it cause a deadlock? Let's understand first if there is a real-world 
problem. And let's show evidence for that real-world problem. I hope it makes 
sense. Thanks.

> Fix issue of the StatisticsDataReferenceCleaner cleanUp
> ---
>
> Key: HADOOP-17728
> URL: https://issues.apache.org/jira/browse/HADOOP-17728
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.2.1
>Reporter: yikf
>Assignee: yikf
>Priority: Minor
>  Labels: pull-request-available, reverted
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Cleaner thread will be blocked if we remove reference from ReferenceQueue 
> unless the `queue.enqueue` called.
> 
>     As shown below, We call ReferenceQueue.remove() now while cleanUp, Call 
> chain as follow:
>                          *StatisticsDataReferenceCleaner#queue.remove()  ->  
> ReferenceQueue.remove(0)  -> lock.wait(0)*
>     But, lock.notifyAll is called when queue.enqueue only, so Cleaner thread 
> will be blocked.
>  
> ThreadDump:
> {code:java}
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7f7afc088800 
> nid=0x2119 in Object.wait() [0x7f7b0023]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:502)
> at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> - locked <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at 
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17728) Fix issue of the StatisticsDataReferenceCleaner cleanUp

2021-09-17 Thread Sangjin Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved HADOOP-17728.
--
Resolution: Invalid

> Fix issue of the StatisticsDataReferenceCleaner cleanUp
> ---
>
> Key: HADOOP-17728
> URL: https://issues.apache.org/jira/browse/HADOOP-17728
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.2.1
>Reporter: yikf
>Assignee: yikf
>Priority: Minor
>  Labels: pull-request-available, reverted
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Cleaner thread will be blocked if we remove reference from ReferenceQueue 
> unless the `queue.enqueue` called.
> 
>     As shown below, We call ReferenceQueue.remove() now while cleanUp, Call 
> chain as follow:
>                          *StatisticsDataReferenceCleaner#queue.remove()  ->  
> ReferenceQueue.remove(0)  -> lock.wait(0)*
>     But, lock.notifyAll is called when queue.enqueue only, so Cleaner thread 
> will be blocked.
>  
> ThreadDump:
> {code:java}
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7f7afc088800 
> nid=0x2119 in Object.wait() [0x7f7b0023]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:502)
> at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> - locked <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at 
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17728) Fix issue of the StatisticsDataReferenceCleaner cleanUp

2021-09-17 Thread Sangjin Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416774#comment-17416774
 ] 

Sangjin Lee commented on HADOOP-17728:
--

It's still not clear to me what *real* problem is being pointed out. The fact 
that you see the stack trace with that thread in the waiting state is *not* a 
problem. In fact, I would expect to see that in almost all cases as this 
background thread gets busy only when the statistics data gets garbage 
collected. That is perfectly normal. So, if we're saying the presence of this 
stack trace is a problem, I can say it definitely is not.

On an earlier claim that this is not interruptible, it is most definitely 
interruptible. ReferenceQueue.remove() is interruptible and throws 
InterruptedException on interruption. The outer loop for 
StatisticsDataReferenceCleaner.run() clearly checks if the thread was 
interrupted and if so exits from the while loop. Please check it out.

Lastly, this is a background thread whose sole job is to clean up references 
upon garbage collection. This has no other interaction with any other thread or 
operations that may be going on. I'm not sure why that is being discussed as a 
problem.

Unless there is a clear demonstration of a real-life issue (not the stack 
trace), I am inclined to close this as a "not an issue".

> Fix issue of the StatisticsDataReferenceCleaner cleanUp
> ---
>
> Key: HADOOP-17728
> URL: https://issues.apache.org/jira/browse/HADOOP-17728
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.2.1
>Reporter: yikf
>Assignee: yikf
>Priority: Minor
>  Labels: pull-request-available, reverted
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Cleaner thread will be blocked if we remove reference from ReferenceQueue 
> unless the `queue.enqueue` called.
> 
>     As shown below, We call ReferenceQueue.remove() now while cleanUp, Call 
> chain as follow:
>                          *StatisticsDataReferenceCleaner#queue.remove()  ->  
> ReferenceQueue.remove(0)  -> lock.wait(0)*
>     But, lock.notifyAll is called when queue.enqueue only, so Cleaner thread 
> will be blocked.
>  
> ThreadDump:
> {code:java}
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7f7afc088800 
> nid=0x2119 in Object.wait() [0x7f7b0023]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:502)
> at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> - locked <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at 
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-17728) Fix issue of the StatisticsDataReferenceCleaner cleanUp

2021-06-13 Thread Sangjin Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362577#comment-17362577
 ] 

Sangjin Lee edited comment on HADOOP-17728 at 6/13/21, 9:09 PM:


This caught my attention (sorry I'm not very active on the Hadoop codebase 
these days).

I'm not quite sure what the problem statement here is. Is there a real problem 
that can be demonstrated with a reproducible test case? The reference queue 
gets enqueued not by user's explicit code but by the JVM via weak references in 
this case. The GC will enqueue the reference that's being garbage collected 
into the reference queue. That's why there is no code in the Hadoop codebase 
that enqueues objects explicitly to this queue.

The cleaner thread is essentially a daemon thread that needs to run for the 
duration of the runtime to handle this task. If there is no work to be done (no 
relevant threads to garbage collect), that it will sit idle on the queue which 
is correct behavior. If the program needs to exit and there is an interrupt, 
the cleaner thread *does* respond to the interrupt and does an orderly exit 
(see the while loop condition). So I'm still wondering what the real-world 
problem we're observing is.

It might be helpful to jog your memory on HADOOP-12107 and HADOOP-12958 for 
past analyses that went into this implementation.


was (Author: sjlee0):
This caught my attention (sorry I'm not very active on the Hadoop codebase 
these days).

I'm not quite sure what the problem statement here is. Is there a real problem 
that can be demonstrated with a reproducible test case? The reference queue 
gets enqueued not by user's explicit code but by the JVM via weak references in 
this case. The GC will enqueue the reference that's being garbage collected 
into the reference queue. That's why there is no code in the Hadoop codebase 
that enqueues objects explicitly to this queue.

The cleaner thread is essentially a daemon thread that needs to run for the 
duration of the runtime to handle this task. If there is no work to be done (no 
relevant threads to garbage collect), that it will sit idle on the queue which 
is fine. If the program needs to exit and there is an interrupt, the cleaner 
thread *does* respond to the interrupt and does an orderly exit (see the while 
loop condition). So I'm still wondering what real-world problems we're 
observing is.

It might be helpful to jog your memory on HADOOP-12107 and HADOOP-12958 for 
past analyses that went into this implementation.

> Fix issue of the StatisticsDataReferenceCleaner cleanUp
> ---
>
> Key: HADOOP-17728
> URL: https://issues.apache.org/jira/browse/HADOOP-17728
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.2.1
>Reporter: yikf
>Assignee: yikf
>Priority: Minor
>  Labels: pull-request-available, reverted
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Cleaner thread will be blocked if we remove reference from ReferenceQueue 
> unless the `queue.enqueue` called.
> 
>     As shown below, We call ReferenceQueue.remove() now while cleanUp, Call 
> chain as follow:
>                          *StatisticsDataReferenceCleaner#queue.remove()  ->  
> ReferenceQueue.remove(0)  -> lock.wait(0)*
>     But, lock.notifyAll is called when queue.enqueue only, so Cleaner thread 
> will be blocked.
>  
> ThreadDump:
> {code:java}
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7f7afc088800 
> nid=0x2119 in Object.wait() [0x7f7b0023]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:502)
> at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> - locked <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at 
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-17728) Fix issue of the StatisticsDataReferenceCleaner cleanUp

2021-06-13 Thread Sangjin Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362577#comment-17362577
 ] 

Sangjin Lee edited comment on HADOOP-17728 at 6/13/21, 9:06 PM:


This caught my attention (sorry I'm not very active on the Hadoop codebase 
these days).

I'm not quite sure what the problem statement here is. Is there a real problem 
that can be demonstrated with a reproducible test case? The reference queue 
gets enqueued not by user's explicit code but by the JVM via weak references in 
this case. The GC will enqueue the reference that's being garbage collected 
into the reference queue. That's why there is no code in the Hadoop codebase 
that enqueues objects explicitly to this queue.

The cleaner thread is essentially a daemon thread that needs to run for the 
duration of the runtime to handle this task. If there is no work to be done (no 
relevant threads to garbage collect), that it will sit idle on the queue which 
is fine. If the program needs to exit and there is an interrupt, the cleaner 
thread *does* respond to the interrupt and does an orderly exit (see the while 
loop condition). So I'm still wondering what real-world problems we're 
observing is.

It might be helpful to jog your memory on HADOOP-12107 and HADOOP-12958 for 
past analyses that went into this implementation.


was (Author: sjlee0):
This caught my attention (sorry I'm not very active on the Hadoop codebase 
these days).

I'm not quite sure what the problem statement here is. Is there a real problem 
that can be demonstrated with a reproducible test case? The reference queue 
gets enqueued not by user's explicit code but by the JVM via weak references in 
this case. The GC will enqueue the reference that's being garbage collected 
into the reference queue. That's why there is no code in the Hadoop codebase 
that enqueues objects explicitly to this queue.

The cleaner thread is essentially a daemon thread that needs to run for the 
duration of the runtime to handle this task. If there is no work to be done (no 
relevant threads to garbage collect), that it will sit idle on the queue which 
is fine. If the program needs to exit and there is an interrupt, the cleaner 
thread *does* respond to the interrupt and does an orderly exit (see the while 
loop condition). So I'm still wondering what real-world problems we're 
observing.

It might be helpful to jog your memory on HADOOP-12107 and HADOOP-12958 for 
past analyses that went into this.

> Fix issue of the StatisticsDataReferenceCleaner cleanUp
> ---
>
> Key: HADOOP-17728
> URL: https://issues.apache.org/jira/browse/HADOOP-17728
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.2.1
>Reporter: yikf
>Assignee: yikf
>Priority: Minor
>  Labels: pull-request-available, reverted
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Cleaner thread will be blocked if we remove reference from ReferenceQueue 
> unless the `queue.enqueue` called.
> 
>     As shown below, We call ReferenceQueue.remove() now while cleanUp, Call 
> chain as follow:
>                          *StatisticsDataReferenceCleaner#queue.remove()  ->  
> ReferenceQueue.remove(0)  -> lock.wait(0)*
>     But, lock.notifyAll is called when queue.enqueue only, so Cleaner thread 
> will be blocked.
>  
> ThreadDump:
> {code:java}
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7f7afc088800 
> nid=0x2119 in Object.wait() [0x7f7b0023]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:502)
> at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> - locked <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at 
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17728) Fix issue of the StatisticsDataReferenceCleaner cleanUp

2021-06-13 Thread Sangjin Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362577#comment-17362577
 ] 

Sangjin Lee commented on HADOOP-17728:
--

This caught my attention (sorry I'm not very active in the Hadoop codebase 
lately).

I'm not quite sure what the problem statement here is. Is there a real problem 
that can be demonstrated with a reproducible test case? The reference queue 
gets enqueued not by user's explicit code but by the JVM via weak references in 
this case. The GC will enqueue the reference that's being garbage collected 
into the reference queue. That's why there is no code in the Hadoop codebase 
that enqueues objects explicitly to this queue.

The cleaner thread is essentially a daemon thread that needs to run for the 
duration of the runtime to handle this task. If there is no work to be done (no 
relevant threads to garbage collect), that it will sit idle on the queue which 
is fine. If the program needs to exit and there is an interrupt, the cleaner 
thread *does* respond to the interrupt and does an orderly exit (see the while 
loop condition). So I'm still wondering what real-world problems we're 
observing.

It might be helpful to jog your memory on HADOOP-12107 and HADOOP-12958 for 
past analyses that went into this.

> Fix issue of the StatisticsDataReferenceCleaner cleanUp
> ---
>
> Key: HADOOP-17728
> URL: https://issues.apache.org/jira/browse/HADOOP-17728
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.2.1
>Reporter: yikf
>Assignee: yikf
>Priority: Minor
>  Labels: pull-request-available, reverted
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Cleaner thread will be blocked if we remove reference from ReferenceQueue 
> unless the `queue.enqueue` called.
> 
>     As shown below, We call ReferenceQueue.remove() now while cleanUp, Call 
> chain as follow:
>                          *StatisticsDataReferenceCleaner#queue.remove()  ->  
> ReferenceQueue.remove(0)  -> lock.wait(0)*
>     But, lock.notifyAll is called when queue.enqueue only, so Cleaner thread 
> will be blocked.
>  
> ThreadDump:
> {code:java}
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7f7afc088800 
> nid=0x2119 in Object.wait() [0x7f7b0023]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:502)
> at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> - locked <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at 
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-17728) Fix issue of the StatisticsDataReferenceCleaner cleanUp

2021-06-13 Thread Sangjin Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362577#comment-17362577
 ] 

Sangjin Lee edited comment on HADOOP-17728 at 6/13/21, 5:58 PM:


This caught my attention (sorry I'm not very active on the Hadoop codebase 
these days).

I'm not quite sure what the problem statement here is. Is there a real problem 
that can be demonstrated with a reproducible test case? The reference queue 
gets enqueued not by user's explicit code but by the JVM via weak references in 
this case. The GC will enqueue the reference that's being garbage collected 
into the reference queue. That's why there is no code in the Hadoop codebase 
that enqueues objects explicitly to this queue.

The cleaner thread is essentially a daemon thread that needs to run for the 
duration of the runtime to handle this task. If there is no work to be done (no 
relevant threads to garbage collect), that it will sit idle on the queue which 
is fine. If the program needs to exit and there is an interrupt, the cleaner 
thread *does* respond to the interrupt and does an orderly exit (see the while 
loop condition). So I'm still wondering what real-world problems we're 
observing.

It might be helpful to jog your memory on HADOOP-12107 and HADOOP-12958 for 
past analyses that went into this.


was (Author: sjlee0):
This caught my attention (sorry I'm not very active in the Hadoop codebase 
lately).

I'm not quite sure what the problem statement here is. Is there a real problem 
that can be demonstrated with a reproducible test case? The reference queue 
gets enqueued not by user's explicit code but by the JVM via weak references in 
this case. The GC will enqueue the reference that's being garbage collected 
into the reference queue. That's why there is no code in the Hadoop codebase 
that enqueues objects explicitly to this queue.

The cleaner thread is essentially a daemon thread that needs to run for the 
duration of the runtime to handle this task. If there is no work to be done (no 
relevant threads to garbage collect), that it will sit idle on the queue which 
is fine. If the program needs to exit and there is an interrupt, the cleaner 
thread *does* respond to the interrupt and does an orderly exit (see the while 
loop condition). So I'm still wondering what real-world problems we're 
observing.

It might be helpful to jog your memory on HADOOP-12107 and HADOOP-12958 for 
past analyses that went into this.

> Fix issue of the StatisticsDataReferenceCleaner cleanUp
> ---
>
> Key: HADOOP-17728
> URL: https://issues.apache.org/jira/browse/HADOOP-17728
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.2.1
>Reporter: yikf
>Assignee: yikf
>Priority: Minor
>  Labels: pull-request-available, reverted
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Cleaner thread will be blocked if we remove reference from ReferenceQueue 
> unless the `queue.enqueue` called.
> 
>     As shown below, We call ReferenceQueue.remove() now while cleanUp, Call 
> chain as follow:
>                          *StatisticsDataReferenceCleaner#queue.remove()  ->  
> ReferenceQueue.remove(0)  -> lock.wait(0)*
>     But, lock.notifyAll is called when queue.enqueue only, so Cleaner thread 
> will be blocked.
>  
> ThreadDump:
> {code:java}
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7f7afc088800 
> nid=0x2119 in Object.wait() [0x7f7b0023]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:502)
> at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> - locked <0xc00c2f58> (a java.lang.ref.Reference$Lock)
> at 
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17462) Hadoop Client getRpcResponse May Return Wrong Result

2021-01-08 Thread Sangjin Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261562#comment-17261562
 ] 

Sangjin Lee commented on HADOOP-17462:
--

I might be coming in cold here, not having looked at this code in a long while, 
but if the {{done}} variable is read or written always with synchronization, 
that should make it safe without making it volatile. A cursory look seems to 
indicate that {{done}} is accessed only with the instance lock held. So making 
it volatile is not going to change things.

Perhaps there is a deadlock or other situation that's causing the pile-up?

> Hadoop Client getRpcResponse May Return Wrong Result
> 
>
> Key: HADOOP-17462
> URL: https://issues.apache.org/jira/browse/HADOOP-17462
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java|Title=Client.java}
>   /** @return the rpc response or, in case of timeout, null. */
>   private Writable getRpcResponse(final Call call, final Connection 
> connection,
>   final long timeout, final TimeUnit unit) throws IOException {
> synchronized (call) {
>   while (!call.done) {
> try {
>   AsyncGet.Util.wait(call, timeout, unit);
>   if (timeout >= 0 && !call.done) {
> return null;
>   }
> } catch (InterruptedException ie) {
>   Thread.currentThread().interrupt();
>   throw new InterruptedIOException("Call interrupted");
> }
>   }
>  */
>   static class Call {
> final int id;   // call id
> final int retry;   // retry count
> ...
> boolean done;   // true when call is done
> ...
> }
> {code}
> The {{done}} variable is not marked as {{volatile}} so the thread which is 
> checking its status is free to cache the value and never reload it even 
> though it is expected to change by a different thread.  The while loop may be 
> stuck waiting for the change, but is always looking at a cached value.  If 
> that happens, timeout will occur and then return 'null'.
> In previous versions of Hadoop, there was no time-out at this level, so it 
> would cause endless loop.  Really tough error to track down if it happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14366) maven upgrade broke start-build-env.sh

2017-05-01 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-14366:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

Committed the patch to trunk. Thanks [~ajisakaa] for your contribution and 
[~aw] for the review!

> maven upgrade broke start-build-env.sh
> --
>
> Key: HADOOP-14366
> URL: https://issues.apache.org/jira/browse/HADOOP-14366
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha3
>Reporter: Allen Wittenauer
>Assignee: Akira Ajisaka
>Priority: Blocker
> Fix For: 3.0.0-alpha3
>
> Attachments: HADOOP-14366.01.patch, HADOOP-14366.02.patch
>
>
> The switch to maven 3.3 didn't put mvn in the command line path, breaking all 
> sorts of stuff.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14366) maven upgrade broke start-build-env.sh

2017-05-01 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991594#comment-15991594
 ] 

Sangjin Lee commented on HADOOP-14366:
--

LGTM. I'll merge it shortly.

> maven upgrade broke start-build-env.sh
> --
>
> Key: HADOOP-14366
> URL: https://issues.apache.org/jira/browse/HADOOP-14366
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha3
>Reporter: Allen Wittenauer
>Assignee: Akira Ajisaka
>Priority: Blocker
> Attachments: HADOOP-14366.01.patch, HADOOP-14366.02.patch
>
>
> The switch to maven 3.3 didn't put mvn in the command line path, breaking all 
> sorts of stuff.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2017-04-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977587#comment-15977587
 ] 

Sangjin Lee commented on HADOOP-11656:
--

{quote}
Will the shaded client jar be published to maven central? If so, how will 
serious security bugs in shaded dependencies be handled? Seems like this would 
require publishing new Hadoop bug fix release for all branches impacted (e.g. 
3.0.X, 3.1.Y, etc) ASAP.
{quote}

This is an issue IMO. I didn't realize the shaded classes were bundled into the 
hadoop jars themselves and published to the maven central. I wonder if it is 
feasible to keep shaded dependencies as individual jars but only provide it in 
the dist tarball. That way, we don't republish shaded dependencies to maven 
central. [~busbey], what do you think?

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Blocker
>  Labels: classloading, classpath, dependencies, scripts, shell
> Attachments: HADOOP-11656_proposal.md
>
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2017-04-18 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973641#comment-15973641
 ] 

Sangjin Lee commented on HADOOP-11656:
--

bq. I may be able to work around it by patching out the bundled deps for Fedora 
builds...

You can do that only so much from your side. Maybe swapping 1.2.3 of something 
with 1.2.4 would work, but the Hadoop community cannot guarantee that things 
will work if the version jump is sufficiently large.

{quote}
 If I'm understanding the situation correctly, perhaps there's a better way to 
resolve dependency convergence issues? I've found that often, just following 
semantic versioning, and keeping dependencies reasonably up-to-date resolves 
most issues.
{quote}

As [~steve_l] noted, it's not so simple as adopting semantic versioning etc. 
I'd be the first to acknowledge that the current set of dependency versions is 
hopelessly outdated. But we have been very conservative for fear of causing 
more issues by upgrading for downstream users and frameworks. The 3.0 timeframe 
gives us a window to make these changes that will insulate downstream from 
Hadoop and vice versa.

bq. But right now we don't have any way to stop changes in Hadoop's 
dependencies from breaking things downstream.

I would disagree with this statement. Shading is only *one* mechanism of 
isolating classpaths. The other commonly used mechanism is to have an isolating 
classloader, like a servlet webapp classloader. Hadoop has had this for many 
years, and folks have been using it with success. And it doesn't involve 
rewriting classes at build time.

We've been working on making it stricter for 3.0 so that we can finally 
separate the Hadoop classpath from the user classpath, thereby freeing Hadoop 
to evolve its dependencies without worrying about users' dependencies. See 
HADOOP-13070 and HADOOP-13398. IMO, we should keep the isolating classloader 
feature for 3.0 and get that done for the container runtime at least.

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Blocker
>  Labels: classloading, classpath, dependencies, scripts, shell
> Attachments: HADOOP-11656_proposal.md
>
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13070) classloading isolation improvements for stricter dependencies

2017-04-07 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961283#comment-15961283
 ] 

Sangjin Lee commented on HADOOP-13070:
--

The main subtask (HADOOP-13398) has been in the review queue for a while, but 
hasn't gotten feedback yet. The work is active but needs more feedback. I'm 
fine with dropping the priority to major.

> classloading isolation improvements for stricter dependencies
> -
>
> Key: HADOOP-13070
> URL: https://issues.apache.org/jira/browse/HADOOP-13070
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: classloading-improvements-ideas.pdf, 
> classloading-improvements-ideas.v.2.pdf, 
> classloading-improvements-ideas-v.3.pdf, HADOOP-13070.poc.01.patch, lib.jar, 
> TestDriver.java, Test.java
>
>
> Related to HADOOP-11656, we would like to make a number of improvements in 
> terms of classloading isolation so that user-code can run safely without 
> worrying about dependency collisions with the Hadoop dependencies.
> By the same token, it should raised the quality of the user code and its 
> specified classpath so that users get clear signals if they specify incorrect 
> classpaths.
> This will contain a proposal that will include several improvements some of 
> which may not be backward compatible. As such, it should be targeted to the 
> next major revision of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2017-03-31 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951309#comment-15951309
 ] 

Sangjin Lee commented on HADOOP-13398:
--

Ping. Any feedback on this patch is greatly appreciated. Thanks!

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch, 
> HADOOP-13398-HADOOP-13070.02.patch, HADOOP-13398-HADOOP-13070.03.patch, 
> HADOOP-13398-HADOOP-13070.04.patch, hadoop-13398-notes.pdf
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13866) Upgrade netty-all to 4.1.1.Final

2017-03-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894862#comment-15894862
 ] 

Sangjin Lee commented on HADOOP-13866:
--

Sounds like the 4.1 bump happened within 2.8, in which case I would agree that 
we want to stay with 4.0.x unless there is a compelling reason. 

The shading is great for 3.0, but I'm less enthusiastic about doing that for 
2.x and only for a single dependency. It would cause unnecessary scrambling 
from the users part if they had been relying on it without realizing it.

> Upgrade netty-all to 4.1.1.Final
> 
>
> Key: HADOOP-13866
> URL: https://issues.apache.org/jira/browse/HADOOP-13866
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Blocker
> Attachments: HADOOP-13866.v1.patch, HADOOP-13866.v2.patch, 
> HADOOP-13866.v3.patch, HADOOP-13866.v4.patch, HADOOP-13866.v6.patch, 
> HADOOP-13866.v7.patch, HADOOP-13866.v8.patch, HADOOP-13866.v8.patch, 
> HADOOP-13866.v8.patch, HADOOP-13866.v9.patch
>
>
> netty-all 4.1.1.Final is stable release which we should upgrade to.
> See bottom of HADOOP-12927 for related discussion.
> This issue was discovered since hbase 2.0 uses 4.1.1.Final of netty.
> When launching mapreduce job from hbase, /grid/0/hadoop/yarn/local/  
> usercache/hbase/appcache/application_1479850535804_0008/container_e01_1479850535804_0008_01_05/mr-framework/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar
>  (from hdfs) is ahead of 4.1.1.Final jar (from hbase) on the classpath.
> Resulting in the following exception:
> {code}
> 2016-12-01 20:17:26,678 WARN [Default-IPC-NioEventLoopGroup-1-1] 
> io.netty.util.concurrent.DefaultPromise: An exception was thrown by 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete()
> java.lang.NoSuchMethodError: 
> io.netty.buffer.ByteBuf.retainedDuplicate()Lio/netty/buffer/ByteBuf;
> at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:272)
> at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:262)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
> at 
> io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2017-02-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13398:
-
Attachment: hadoop-13398-notes.pdf

Notes attached.

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch, 
> HADOOP-13398-HADOOP-13070.02.patch, HADOOP-13398-HADOOP-13070.03.patch, 
> HADOOP-13398-HADOOP-13070.04.patch, hadoop-13398-notes.pdf
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2017-02-13 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864495#comment-15864495
 ] 

Sangjin Lee commented on HADOOP-13398:
--

Thanks Sean. I'll add a doc that summarizes the changes soon. Until then, you 
might want to look at [this 
one|https://issues.apache.org/jira/secure/attachment/12803266/classloading-improvements-ideas-v.3.pdf]
 attached to the parent JIRA (HADOOP-13070). That needs a little update as well 
but is still pretty accurate.

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch, 
> HADOOP-13398-HADOOP-13070.02.patch, HADOOP-13398-HADOOP-13070.03.patch, 
> HADOOP-13398-HADOOP-13070.04.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2017-02-10 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861672#comment-15861672
 ] 

Sangjin Lee commented on HADOOP-13398:
--

Can I get some feedback or review on this one? Perhaps I could write a short 
document that explains the changes? I know this is not a common (or popular) 
area for review, but it'd go a long way if I can get some interest on this. 
Thanks!

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch, 
> HADOOP-13398-HADOOP-13070.02.patch, HADOOP-13398-HADOOP-13070.03.patch, 
> HADOOP-13398-HADOOP-13070.04.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14006) dkopp;[;;[']]]]][[pplkkkkkkkkkjjjjjjhhhhgggffdyhfiot9ify6u;;d[][dlf

2017-01-20 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved HADOOP-14006.
--
Resolution: Invalid

> dkopp;[;;['][[pplkjjgggffdyhfiot9ify6u;;d[][dlf
> ---
>
> Key: HADOOP-14006
> URL: https://issues.apache.org/jira/browse/HADOOP-14006
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Kurt Ostfeld
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13961) mvn install fails on trunk

2017-01-10 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13961:
-
Attachment: HADOOP-13961.002.patch

Would something like this work? There is literally no difference between 
hadoop-kms.jar and hadoop-kms-classes.jar, and the former can be used as a 
dependency just fine.

> mvn install fails on trunk
> --
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Assignee: John Zhuge
>Priority: Blocker
> Attachments: HADOOP-13961.001.patch, HADOOP-13961.002.patch
>
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13961) mvn install fails on trunk

2017-01-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813597#comment-15813597
 ] 

Sangjin Lee commented on HADOOP-13961:
--

Thanks for the patch [~jzhuge]!

Quick question: now that we've switched from war to jar for kms, do we need a 
separate classes.jar any more? Can hadoop-hdfs simply depend on the main kms 
jar? Is the main jar redundant with the classes.jar or are they different so 
that hadoop-hdfs should depend on classes.jar still?

> mvn install fails on trunk
> --
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Assignee: John Zhuge
>Priority: Blocker
> Attachments: HADOOP-13961.001.patch
>
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13961) mvn install fails on trunk

2017-01-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812263#comment-15812263
 ] 

Sangjin Lee commented on HADOOP-13961:
--

I don't know about the hadoop-common issue [~jzhuge] is seeing, but I can 
reproduce the original problem pretty easily. With a clean trunk and an empty 
maven local cache the build fails:
{panel}
[ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
dependencies for project 
org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: The following 
artifacts could not be resolved: 
org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-SNAPSHOT, 
org.apache.hadoop:hadoop-kms:jar:tests:3.0.0-alpha2-SNAPSHOT: Could not find 
artifact org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-SNAPSHOT in 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
{panel}
And the problem does go away if I revert HADOOP-13597.

IMO the cause seems pretty clear. With HADOOP-13597, it appears that we stopped 
installing hadoop-kms-classes.jr and hadoop-kms-tests.jar 
(https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-kms/3.0.0-alpha2-SNAPSHOT/).
 But there are projects that depend on those classifiers (hadoop-hdfs being 
one).

> mvn install fails on trunk
> --
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Priority: Blocker
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13961) mvn install fails on trunk

2017-01-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810715#comment-15810715
 ] 

Sangjin Lee commented on HADOOP-13961:
--

Could it be related with HADOOP-13597?

> mvn install fails on trunk
> --
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Priority: Blocker
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2017-01-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801910#comment-15801910
 ] 

Sangjin Lee commented on HADOOP-13398:
--

Gentle ping to request feedback and comments. Thanks!

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch, 
> HADOOP-13398-HADOOP-13070.02.patch, HADOOP-13398-HADOOP-13070.03.patch, 
> HADOOP-13398-HADOOP-13070.04.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13947) Add a pre-commit step to check LICENSE/NOTICE issue

2017-01-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795745#comment-15795745
 ] 

Sangjin Lee commented on HADOOP-13947:
--

This would be a great idea. Thanks Xiao!

> Add a pre-commit step to check LICENSE/NOTICE issue
> ---
>
> Key: HADOOP-13947
> URL: https://issues.apache.org/jira/browse/HADOOP-13947
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2016-12-16 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13398:
-
Attachment: HADOOP-13398-HADOOP-13070.04.patch

v.4 patch posted.

This implements the support for ClassLoader.getResources() per proposal 
mentioned above and on HADOOP-13070.

This is now a pretty much ready patch. The only thing I haven't done is to 
merge the new unit test into {{TestApplicationClassLoader}} and reconcile them.

Again, your reviews and feedback would be greatly appreciated. Thanks!

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch, 
> HADOOP-13398-HADOOP-13070.02.patch, HADOOP-13398-HADOOP-13070.03.patch, 
> HADOOP-13398-HADOOP-13070.04.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2016-12-15 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13398:
-
Attachment: HADOOP-13398-HADOOP-13070.03.patch

v.3 patch posted. Addressed checkstyle issues.

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch, 
> HADOOP-13398-HADOOP-13070.02.patch, HADOOP-13398-HADOOP-13070.03.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2016-12-14 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750122#comment-15750122
 ] 

Sangjin Lee commented on HADOOP-13398:
--

The unit test failure is (obviously) unrelated. The new unit test passes.

I'll address the checkstyle issues soon.

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch, 
> HADOOP-13398-HADOOP-13070.02.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2016-12-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13398:
-
Attachment: HADOOP-13398-HADOOP-13070.02.patch

v.2 patch posted.

I have completed a set of unit tests that actually tests user classes and 
parent classes loading various classes and resources.

This is still work in progress. Still to be done:
- address {{ClassLoader.getResources()}} per 
[comment|https://issues.apache.org/jira/browse/HADOOP-13070?focusedCommentId=15742855=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15742855]
 in HADOOP-13070
- combining and reconciling this new unit test with the existing ones

Your reviews are greatly appreciated. I'd like to get feedback on the unit 
tests too. Thanks!

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch, 
> HADOOP-13398-HADOOP-13070.02.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13070) classloading isolation improvements for stricter dependencies

2016-12-12 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742855#comment-15742855
 ] 

Sangjin Lee edited comment on HADOOP-13070 at 12/12/16 8:14 PM:


I’ve been testing the latest patch (HADOOP-13398) and there seems to be one 
interesting issue with the stricter classpath isolation, and that has to do 
with the {{ServiceLoader}}.

(1) {{ServiceLoader}}
The {{ServiceLoader}} essentially uses the following pattern to load service 
classes dynamically:
{code}
Enumeration defs = 
classloader.getResources(“META-INF/services/org.foo.ServiceInterface”);
while (defs.hasMoreElements()) {
  URL def = defs.nextElement();
  Iterator names = parse(def);
  for (String name : names) {
ServiceInterface si = (ServiceInterface)Class.forName(name, classloader);
  }
}
{code}
First off, {{ClassLoader.getResources()}} will return all service files, 
*regardless of* where the service file is, either in the user classpath or the 
parent classpath (bit more discussion on {{ClassLoader.getResources()}} below).

Since all service files have been located and the calling class of 
{{Class.forName()}} is {{ServiceLoader}} which is a system class, all service 
classes will be successfully loaded, *regardless of* where the service class 
is, either in the user classpath or the parent classpath.

Technically this would represent an opportunity to circumvent the isolation and 
load stuff from the parent classpath. That said, we could still regard this as 
a variation of a “system facility providing a way to load things from both 
classpaths” case mentioned in the proposal (section 2-1).

I thought about plugging this possibility, but there doesn’t seem to be an 
unambiguous way to do this.

One approach I considered is to go above the call stack to identify who’s 
calling {{ServiceLoader.load()}}. Suppose we use the calling class to enforce 
stricter loading. If a user class is a calling class, it would load service 
files from both the user classpath and the parent classpath. However, as it 
iterates over the classes, it will fail to load a non-system parent class. This 
causes a *hard* failure on the iteration on {{ServiceLoader}}.

On the other hand, we could try to determine somehow a certain service file is 
a “non-system parent service file” and not return that service file resource 
with {{ClassLoader.getResources()}} to begin with. However, the notion of a 
“non-system parent service file” is not well defined, and I don’t think there 
is a way to define this clearly.

I think the best way forward is to allow {{ServiceLoader}} to load services 
from both the user and the parent classpath. I’d love to hear your thoughts on 
this.

(2) {{ClassLoader.getResources()}}
Currently {{ApplicationClassLoader}} does not override this. The javadoc for 
{{ClassLoader.getResources()}} states:
{noformat}
…
The search order is described in the documentation for getResource(String).
{noformat}
Since we do not override this today, we return the resources from the parent 
first and then from the child, which is not quite the same as what the javadoc 
indicates. So it seems to me that at minimum we want to change the order of 
resources so that it returns the child resources first.

The next question is whether it should return a (non-system) parent resource if 
a user class calls this method. We could tighten this to filter out non-system 
parent resource. I am leaning towards making that change.

Thoughts? Feedback? Concerns?
cc [~busbey]



was (Author: sjlee0):
I’ve been testing the latest patch (HADOOP-13998) and there seems to be one 
interesting issue with the stricter classpath isolation, and that has to do 
with the {{ServiceLoader}}.

(1) {{ServiceLoader}}
The {{ServiceLoader}} essentially uses the following pattern to load service 
classes dynamically:
{code}
Enumeration defs = 
classloader.getResources(“META-INF/services/org.foo.ServiceInterface”);
while (defs.hasMoreElements()) {
  URL def = defs.nextElement();
  Iterator names = parse(def);
  for (String name : names) {
ServiceInterface si = (ServiceInterface)Class.forName(name, classloader);
  }
}
{code}
First off, {{ClassLoader.getResources()}} will return all service files, 
*regardless of* where the service file is, either in the user classpath or the 
parent classpath (bit more discussion on {{ClassLoader.getResources()}} below).

Since all service files have been located and the calling class of 
{{Class.forName()}} is {{ServiceLoader}} which is a system class, all service 
classes will be successfully loaded, *regardless of* where the service class 
is, either in the user classpath or the parent classpath.

Technically this would represent an opportunity to circumvent the isolation and 
load stuff from the parent classpath. That said, we could still regard this as 
a variation of a “system facility providing a way to load things from 

[jira] [Commented] (HADOOP-13070) classloading isolation improvements for stricter dependencies

2016-12-12 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742855#comment-15742855
 ] 

Sangjin Lee commented on HADOOP-13070:
--

I’ve been testing the latest patch (HADOOP-13998) and there seems to be one 
interesting issue with the stricter classpath isolation, and that has to do 
with the {{ServiceLoader}}.

(1) {{ServiceLoader}}
The {{ServiceLoader}} essentially uses the following pattern to load service 
classes dynamically:
{code}
Enumeration defs = 
classloader.getResources(“META-INF/services/org.foo.ServiceInterface”);
while (defs.hasMoreElements()) {
  URL def = defs.nextElement();
  Iterator names = parse(def);
  for (String name : names) {
ServiceInterface si = (ServiceInterface)Class.forName(name, classloader);
  }
}
{code}
First off, {{ClassLoader.getResources()}} will return all service files, 
*regardless of* where the service file is, either in the user classpath or the 
parent classpath (bit more discussion on {{ClassLoader.getResources()}} below).

Since all service files have been located and the calling class of 
{{Class.forName()}} is {{ServiceLoader}} which is a system class, all service 
classes will be successfully loaded, *regardless of* where the service class 
is, either in the user classpath or the parent classpath.

Technically this would represent an opportunity to circumvent the isolation and 
load stuff from the parent classpath. That said, we could still regard this as 
a variation of a “system facility providing a way to load things from both 
classpaths” case mentioned in the proposal (section 2-1).

I thought about plugging this possibility, but there doesn’t seem to be an 
unambiguous way to do this.

One approach I considered is to go above the call stack to identify who’s 
calling {{ServiceLoader.load()}}. Suppose we use the calling class to enforce 
stricter loading. If a user class is a calling class, it would load service 
files from both the user classpath and the parent classpath. However, as it 
iterates over the classes, it will fail to load a non-system parent class. This 
causes a *hard* failure on the iteration on {{ServiceLoader}}.

On the other hand, we could try to determine somehow a certain service file is 
a “non-system parent service file” and not return that service file resource 
with {{ClassLoader.getResources()}} to begin with. However, the notion of a 
“non-system parent service file” is not well defined, and I don’t think there 
is a way to define this clearly.

I think the best way forward is to allow {{ServiceLoader}} to load services 
from both the user and the parent classpath. I’d love to hear your thoughts on 
this.

(2) {{ClassLoader.getResources()}}
Currently {{ApplicationClassLoader}} does not override this. The javadoc for 
{{ClassLoader.getResources()}} states:
{noformat}
…
The search order is described in the documentation for getResource(String).
{noformat}
Since we do not override this today, we return the resources from the parent 
first and then from the child, which is not quite the same as what the javadoc 
indicates. So it seems to me that at minimum we want to change the order of 
resources so that it returns the child resources first.

The next question is whether it should return a (non-system) parent resource if 
a user class calls this method. We could tighten this to filter out non-system 
parent resource. I am leaning towards making that change.

Thoughts? Feedback? Concerns?
cc [~busbey]


> classloading isolation improvements for stricter dependencies
> -
>
> Key: HADOOP-13070
> URL: https://issues.apache.org/jira/browse/HADOOP-13070
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13070.poc.01.patch, Test.java, TestDriver.java, 
> classloading-improvements-ideas-v.3.pdf, classloading-improvements-ideas.pdf, 
> classloading-improvements-ideas.v.2.pdf, lib.jar
>
>
> Related to HADOOP-11656, we would like to make a number of improvements in 
> terms of classloading isolation so that user-code can run safely without 
> worrying about dependency collisions with the Hadoop dependencies.
> By the same token, it should raised the quality of the user code and its 
> specified classpath so that users get clear signals if they specify incorrect 
> classpaths.
> This will contain a proposal that will include several improvements some of 
> which may not be backward compatible. As such, it should be targeted to the 
> next major revision of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Updated] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2016-12-05 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13398:
-
Description: 
Today, a user class is able to trigger loading a class from Hadoop's 
dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
creates an implicit dependence from users' code on Hadoop's dependencies, and 
as a result dependency conflicts.

We should modify {{ApplicationClassLoader}} to prevent a user class from 
loading a class from the parent classpath.

This should also cover resource loading (including 
{{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).

  was:
Today, a user class is able to trigger loading a class from Hadoop's 
dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
creates an implicit dependence from users' code on Hadoop's dependencies, and 
as a result dependency conflicts.

We should modify {{ApplicationClassLoader}} to prevent a user class from 
loading a class from the parent classpath.

This should also cover resource loading (and as a corollary {{ServiceLoader}}).


> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (including 
> {{ClassLoader.getResources()}} and as a corollary {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2016-11-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688487#comment-15688487
 ] 

Sangjin Lee commented on HADOOP-13398:
--

I would greatly appreciate reviews. You can test this per 
[comment|https://issues.apache.org/jira/browse/HADOOP-13070?focusedCommentId=15330886=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15330886]
 on HADOOP-13070.

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (and as a corollary 
> {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2016-11-22 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13398:
-
Attachment: HADOOP-13398-HADOOP-13070.01.patch

v.1 patch posted for review. This is still a POC quality. It handles 
classloading and resource loading. But unit tests are not written yet. I also 
didn't test {{ServiceLoader}}.

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (and as a corollary 
> {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2016-11-22 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13398:
-
Status: Patch Available  (was: In Progress)

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13398-HADOOP-13070.01.patch
>
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (and as a corollary 
> {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13070) classloading isolation improvements for stricter dependencies

2016-11-22 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13070:
-
Summary: classloading isolation improvements for stricter dependencies  
(was: classloading isolation improvements for cleaner and stricter dependencies)

> classloading isolation improvements for stricter dependencies
> -
>
> Key: HADOOP-13070
> URL: https://issues.apache.org/jira/browse/HADOOP-13070
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: HADOOP-13070.poc.01.patch, Test.java, TestDriver.java, 
> classloading-improvements-ideas-v.3.pdf, classloading-improvements-ideas.pdf, 
> classloading-improvements-ideas.v.2.pdf, lib.jar
>
>
> Related to HADOOP-11656, we would like to make a number of improvements in 
> terms of classloading isolation so that user-code can run safely without 
> worrying about dependency collisions with the Hadoop dependencies.
> By the same token, it should raised the quality of the user code and its 
> specified classpath so that users get clear signals if they specify incorrect 
> classpaths.
> This will contain a proposal that will include several improvements some of 
> which may not be backward compatible. As such, it should be targeted to the 
> next major revision of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work started] (HADOOP-13398) prevent user classes from loading classes in the parent classpath with ApplicationClassLoader

2016-11-22 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-13398 started by Sangjin Lee.

> prevent user classes from loading classes in the parent classpath with 
> ApplicationClassLoader
> -
>
> Key: HADOOP-13398
> URL: https://issues.apache.org/jira/browse/HADOOP-13398
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>
> Today, a user class is able to trigger loading a class from Hadoop's 
> dependencies, with or without the use of {{ApplicationClassLoader}}, and it 
> creates an implicit dependence from users' code on Hadoop's dependencies, and 
> as a result dependency conflicts.
> We should modify {{ApplicationClassLoader}} to prevent a user class from 
> loading a class from the parent classpath.
> This should also cover resource loading (and as a corollary 
> {{ServiceLoader}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11804) POC Hadoop Client w/o transitive dependencies

2016-11-18 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677212#comment-15677212
 ] 

Sangjin Lee commented on HADOOP-11804:
--

I'll take a look at it soon. It would be great if there is a reproducible case 
that we can easily replicate the problem.

If it is related with the ServiceLoader, you want to look at the properties 
files from which the ServiceLoader loads classes via reflection (e.g. 
META-INF/services/org.apache.hadoop.fs.FileSystem). The class names in these 
files need to match the actual class names in the jar. If the actual class was 
shaded, the content of these properties files need to be shaded too.

> POC Hadoop Client w/o transitive dependencies
> -
>
> Key: HADOOP-11804
> URL: https://issues.apache.org/jira/browse/HADOOP-11804
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Reporter: Sean Busbey
>Assignee: Sean Busbey
> Attachments: HADOOP-11804.1.patch, HADOOP-11804.2.patch, 
> HADOOP-11804.3.patch, HADOOP-11804.4.patch, HADOOP-11804.5.patch, 
> HADOOP-11804.6.patch, HADOOP-11804.7.patch, HADOOP-11804.8.patch, 
> HADOOP-11804.9.patch
>
>
> make a hadoop-client-api and hadoop-client-runtime that i.e. HBase can use to 
> talk with a Hadoop cluster without seeing any of the implementation 
> dependencies.
> see proposal on parent for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11804) POC Hadoop Client w/o transitive dependencies

2016-11-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649098#comment-15649098
 ] 

Sangjin Lee commented on HADOOP-11804:
--

Oh got you. I read it too quickly.

> POC Hadoop Client w/o transitive dependencies
> -
>
> Key: HADOOP-11804
> URL: https://issues.apache.org/jira/browse/HADOOP-11804
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Reporter: Sean Busbey
>Assignee: Sean Busbey
> Attachments: HADOOP-11804.1.patch, HADOOP-11804.2.patch, 
> HADOOP-11804.3.patch, HADOOP-11804.4.patch, HADOOP-11804.5.patch, 
> HADOOP-11804.6.patch, HADOOP-11804.7.patch
>
>
> make a hadoop-client-api and hadoop-client-runtime that i.e. HBase can use to 
> talk with a Hadoop cluster without seeing any of the implementation 
> dependencies.
> see proposal on parent for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11804) POC Hadoop Client w/o transitive dependencies

2016-11-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649050#comment-15649050
 ] 

Sangjin Lee commented on HADOOP-11804:
--

The javac compiler inlines string or integer primitive constants.

> POC Hadoop Client w/o transitive dependencies
> -
>
> Key: HADOOP-11804
> URL: https://issues.apache.org/jira/browse/HADOOP-11804
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Reporter: Sean Busbey
>Assignee: Sean Busbey
> Attachments: HADOOP-11804.1.patch, HADOOP-11804.2.patch, 
> HADOOP-11804.3.patch, HADOOP-11804.4.patch, HADOOP-11804.5.patch, 
> HADOOP-11804.6.patch, HADOOP-11804.7.patch
>
>
> make a hadoop-client-api and hadoop-client-runtime that i.e. HBase can use to 
> talk with a Hadoop cluster without seeing any of the implementation 
> dependencies.
> see proposal on parent for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11804) POC Hadoop Client w/o transitive dependencies

2016-11-01 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626288#comment-15626288
 ] 

Sangjin Lee commented on HADOOP-11804:
--

Thanks for the work [~busbey]! I just did a quick test with the latest patch.

One high level concern is in terms of maintaining dependencies in the pom's. If 
a developer adds a new dependency to a module, how would that propagate to 
these client pom's? Would he/she need to add it to these client pom's for the 
most part? It wasn't entirely clear to me what that cost of maintenance is. If 
that is the only way to keep it clean, that's OK. But it would be great if that 
cost is kept to a minimum.

1.
The patch indeed does not apply for me via plain {{git apply}}: it breaks with 
{{hadoop-client/pom.xml}} and {{hadoop-maven-plugins/pom.xml}}. I did {{git 
apply --reject HADOOP-11804.1.patch}}.

2.
Once I fixed the git apply issues, I did {{mvn clean install package -Pdist 
-DskipTests -Dmaven.javadoc.skip}} and it fails right away:
{noformat}
[ERROR]   The project 
org.apache.hadoop:hadoop-client-minicluster:3.0.0-alpha2-SNAPSHOT 
(/Users/sjlee/git/hadoop-trunk/hadoop-client-modules/hadoop-client-minicluster/pom.xml)
 has 1 error
[ERROR] 'dependencies.dependency.version' for org.mortbay.jetty:jetty:jar 
is missing. @ line 266, column 17
{noformat}

I got past it by providing a version for this (chose 6.1.26).

3.
The build still fails with a couple of duplicate classes issues. One is what 
Andrew reported above. Another is duplicate jetty classes.
{noformat}
[WARNING] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses failed 
with message:
Duplicate classes found:

  Found in:
org.apache.hadoop:hadoop-client-runtime:jar:3.0.0-alpha2-SNAPSHOT:compile

org.apache.hadoop:hadoop-client-minicluster:jar:3.0.0-alpha2-SNAPSHOT:compile
  Duplicate classes:
org/apache/hadoop/shaded/org/eclipse/jetty/io/ssl/SslConnection$2.class
org/apache/hadoop/shaded/org/eclipse/jetty/server/RequestLog.class
org/apache/hadoop/shaded/org/eclipse/jetty/server/ResourceCache$1.class
org/apache/hadoop/shaded/org/eclipse/jetty/util/log/AbstractLogger.class
org/apache/hadoop/shaded/org/eclipse/jetty/util/annotation/Name.class
org/apache/hadoop/shaded/org/eclipse/jetty/util/component/LifeCycle.class

org/apache/hadoop/shaded/org/eclipse/jetty/server/HttpChannel$Commit100Callback.class

org/apache/hadoop/shaded/org/eclipse/jetty/util/ssl/SslContextFactory$1.class
  ...
{noformat}

4.
Was there a significant difficulty in handing the timeline service v.2? Is it 
just the number of new dependencies we’re pulling in or the fact that there is 
a HBase dependency?

5.
Regarding the logging libraries, I agree we probably want to exclude them. 
Things like log4j properties and the way slf4j works can cause issues down the 
road if shaded.


> POC Hadoop Client w/o transitive dependencies
> -
>
> Key: HADOOP-11804
> URL: https://issues.apache.org/jira/browse/HADOOP-11804
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Reporter: Sean Busbey
>Assignee: Sean Busbey
> Attachments: HADOOP-11804.1.patch, HADOOP-11804.2.patch, 
> HADOOP-11804.3.patch, HADOOP-11804.4.patch
>
>
> make a hadoop-client-api and hadoop-client-runtime that i.e. HBase can use to 
> talk with a Hadoop cluster without seeing any of the implementation 
> dependencies.
> see proposal on parent for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-10-31 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15623648#comment-15623648
 ] 

Sangjin Lee commented on HADOOP-13410:
--

Opened HADOOP-13776.

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Fix For: 3.0.0-alpha1
>
> Attachments: HADOOP-13410.001.patch, HADOOP-13410.002.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13776) remove redundant classpath entries in RunJar

2016-10-31 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13776:
-
Status: Patch Available  (was: Open)

> remove redundant classpath entries in RunJar
> 
>
> Key: HADOOP-13776
> URL: https://issues.apache.org/jira/browse/HADOOP-13776
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: HADOOP-13776.01.patch
>
>
> Today when you run a "hadoop jar" command, the content of the jar gets added 
> to the classpath twice, once in the jar form, and again in an unpacked form.
> We should include the content of the jar to the classpath only once. We 
> should keep the jar in the classpath (to support {{setJarByClass}} and other 
> useful use cases) but remove the root of the unpacked directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13776) remove redundant classpath entries in RunJar

2016-10-31 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13776:
-
Attachment: HADOOP-13776.01.patch

> remove redundant classpath entries in RunJar
> 
>
> Key: HADOOP-13776
> URL: https://issues.apache.org/jira/browse/HADOOP-13776
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: HADOOP-13776.01.patch
>
>
> Today when you run a "hadoop jar" command, the content of the jar gets added 
> to the classpath twice, once in the jar form, and again in an unpacked form.
> We should include the content of the jar to the classpath only once. We 
> should keep the jar in the classpath (to support {{setJarByClass}} and other 
> useful use cases) but remove the root of the unpacked directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13776) remove redundant classpath entries in RunJar

2016-10-31 Thread Sangjin Lee (JIRA)
Sangjin Lee created HADOOP-13776:


 Summary: remove redundant classpath entries in RunJar
 Key: HADOOP-13776
 URL: https://issues.apache.org/jira/browse/HADOOP-13776
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Reporter: Sangjin Lee
Assignee: Sangjin Lee


Today when you run a "hadoop jar" command, the content of the jar gets added to 
the classpath twice, once in the jar form, and again in an unpacked form.

We should include the content of the jar to the classpath only once. We should 
keep the jar in the classpath (to support {{setJarByClass}} and other useful 
use cases) but remove the root of the unpacked directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-10-31 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15623442#comment-15623442
 ] 

Sangjin Lee commented on HADOOP-13410:
--

Sorry it is bit confusing.

The original commit for this JIRA did make alpha 1 (although the fix version 
was not marked as such because I thought mistakenly it was committed too late 
for alpha 1).

Then [~bibinchundatt] found it breaks MR job submission, and it was reverted in 
HADOOP-13620 which is on alpha 2 (the issue was discovered after alpha 1 was 
released).

I then reopened this JIRA to fix the original issue correctly. That's what the 
2nd patch is about.

[~andrew.wang], I take that your suggestion is not to reuse this JIRA but open 
a new one to fix this issue correctly?

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Fix For: 3.0.0-alpha1
>
> Attachments: HADOOP-13410.001.patch, HADOOP-13410.002.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-9424) The "hadoop jar" invocation should include the passed jar on the classpath as a whole

2016-10-31 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-9424:

Labels:   (was: BB2015-05-TBR)

> The "hadoop jar" invocation should include the passed jar on the classpath as 
> a whole
> -
>
> Key: HADOOP-9424
> URL: https://issues.apache.org/jira/browse/HADOOP-9424
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 2.0.3-alpha
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HADOOP-9424.patch
>
>
> When you have a case such as this:
> {{X.jar -> Classes = Main, Foo}}
> {{Y.jar -> Classes = Bar}}
> With implementation details such as:
> * Main references Bar and invokes a public, static method on it.
> * Bar does a class lookup to find Foo (Class.forName("Foo")).
> Then when you do a {{HADOOP_CLASSPATH=Y.jar hadoop jar X.jar Main}}, the 
> Bar's method fails with a ClassNotFound exception cause of the way RunJar 
> runs.
> RunJar extracts the passed jar and includes its contents on the ClassLoader 
> of its current thread but the {{Class.forName(…)}} call from another class 
> does not check that class loader and hence cannot find the class as its not 
> on any classpath it is aware of.
> The script of "hadoop jar" should ideally include the passed jar argument to 
> the CLASSPATH before RunJar is invoked, for this above case to pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-9424) The "hadoop jar" invocation should include the passed jar on the classpath as a whole

2016-10-31 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee reopened HADOOP-9424:
-

I would keep this issue open as it still is a real issue. Let's find a way to 
address this in the best way possible. It is not as urgent as other issues, so 
we can take some time to think it through.

> The "hadoop jar" invocation should include the passed jar on the classpath as 
> a whole
> -
>
> Key: HADOOP-9424
> URL: https://issues.apache.org/jira/browse/HADOOP-9424
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 2.0.3-alpha
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HADOOP-9424.patch
>
>
> When you have a case such as this:
> {{X.jar -> Classes = Main, Foo}}
> {{Y.jar -> Classes = Bar}}
> With implementation details such as:
> * Main references Bar and invokes a public, static method on it.
> * Bar does a class lookup to find Foo (Class.forName("Foo")).
> Then when you do a {{HADOOP_CLASSPATH=Y.jar hadoop jar X.jar Main}}, the 
> Bar's method fails with a ClassNotFound exception cause of the way RunJar 
> runs.
> RunJar extracts the passed jar and includes its contents on the ClassLoader 
> of its current thread but the {{Class.forName(…)}} call from another class 
> does not check that class loader and hence cannot find the class as its not 
> on any classpath it is aware of.
> The script of "hadoop jar" should ideally include the passed jar argument to 
> the CLASSPATH before RunJar is invoked, for this above case to pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-10-31 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622683#comment-15622683
 ] 

Sangjin Lee commented on HADOOP-13410:
--

The unit test failure is unrelated. I would greatly appreciate feedback. Thanks!

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch, HADOOP-13410.002.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-10-28 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13410:
-
Attachment: HADOOP-13410.002.patch

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch, HADOOP-13410.002.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-10-28 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13410:
-
Attachment: (was: HADOOP-13410.002.patch)

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-10-28 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13410:
-
Hadoop Flags:   (was: Reviewed)
  Status: Patch Available  (was: Reopened)

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch, HADOOP-13410.002.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-10-28 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13410:
-
Attachment: HADOOP-13410.002.patch

Posted patch v.2.

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch, HADOOP-13410.002.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-10-28 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616572#comment-15616572
 ] 

Sangjin Lee commented on HADOOP-13410:
--

Thanks for the input [~matt.mot...@cerner.com]. I am now thinking that we need 
to keep the jar over the unpacked root ("unpacked/"), which is the opposite of 
the previous attempt to solve this (we would still need to retain 
"unpacked/classes" and "unpacked/libs").

That should address your issue too, correct?

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13400) update the ApplicationClassLoader implementation in line with latest Java ClassLoader implementation

2016-10-18 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587046#comment-15587046
 ] 

Sangjin Lee commented on HADOOP-13400:
--

Sorry it took a while. +1. I'll commit it to the feature branch shortly.

> update the ApplicationClassLoader implementation in line with latest Java 
> ClassLoader implementation
> 
>
> Key: HADOOP-13400
> URL: https://issues.apache.org/jira/browse/HADOOP-13400
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Vrushali C
> Attachments: HADOOP-13400-HADOOP-13070.01.patch, 
> HADOOP-13400-HADOOP-13070.02.patch
>
>
> The current {{ApplicationClassLoader}} implementation is aged, and does not 
> reflect the latest java {{ClassLoader}} implementation. One example is the 
> use of the fine-grained classloading lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2016-10-17 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583814#comment-15583814
 ] 

Sangjin Lee commented on HADOOP-11656:
--

Thanks!

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>  Labels: classloading, classpath, dependencies, scripts, shell
> Attachments: HADOOP-11656_proposal.md
>
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2016-10-17 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583768#comment-15583768
 ] 

Sangjin Lee commented on HADOOP-11656:
--

Thanks [~andrew.wang]. When would be the beta 1 timeframe?

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>  Labels: classloading, classpath, dependencies, scripts, shell
> Attachments: HADOOP-11656_proposal.md
>
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2016-10-17 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583719#comment-15583719
 ] 

Sangjin Lee commented on HADOOP-11656:
--

Sorry for the delay. I'm going to try to give some love to HADOOP-13070 (have 
been focusing on the continuing timeline service work). I'm not too sure if it 
can be completed by this month, but I hope it can be done by next month. How 
does that sound from the 3.0.0 release perspective?

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>  Labels: classloading, classpath, dependencies, scripts, shell
> Attachments: HADOOP-11656_proposal.md
>
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12090) minikdc-related unit tests fail consistently on some platforms

2016-10-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567529#comment-15567529
 ] 

Sangjin Lee commented on HADOOP-12090:
--

The patch adds references to {{SocketAcceptor}} and {{SocketSessionConfig}} 
which are classes in Mina. Since these are new direct references, I added the 
explicit dependency.

> minikdc-related unit tests fail consistently on some platforms
> --
>
> Key: HADOOP-12090
> URL: https://issues.apache.org/jira/browse/HADOOP-12090
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms, test
>Affects Versions: 2.7.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: HADOOP-12090.001.patch, HADOOP-12090.002.patch
>
>
> On some platforms all unit tests that use minikdc fail consistently. Those 
> tests include TestKMS, TestSaslDataTransfer, 
> TestTimelineAuthenticationFilter, etc.
> Typical failures on the unit tests:
> {noformat}
> java.lang.AssertionError: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Cannot get a 
> KDC reply)
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1154)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1145)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS.doAs(TestKMS.java:261)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS.access$100(TestKMS.java:76)
> {noformat}
> The errors that cause this failure on the KDC server on the minikdc are a 
> NullPointerException:
> {noformat}
> org.apache.mina.filter.codec.ProtocolDecoderException: 
> java.lang.NullPointerException: message (Hexdump: ...)
>   at 
> org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:234)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:48)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:802)
>   at 
> org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:120)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:426)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:604)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:564)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:553)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:57)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:892)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:65)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException: message
>   at 
> org.apache.mina.filter.codec.AbstractProtocolDecoderOutput.write(AbstractProtocolDecoderOutput.java:44)
>   at 
> org.apache.directory.server.kerberos.protocol.codec.MinaKerberosDecoder.decode(MinaKerberosDecoder.java:65)
>   at 
> org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:224)
>   ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-12090) minikdc-related unit tests fail consistently on some platforms

2016-10-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567076#comment-15567076
 ] 

Sangjin Lee edited comment on HADOOP-12090 at 10/12/16 4:27 AM:


Just to be clear, this issue is caused because Mina (the networking stack on 
which ApacheDS depends) does set the send and receive buffer size to 1 KB (see 
DIRSERVER-2074 for more detail). If we move away from that behavior by using 
different libraries or else, the problem may go away.


was (Author: sjlee0):
Just to be clear, this issue is caused because Mina (the networking stack on 
which ApacheDS depends) does set the send and receive buffer size to 1 KB (see 
DIRSERVER-2074 more detail). If we move away from that behavior either by using 
different libraries, the problem might go away.

> minikdc-related unit tests fail consistently on some platforms
> --
>
> Key: HADOOP-12090
> URL: https://issues.apache.org/jira/browse/HADOOP-12090
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms, test
>Affects Versions: 2.7.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: HADOOP-12090.001.patch, HADOOP-12090.002.patch
>
>
> On some platforms all unit tests that use minikdc fail consistently. Those 
> tests include TestKMS, TestSaslDataTransfer, 
> TestTimelineAuthenticationFilter, etc.
> Typical failures on the unit tests:
> {noformat}
> java.lang.AssertionError: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Cannot get a 
> KDC reply)
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1154)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1145)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS.doAs(TestKMS.java:261)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS.access$100(TestKMS.java:76)
> {noformat}
> The errors that cause this failure on the KDC server on the minikdc are a 
> NullPointerException:
> {noformat}
> org.apache.mina.filter.codec.ProtocolDecoderException: 
> java.lang.NullPointerException: message (Hexdump: ...)
>   at 
> org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:234)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:48)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:802)
>   at 
> org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:120)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:426)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:604)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:564)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:553)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:57)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:892)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:65)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException: message
>   at 
> org.apache.mina.filter.codec.AbstractProtocolDecoderOutput.write(AbstractProtocolDecoderOutput.java:44)
>   at 
> org.apache.directory.server.kerberos.protocol.codec.MinaKerberosDecoder.decode(MinaKerberosDecoder.java:65)
>   at 
> org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:224)
>   ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To 

[jira] [Commented] (HADOOP-12090) minikdc-related unit tests fail consistently on some platforms

2016-10-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567076#comment-15567076
 ] 

Sangjin Lee commented on HADOOP-12090:
--

Just to be clear, this issue is caused because Mina (the networking stack on 
which ApacheDS depends) does set the send and receive buffer size to 1 KB (see 
DIRSERVER-2074 more detail). If we move away from that behavior either by using 
different libraries, the problem might go away.

> minikdc-related unit tests fail consistently on some platforms
> --
>
> Key: HADOOP-12090
> URL: https://issues.apache.org/jira/browse/HADOOP-12090
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms, test
>Affects Versions: 2.7.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: HADOOP-12090.001.patch, HADOOP-12090.002.patch
>
>
> On some platforms all unit tests that use minikdc fail consistently. Those 
> tests include TestKMS, TestSaslDataTransfer, 
> TestTimelineAuthenticationFilter, etc.
> Typical failures on the unit tests:
> {noformat}
> java.lang.AssertionError: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Cannot get a 
> KDC reply)
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1154)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1145)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS.doAs(TestKMS.java:261)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS.access$100(TestKMS.java:76)
> {noformat}
> The errors that cause this failure on the KDC server on the minikdc are a 
> NullPointerException:
> {noformat}
> org.apache.mina.filter.codec.ProtocolDecoderException: 
> java.lang.NullPointerException: message (Hexdump: ...)
>   at 
> org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:234)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:48)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:802)
>   at 
> org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:120)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:426)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:604)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:564)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:553)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:57)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:892)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:65)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException: message
>   at 
> org.apache.mina.filter.codec.AbstractProtocolDecoderOutput.write(AbstractProtocolDecoderOutput.java:44)
>   at 
> org.apache.directory.server.kerberos.protocol.codec.MinaKerberosDecoder.decode(MinaKerberosDecoder.java:65)
>   at 
> org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:224)
>   ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13691) remove build user and date from various hadoop web UI

2016-10-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557330#comment-15557330
 ] 

Sangjin Lee commented on HADOOP-13691:
--

I updated the title and the description to say explicitly "web UI", which was 
what I intended (thus the mention of the namenode UI and the resource manager 
UI). I hope this is clear.

> remove build user and date from various hadoop web UI
> -
>
> Key: HADOOP-13691
> URL: https://issues.apache.org/jira/browse/HADOOP-13691
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Sangjin Lee
>Priority: Minor
>
> Currently in the namenode UI as well as the resource manager UI, we display 
> the date of the build as well as the user id of the person who built it. 
> Although other bits of information is useful (e.g. git commit it, branch, 
> etc.), the value of the build date and user is suspect. We should consider 
> removing them from the visible web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13691) remove build user and date from various hadoop web UI

2016-10-08 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13691:
-
Description: Currently in the namenode UI as well as the resource manager 
UI, we display the date of the build as well as the user id of the person who 
built it. Although other bits of information is useful (e.g. git commit it, 
branch, etc.), the value of the build date and user is suspect. We should 
consider removing them from the visible web UI.  (was: Currently in the 
namenode UI as well as the resource manager UI, we display the date of the 
build as well as the user id of the person who built it. Although other bits of 
information is useful (e.g. git commit it, branch, etc.), the value of the 
build date and user is suspect. We should consider removing them from the 
visible UI.)

> remove build user and date from various hadoop web UI
> -
>
> Key: HADOOP-13691
> URL: https://issues.apache.org/jira/browse/HADOOP-13691
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Sangjin Lee
>Priority: Minor
>
> Currently in the namenode UI as well as the resource manager UI, we display 
> the date of the build as well as the user id of the person who built it. 
> Although other bits of information is useful (e.g. git commit it, branch, 
> etc.), the value of the build date and user is suspect. We should consider 
> removing them from the visible web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13691) remove build user and date from various hadoop web UI

2016-10-08 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13691:
-
Summary: remove build user and date from various hadoop web UI  (was: 
remove build user and date from various hadoop UI)

> remove build user and date from various hadoop web UI
> -
>
> Key: HADOOP-13691
> URL: https://issues.apache.org/jira/browse/HADOOP-13691
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Sangjin Lee
>Priority: Minor
>
> Currently in the namenode UI as well as the resource manager UI, we display 
> the date of the build as well as the user id of the person who built it. 
> Although other bits of information is useful (e.g. git commit it, branch, 
> etc.), the value of the build date and user is suspect. We should consider 
> removing them from the visible UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13691) remove build user and date from various hadoop UI

2016-10-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557313#comment-15557313
 ] 

Sangjin Lee commented on HADOOP-13691:
--

It's good to hear the clarification. If it wasn't clear from the title and the 
description, I was never arguing to remove it from hadoop version or 
*-version.properties. The scope was narrowly on the web UI.

> remove build user and date from various hadoop UI
> -
>
> Key: HADOOP-13691
> URL: https://issues.apache.org/jira/browse/HADOOP-13691
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Sangjin Lee
>Priority: Minor
>
> Currently in the namenode UI as well as the resource manager UI, we display 
> the date of the build as well as the user id of the person who built it. 
> Although other bits of information is useful (e.g. git commit it, branch, 
> etc.), the value of the build date and user is suspect. We should consider 
> removing them from the visible UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13691) remove build user and date from various hadoop UI

2016-10-07 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556659#comment-15556659
 ] 

Sangjin Lee commented on HADOOP-13691:
--

bq. It's very useful to be able use to 'hadoop version' to determine the date 
and user who built it when dealing with custom builds. From an ASF perspective, 
as soon as we start doing releases correctly again, it'll be a quick way to 
determine who the RE was for a given release.

I understand user and date do give us more information. What I'm curious about 
is whether this information is verifiable. For example, git commit id's are 
pretty concrete and it would be easy to verify that the binary matches the said 
git commit id. However, I don't think user and date give you really verifiable 
information. The user is a local system user id which may have no verifiable 
relation to the actual release manager, right?

> remove build user and date from various hadoop UI
> -
>
> Key: HADOOP-13691
> URL: https://issues.apache.org/jira/browse/HADOOP-13691
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Sangjin Lee
>Priority: Minor
>
> Currently in the namenode UI as well as the resource manager UI, we display 
> the date of the build as well as the user id of the person who built it. 
> Although other bits of information is useful (e.g. git commit it, branch, 
> etc.), the value of the build date and user is suspect. We should consider 
> removing them from the visible UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12090) minikdc-related unit tests fail consistently on some platforms

2016-10-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553524#comment-15553524
 ] 

Sangjin Lee commented on HADOOP-12090:
--

bq. Sangjin Lee On which platforms do you see this problem? We are seeing the 
issue 10% of the time with a source tree based on 2.6.

I think I saw this on Centos 6.

bq. Since the patch 002 increased the socket buffer size for minikdc, does it 
mean a certain socker buffer size can reliably reproduce the problem on some 
platforms? What is that size? 1140 bytes?

The default would reproduce this (at least in my problem environment). The 
default is 1 KB if I'm not mistaken. To some extent, this is OS sensitive 
because it has something to do with the TCP window management. For example, I 
was not able to reproduce this on mac.

> minikdc-related unit tests fail consistently on some platforms
> --
>
> Key: HADOOP-12090
> URL: https://issues.apache.org/jira/browse/HADOOP-12090
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms, test
>Affects Versions: 2.7.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: HADOOP-12090.001.patch, HADOOP-12090.002.patch
>
>
> On some platforms all unit tests that use minikdc fail consistently. Those 
> tests include TestKMS, TestSaslDataTransfer, 
> TestTimelineAuthenticationFilter, etc.
> Typical failures on the unit tests:
> {noformat}
> java.lang.AssertionError: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Cannot get a 
> KDC reply)
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1154)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1145)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS.doAs(TestKMS.java:261)
>   at 
> org.apache.hadoop.crypto.key.kms.server.TestKMS.access$100(TestKMS.java:76)
> {noformat}
> The errors that cause this failure on the KDC server on the minikdc are a 
> NullPointerException:
> {noformat}
> org.apache.mina.filter.codec.ProtocolDecoderException: 
> java.lang.NullPointerException: message (Hexdump: ...)
>   at 
> org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:234)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:48)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:802)
>   at 
> org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:120)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
>   at 
> org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:426)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:604)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:564)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:553)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:57)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:892)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:65)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException: message
>   at 
> org.apache.mina.filter.codec.AbstractProtocolDecoderOutput.write(AbstractProtocolDecoderOutput.java:44)
>   at 
> org.apache.directory.server.kerberos.protocol.codec.MinaKerberosDecoder.decode(MinaKerberosDecoder.java:65)
>   at 
> org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:224)
>   ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Created] (HADOOP-13691) remove build person and date from various hadoop UI

2016-10-06 Thread Sangjin Lee (JIRA)
Sangjin Lee created HADOOP-13691:


 Summary: remove build person and date from various hadoop UI
 Key: HADOOP-13691
 URL: https://issues.apache.org/jira/browse/HADOOP-13691
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Reporter: Sangjin Lee
Priority: Minor


Currently in the namenode UI as well as the resource manager UI, we display the 
date of the build as well as the user id of the person who built it. Although 
other bits of information is useful (e.g. git commit it, branch, etc.), the 
value of the build date and user is suspect. We should consider removing them 
from the visible UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13691) remove build user and date from various hadoop UI

2016-10-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13691:
-
Summary: remove build user and date from various hadoop UI  (was: remove 
build person and date from various hadoop UI)

> remove build user and date from various hadoop UI
> -
>
> Key: HADOOP-13691
> URL: https://issues.apache.org/jira/browse/HADOOP-13691
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Sangjin Lee
>Priority: Minor
>
> Currently in the namenode UI as well as the resource manager UI, we display 
> the date of the build as well as the user id of the person who built it. 
> Although other bits of information is useful (e.g. git commit it, branch, 
> etc.), the value of the build date and user is suspect. We should consider 
> removing them from the visible UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2016-09-26 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523425#comment-15523425
 ] 

Sangjin Lee commented on HADOOP-11656:
--

Speaking of HADOOP-13070, I've been somewhat off track due to other things that 
were flaring up. I'd like to circle back to it sooner than later, however. Any 
feedback on the main proposal there would be welcome (or any help for that 
matter)!

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies, scripts, shell
> Attachments: HADOOP-11656_proposal.md
>
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11683) Need a plugin API to translate long principal names to local OS user names arbitrarily

2016-09-16 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-11683:
-
Target Version/s: 2.6.6  (was: 2.6.5)

Moving this issue to 2.6.6. Please move back if you feel otherwise.

> Need a plugin API to translate long principal names to local OS user names 
> arbitrarily
> --
>
> Key: HADOOP-11683
> URL: https://issues.apache.org/jira/browse/HADOOP-11683
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Sunny Cheung
>Assignee: roger mak
> Attachments: HADOOP-11683.001.patch, HADOOP-11683.002.patch, 
> HADOOP-11683.003.patch
>
>
> We need a plugin API to translate long principal names (e.g. 
> john@example.com) to local OS user names (e.g. user123456) arbitrarily.
> For some organizations the name translation is straightforward (e.g. 
> john@example.com to john_doe), and the hadoop.security.auth_to_local 
> configurable mapping is sufficient to resolve this (see HADOOP-6526). 
> However, in some other cases the name translation is arbitrary and cannot be 
> generalized by a set of translation rules easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13206) Delegation token cannot be fetched and used by different versions of client

2016-09-16 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13206:
-
Target Version/s: 2.6.6  (was: 2.6.5)

Moving this issue to 2.6.6. Please move back if you feel otherwise.

> Delegation token cannot be fetched and used by different versions of client
> ---
>
> Key: HADOOP-13206
> URL: https://issues.apache.org/jira/browse/HADOOP-13206
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0, 2.6.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HADOOP-13206.00.patch, HADOOP-13206.01.patch, 
> HADOOP-13206.02.patch
>
>
> We have observed that an HDFS delegation token fetched by a 2.3.0 client 
> cannot be used by a 2.6.1 client, and vice versa. Through some debugging I 
> found that it's a mismatch between the token's {{service}} and the 
> {{service}} of the filesystem (e.g. {{webhdfs://host.something.com:50070/}}). 
> One would be in numerical IP address and one would be in non-numerical 
> hostname format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13620) Mapreduce job failure on submission

2016-09-16 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13620:
-
Affects Version/s: 3.0.0-alpha1
Fix Version/s: 3.0.0-alpha2

Moved the JIRA to HADOOP. This has been fixed by reverting HADOOP-13410.

> Mapreduce job failure on submission
> ---
>
> Key: HADOOP-13620
> URL: https://issues.apache.org/jira/browse/HADOOP-13620
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Bibin A Chundatt
>Assignee: Sangjin Lee
>Priority: Blocker
> Fix For: 3.0.0-alpha2
>
>
> Configure Hibench
> Try running Enhanced TestDFSIO
> {noformat}
> 2016-09-15 18:20:24,849 INFO mapreduce.Job: Task Id : 
> attempt_1473943118844_0001_m_01_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1806)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> ... 9 more
> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.dfsioe.TestDFSIOEnh$WriteMapperEnh not found
> at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2330)
> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1108)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
> ... 14 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> Class org.apache.hadoop.fs.dfsioe.TestDFSIOEnh$WriteMapperEnh not found
> at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2298)
> at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2322)
> ... 16 more
> Caused by: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.dfsioe.TestDFSIOEnh$WriteMapperEnh not found
> at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2202)
> at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2296)
> ... 17 more
> {noformat}
> *mapreduce.JobResourceUploader: No job jar file set.  User classes may not be 
> found. See Job or Job#setJar(String).*
> Job jar is not getting set and not uploaded to staging dir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Moved] (HADOOP-13620) Mapreduce job failure on submission

2016-09-16 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee moved MAPREDUCE-6779 to HADOOP-13620:
-

Key: HADOOP-13620  (was: MAPREDUCE-6779)
Project: Hadoop Common  (was: Hadoop Map/Reduce)

> Mapreduce job failure on submission
> ---
>
> Key: HADOOP-13620
> URL: https://issues.apache.org/jira/browse/HADOOP-13620
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Sangjin Lee
>Priority: Blocker
>
> Configure Hibench
> Try running Enhanced TestDFSIO
> {noformat}
> 2016-09-15 18:20:24,849 INFO mapreduce.Job: Task Id : 
> attempt_1473943118844_0001_m_01_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1806)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> ... 9 more
> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.dfsioe.TestDFSIOEnh$WriteMapperEnh not found
> at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2330)
> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1108)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
> ... 14 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> Class org.apache.hadoop.fs.dfsioe.TestDFSIOEnh$WriteMapperEnh not found
> at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2298)
> at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2322)
> ... 16 more
> Caused by: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.dfsioe.TestDFSIOEnh$WriteMapperEnh not found
> at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2202)
> at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2296)
> ... 17 more
> {noformat}
> *mapreduce.JobResourceUploader: No job jar file set.  User classes may not be 
> found. See Job or Job#setJar(String).*
> Job jar is not getting set and not uploaded to staging dir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-09-16 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13410:
-
Fix Version/s: (was: 3.0.0-alpha2)

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-09-15 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee reopened HADOOP-13410:
--

The commit has been reverted.

It turns out that MR {{JobConf.setJarByClass()}} (by using 
{{ClassUtil.findContainingJar()}} ) looks for the jar that contains the class. 
If the class is not found in a jar, the job jar is not set.

[~yuanbo], I think we may want to remove the "/" entry rather than the jar. It 
would be great if we can add a small unit test to confirm the issue and the fix.

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-09-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493948#comment-15493948
 ] 

Sangjin Lee commented on HADOOP-13410:
--

Yes, in light of that, we should revert this and rework it. Could you revert it 
(it's only on trunk)? Let me know if you need my help. Thanks for reporting the 
issue!

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13434) Add quoting to Shell class

2016-09-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13434:
-
Fix Version/s: 2.6.5

Cherry-picked it to 2.6.5 (trivial).

> Add quoting to Shell class
> --
>
> Key: HADOOP-13434
> URL: https://issues.apache.org/jira/browse/HADOOP-13434
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
>
> Attachments: HADOOP-13434-branch-2.7.01.patch, HADOOP-13434.patch, 
> HADOOP-13434.patch, HADOOP-13434.patch
>
>
> The Shell class makes assumptions that the parameters won't have spaces or 
> other special characters, even when it invokes bash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11361) Fix a race condition in MetricsSourceAdapter.updateJmxCache

2016-09-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-11361:
-
Target Version/s: 2.7.3, 2.8.0, 2.6.5  (was: 2.8.0, 2.7.3, 2.6.5)
   Fix Version/s: 2.6.5

Cherry-picked it to 2.6.5. Picked also HADOOP-11301, HADOOP-12348, and 
HADOOP-12482 before this.

> Fix a race condition in MetricsSourceAdapter.updateJmxCache
> ---
>
> Key: HADOOP-11361
> URL: https://issues.apache.org/jira/browse/HADOOP-11361
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.4.1, 2.5.1, 2.6.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>  Labels: supportability
> Fix For: 2.8.0, 2.9.0, 2.6.5, 2.7.4, 3.0.0-alpha1
>
> Attachments: HADOOP-111361-003.patch, HADOOP-11361-002.patch, 
> HADOOP-11361-004.patch, HADOOP-11361-005.patch, HADOOP-11361-005.patch, 
> HADOOP-11361-006.patch, HADOOP-11361-007.patch, HADOOP-11361-009.patch, 
> HADOOP-11361.008.patch, HADOOP-11361.patch, HDFS-7487.patch
>
>
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:247)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:177)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getAttribute(MetricsSourceAdapter.java:102)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12482) Race condition in JMX cache update

2016-09-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-12482:
-
Fix Version/s: 2.6.5

Cherry-picked it to 2.6.5 (trivial).

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Fix For: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
>
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, 
> HADOOP-12482.003.patch, HADOOP-12482.004.patch, HADOOP-12482.005.patch, 
> HADOOP-12482.006.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12348) MetricsSystemImpl creates MetricsSourceAdapter with wrong time unit parameter.

2016-09-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-12348:
-
Fix Version/s: 2.6.5

Cherry-picked to 2.6.5 (trivial).

> MetricsSystemImpl creates MetricsSourceAdapter with wrong time unit parameter.
> --
>
> Key: HADOOP-12348
> URL: https://issues.apache.org/jira/browse/HADOOP-12348
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: metrics
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
>
> Attachments: HADOOP-12348.000.patch, HADOOP-12348.001.patch, 
> HADOOP-12348.branch-2.patch
>
>
> MetricsSystemImpl creates MetricsSourceAdapter with wrong time unit 
> parameter. MetricsSourceAdapter expects time unit millisecond  for 
> jmxCacheTTL but MetricsSystemImpl  passes time unit second to 
> MetricsSourceAdapter constructor.
> {code}
> jmxCacheTS = Time.now() + jmxCacheTTL;
>   /**
>* Current system time.  Do not use this to calculate a duration or interval
>* to sleep, because it will be broken by settimeofday.  Instead, use
>* monotonicNow.
>* @return current time in msec.
>*/
>   public static long now() {
> return System.currentTimeMillis();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11301) [optionally] update jmx cache to drop old metrics

2016-09-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-11301:
-
Fix Version/s: 2.6.5

Cherry-picked to 2.6.5 (trivial).

> [optionally] update jmx cache to drop old metrics
> -
>
> Key: HADOOP-11301
> URL: https://issues.apache.org/jira/browse/HADOOP-11301
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
> Fix For: 2.7.0, 2.6.5
>
> Attachments: HADOOP-11301.v01.patch, HADOOP-11301.v02.patch, 
> HADOOP-11301.v03.patch, HADOOP-11301.v04.patch
>
>
> MetricsSourceAdapter::updateJmxCache() skips updating the info cache if no 
> new metric is added since last time:
> {code}
>   int oldCacheSize = attrCache.size();
>   int newCacheSize = updateAttrCache();
>   if (oldCacheSize < newCacheSize) {
> updateInfoCache();
>   }
> {code}
> This behavior is not desirable in some applications. For example nntop 
> (HDFS-6982) reports the top users via jmx. The list is updated after each 
> report. The previously reported top users hence should be removed from the 
> cache upon each report request.
> In our production run of nntop we made a change to ignore the size check and 
> always perform updateInfoCache. I am planning to submit a patch including 
> this change. The feature can be enabled by a configuration parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13192) org.apache.hadoop.util.LineReader cannot handle multibyte delimiters correctly

2016-09-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13192:
-
Fix Version/s: 2.6.5

Cherry-picked it to 2.6.5 (trivial).

> org.apache.hadoop.util.LineReader cannot handle multibyte delimiters correctly
> --
>
> Key: HADOOP-13192
> URL: https://issues.apache.org/jira/browse/HADOOP-13192
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 2.6.2
>Reporter: binde
>Assignee: binde
>Priority: Critical
> Fix For: 2.7.3, 2.6.5, 3.0.0-alpha1
>
> Attachments: 
> 0001-HADOOP-13192-org.apache.hadoop.util.LineReader-match.patch, 
> 0002-fix-bug-hadoop-1392-add-test-case-for-LineReader.patch, 
> HADOOP-13192.final.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> org.apache.hadoop.util.LineReader.readCustomLine()  has a bug,
> when line is   bccc, recordDelimiter is aaab, the result should be a,ccc,
> show the code on line 310:
>   for (; bufferPosn < bufferLength; ++bufferPosn) {
> if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
>   delPosn++;
>   if (delPosn >= recordDelimiterBytes.length) {
> bufferPosn++;
> break;
>   }
> } else if (delPosn != 0) {
>   bufferPosn--;
>   delPosn = 0;
> }
>   }
> shoud be :
>   for (; bufferPosn < bufferLength; ++bufferPosn) {
> if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
>   delPosn++;
>   if (delPosn >= recordDelimiterBytes.length) {
> bufferPosn++;
> break;
>   }
> } else if (delPosn != 0) {
>  // - change here - start 
>   bufferPosn -= delPosn;
>  // - change here - end 
>   
>   delPosn = 0;
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13052) ChecksumFileSystem mishandles crc file permissions

2016-09-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13052:
-
Fix Version/s: 2.6.5

Cherry-picked it to 2.6.5 (trivial).

> ChecksumFileSystem mishandles crc file permissions
> --
>
> Key: HADOOP-13052
> URL: https://issues.apache.org/jira/browse/HADOOP-13052
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.7.3, 2.6.5, 3.0.0-alpha1
>
> Attachments: HADOOP-13052.patch
>
>
> CheckFileSystem does not override permission related calls to apply those 
> operations to the hidden crc files.  Clients may be unable to read the crcs 
> if the file is created with strict permissions and then relaxed.
> The checksum fs is designed to work with or w/o crcs present, so it silently 
> ignores FNF exceptions.  The java file stream apis unfortunately may only 
> throw FNF, so permission denied becomes FNF resulting in this bug going 
> silently unnoticed.
> (Problem discovered via public localizer.  Files are downloaded as 
> user-readonly and then relaxed to all-read.  The crc remains user-readonly)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12810) FileSystem#listLocatedStatus causes unnecessary RPC calls

2016-09-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-12810:
-
Fix Version/s: 2.6.5

Cherry-picked it to 2.6.5 (trivial). I'll also get MAPREDUCE-6637.

> FileSystem#listLocatedStatus causes unnecessary RPC calls
> -
>
> Key: HADOOP-12810
> URL: https://issues.apache.org/jira/browse/HADOOP-12810
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
> Fix For: 2.7.3, 2.6.5, 3.0.0-alpha1
>
> Attachments: HADOOP-12810.1.patch
>
>
> {{FileSystem#listLocatedStatus}} lists the files in a directory and then 
> calls {{getFileBlockLocations(stat.getPath(), ...)}} for each instead of 
> {{getFileBlockLocations(stat, ...)}}. That function with the path arg just 
> calls {{getFileStatus}} to get another file status from the path and calls 
> the file status version, so this ends up calling {{getFileStatus}} 
> unnecessarily.
> This is particularly bad for S3, where {{getFileStatus}} is expensive. 
> Avoiding the extra call improved input split calculation time for a data set 
> in S3 by ~20x: from 10 minutes to 25 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-08-16 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423366#comment-15423366
 ] 

Sangjin Lee commented on HADOOP-13410:
--

This should not have any bearing (one way or the other) on HADOOP-12728. If I 
understood correctly, HADOOP-12728 seems to be an issue of the order between 
the jar in the argument and what's in the underlying CLASSPATH or 
HADOOP_CLASSPATH. This JIRA concerns removing the redundant entries for the 
jar, and does not affect the above ordering problem.

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-13410:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   Status: Resolved  (was: Patch Available)

Committed it to trunk. Thanks [~yuanbo] for your contribution!

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-08-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418128#comment-15418128
 ] 

Sangjin Lee commented on HADOOP-13410:
--

I'm +1 with the change. Do let me know if there is feedback or objections to 
the change. I am thinking of committing this only to 3.0.0 (unless anyone needs 
this on 2.x).

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-08-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414547#comment-15414547
 ] 

Sangjin Lee commented on HADOOP-13410:
--

There is more than "/". We also add "/classes/" and "/libs/*" to the classpath. 
Since we unpack it to add different elements to the classpath, it would make 
more sense to remove the jar from the classpath IMO.

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-9424) The "hadoop jar" invocation should include the passed jar on the classpath as a whole

2016-08-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414532#comment-15414532
 ] 

Sangjin Lee commented on HADOOP-9424:
-

Coming over from HADOOP-13410, I would rather take the opposite approach. 
What's included in HADOOP_CLASSPATH should be considered more like "user 
classes". I would rather put the content of HADOOP_CLASSPATH in the user 
classloader that loads the jar and remove it from the CLASSPATH. That way, the 
jar in the argument and the content of the HADOOP_CLASSPATH would be visible to 
each other. Note that the isolated classloading (HADOOP_CLIENT_CLASSLOADER) 
already takes that approach.

> The "hadoop jar" invocation should include the passed jar on the classpath as 
> a whole
> -
>
> Key: HADOOP-9424
> URL: https://issues.apache.org/jira/browse/HADOOP-9424
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 2.0.3-alpha
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-9424.patch
>
>
> When you have a case such as this:
> {{X.jar -> Classes = Main, Foo}}
> {{Y.jar -> Classes = Bar}}
> With implementation details such as:
> * Main references Bar and invokes a public, static method on it.
> * Bar does a class lookup to find Foo (Class.forName("Foo")).
> Then when you do a {{HADOOP_CLASSPATH=Y.jar hadoop jar X.jar Main}}, the 
> Bar's method fails with a ClassNotFound exception cause of the way RunJar 
> runs.
> RunJar extracts the passed jar and includes its contents on the ClassLoader 
> of its current thread but the {{Class.forName(…)}} call from another class 
> does not check that class loader and hence cannot find the class as its not 
> on any classpath it is aware of.
> The script of "hadoop jar" should ideally include the passed jar argument to 
> the CLASSPATH before RunJar is invoked, for this above case to pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-08-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414527#comment-15414527
 ] 

Sangjin Lee commented on HADOOP-13410:
--

[~qwertymaniac], you're right. HADOOP-9424 is somewhat similar but can be 
considered separately. I'll comment there.

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-08-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413770#comment-15413770
 ] 

Sangjin Lee commented on HADOOP-13410:
--

Thanks [~yuanbo]! The patch does what the JIRA calls for, and I tested it 
locally.

That said, I'd like to find out from the community if there is any reason that 
the jar itself needs to remain in the classpath after the unjarred content is 
added to the classpath. I'll ask the community.

> RunJar adds the content of the jar twice to the classpath
> -
>
> Key: HADOOP-13410
> URL: https://issues.apache.org/jira/browse/HADOOP-13410
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Yuanbo Liu
> Attachments: HADOOP-13410.001.patch
>
>
> Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
> location and gets added to the classloader.
> However, the original jar itself is still added to the classpath.
> {code}
>   List classPath = new ArrayList<>();
>   classPath.add(new File(workDir + "/").toURI().toURL());
>   classPath.add(file.toURI().toURL());
>   classPath.add(new File(workDir, "classes/").toURI().toURL());
>   File[] libs = new File(workDir, "lib").listFiles();
>   if (libs != null) {
> for (File lib : libs) {
>   classPath.add(lib.toURI().toURL());
> }
>   }
> {code}
> As a result, the contents of the jar are present in the classpath *twice* and 
> are completely redundant. Although this does not necessarily cause 
> correctness issues, some stricter code written to require a single presence 
> of files may fail.
> I cannot think of a good reason why the jar should be added to the classpath 
> if the unjarred content was added to it. I think we should remove the jar 
> from the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12747) support wildcard in libjars argument

2016-08-08 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-12747:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   2.9.0
 Release Note: It is now possible to specify multiple jar files for the 
libjars argument using a wildcard. For example, you can specify "-libjars 
'libs/*'" as a shorthand for all jars in the libs directory.
   Status: Resolved  (was: Patch Available)

Committed. Thanks [~cnauroth], [~jira.shegalov], and [~vicaya] for your reviews 
and comments.

> support wildcard in libjars argument
> 
>
> Key: HADOOP-12747
> URL: https://issues.apache.org/jira/browse/HADOOP-12747
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch, 
> HADOOP-12747.03.patch, HADOOP-12747.04.patch, HADOOP-12747.05.patch, 
> HADOOP-12747.06.patch, HADOOP-12747.07.patch
>
>
> There is a problem when a user job adds too many dependency jars in their 
> command line. The HADOOP_CLASSPATH part can be addressed, including using 
> wildcards (\*). But the same cannot be done with the -libjars argument. Today 
> it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this 
> situation. The idea is to handle it the same way the JVM does it: \* expands 
> to the list of jars in that directory. It does not traverse into any child 
> directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't 
> do it for -files and -archives).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12747) support wildcard in libjars argument

2016-08-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412554#comment-15412554
 ] 

Sangjin Lee commented on HADOOP-12747:
--

Thanks [~vicaya]! Unless there are objections, I'll commit it by EOD today.

> support wildcard in libjars argument
> 
>
> Key: HADOOP-12747
> URL: https://issues.apache.org/jira/browse/HADOOP-12747
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: util
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch, 
> HADOOP-12747.03.patch, HADOOP-12747.04.patch, HADOOP-12747.05.patch, 
> HADOOP-12747.06.patch, HADOOP-12747.07.patch
>
>
> There is a problem when a user job adds too many dependency jars in their 
> command line. The HADOOP_CLASSPATH part can be addressed, including using 
> wildcards (\*). But the same cannot be done with the -libjars argument. Today 
> it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this 
> situation. The idea is to handle it the same way the JVM does it: \* expands 
> to the list of jars in that directory. It does not traverse into any child 
> directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't 
> do it for -files and -archives).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13410) RunJar adds the content of the jar twice to the classpath

2016-07-22 Thread Sangjin Lee (JIRA)
Sangjin Lee created HADOOP-13410:


 Summary: RunJar adds the content of the jar twice to the classpath
 Key: HADOOP-13410
 URL: https://issues.apache.org/jira/browse/HADOOP-13410
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Reporter: Sangjin Lee


Today when you run a "hadoop jar" command, the jar is unzipped to a temporary 
location and gets added to the classloader.

However, the original jar itself is still added to the classpath.
{code}
  List classPath = new ArrayList<>();
  classPath.add(new File(workDir + "/").toURI().toURL());
  classPath.add(file.toURI().toURL());
  classPath.add(new File(workDir, "classes/").toURI().toURL());
  File[] libs = new File(workDir, "lib").listFiles();
  if (libs != null) {
for (File lib : libs) {
  classPath.add(lib.toURI().toURL());
}
  }
{code}

As a result, the contents of the jar are present in the classpath *twice* and 
are completely redundant. Although this does not necessarily cause correctness 
issues, some stricter code written to require a single presence of files may 
fail.

I cannot think of a good reason why the jar should be added to the classpath if 
the unjarred content was added to it. I think we should remove the jar from the 
classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13401) usability improvements of ApplicationClassLoader

2016-07-21 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388438#comment-15388438
 ] 

Sangjin Lee commented on HADOOP-13401:
--

I think I'll include that work in HADOOP-13398. It needs to be addressed as a 
whole.

> usability improvements of ApplicationClassLoader
> 
>
> Key: HADOOP-13401
> URL: https://issues.apache.org/jira/browse/HADOOP-13401
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: util
>Reporter: Sangjin Lee
>
> Miscellaneous usability improvements for {{ApplicationClassLoader}}:
> - Improve the system class override mechanism: today the override is a 
> wholesale replacement of the default; enable modifying the default
> - Improve handling of addition and subtraction of system classes: today it is 
> sensitive to order
> - other miscellaneous improvements that make using {{ApplicationClassLoader}} 
> easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13401) usability improvements of ApplicationClassLoader

2016-07-21 Thread Sangjin Lee (JIRA)
Sangjin Lee created HADOOP-13401:


 Summary: usability improvements of ApplicationClassLoader
 Key: HADOOP-13401
 URL: https://issues.apache.org/jira/browse/HADOOP-13401
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: util
Reporter: Sangjin Lee


Miscellaneous usability improvements for {{ApplicationClassLoader}}:
- Improve the system class override mechanism: today the override is a 
wholesale replacement of the default; enable modifying the default
- Improve handling of addition and subtraction of system classes: today it is 
sensitive to order
- other miscellaneous improvements that make using {{ApplicationClassLoader}} 
easier




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13400) update the ApplicationClassLoader implementation in line with latest Java ClassLoader implementation

2016-07-21 Thread Sangjin Lee (JIRA)
Sangjin Lee created HADOOP-13400:


 Summary: update the ApplicationClassLoader implementation in line 
with latest Java ClassLoader implementation
 Key: HADOOP-13400
 URL: https://issues.apache.org/jira/browse/HADOOP-13400
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: util
Reporter: Sangjin Lee


The current {{ApplicationClassLoader}} implementation is aged, and does not 
reflect the latest java {{ClassLoader}} implementation. One example is the use 
of the fine-grained classloading lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13399) deprecate the Configuration classloader

2016-07-21 Thread Sangjin Lee (JIRA)
Sangjin Lee created HADOOP-13399:


 Summary: deprecate the Configuration classloader
 Key: HADOOP-13399
 URL: https://issues.apache.org/jira/browse/HADOOP-13399
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: util
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical


Today, anyone can simply call {{Configuration.setClassLoader()}} to set the 
configuration classloader to any arbitrary classloader. This classloader is 
then used to get a class or a resource through {{Configuration}} 
({{getClass()}} and {{getResource()}}).

In essence, the {{Configuration}} classloader is effectively a globally shared 
classloader without contract. This is one step worse than TCCL in that regard.

I propose to remove/deprecate {{setClassLoader()}} and {{getClassLoader()}} and 
simply use TCCL (and then the classloader that loaded the {{Configuration}} 
class) to load classes and resources.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   3   4   5   >