[jira] [Commented] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-09 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228610#comment-17228610
 ] 

Steve Loughran commented on HADOOP-17306:
-

How does production code set the timestamp? That is what the tests should do. 
In which case the granularity does prevent problems.

FWIW, I've had problems with localisation related to
* timestamps being brittle
* the NM localizer assuming that world exec/read permissions is enough to 
promote any unencrypted reference into the cache, which is D/L'd using the 
credentials of the NM. HADOOP-16233 shows the problem there: if your store 
fakes directories then the permission probes are worthless.

including the checksum in the local resource would address the timestamp issue 
for stores which support it, but we'd need to be happy that the marshalling 
worked for all subclasses, which is probably a bit dubious unless there's some 
code which does a lot of marshalling of them already. Does Distcp?



> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-09 Thread Vinayakumar B (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228567#comment-17228567
 ] 

Vinayakumar B commented on HADOOP-17306:


Hi [~Jim_Brennan], Thanks for pointing to test failures.

AFAIK, test failures are due to setting timestamp of {{LocalResource}} with 
value returned by {{File.lastModified()}} all in test code explicitly for the 
scriptfile used for tests. As mentioned in this Jira title, 
{{File.lastModified()}} is broken and looses accuracy. I tried replacing 
{{File.lastModified()}} calls with
 {{Files.getLastModifiedTime(file.toPath()).toMillis()}}, all tests passed.

AM's sets the timestamp using the value returned by 
{{FileStatus#getModifiedTime()}} in which case, it will be consistent. So I 
dont think any problem with the production code as long as 
{{FileStatus#getModificationTime()}} is used.

 

As Steve mentioned, relying on modificationTime and length may not be a good 
idea to detect changes. There could be possibilities of corruption 

> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-07 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1722#comment-1722
 ] 

Steve Loughran commented on HADOOP-17306:
-

bq. Note, it looks like pre-commit build only tested hadoop-common, so I can 
see why the YARN failures were missed. Not sure why we don't run tests for all 
projects when we make changes in common.

* patches would take forever
* things like HDFS tests fail so consistently we'd end up ignoring them

Generally, when a patch to hadoop-common goes near a module you know about, 
adding a change to the PR like a newline in a comment in a test file is 
sufficient to get it included -you just omit that change when you do the final 
merge.

Problem here looks like the timestamp checking of yarn localiser -there's 
clearly a timestamp mismatch between when the file was uploaded and the actual 
time of the file.

FWIW, object stores have been brittle for this in the past, especially s3 + 
S3guard...under heavy test load things can be off by a second). I've never 
worried about this (intermittent) failure, as going near the localizer was 
something I was worried about. Maybe: filesystems can declare their 
granularity, or have their own "has equal timestamp" probe. Or YARN localizer 
only worries about second granularity for its comparisions? I'd worry about the 
security risks on HDFS though.

Ideally: checksums should be used for any FS which supports checksums. Of 
course, S3A turns off mapping of etag -> checksum as then distcp from hdfs to 
s3 fails unless you explicitly {{-skipCrcCheck}} the operation...

> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-05 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227099#comment-17227099
 ] 

Akira Ajisaka commented on HADOOP-17306:


Thank you [~Jim_Brennan] for reverting this.

> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-05 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226902#comment-17226902
 ] 

Jim Brennan commented on HADOOP-17306:
--

Note, it looks like pre-commit build only tested hadoop-common, so I can see 
why the YARN failures were missed.  Not sure why we don't run tests for all 
projects when we make changes in common.


> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-05 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226795#comment-17226795
 ] 

Jim Brennan commented on HADOOP-17306:
--

Unit test failures in [YARN-10479] were caused by this.


> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-05 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226752#comment-17226752
 ] 

Jim Brennan commented on HADOOP-17306:
--

Here is an example of the type of failure I am seeing in the unit tests:
{noformat}
java.io.IOException: Resource 
file:/home/jenkins/jenkins-home/workspace/hadoop-qbt-trunk-java8-linux-x86_64/sourcedir/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/TestContainerManager-tmpDir/scriptFile.sh
 changed on src filesystem - expected: "2020-10-24T09:29:28.000+", was: 
"2020-10-24T09:29:28.936+", current time: "2020-10-24T09:29:29.586+"
at 
org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:278)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}

> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-05 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226747#comment-17226747
 ] 

Jim Brennan commented on HADOOP-17306:
--

[~aajisaka], [~ayushsaxena] any comment?  My inclination is to revert this 
change.


> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-04 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226305#comment-17226305
 ] 

Jim Brennan commented on HADOOP-17306:
--

This change is causing a large number of YARN unit tests to fail.
 We should consider reverting it until we can address the issues.

I am concerned that this might be a problem not just for tests, but also for 
production code.

This was last build on trunk before this change went in:
 
[https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/303/#showFailuresLink]

This was the first build with this change:
 
[https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/304/#showFailuresLink]

I believe many of the new nodemanager tests failures are due to this change. 
Many of them are failing because the timestamp for localized resources do not 
match what they were set to.
 Example failure:
{noformat}
java.lang.AssertionError: ProcessStartFile doesn't exist!
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager.prepareInitialContainer(TestContainerManager.java:1040)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager.testContainerUpgradeLocalizationFailure(TestContainerManager.java:819)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{noformat}
 
cc: [~vinayakumarb], [~hexiaoqiao], [~epayne]


> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was