[jira] [Commented] (HADOOP-16644) Intermittent failure of ITestS3ATerasortOnS3A: timestamp differences

2019-10-08 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947209#comment-16947209
 ] 

Siddharth Seth commented on HADOOP-16644:
-

Looks like a PUTRequest gives back the modification time, a multipart upload 
does not. Given a multipart upload is likely a long operation anyway - a HEAD 
request following a MultiPartComplete call likely doesn't add a large 
percentage to the operation time (only is S3Guard enabled). For a direct PUT - 
we have the data anyway. Will definitely make me happy to avoid writing to DDB 
during a getSTatus operation.

Using S3 for resource localization - that's got at least one issue which I'm 
aware of. Need to test this, and then file a YARN jira. Essentially - I suspect 
the localizer does not use the JobClient config - so any credentials there will 
not be available to YARN for localization (e.g. client sets up access_key and 
secret_key in config).

> Intermittent failure of ITestS3ATerasortOnS3A: timestamp differences
> 
>
> Key: HADOOP-16644
> URL: https://issues.apache.org/jira/browse/HADOOP-16644
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
> Environment: -Dparallel-tests -DtestsThreadCount=8 
> -Dfailsafe.runOrder=balanced -Ds3guard -Ddynamo -Dscale
> h2. Hypothesis:
> the timestamp of the source file is being picked up from S3Guard, but when 
> the NM does a getFileStatus call, a HEAD check is made -and this (due to the 
> overloaded test system) is out of sync with the listing. S3Guard is updated, 
> the corrected date returned and the localisation fails.
>Reporter: Steve Loughran
>Priority: Major
>
> Terasort of directory committer failing in resource localisaton -the 
> partitions.lst file has a different TS from that expected
> Happens under loaded integration tests (threads = 8; not standalone); 
> non-auth s3guard
> {code}
> 2019-10-08 11:50:29,774 [IPC Server handler 4 on 55983] WARN  
> localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:processHeartbeat(1150)) - { 
> s3a://hwdev-steve-ireland-new/terasort-directory/sortout/_partition.lst, 
> 1570531828143, FILE, null } failed: Resource 
> s3a://hwdev-steve-ireland-new/terasort-directory/sortout/_partition.lst 
> changed on src filesystem (expected 1570531828143, was 1570531828000
> java.io.IOException: Resource 
> s3a://hwdev-steve-ireland-new/terasort-directory/sortout/_partition.lst 
> changed on src filesystem (expected 1570531828143, was 1570531828000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16626) S3A ITestRestrictedReadAccess fails

2019-10-03 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944139#comment-16944139
 ] 

Siddharth Seth commented on HADOOP-16626:
-

bq. When you call Configuration.addResource() it reloads all configs, so all 
settings you've previously cleared get set again.
Interesting. Any properties which have explicitly been set using 
config.set(...) are retained after an addResource() call. However, properties 
which have been unset explicitly via conf.unset() are lost of after an 
addResource(). This is probably a bug in 'Configuration'.

For my understanding, this specific call in createConfiguration()
{code}
removeBucketOverrides(bucketName, conf,
S3_METADATA_STORE_IMPL,
METADATASTORE_AUTHORITATIVE);
{code}
All the unsets it does are lost, and somehow in your config files you have 
bucket level overrides set up, which are lost as a result?

> S3A ITestRestrictedReadAccess fails
> ---
>
> Key: HADOOP-16626
> URL: https://issues.apache.org/jira/browse/HADOOP-16626
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Steve Loughran
>Priority: Major
>
> Just tried running the S3A test suite. Consistently seeing the following.
> Command used 
> {code}
> mvn -T 1C  verify -Dparallel-tests -DtestsThreadCount=12 -Ds3guard -Dauth 
> -Ddynamo -Dtest=moo -Dit.test=ITestRestrictedReadAccess
> {code}
> cc [~ste...@apache.org]
> {code}
> ---
> Test set: org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess
> ---
> Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 5.335 s <<< 
> FAILURE! - in org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess
> testNoReadAccess[raw](org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess)
>   Time elapsed: 2.841 s  <<< ERROR!
> java.nio.file.AccessDeniedException: 
> test/testNoReadAccess-raw/noReadDir/emptyDir/: getFileStatus on 
> test/testNoReadAccess-raw/noReadDir/emptyDir/: 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> FE8B4D6F25648BCD; S3 Extended Request ID: 
> hgUHzFskU9CcEUT3DxgAkYcWLl6vFoa1k7qXX29cx1u3lpl7RVsWr5rp27/B8s5yjmWvvi6hVgk=),
>  S3 Extended Request ID: 
> hgUHzFskU9CcEUT3DxgAkYcWLl6vFoa1k7qXX29cx1u3lpl7RVsWr5rp27/B8s5yjmWvvi6hVgk=:403
>  Forbidden
> at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:244)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2777)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2705)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2589)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:2377)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$10(S3AFileSystem.java:2356)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:110)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2356)
> at 
> org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess.checkBasicFileOperations(ITestRestrictedReadAccess.java:360)
> at 
> org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess.testNoReadAccess(ITestRestrictedReadAccess.java:282)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 

[jira] [Resolved] (HADOOP-16599) Allow a SignerInitializer to be specified along with a Custom Signer

2019-10-02 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HADOOP-16599.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> Allow a SignerInitializer to be specified along with a Custom Signer
> 
>
> Key: HADOOP-16599
> URL: https://issues.apache.org/jira/browse/HADOOP-16599
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Fix For: 3.3.0
>
>
> HADOOP-16445 added support for custom signers. This is a follow up to allow 
> for an Initializer to be specified along with the Custom Signer, for any 
> initialization etc that is required by the custom signer specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16626) S3A ITestRestrictedReadAccess fails

2019-10-02 Thread Siddharth Seth (Jira)
Siddharth Seth created HADOOP-16626:
---

 Summary: S3A ITestRestrictedReadAccess fails
 Key: HADOOP-16626
 URL: https://issues.apache.org/jira/browse/HADOOP-16626
 Project: Hadoop Common
  Issue Type: Test
  Components: fs/s3
Reporter: Siddharth Seth


Just tried running the S3A test suite. Consistently seeing the following.
Command used 
{code}
mvn -T 1C  verify -Dparallel-tests -DtestsThreadCount=12 -Ds3guard -Dauth 
-Ddynamo -Dtest=moo -Dit.test=ITestRestrictedReadAccess
{code}

cc [~ste...@apache.org]

{code}
---
Test set: org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess
---
Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 5.335 s <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess
testNoReadAccess[raw](org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess)  
Time elapsed: 2.841 s  <<< ERROR!
java.nio.file.AccessDeniedException: 
test/testNoReadAccess-raw/noReadDir/emptyDir/: getFileStatus on 
test/testNoReadAccess-raw/noReadDir/emptyDir/: 
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: FE8B4D6F25648BCD; 
S3 Extended Request ID: 
hgUHzFskU9CcEUT3DxgAkYcWLl6vFoa1k7qXX29cx1u3lpl7RVsWr5rp27/B8s5yjmWvvi6hVgk=), 
S3 Extended Request ID: 
hgUHzFskU9CcEUT3DxgAkYcWLl6vFoa1k7qXX29cx1u3lpl7RVsWr5rp27/B8s5yjmWvvi6hVgk=:403
 Forbidden
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:244)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2777)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2705)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2589)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:2377)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$10(S3AFileSystem.java:2356)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:110)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2356)
at 
org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess.checkBasicFileOperations(ITestRestrictedReadAccess.java:360)
at 
org.apache.hadoop.fs.s3a.auth.ITestRestrictedReadAccess.testNoReadAccess(ITestRestrictedReadAccess.java:282)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden 
(Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
FE8B4D6F25648BCD; S3 Extended Request ID: 
hgUHzFskU9CcEUT3DxgAkYcWLl6vFoa1k7qXX29cx1u3lpl7RVsWr5rp27/B8s5yjmWvvi6hVgk=), 
S3 Extended Request ID: 
hgUHzFskU9CcEUT3DxgAkYcWLl6vFoa1k7qXX29cx1u3lpl7RVsWr5rp27/B8s5yjmWvvi6hVgk=
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at 

[jira] [Updated] (HADOOP-16599) Allow a SignerInitializer to be specified along with a Custom Signer

2019-09-24 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16599:

Environment: (was: A)

> Allow a SignerInitializer to be specified along with a Custom Signer
> 
>
> Key: HADOOP-16599
> URL: https://issues.apache.org/jira/browse/HADOOP-16599
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
>
> HADOOP-16445 added support for custom signers. This is a follow up to allow 
> for an Initializer to be specified along with the Custom Signer, for any 
> initialization etc that is required by the custom signer specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16599) Allow a SignerInitializer to be specified along with a Custom Signer

2019-09-24 Thread Siddharth Seth (Jira)
Siddharth Seth created HADOOP-16599:
---

 Summary: Allow a SignerInitializer to be specified along with a 
Custom Signer
 Key: HADOOP-16599
 URL: https://issues.apache.org/jira/browse/HADOOP-16599
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
 Environment: A
Reporter: Siddharth Seth
Assignee: Siddharth Seth


HADOOP-16445 added support for custom signers. This is a follow up to allow for 
an Initializer to be specified along with the Custom Signer, for any 
initialization etc that is required by the custom signer specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16586) ITestS3GuardFsck, others fails when run using a local metastore

2019-09-24 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936511#comment-16936511
 ] 

Siddharth Seth commented on HADOOP-16586:
-

Updated traces in the description.

> ITestS3GuardFsck, others fails when run using a local metastore
> ---
>
> Key: HADOOP-16586
> URL: https://issues.apache.org/jira/browse/HADOOP-16586
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Reporter: Siddharth Seth
>Priority: Major
>
> Most of these tests fail if running against a local metastore with a 
> ClassCastException.
> Not sure if these tests are intended to work with dynamo only. The fix 
> (either ignore in case of other metastores or fix the test) would depend on 
> the original intent.
> {code}
> ---
> Test set: org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
> ---
> Tests run: 12, Failures: 0, Errors: 11, Skipped: 1, Time elapsed: 34.653 s 
> <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
> testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)
>   Time elapsed: 3.237 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:190)
> testIDetectDirInS3FileInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 1.827 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectDirInS3FileInMs(ITestS3GuardFsck.java:214)
> testIDetectLengthMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
> Time elapsed: 2.819 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectLengthMismatch(ITestS3GuardFsck.java:311)
> testIEtagMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time 
> elapsed: 2.832 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIEtagMismatch(ITestS3GuardFsck.java:373)
> testIDetectFileInS3DirInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 2.752 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectFileInS3DirInMs(ITestS3GuardFsck.java:238)
> testIDetectModTimeMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 4.103 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectModTimeMismatch(ITestS3GuardFsck.java:346)
> testIDetectNoMetadataEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 3.017 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoMetadataEntry(ITestS3GuardFsck.java:113)
> testIDetectNoParentEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
> Time elapsed: 2.821 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoParentEntry(ITestS3GuardFsck.java:136)
> testINoEtag(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 
> 4.493 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testINoEtag(ITestS3GuardFsck.java:403)
> testIDetectParentIsAFile(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
> Time elapsed: 2.782 s  <<< ERROR!
> java.lang.ClassCastException: 
> 

[jira] [Updated] (HADOOP-16586) ITestS3GuardFsck, others fails when run using a local metastore

2019-09-24 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16586:

Description: 
Most of these tests fail if running against a local metastore with a 
ClassCastException.

Not sure if these tests are intended to work with dynamo only. The fix (either 
ignore in case of other metastores or fix the test) would depend on the 
original intent.

{code}
---
Test set: org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
---
Tests run: 12, Failures: 0, Errors: 11, Skipped: 1, Time elapsed: 34.653 s <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.237 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:190)

testIDetectDirInS3FileInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 1.827 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectDirInS3FileInMs(ITestS3GuardFsck.java:214)

testIDetectLengthMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.819 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectLengthMismatch(ITestS3GuardFsck.java:311)

testIEtagMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time 
elapsed: 2.832 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIEtagMismatch(ITestS3GuardFsck.java:373)

testIDetectFileInS3DirInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.752 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectFileInS3DirInMs(ITestS3GuardFsck.java:238)

testIDetectModTimeMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 4.103 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectModTimeMismatch(ITestS3GuardFsck.java:346)

testIDetectNoMetadataEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.017 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoMetadataEntry(ITestS3GuardFsck.java:113)

testIDetectNoParentEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.821 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoParentEntry(ITestS3GuardFsck.java:136)

testINoEtag(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 
4.493 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testINoEtag(ITestS3GuardFsck.java:403)

testIDetectParentIsAFile(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.782 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentIsAFile(ITestS3GuardFsck.java:163)

testTombstonedInMsNotDeletedInS3(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)
  Time elapsed: 3.008 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 

[jira] [Updated] (HADOOP-16586) ITestS3GuardFsck, others fails when run using a local metastore

2019-09-24 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16586:

Description: 
Most of these tests fail if running against a local metastore with a 
ClassCastException.

Not sure if these tests are intended to work with dynamo only. The fix (either 
ignore in case of other metastores or fix the test) would depend on the 
original intent.

{code}
---
Test set: org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
---
Tests run: 12, Failures: 0, Errors: 11, Skipped: 1, Time elapsed: 34.653 s <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.237 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:190)

testIDetectDirInS3FileInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 1.827 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectDirInS3FileInMs(ITestS3GuardFsck.java:214)

testIDetectLengthMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.819 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectLengthMismatch(ITestS3GuardFsck.java:311)

testIEtagMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time 
elapsed: 2.832 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIEtagMismatch(ITestS3GuardFsck.java:373)

testIDetectFileInS3DirInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.752 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectFileInS3DirInMs(ITestS3GuardFsck.java:238)

testIDetectModTimeMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 4.103 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectModTimeMismatch(ITestS3GuardFsck.java:346)

testIDetectNoMetadataEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.017 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoMetadataEntry(ITestS3GuardFsck.java:113)

testIDetectNoParentEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.821 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoParentEntry(ITestS3GuardFsck.java:136)

testINoEtag(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 
4.493 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testINoEtag(ITestS3GuardFsck.java:403)

testIDetectParentIsAFile(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.782 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentIsAFile(ITestS3GuardFsck.java:163)

testTombstonedInMsNotDeletedInS3(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)
  Time elapsed: 3.008 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 

[jira] [Updated] (HADOOP-16586) ITestS3GuardFsck, others fails when run using a local metastore

2019-09-24 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16586:

Summary: ITestS3GuardFsck, others fails when run using a local metastore  
(was: ITestS3GuardFsck, others fails when nun using a local metastore)

> ITestS3GuardFsck, others fails when run using a local metastore
> ---
>
> Key: HADOOP-16586
> URL: https://issues.apache.org/jira/browse/HADOOP-16586
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Reporter: Siddharth Seth
>Priority: Major
>
> Most of these tests fail if running against a local metastore with a 
> ClassCastException.
> Not sure if these tests are intended to work with dynamo only. The fix 
> (either ignore in case of other metastores or fix the test) would depend on 
> the original intent.
> {code}
> ---
> Test set: org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
> ---
> Tests run: 12, Failures: 0, Errors: 11, Skipped: 1, Time elapsed: 34.653 s 
> <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
> testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)
>   Time elapsed: 3.237 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:190)
> testIDetectDirInS3FileInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 1.827 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectDirInS3FileInMs(ITestS3GuardFsck.java:214)
> testIDetectLengthMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
> Time elapsed: 2.819 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectLengthMismatch(ITestS3GuardFsck.java:311)
> testIEtagMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time 
> elapsed: 2.832 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIEtagMismatch(ITestS3GuardFsck.java:373)
> testIDetectFileInS3DirInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 2.752 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectFileInS3DirInMs(ITestS3GuardFsck.java:238)
> testIDetectModTimeMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 4.103 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectModTimeMismatch(ITestS3GuardFsck.java:346)
> testIDetectNoMetadataEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 3.017 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoMetadataEntry(ITestS3GuardFsck.java:113)
> testIDetectNoParentEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
> Time elapsed: 2.821 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoParentEntry(ITestS3GuardFsck.java:136)
> testINoEtag(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 
> 4.493 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testINoEtag(ITestS3GuardFsck.java:403)
> testIDetectParentIsAFile(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  

[jira] [Updated] (HADOOP-16445) Allow separate custom signing algorithms for S3 and DDB

2019-09-21 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16445:

Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Allow separate custom signing algorithms for S3 and DDB
> ---
>
> Key: HADOOP-16445
> URL: https://issues.apache.org/jira/browse/HADOOP-16445
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16445.01.patch, HADOOP-16445.02.patch
>
>
> fs.s3a.signing-algorithm allows overriding the signer. This applies to both 
> the S3 and DDB clients. Need to be able to specify separate signing algorithm 
> overrides for S3 and DDB.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16591) S3A ITest*MRjob failures

2019-09-20 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16591:

Status: Patch Available  (was: Open)

> S3A ITest*MRjob failures
> 
>
> Key: HADOOP-16591
> URL: https://issues.apache.org/jira/browse/HADOOP-16591
> Project: Hadoop Common
>  Issue Type: Test
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
>
> ITest*MRJob fail with a FileNotFoundException
> {code}
> [ERROR]   
> ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
>  » FileNotFound
> [ERROR]   
> ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
>  » FileNotFound
> [ERROR]   
> ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
>  » FileNotFound
> [ERROR]   
> ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
>  » FileNotFound
> {code}
> Details here: 
> https://issues.apache.org/jira/browse/HADOOP-16207?focusedCommentId=16933718=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16933718
> Creating a separate jira since HADOOP-16207 already has a patch which is 
> trying to parallelize the test runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-20 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934667#comment-16934667
 ] 

Siddharth Seth edited comment on HADOOP-16207 at 9/20/19 7:13 PM:
--

-Attached a simple patch which fixes just the test failures. Doesn't do 
anything with parallelism, changing dir names to be different across tests etc. 
Can submit this in a separate jira, if this one is being used for parallelizing 
the tests.-
Switched to https://issues.apache.org/jira/browse/HADOOP-16591


was (Author: sseth):
Attached a simple patch which fixes just the test failures. Doesn't do anything 
with parallelism, changing dir names to be different across tests etc. Can 
submit this in a separate jira, if this one is being used for parallelizing the 
tests.

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as deleted by S3Guard"
> {code}
> waitForConsistency();
> assertIsDirectory(outputPath) /* here */
> {code}
> The file is there but there's a tombstone. Possibilities
> * some race condition with another test
> * tombstones aren't timing out
> * committers aren't creating that base dir in a way which cleans up S3Guard's 
> tombstones. 
> Remember: we do have to delete that dest dir before the committer runs unless 
> overwrite==true, so at the start of the run there will be a tombstone. It 
> should be overwritten by a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-20 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16207:

Attachment: (was: HADOOP-16207.fixtestsonly.txt)

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as deleted by S3Guard"
> {code}
> waitForConsistency();
> assertIsDirectory(outputPath) /* here */
> {code}
> The file is there but there's a tombstone. Possibilities
> * some race condition with another test
> * tombstones aren't timing out
> * committers aren't creating that base dir in a way which cleans up S3Guard's 
> tombstones. 
> Remember: we do have to delete that dest dir before the committer runs unless 
> overwrite==true, so at the start of the run there will be a tombstone. It 
> should be overwritten by a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16591) S3A ITest*MRjob failures

2019-09-20 Thread Siddharth Seth (Jira)
Siddharth Seth created HADOOP-16591:
---

 Summary: S3A ITest*MRjob failures
 Key: HADOOP-16591
 URL: https://issues.apache.org/jira/browse/HADOOP-16591
 Project: Hadoop Common
  Issue Type: Test
  Components: fs/s3
Reporter: Siddharth Seth
Assignee: Siddharth Seth


ITest*MRJob fail with a FileNotFoundException
{code}
[ERROR]   
ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
{code}
Details here: 
https://issues.apache.org/jira/browse/HADOOP-16207?focusedCommentId=16933718=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16933718

Creating a separate jira since HADOOP-16207 already has a patch which is trying 
to parallelize the test runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16583) Minor fixes to S3 testing instructions

2019-09-20 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16583:

Resolution: Invalid
Status: Resolved  (was: Patch Available)

The specific changes are not required, and work as is.

> Minor fixes to S3 testing instructions
> --
>
> Key: HADOOP-16583
> URL: https://issues.apache.org/jira/browse/HADOOP-16583
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Minor
>
> testing.md has some instructions which don't work any longer, and needs an 
> update.
> Specifically - how to enable s3guard and switch between dynamodb and localdb 
> as the store.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-20 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934667#comment-16934667
 ] 

Siddharth Seth commented on HADOOP-16207:
-

Attached a simple patch which fixes just the test failures. Doesn't do anything 
with parallelism, changing dir names to be different across tests etc. Can 
submit this in a separate jira, if this one is being used for parallelizing the 
tests.

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-16207.fixtestsonly.txt
>
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as deleted by S3Guard"
> {code}
> waitForConsistency();
> assertIsDirectory(outputPath) /* here */
> {code}
> The file is there but there's a tombstone. Possibilities
> * some race condition with another test
> * tombstones aren't timing out
> * committers aren't creating that base dir in a way which cleans up S3Guard's 
> tombstones. 
> Remember: we do have to delete that dest dir before the committer runs unless 
> overwrite==true, so at the start of the run there will be a tombstone. It 
> should be overwritten by a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-20 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16207:

Attachment: HADOOP-16207.fixtestsonly.txt

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-16207.fixtestsonly.txt
>
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as deleted by S3Guard"
> {code}
> waitForConsistency();
> assertIsDirectory(outputPath) /* here */
> {code}
> The file is there but there's a tombstone. Possibilities
> * some race condition with another test
> * tombstones aren't timing out
> * committers aren't creating that base dir in a way which cleans up S3Guard's 
> tombstones. 
> Remember: we do have to delete that dest dir before the committer runs unless 
> overwrite==true, so at the start of the run there will be a tombstone. It 
> should be overwritten by a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-19 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934024#comment-16934024
 ] 

Siddharth Seth edited comment on HADOOP-16207 at 9/20/19 4:00 AM:
--

Also, to run the tests in parallel - the jobs need to start using a different 
directory name. Currently, all of them use testMRJob (The method name in the 
common class that all tests inherit from).
The issue with the local dir conflict is a MR configuration afaik (Likely the 
MR tmp dir config property). YARN clusters should already be able to run in 
parallel (different ports, random dir names, etc)
I'd also be careful trying to run too many of these in parallel, given the 
amount of memory they consume. Maybe a different parallelism flag for any tests 
running on a cluster?
How about simplifying the code and letting the tests reside in the same class, 
which makes the code easier to read and allows sharing a cluster more easily. 
Haven't seen the WIP patch - but sharing a cluster across different tests, 
which may or may not trigger at the same time seems like it may cause problems.
The tests also use a 1 s sleep for the InconsistentFS to get into a consistent 
state. That can lead to flakiness in the tests. A higher sleep, unless 
InconsistentFS can be set up with an actual waitForConsistency method which is 
not time based.


was (Author: sseth):
Also, to run the tests in parallel - the jobs need to start using a different 
directory name. Currently, all of them use testMRJob (The method name in the 
common class that all tests inherit from).
The issue with the local dir conflict is a MR configuration afaik (Likely the 
MR tmp dir config property). YARN clusters should already be able to run in 
parallel (different ports, random dir names, etc)
I'd also be careful trying to run too many of these in parallel, given the 
amount of memory they consume. Maybe a different parallelism flag for any tests 
running on a cluster?

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as deleted by S3Guard"
> {code}
> waitForConsistency();
> assertIsDirectory(outputPath) /* here */
> {code}
> The file is there but there's a tombstone. Possibilities
> * some race condition with another test
> * tombstones aren't timing out
> * committers aren't creating that base dir in a way which cleans up S3Guard's 
> tombstones. 
> Remember: we do have to delete that dest dir before the committer runs unless 
> overwrite==true, so at the start of the run there will be a tombstone. It 
> should be overwritten by a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-19 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934024#comment-16934024
 ] 

Siddharth Seth edited comment on HADOOP-16207 at 9/20/19 3:58 AM:
--

Also, to run the tests in parallel - the jobs need to start using a different 
directory name. Currently, all of them use testMRJob (The method name in the 
common class that all tests inherit from).
The issue with the local dir conflict is a MR configuration afaik (Likely the 
MR tmp dir config property). YARN clusters should already be able to run in 
parallel (different ports, random dir names, etc)
I'd also be careful trying to run too many of these in parallel, given the 
amount of memory they consume. Maybe a different parallelism flag for any tests 
running on a cluster?


was (Author: sseth):
Also, to run the tests in parallel - the jobs need to start using a different 
directory name. Currently, all of them use testMRJob (The method name in the 
common class that all tests inherit from).
The issue with the local dir conflict is a MR configuration afaik (Likely the 
MR tmp dir config property). YARN clusters should already be able to run in 
parallel (different ports, random dir names, etc)

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as deleted by S3Guard"
> {code}
> waitForConsistency();
> assertIsDirectory(outputPath) /* here */
> {code}
> The file is there but there's a tombstone. Possibilities
> * some race condition with another test
> * tombstones aren't timing out
> * committers aren't creating that base dir in a way which cleans up S3Guard's 
> tombstones. 
> Remember: we do have to delete that dest dir before the committer runs unless 
> overwrite==true, so at the start of the run there will be a tombstone. It 
> should be overwritten by a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-19 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934024#comment-16934024
 ] 

Siddharth Seth commented on HADOOP-16207:
-

Also, to run the tests in parallel - the jobs need to start using a different 
directory name. Currently, all of them use testMRJob (The method name in the 
common class that all tests inherit from).
The issue with the local dir conflict is a MR configuration afaik (Likely the 
MR tmp dir config property). YARN clusters should already be able to run in 
parallel (different ports, random dir names, etc)

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as deleted by S3Guard"
> {code}
> waitForConsistency();
> assertIsDirectory(outputPath) /* here */
> {code}
> The file is there but there's a tombstone. Possibilities
> * some race condition with another test
> * tombstones aren't timing out
> * committers aren't creating that base dir in a way which cleans up S3Guard's 
> tombstones. 
> Remember: we do have to delete that dest dir before the committer runs unless 
> overwrite==true, so at the start of the run there will be a tombstone. It 
> should be overwritten by a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-19 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933718#comment-16933718
 ] 

Siddharth Seth edited comment on HADOOP-16207 at 9/19/19 8:02 PM:
--

Seeing several MR job failures when running tests on HADOOP-16445.

{code}
[ERROR]   
ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
{code}
always fail when run with -Ds3guard -Ddynamo -Dauth (These fail when starting 
with a clean DDB table as well)

The test setup seems broken to me.
* Cluster set up happens with createCluster(new JobConf())
* After this, AbstractITCommitMRJob creates the MRJob with 
Job.getInstance(getClusterBinding().getConf() ... -> This will end up using the 
previously created JobConf
* JobConf will only read core-site.xml ... so the command line parameters 
-Ds3guard, -Ddynamo -Dauth don't make a difference.

Adding fs.s3a.metadatastore.authoritative=true, 
fs.s3a.metadatastore.impl=org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
 in auth-keys.xml or core-site.xml fixed all the test failures for me. (With 
the additions, the JobConf used by the cluster has these configs, and the tests 
do what they're supposed to).

That isn't the correct fix though. Making sure the test configuration is used 
to create the JobConf for the cluster and jobs would allow the test properties 
to work.

That said, I did see 3 empty (and marked as deleted) files - part_, 
part_0001, _SUCCESS in the s3guard table. I suspect this is a result of the 
committer trying to access a file on the client, getting a cached FileSystem 
instance (same UGI), and the getFileStatus (maybe) creates these S3Guard DDB 
entries?


was (Author: sseth):
Seeing several MR job failures when running tests on HADOOP-16445.

{code}
[ERROR]   
ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
{code}
always fail when run with -Ds3guard -Ddynamo -Dauth (These fail when starting 
with a clean DDB table as well)

The test setup seems broken to me.
* Cluster set up happens with createCluster(new JobConf())
* After this, AbstractITCommitMRJob creates the MRJob with 
Job.getInstance(getClusterBinding().getConf() ... -> This will end up using the 
previously created JobConf
* JobConf will only read core-site.xml ... so the command line parameters 
-Ds3guard, -Ddynamo -Dauth don't make a difference.

Adding fs.s3a.metadatastore.authoritative=true, 
fs.s3a.metadatastore.impl=org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
 in auth-keys.xml or core-site.xml fixed all the test failures for me. (With 
the additions, the JobConf used by the cluster has these configs, and the tests 
do what they're supposed to).

That isn't the correct fix though. Making sure the test configuration is used 
to create the JobConf for the cluster and jobs would allow the test properties 
to work.

That said, I did see 3 empty (and marked as deleted) files - part_, 
part_0001, _SUCCESS in the s3guard table. I suspect this is a result of the 
committer trying to access a file on the client, getting a cached FileSystem 
instance (same UGI), and the getFileStatus (maybe) creates these S3Guard DDB 
entries?

[~gabor.bota] - do you remember if you were you seeing failures on a single 
test only, and did it pass in non-parallel mode? (did the other tests exist 
when the jira was filed)

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as 

[jira] [Comment Edited] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-19 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933718#comment-16933718
 ] 

Siddharth Seth edited comment on HADOOP-16207 at 9/19/19 8:01 PM:
--

Seeing several MR job failures when running tests on HADOOP-16445.

{code}
[ERROR]   
ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
{code}
always fail when run with -Ds3guard -Ddynamo -Dauth (These fail when starting 
with a clean DDB table as well)

The test setup seems broken to me.
* Cluster set up happens with createCluster(new JobConf())
* After this, AbstractITCommitMRJob creates the MRJob with 
Job.getInstance(getClusterBinding().getConf() ... -> This will end up using the 
previously created JobConf
* JobConf will only read core-site.xml ... so the command line parameters 
-Ds3guard, -Ddynamo -Dauth don't make a difference.

Adding fs.s3a.metadatastore.authoritative=true, 
fs.s3a.metadatastore.impl=org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
 in auth-keys.xml or core-site.xml fixed all the test failures for me. (With 
the additions, the JobConf used by the cluster has these configs, and the tests 
do what they're supposed to).

That isn't the correct fix though. Making sure the test configuration is used 
to create the JobConf for the cluster and jobs would allow the test properties 
to work.

That said, I did see 3 empty (and marked as deleted) files - part_, 
part_0001, _SUCCESS in the s3guard table. I suspect this is a result of the 
committer trying to access a file on the client, getting a cached FileSystem 
instance (same UGI), and the getFileStatus (maybe) creates these S3Guard DDB 
entries?

[~gabor.bota] - do you remember if you were you seeing failures on a single 
test only, and did it pass in non-parallel mode? (did the other tests exist 
when the jira was filed)


was (Author: sseth):
Seeing several MR job failures when running tests on HADOOP-16445.

{code}
[ERROR]   
ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
{code}
always fail when run with -Ds3guard -Ddynamo -Dauth (These fail when starting 
with a clean DDB table as well)

The test setup seems broken to me.
* Cluster set up happens with createCluster(new JobConf())
* After this, AbstractITCommitMRJob creates the MRJob with 
Job.getInstance(getClusterBinding().getConf() ... -> This will end up using the 
previously created JobConf
* JobConf will only read core-site.xml ... so the command line parameters 
-Ds3guard, -Ddynamo -Dauth don't make a difference.

Adding fs.s3a.metadatastore.authoritative=true, 
fs.s3a.metadatastore.impl=org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
 in auth-keys.xml or core-site.xml fixed all the test failures for me. (With 
the additions, the JobConf used by the cluster has these configs, and the tests 
do what they're supposed to).

That isn't the correct fix though. Making sure the test configuration is used 
to create the JobConf for the cluster and jobs would allow the test properties 
to work.

That said, I did see 3 empty (and marked as deleted) files - part_, 
part_0001, _SUCCESS in the s3guard table. I suspect this is a result of the 
committer trying to access a file on the client, getting a cached FileSystem 
instance (same UGI), and the getFileStatus (maybe) creates these S3Guard DDB 
entries?

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as 

[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob

2019-09-19 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933718#comment-16933718
 ] 

Siddharth Seth commented on HADOOP-16207:
-

Seeing several MR job failures when running tests on HADOOP-16445.

{code}
[ERROR]   
ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
[ERROR]   
ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327
 » FileNotFound
{code}
always fail when run with -Ds3guard -Ddynamo -Dauth (These fail when starting 
with a clean DDB table as well)

The test setup seems broken to me.
* Cluster set up happens with createCluster(new JobConf())
* After this, AbstractITCommitMRJob creates the MRJob with 
Job.getInstance(getClusterBinding().getConf() ... -> This will end up using the 
previously created JobConf
* JobConf will only read core-site.xml ... so the command line parameters 
-Ds3guard, -Ddynamo -Dauth don't make a difference.

Adding fs.s3a.metadatastore.authoritative=true, 
fs.s3a.metadatastore.impl=org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
 in auth-keys.xml or core-site.xml fixed all the test failures for me. (With 
the additions, the JobConf used by the cluster has these configs, and the tests 
do what they're supposed to).

That isn't the correct fix though. Making sure the test configuration is used 
to create the JobConf for the cluster and jobs would allow the test properties 
to work.

That said, I did see 3 empty (and marked as deleted) files - part_, 
part_0001, _SUCCESS in the s3guard table. I suspect this is a result of the 
committer trying to access a file on the client, getting a cached FileSystem 
instance (same UGI), and the getFileStatus (maybe) creates these S3Guard DDB 
entries?

> Fix ITestDirectoryCommitMRJob.testMRJob
> ---
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of 
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: 
> Path "is recorded as deleted by S3Guard"
> {code}
> waitForConsistency();
> assertIsDirectory(outputPath) /* here */
> {code}
> The file is there but there's a tombstone. Possibilities
> * some race condition with another test
> * tombstones aren't timing out
> * committers aren't creating that base dir in a way which cleans up S3Guard's 
> tombstones. 
> Remember: we do have to delete that dest dir before the committer runs unless 
> overwrite==true, so at the start of the run there will be a tombstone. It 
> should be overwritten by a success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16586) ITestS3GuardFsck, others fails when nun using a local metastore

2019-09-18 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16586:

Description: 
Most of these tests fail if running against a local metastore with a 
ClassCastException.

Not sure if these tests are intended to work with dynamo only. The fix (either 
ignore in case of other metastores or fix the test) would depend on the 
original intent.

{code}
---
Test set: org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
---
Tests run: 12, Failures: 0, Errors: 11, Skipped: 1, Time elapsed: 34.653 s <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.237 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:190)

testIDetectDirInS3FileInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 1.827 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectDirInS3FileInMs(ITestS3GuardFsck.java:214)

testIDetectLengthMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.819 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectLengthMismatch(ITestS3GuardFsck.java:311)

testIEtagMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time 
elapsed: 2.832 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIEtagMismatch(ITestS3GuardFsck.java:373)

testIDetectFileInS3DirInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.752 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectFileInS3DirInMs(ITestS3GuardFsck.java:238)

testIDetectModTimeMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 4.103 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectModTimeMismatch(ITestS3GuardFsck.java:346)

testIDetectNoMetadataEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.017 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoMetadataEntry(ITestS3GuardFsck.java:113)

testIDetectNoParentEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.821 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoParentEntry(ITestS3GuardFsck.java:136)

testINoEtag(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 
4.493 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testINoEtag(ITestS3GuardFsck.java:403)

testIDetectParentIsAFile(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.782 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentIsAFile(ITestS3GuardFsck.java:163)

testTombstonedInMsNotDeletedInS3(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)
  Time elapsed: 3.008 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 

[jira] [Updated] (HADOOP-16586) ITestS3GuardFsck, others fails when nun using a local metastore

2019-09-18 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16586:

Description: 
Most of these tests fail if running against a local metastore with a 
ClassCastException.

Not sure if these tests are intended to work with dynamo only. The fix (either 
ignore in case of other metastores or fix the test) would depend on the 
original intent.

{code}
---
Test set: org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
---
Tests run: 12, Failures: 0, Errors: 11, Skipped: 1, Time elapsed: 34.653 s <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.237 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:190)

testIDetectDirInS3FileInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 1.827 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectDirInS3FileInMs(ITestS3GuardFsck.java:214)

testIDetectLengthMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.819 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectLengthMismatch(ITestS3GuardFsck.java:311)

testIEtagMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time 
elapsed: 2.832 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIEtagMismatch(ITestS3GuardFsck.java:373)

testIDetectFileInS3DirInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.752 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectFileInS3DirInMs(ITestS3GuardFsck.java:238)

testIDetectModTimeMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 4.103 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectModTimeMismatch(ITestS3GuardFsck.java:346)

testIDetectNoMetadataEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.017 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoMetadataEntry(ITestS3GuardFsck.java:113)

testIDetectNoParentEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.821 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoParentEntry(ITestS3GuardFsck.java:136)

testINoEtag(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 
4.493 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testINoEtag(ITestS3GuardFsck.java:403)

testIDetectParentIsAFile(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.782 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentIsAFile(ITestS3GuardFsck.java:163)

testTombstonedInMsNotDeletedInS3(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)
  Time elapsed: 3.008 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 

[jira] [Updated] (HADOOP-16586) ITestS3GuardFsck, others fails when nun using a local metastore

2019-09-18 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16586:

Summary: ITestS3GuardFsck, others fails when nun using a local metastore  
(was: ITestS3GuardFsck fails when nun using a local metastore)

> ITestS3GuardFsck, others fails when nun using a local metastore
> ---
>
> Key: HADOOP-16586
> URL: https://issues.apache.org/jira/browse/HADOOP-16586
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Reporter: Siddharth Seth
>Priority: Major
>
> Most of these tests fail if running against a local metastore with a 
> ClassCastException.
> Not sure if these tests are intended to work with dynamo only. The fix 
> (either ignore in case of other metastores or fix the test) would depend on 
> the original intent.
> {code}
> ---
> Test set: org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
> ---
> Tests run: 12, Failures: 0, Errors: 11, Skipped: 1, Time elapsed: 34.653 s 
> <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
> testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)
>   Time elapsed: 3.237 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:190)
> testIDetectDirInS3FileInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 1.827 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectDirInS3FileInMs(ITestS3GuardFsck.java:214)
> testIDetectLengthMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
> Time elapsed: 2.819 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectLengthMismatch(ITestS3GuardFsck.java:311)
> testIEtagMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time 
> elapsed: 2.832 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIEtagMismatch(ITestS3GuardFsck.java:373)
> testIDetectFileInS3DirInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 2.752 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectFileInS3DirInMs(ITestS3GuardFsck.java:238)
> testIDetectModTimeMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 4.103 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectModTimeMismatch(ITestS3GuardFsck.java:346)
> testIDetectNoMetadataEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) 
>  Time elapsed: 3.017 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoMetadataEntry(ITestS3GuardFsck.java:113)
> testIDetectNoParentEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
> Time elapsed: 2.821 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoParentEntry(ITestS3GuardFsck.java:136)
> testINoEtag(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 
> 4.493 s  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
>   at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testINoEtag(ITestS3GuardFsck.java:403)
> testIDetectParentIsAFile(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
> Time 

[jira] [Created] (HADOOP-16586) ITestS3GuardFsck fails when nun using a local metastore

2019-09-18 Thread Siddharth Seth (Jira)
Siddharth Seth created HADOOP-16586:
---

 Summary: ITestS3GuardFsck fails when nun using a local metastore
 Key: HADOOP-16586
 URL: https://issues.apache.org/jira/browse/HADOOP-16586
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Siddharth Seth


Most of these tests fail if running against a local metastore with a 
ClassCastException.

Not sure if these tests are intended to work with dynamo only. The fix (either 
ignore in case of other metastores or fix the test) would depend on the 
original intent.

{code}
---
Test set: org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
---
Tests run: 12, Failures: 0, Errors: 11, Skipped: 1, Time elapsed: 34.653 s <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.237 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:190)

testIDetectDirInS3FileInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 1.827 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectDirInS3FileInMs(ITestS3GuardFsck.java:214)

testIDetectLengthMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.819 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectLengthMismatch(ITestS3GuardFsck.java:311)

testIEtagMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time 
elapsed: 2.832 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIEtagMismatch(ITestS3GuardFsck.java:373)

testIDetectFileInS3DirInMs(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.752 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectFileInS3DirInMs(ITestS3GuardFsck.java:238)

testIDetectModTimeMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 4.103 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectModTimeMismatch(ITestS3GuardFsck.java:346)

testIDetectNoMetadataEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 3.017 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoMetadataEntry(ITestS3GuardFsck.java:113)

testIDetectNoParentEntry(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.821 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectNoParentEntry(ITestS3GuardFsck.java:136)

testINoEtag(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 
4.493 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testINoEtag(ITestS3GuardFsck.java:403)

testIDetectParentIsAFile(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  
Time elapsed: 2.782 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore cannot be cast to 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
  at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentIsAFile(ITestS3GuardFsck.java:163)

testTombstonedInMsNotDeletedInS3(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)
  Time elapsed: 3.008 s  <<< ERROR!
java.lang.ClassCastException: 
org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore 

[jira] [Updated] (HADOOP-16583) Minor fixes to S3 testing instructions

2019-09-18 Thread Siddharth Seth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16583:

Status: Patch Available  (was: Open)

> Minor fixes to S3 testing instructions
> --
>
> Key: HADOOP-16583
> URL: https://issues.apache.org/jira/browse/HADOOP-16583
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Minor
>
> testing.md has some instructions which don't work any longer, and needs an 
> update.
> Specifically - how to enable s3guard and switch between dynamodb and localdb 
> as the store.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16584) S3A Test failures when S3Guard is not enabled

2019-09-17 Thread Siddharth Seth (Jira)
Siddharth Seth created HADOOP-16584:
---

 Summary: S3A Test failures when S3Guard is not enabled
 Key: HADOOP-16584
 URL: https://issues.apache.org/jira/browse/HADOOP-16584
 Project: Hadoop Common
  Issue Type: Task
  Components: fs/s3
 Environment: S
Reporter: Siddharth Seth


There's several S3 test failures when S3Guard is not enabled.
All of these tests pass once the tests are configured to use S3Guard.

{code}
ITestS3GuardTtl#testListingFilteredExpiredItems
[INFO] Running org.apache.hadoop.fs.s3a.ITestS3GuardTtl
[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 4, Time elapsed: 
102.988 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3GuardTtl
[ERROR] 
testListingFilteredExpiredItems[0](org.apache.hadoop.fs.s3a.ITestS3GuardTtl)  
Time elapsed: 14.675 s  <<< FAILURE!
java.lang.AssertionError:
[Metastrore directory listing of 
s3a://sseth-dev-in/fork-0002/test/testListingFilteredExpiredItems]
Expecting actual not to be null
  at 
org.apache.hadoop.fs.s3a.ITestS3GuardTtl.getDirListingMetadata(ITestS3GuardTtl.java:367)
  at 
org.apache.hadoop.fs.s3a.ITestS3GuardTtl.testListingFilteredExpiredItems(ITestS3GuardTtl.java:335)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
  at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
  at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
  at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
  at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
  at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
  at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
  at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.lang.Thread.run(Thread.java:748)

[ERROR] 
testListingFilteredExpiredItems[1](org.apache.hadoop.fs.s3a.ITestS3GuardTtl)  
Time elapsed: 44.463 s  <<< FAILURE!
java.lang.AssertionError:
[Metastrore directory listing of 
s3a://sseth-dev-in/fork-0002/test/testListingFilteredExpiredItems]
Expecting actual not to be null
  at 
org.apache.hadoop.fs.s3a.ITestS3GuardTtl.getDirListingMetadata(ITestS3GuardTtl.java:367)
  at 
org.apache.hadoop.fs.s3a.ITestS3GuardTtl.testListingFilteredExpiredItems(ITestS3GuardTtl.java:335)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
  at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
  at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
  at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
  at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
  at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
  at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
  at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.lang.Thread.run(Thread.java:748)
{code}

Related to no metastore being used. Test failure happens in teardown with a 
NPE, since the setup did not complete. This one is likely a simple fix with 
some null checks in the teardown method.
 ITestAuthoritativePath (6 failures all with the same pattern)
{code}
  [ERROR] Tests run: 6, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 8.142 
s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestAuthoritativePath
[ERROR] testPrefixVsDirectory(org.apache.hadoop.fs.s3a.ITestAuthoritativePath)  
Time elapsed: 6.821 s  <<< ERROR!
org.junit.AssumptionViolatedException: FS needs to have a metadatastore.
  at org.junit.Assume.assumeTrue(Assume.java:59)
  at 
org.apache.hadoop.fs.s3a.ITestAuthoritativePath.setup(ITestAuthoritativePath.java:63)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 

[jira] [Created] (HADOOP-16583) Minor fixes to S3 testing instructions

2019-09-17 Thread Siddharth Seth (Jira)
Siddharth Seth created HADOOP-16583:
---

 Summary: Minor fixes to S3 testing instructions
 Key: HADOOP-16583
 URL: https://issues.apache.org/jira/browse/HADOOP-16583
 Project: Hadoop Common
  Issue Type: Task
  Components: fs/s3
Reporter: Siddharth Seth
Assignee: Siddharth Seth


testing.md has some instructions which don't work any longer, and needs an 
update.

Specifically - how to enable s3guard and switch between dynamodb and localdb as 
the store.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16538) S3AFilesystem trash handling should respect the current UGI

2019-08-28 Thread Siddharth Seth (Jira)
Siddharth Seth created HADOOP-16538:
---

 Summary: S3AFilesystem trash handling should respect the current 
UGI
 Key: HADOOP-16538
 URL: https://issues.apache.org/jira/browse/HADOOP-16538
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Siddharth Seth


S3 move to trash currently relies upon System.getProperty(user.name). Instead, 
it should be relying on the current UGI to figure out the username.

getHomeDirectory needs to be overridden to use UGI instead of System.getProperty



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16445) Allow separate custom signing algorithms for S3 and DDB

2019-08-21 Thread Siddharth Seth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912592#comment-16912592
 ] 

Siddharth Seth commented on HADOOP-16445:
-

Have posted a PR.

On the STS question - at the moment, it is going to end up using the existing 
configuration parameter - i.e. fs.s3a.signing-algorithm - and the overrides for 
S3A/DDB will not have an affect on this. I could add an override for STS as 
well if that makes sense.

For STS - if fs.s3a.signing-algorithm is not set, the signer is not overridden.
For S3 - If fs.s3a.s3.signing-algorithm is set - the signer is overriden with 
this value. Otherwise the existing behaviour continues (similar to what is 
decsribed for STS above)
For DDB -  If fs.s3a.ddb.signing-algorithm is set - the signer is overriden 
with this value. Otherwise the existing behaviour continues (similar to what is 
decsribed for STS above)

> Allow separate custom signing algorithms for S3 and DDB
> ---
>
> Key: HADOOP-16445
> URL: https://issues.apache.org/jira/browse/HADOOP-16445
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16445.01.patch, HADOOP-16445.02.patch
>
>
> fs.s3a.signing-algorithm allows overriding the signer. This applies to both 
> the S3 and DDB clients. Need to be able to specify separate signing algorithm 
> overrides for S3 and DDB.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16505) Add ability to register custom signer with AWS SignerFactory

2019-08-14 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907417#comment-16907417
 ] 

Siddharth Seth commented on HADOOP-16505:
-

[~viczsaurav] - any thoughts on how this compares to 
https://issues.apache.org/jira/browse/HADOOP-16445

That sets up a new config to register "signerName:signerClass" pairs, instead 
of re-using the current config to allow class names.

> Add ability to register custom signer with AWS SignerFactory
> 
>
> Key: HADOOP-16505
> URL: https://issues.apache.org/jira/browse/HADOOP-16505
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3, hadoop-aws
>Affects Versions: 3.3.0
>Reporter: Saurav Verma
>Assignee: Saurav Verma
>Priority: Major
> Attachments: HADOOP-16505.patch, hadoop-16505-1.patch
>
>
> Currently, the AWS SignerFactory restricts the class of Signer algorithms 
> that can be used. 
> We require an ability to register a custom Signer. The SignerFactory supports 
> this functionality through its {{registerSigner}} method. 
> By providing a fully qualified classname to the existing parameter 
> {{fs.s3a.signing-algorithm}}, the custom signer can be registered.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16449) Allow an empty credential provider chain, separate chains for S3 and DDB

2019-07-26 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16449:

Status: Open  (was: Patch Available)

> Allow an empty credential provider chain, separate chains for S3 and DDB
> 
>
> Key: HADOOP-16449
> URL: https://issues.apache.org/jira/browse/HADOOP-16449
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16449.01.patch
>
>
> Currently, credentials cannot be empty (falls back to using the default 
> chain). Credentials for S3 and DDB are always the same.
> In some cases it can be useful to use a different credential chain for S3 and 
> DDB, as well as allow for an empty credential chain.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16449) Allow an empty credential provider chain, separate chains for S3 and DDB

2019-07-26 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HADOOP-16449.
-
Resolution: Won't Fix

> Allow an empty credential provider chain, separate chains for S3 and DDB
> 
>
> Key: HADOOP-16449
> URL: https://issues.apache.org/jira/browse/HADOOP-16449
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16449.01.patch
>
>
> Currently, credentials cannot be empty (falls back to using the default 
> chain). Credentials for S3 and DDB are always the same.
> In some cases it can be useful to use a different credential chain for S3 and 
> DDB, as well as allow for an empty credential chain.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16449) Allow an empty credential provider chain, separate chains for S3 and DDB

2019-07-26 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894214#comment-16894214
 ] 

Siddharth Seth commented on HADOOP-16449:
-

Abandoning this patch for now. Will re-visit (and re-open) if required, along 
with StoreContext and the delegation token based credentials. Thanks for taking 
a look Steve.

> Allow an empty credential provider chain, separate chains for S3 and DDB
> 
>
> Key: HADOOP-16449
> URL: https://issues.apache.org/jira/browse/HADOOP-16449
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16449.01.patch
>
>
> Currently, credentials cannot be empty (falls back to using the default 
> chain). Credentials for S3 and DDB are always the same.
> In some cases it can be useful to use a different credential chain for S3 and 
> DDB, as well as allow for an empty credential chain.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16445) Allow separate custom signing algorithms for S3 and DDB

2019-07-24 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891640#comment-16891640
 ] 

Siddharth Seth commented on HADOOP-16445:
-

bq. github PRs are how we're reviewing patches no.
Will move this over to a PR

bq. No tests, no review, you know the rules.
The patch does have what I think are unit tests. Do you have some specific 
tests in mind (integration tests?) and pointers on where to add them. A 
possible test would be to set up a custom signer via the new configs and use it.

 bq. Which endpoint did you test against, and, for something going anywhere 
near auth, I'm expecting the SSE-KMS and IAM roles to be tested to, including 
the role delegation tokens. thanks
Not sure why SSE-KMS, IAM roles and delegation tokens need to be tested on this 
patch (HADOOP-16449 is the one which makes changes to the way authN tokens can 
be specified). The patch allows for custom signers, and does not change 
authentication mechanics.



> Allow separate custom signing algorithms for S3 and DDB
> ---
>
> Key: HADOOP-16445
> URL: https://issues.apache.org/jira/browse/HADOOP-16445
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16445.01.patch, HADOOP-16445.02.patch
>
>
> fs.s3a.signing-algorithm allows overriding the signer. This applies to both 
> the S3 and DDB clients. Need to be able to specify separate signing algorithm 
> overrides for S3 and DDB.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16449) Allow an empty credential provider chain, separate chains for S3 and DDB

2019-07-23 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890755#comment-16890755
 ] 

Siddharth Seth commented on HADOOP-16449:
-

The patch allows credentials to be overridden independently for S3 and DDB when 
not using delegation tokens. Also allows for the credential chain to end up 
being empty.

cc [~ste...@apache.org], [~mackrorysd] - please take a look when possible.

> Allow an empty credential provider chain, separate chains for S3 and DDB
> 
>
> Key: HADOOP-16449
> URL: https://issues.apache.org/jira/browse/HADOOP-16449
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16449.01.patch
>
>
> Currently, credentials cannot be empty (falls back to using the default 
> chain). Credentials for S3 and DDB are always the same.
> In some cases it can be useful to use a different credential chain for S3 and 
> DDB, as well as allow for an empty credential chain.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16449) Allow an empty credential provider chain, separate chains for S3 and DDB

2019-07-23 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16449:

Status: Patch Available  (was: Open)

> Allow an empty credential provider chain, separate chains for S3 and DDB
> 
>
> Key: HADOOP-16449
> URL: https://issues.apache.org/jira/browse/HADOOP-16449
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16449.01.patch
>
>
> Currently, credentials cannot be empty (falls back to using the default 
> chain). Credentials for S3 and DDB are always the same.
> In some cases it can be useful to use a different credential chain for S3 and 
> DDB, as well as allow for an empty credential chain.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16449) Allow an empty credential provider chain, separate chains for S3 and DDB

2019-07-23 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16449:

Attachment: HADOOP-16449.01.patch

> Allow an empty credential provider chain, separate chains for S3 and DDB
> 
>
> Key: HADOOP-16449
> URL: https://issues.apache.org/jira/browse/HADOOP-16449
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16449.01.patch
>
>
> Currently, credentials cannot be empty (falls back to using the default 
> chain). Credentials for S3 and DDB are always the same.
> In some cases it can be useful to use a different credential chain for S3 and 
> DDB, as well as allow for an empty credential chain.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16449) Allow an empty credential provider chain, separate chains for S3 and DDB

2019-07-23 Thread Siddharth Seth (JIRA)
Siddharth Seth created HADOOP-16449:
---

 Summary: Allow an empty credential provider chain, separate chains 
for S3 and DDB
 Key: HADOOP-16449
 URL: https://issues.apache.org/jira/browse/HADOOP-16449
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Siddharth Seth
Assignee: Siddharth Seth


Currently, credentials cannot be empty (falls back to using the default chain). 
Credentials for S3 and DDB are always the same.

In some cases it can be useful to use a different credential chain for S3 and 
DDB, as well as allow for an empty credential chain.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16445) Allow separate custom signing algorithms for S3 and DDB

2019-07-23 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16445:

Attachment: HADOOP-16445.02.patch

> Allow separate custom signing algorithms for S3 and DDB
> ---
>
> Key: HADOOP-16445
> URL: https://issues.apache.org/jira/browse/HADOOP-16445
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16445.01.patch, HADOOP-16445.02.patch
>
>
> fs.s3a.signing-algorithm allows overriding the signer. This applies to both 
> the S3 and DDB clients. Need to be able to specify separate signing algorithm 
> overrides for S3 and DDB.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16445) Allow separate custom signing algorithms for S3 and DDB

2019-07-23 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16445:

Status: Patch Available  (was: Open)

Updated to fix the checkstyle warnings.

> Allow separate custom signing algorithms for S3 and DDB
> ---
>
> Key: HADOOP-16445
> URL: https://issues.apache.org/jira/browse/HADOOP-16445
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16445.01.patch, HADOOP-16445.02.patch
>
>
> fs.s3a.signing-algorithm allows overriding the signer. This applies to both 
> the S3 and DDB clients. Need to be able to specify separate signing algorithm 
> overrides for S3 and DDB.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16445) Allow separate custom signing algorithms for S3 and DDB

2019-07-23 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16445:

Status: Open  (was: Patch Available)

> Allow separate custom signing algorithms for S3 and DDB
> ---
>
> Key: HADOOP-16445
> URL: https://issues.apache.org/jira/browse/HADOOP-16445
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16445.01.patch
>
>
> fs.s3a.signing-algorithm allows overriding the signer. This applies to both 
> the S3 and DDB clients. Need to be able to specify separate signing algorithm 
> overrides for S3 and DDB.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16445) Allow separate custom signing algorithms for S3 and DDB

2019-07-22 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16445:

Status: Patch Available  (was: Open)

The patch allows for separate signing algorithms to be used for S3 and DDB. 
(Documentation on usage in the patch)

Also, a non standard signer cannot be used without registering it with the 
Amazon SDK. The patch allows for such non standard signers to be registered.

 

[~ste...@apache.org], [~mackrorysd] - could you please take a look when you get 
a chance.

> Allow separate custom signing algorithms for S3 and DDB
> ---
>
> Key: HADOOP-16445
> URL: https://issues.apache.org/jira/browse/HADOOP-16445
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16445.01.patch
>
>
> fs.s3a.signing-algorithm allows overriding the signer. This applies to both 
> the S3 and DDB clients. Need to be able to specify separate signing algorithm 
> overrides for S3 and DDB.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16445) Allow separate custom signing algorithms for S3 and DDB

2019-07-22 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-16445:

Attachment: HADOOP-16445.01.patch

> Allow separate custom signing algorithms for S3 and DDB
> ---
>
> Key: HADOOP-16445
> URL: https://issues.apache.org/jira/browse/HADOOP-16445
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Attachments: HADOOP-16445.01.patch
>
>
> fs.s3a.signing-algorithm allows overriding the signer. This applies to both 
> the S3 and DDB clients. Need to be able to specify separate signing algorithm 
> overrides for S3 and DDB.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16445) Allow separate custom signing algorithms for S3 and DDB

2019-07-22 Thread Siddharth Seth (JIRA)
Siddharth Seth created HADOOP-16445:
---

 Summary: Allow separate custom signing algorithms for S3 and DDB
 Key: HADOOP-16445
 URL: https://issues.apache.org/jira/browse/HADOOP-16445
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Siddharth Seth
Assignee: Siddharth Seth


fs.s3a.signing-algorithm allows overriding the signer. This applies to both the 
S3 and DDB clients. Need to be able to specify separate signing algorithm 
overrides for S3 and DDB.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15124) Slow FileSystem.Statistics counters implementation

2018-03-22 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409200#comment-16409200
 ] 

Siddharth Seth commented on HADOOP-15124:
-

Adding to [~ste...@apache.org]'s comment earlier about downstream projects 
relying on per thread statistics - Hive-LLAP does rely on this since it can end 
up executing different queries in the same process. It tracks per query 
statistics by pulling from thread statistics - 
[https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/StatsRecordingThreadPool.java]
 . cc [~prasanth_j] - the perf improvement here may interest you.

> Slow FileSystem.Statistics counters implementation
> --
>
> Key: HADOOP-15124
> URL: https://issues.apache.org/jira/browse/HADOOP-15124
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common
>Affects Versions: 2.9.0, 2.8.3, 2.7.5, 3.0.0
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
>  Labels: common, filesystem, statistics
> Attachments: HADOOP-15124.001.patch
>
>
> While profiling 1TB TeraGen job on Hadoop 2.8.2 cluster (Google Dataproc, 2 
> workers, GCS connector) I saw that FileSystem.Statistics code paths Wall time 
> is 5.58% and CPU time is 26.5% of total execution time.
> After switching FileSystem.Statistics implementation to LongAdder, consumed 
> Wall time decreased to 0.006% and CPU time to 0.104% of total execution time.
> Total job runtime decreased from 66 mins to 61 mins.
> These results are not conclusive, because I didn't benchmark multiple times 
> to average results, but regardless of performance gains switching to 
> LongAdder simplifies code and reduces its complexity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14138) Remove S3A ref from META-INF service discovery, rely on existing core-default entry

2017-04-21 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979369#comment-15979369
 ] 

Siddharth Seth commented on HADOOP-14138:
-

bq. those JIRAs are so old they are implicitly dead.
Don't think they're any less relevant today, than they were when they were 
filed.
Realistically though, the jiras will likely not be fixed - 1) Incompatible, and 
incompatible in a manner that is not easy to find since this is not a 
compilation breakage. 2) Someone needs to actually put in some work to make 
this happen.

bq. To me, having to change defaults is pretty common (we frequently have to 
tweak core-default settings for a shipping product), and being able to do that 
in a default config is very low-friction compared to code changes.
Isn't that what the site files are for?

A lot of people consider the core-default files as documentation. Available 
Config, Default Value, Description.
In Tez we went the approach of explicitly not having a default file, and 
generated an output file from the code defaults.
Hive uses a nice approach where HiveConf.get(ParamName) implicitly picks up 
default values. No *-default.xml file here either.

That said, if we're moving to discussing core-default.xml vs Code defaults - 
probably needs a wider audience.

The change helps with performance, so that's really good. Think this affects 
simple invocations like hadoop fs -ls, and it's really good to see this run 
faster. Hoping that a longer term change to fix service loaders goes in. 
Unfortunately will not be able to contribute, the patch in any case.

> Remove S3A ref from META-INF service discovery, rely on existing core-default 
> entry
> ---
>
> Key: HADOOP-14138
> URL: https://issues.apache.org/jira/browse/HADOOP-14138
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha3
>
> Attachments: HADOOP-14138.001.patch, HADOOP-14138-branch-2-001.patch
>
>
> As discussed in HADOOP-14132, the shaded AWS library is killing performance 
> starting all hadoop operations, due to classloading on FS service discovery.
> This is despite the fact that there is an entry for fs.s3a.impl in 
> core-default.xml, *we don't need service discovery here*
> Proposed:
> # cut the entry from 
> {{/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
> # when HADOOP-14132 is in, move to that, including declaring an XML file 
> exclusively for s3a entries
> I want this one in first as its a major performance regression, and one we 
> coula actually backport to 2.7.x, just to improve load time slightly there too



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14138) Remove S3A ref from META-INF service discovery, rely on existing core-default entry

2017-04-19 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975477#comment-15975477
 ] 

Siddharth Seth commented on HADOOP-14138:
-

Not reading core-default was a bug, and is fixed. core-site.xml is independent 
of what is in core-default.xml, if the code defaults were used.

See: HDFS-4820, HADOOP-7956. While searching for these bugs, there's several 
others which relate to removing deprecated entries, fixing default values etc 
from *-default files (This was within the HDFS project itself).
My concern is that an explicit requirement on the *-default files is getting 
introduced, while the files should be removed IMO.

I get the problem with the serviceloaders, and the additional time introduced. 
Thoughts there was an alternate plan to reduce this cost, while retaining 
service loaders.

Could you please elaborate on "How to pick up credentials" - and why that needs 
to be part of core-default? (And not a conf.get(PARAMETER, DEFAULT)). 

> Remove S3A ref from META-INF service discovery, rely on existing core-default 
> entry
> ---
>
> Key: HADOOP-14138
> URL: https://issues.apache.org/jira/browse/HADOOP-14138
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha3
>
> Attachments: HADOOP-14138.001.patch, HADOOP-14138-branch-2-001.patch
>
>
> As discussed in HADOOP-14132, the shaded AWS library is killing performance 
> starting all hadoop operations, due to classloading on FS service discovery.
> This is despite the fact that there is an entry for fs.s3a.impl in 
> core-default.xml, *we don't need service discovery here*
> Proposed:
> # cut the entry from 
> {{/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
> # when HADOOP-14132 is in, move to that, including declaring an XML file 
> exclusively for s3a entries
> I want this one in first as its a major performance regression, and one we 
> coula actually backport to 2.7.x, just to improve load time slightly there too



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14138) Remove S3A ref from META-INF service discovery, rely on existing core-default entry

2017-04-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971800#comment-15971800
 ] 

Siddharth Seth commented on HADOOP-14138:
-

bq.  I do not want to replicate such ugliness elsewhere.
Absolutely. The way HDFS, Yarn insert configuration (global default resources) 
into every instance needs to be avoided.

bq. You start ignoring core-default and core-site, you stop picking up site 
kerberos options
core-site.xml - yes. Core-default.xml is code defaults. Most places access this 
via - conf.get(PARAMETER_NAME, PARAMETER_DEFAULT). This is what I mean by 
setting defaults in code, rather than having them picked up from 
core-default.xml.

In terms of the FileSystems - looks like we disagree on where they belong. A 
clean / fast service loader would be the correct approach as far as I'm 
concerned, and the mechanism already exists. core-default seems like a 
workaround.

> Remove S3A ref from META-INF service discovery, rely on existing core-default 
> entry
> ---
>
> Key: HADOOP-14138
> URL: https://issues.apache.org/jira/browse/HADOOP-14138
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha3
>
> Attachments: HADOOP-14138.001.patch, HADOOP-14138-branch-2-001.patch
>
>
> As discussed in HADOOP-14132, the shaded AWS library is killing performance 
> starting all hadoop operations, due to classloading on FS service discovery.
> This is despite the fact that there is an entry for fs.s3a.impl in 
> core-default.xml, *we don't need service discovery here*
> Proposed:
> # cut the entry from 
> {{/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
> # when HADOOP-14132 is in, move to that, including declaring an XML file 
> exclusively for s3a entries
> I want this one in first as its a major performance regression, and one we 
> coula actually backport to 2.7.x, just to improve load time slightly there too



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14138) Remove S3A ref from META-INF service discovery, rely on existing core-default entry

2017-04-06 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960141#comment-15960141
 ] 

Siddharth Seth commented on HADOOP-14138:
-

[~steve_l] - I understand the mechanics behind *-default.xml and *-site.xml. 
When I said "If someone wants to use s3a, I'd expect them to explicitly set it 
up in their Configuration," - their own Configuration could well be 
core-site.xml, which will then be loaded by all Hadoop services.

What I'm asking is why s3a gets special treatment, and an entry in 
core-default.xml.  Along with that, the 5+ additional s3a settings - why do 
they need to be defined in core-default.xml? Should be possible to have the 
default values in code. This could be a separate template, which users can 
include, to get all relevant settings (if custom settings are required). 
Without custom settings, the service loader approach is sufficient to get s3a 
functional, as long as the jar is available.

Hdfs does not have an entry in core-default, and relies upon the ServiceLoader 
approach. (fs.hdfs.impl does not exist. fs.AbstractFileSystem.hdfs.impl exists 
- I don't know what this is used for). 

core-default.xml, to me at least, serves more as documentation of defaults. The 
files can go out of sync with the default values defined in code, 
YarnConfiguration for example. It takes additional effort to keep the files in 
sync. There's jiras to remove all the *-default.xml files, in favor of code 
defaults (I don't expect these to be fixed soon since such changes would be 
incompatible). For most parameters in these files, the code has default values 
(all the IPC defaults).
I suspect nothing has broken so far, because the defaults exist in code.

In terms of the s3a and service loader problems, HADOOP-14132 sounds like a 
very good fix to have. If I'm understanding this correctly, general FS 
operations will be faster if we don't load all filesystems in the clsaspath. 
I'm worried that we're introducing a new dependency on core-default by making 
this change, while I think we should be going in the opposite direction and 
getting rid of dependencies on these files.



> Remove S3A ref from META-INF service discovery, rely on existing core-default 
> entry
> ---
>
> Key: HADOOP-14138
> URL: https://issues.apache.org/jira/browse/HADOOP-14138
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha3
>
> Attachments: HADOOP-14138.001.patch, HADOOP-14138-branch-2-001.patch
>
>
> As discussed in HADOOP-14132, the shaded AWS library is killing performance 
> starting all hadoop operations, due to classloading on FS service discovery.
> This is despite the fact that there is an entry for fs.s3a.impl in 
> core-default.xml, *we don't need service discovery here*
> Proposed:
> # cut the entry from 
> {{/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
> # when HADOOP-14132 is in, move to that, including declaring an XML file 
> exclusively for s3a entries
> I want this one in first as its a major performance regression, and one we 
> coula actually backport to 2.7.x, just to improve load time slightly there too



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14138) Remove S3A ref from META-INF service discovery, rely on existing core-default entry

2017-04-05 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957261#comment-15957261
 ] 

Siddharth Seth commented on HADOOP-14138:
-

[~steve_l] - why should s3a entries exist in core-default.xml?
core-default is supposed to contain defaults for most config values, and serves 
as documentation.

If someone wants to use s3a, I'd expect them to explicitly set it up in their 
Configuration, or rely on the ServiceLoader approach - which this jira 
reverses. 

> Remove S3A ref from META-INF service discovery, rely on existing core-default 
> entry
> ---
>
> Key: HADOOP-14138
> URL: https://issues.apache.org/jira/browse/HADOOP-14138
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha3
>
> Attachments: HADOOP-14138.001.patch, HADOOP-14138-branch-2-001.patch
>
>
> As discussed in HADOOP-14132, the shaded AWS library is killing performance 
> starting all hadoop operations, due to classloading on FS service discovery.
> This is despite the fact that there is an entry for fs.s3a.impl in 
> core-default.xml, *we don't need service discovery here*
> Proposed:
> # cut the entry from 
> {{/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
> # when HADOOP-14132 is in, move to that, including declaring an XML file 
> exclusively for s3a entries
> I want this one in first as its a major performance regression, and one we 
> coula actually backport to 2.7.x, just to improve load time slightly there too



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2016-11-18 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Attachment: HADOOP-11552.07.patch

Thanks for taking a look, and pointing out the issue with setDeferredResponse.
Modified to not attempt a sendResponse - that's what is done for non deferred 
calls as well (LOG and forget).

> Allow handoff on the server side for RPC requests
> -
>
> Key: HADOOP-11552
> URL: https://issues.apache.org/jira/browse/HADOOP-11552
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-11552.05.patch, HADOOP-11552.06.patch, 
> HADOOP-11552.07.patch, HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, 
> HADOOP-11552.3.txt, HADOOP-11552.3.txt, HADOOP-11552.4.txt
>
>
> An RPC server handler thread is tied up for each incoming RPC request. This 
> isn't ideal, since this essentially implies that RPC operations should be 
> short lived, and most operations which could take time end up falling back to 
> a polling mechanism.
> Some use cases where this is useful.
> - YARN submitApplication - which currently submits, followed by a poll to 
> check if the application is accepted while the submit operation is written 
> out to storage. This can be collapsed into a single call.
> - YARN allocate - requests and allocations use the same protocol. New 
> allocations are received via polling.
> The allocate protocol could be split into a request/heartbeat along with a 
> 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
> on a much longer heartbeat interval. awaitResponse is always left active with 
> the RM - and returns the moment something is available.
> MapReduce/Tez task to AM communication is another example of this pattern.
> The same pattern of splitting calls can be used for other protocols as well. 
> This should serve to improve latency, as well as reduce network traffic since 
> the keep-alive heartbeat can be sent less frequently.
> I believe there's some cases in HDFS as well, where the DN gets told to 
> perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2016-11-15 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Attachment: HADOOP-11552.06.patch

Revised to address review comments, and fix the findbugs warnings.

> Allow handoff on the server side for RPC requests
> -
>
> Key: HADOOP-11552
> URL: https://issues.apache.org/jira/browse/HADOOP-11552
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-11552.05.patch, HADOOP-11552.06.patch, 
> HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, HADOOP-11552.3.txt, 
> HADOOP-11552.3.txt, HADOOP-11552.4.txt
>
>
> An RPC server handler thread is tied up for each incoming RPC request. This 
> isn't ideal, since this essentially implies that RPC operations should be 
> short lived, and most operations which could take time end up falling back to 
> a polling mechanism.
> Some use cases where this is useful.
> - YARN submitApplication - which currently submits, followed by a poll to 
> check if the application is accepted while the submit operation is written 
> out to storage. This can be collapsed into a single call.
> - YARN allocate - requests and allocations use the same protocol. New 
> allocations are received via polling.
> The allocate protocol could be split into a request/heartbeat along with a 
> 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
> on a much longer heartbeat interval. awaitResponse is always left active with 
> the RM - and returns the moment something is available.
> MapReduce/Tez task to AM communication is another example of this pattern.
> The same pattern of splitting calls can be used for other protocols as well. 
> This should serve to improve latency, as well as reduce network traffic since 
> the keep-alive heartbeat can be sent less frequently.
> I believe there's some cases in HDFS as well, where the DN gets told to 
> perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2016-11-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Attachment: HADOOP-11552.05.patch

Rebased patch for trunk. This makes some changes for metrics as well.

> Allow handoff on the server side for RPC requests
> -
>
> Key: HADOOP-11552
> URL: https://issues.apache.org/jira/browse/HADOOP-11552
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-11552.05.patch, HADOOP-11552.1.wip.txt, 
> HADOOP-11552.2.txt, HADOOP-11552.3.txt, HADOOP-11552.3.txt, HADOOP-11552.4.txt
>
>
> An RPC server handler thread is tied up for each incoming RPC request. This 
> isn't ideal, since this essentially implies that RPC operations should be 
> short lived, and most operations which could take time end up falling back to 
> a polling mechanism.
> Some use cases where this is useful.
> - YARN submitApplication - which currently submits, followed by a poll to 
> check if the application is accepted while the submit operation is written 
> out to storage. This can be collapsed into a single call.
> - YARN allocate - requests and allocations use the same protocol. New 
> allocations are received via polling.
> The allocate protocol could be split into a request/heartbeat along with a 
> 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
> on a much longer heartbeat interval. awaitResponse is always left active with 
> the RM - and returns the moment something is available.
> MapReduce/Tez task to AM communication is another example of this pattern.
> The same pattern of splitting calls can be used for other protocols as well. 
> This should serve to improve latency, as well as reduce network traffic since 
> the keep-alive heartbeat can be sent less frequently.
> I believe there's some cases in HDFS as well, where the DN gets told to 
> perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375792#comment-15375792
 ] 

Siddharth Seth commented on HADOOP-13335:
-

bq. a) this doesn't solve the user confusion problem.
The message does more harm than good, and gets worse once users are pointed to 
yarn jar, and then told that YARN_* properties are deprecated in trunk.

bq. b) there is already a way to disable the messages.
By unsetting variables. yarn jar and hadoop jar not usable on the same shell. 
This is not a hive only problem.

Please reconsider your -1, or suggest an alternate solution to solve this. I've 
already provided quite a few options.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch, 
> HADOOP-13335.05.branch-2.patch, HADOOP-13335.05.trunk.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-11 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Status: Patch Available  (was: Reopened)

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch, 
> HADOOP-13335.05.branch-2.patch, HADOOP-13335.05.trunk.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-11 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Attachment: HADOOP-13335.05.trunk.patch

Patch that takes option1 and remove the warnings.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch, 
> HADOOP-13335.05.branch-2.patch, HADOOP-13335.05.trunk.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-11 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Attachment: HADOOP-13335.05.branch-2.patch

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch, 
> HADOOP-13335.05.branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13338) Incompatible change to SortedMapWritable

2016-07-11 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371170#comment-15371170
 ] 

Siddharth Seth commented on HADOOP-13338:
-

Thanks [~ajisakaa]

> Incompatible change to SortedMapWritable
> 
>
> Key: HADOOP-13338
> URL: https://issues.apache.org/jira/browse/HADOOP-13338
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Siddharth Seth
>Priority: Critical
>
> Hive does not compile against Hadoop-2.8.0-SNAPSHOT
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-contrib: Compilation failure
> [ERROR] 
> /Users/sseth/work2/projects/hive/dev/forMvnInstall/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableOutput.java:[215,70]
>  incompatible types: java.lang.Object cannot be converted to 
> java.util.Map.Entry
> {code}
> Looks like the change in HADOOP-10465 causes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366778#comment-15366778
 ] 

Siddharth Seth commented on HADOOP-13335:
-

bq. I'm pretty much going to block anything that removes the warnings. So we 
can keep this open, but it isn't going to go anywhere.
On what technical grounds ?
Printing a warning like this on branch-2 can be considered incompatible. That's 
ignoring the bit that the warning itself doesn't necessarily make sense.

In terms of deprecation - I already see the following in trunk. 
"hadoop_deprecate_envvar YARN_OPTS HADOOP_OPTS" - What exactly are we trying to 
achieve here, between the deprecation and the push for yarn jar via this 
message.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reopened HADOOP-13335:
-

[~aw] - lets try coming to a conclusion here. Re-opening. Kindly do not close 
the ticket till there's a resolution.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366571#comment-15366571
 ] 

Siddharth Seth commented on HADOOP-13335:
-

bq. It's not and I'm still not sure why you seem to insist this is true when 
both Vinod and I have said that there isn't any intent on it going away anytime 
remotely soon.
Anytime soon - no. From your comments, it does look like at least you think 
this is a candidate for 4.x removal.

In terms of Hive (for 2.x) - yes, an unset works - for the most part anyway. 
However, this warning is not required - and should not be seen by other systems 
/ users. Unsetting variables is a way to suppress the message, however it isn't 
an optimal or efficient way to do this.
Is this deprecation, what's the public API?

There's multiple options here.
1. Get rid of the warning altogether - which I think is the correct thing to 
do. (Haven't heard a good reason to keep it)
2. Add a variable to deprecate this specific warning.
3. Add an option to deprecate all such warnings (and apparently there's a lot 
of these in trunk).

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-06 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365652#comment-15365652
 ] 

Siddharth Seth commented on HADOOP-13335:
-

bq. There have been a ton of removals all over the place in trunk already. S3, 
hftp, ... lots and lots of places. Bear in mind that the last opportunity 
Hadoop had to remove content was over five years ago.
Good to know.

bq. Once again: they do not. I'm not sure how many more and different ways I 
can state this point. The rest of that discussion is pretty much moot since the 
first opportunity to make them function the same will be 4.x, easily years off.
And once again - the only difference is the additional YARN_* environment 
variables added on. You seem to agree that having these additional variables is 
unnecessary and confusing, and they should eventually be removed (maybe in 
4.x). If the intent is to deprecate the (yarn jar) command - I don't see the 
point of pointing users towards this when they invoke hadoop jar.

Another way of looking at this: if someone has set the YARN_* options - they 
would do this intentionally, and would be aware of the yarn jar command. If 
they use hadoop jar after this point for something else - they know exactly 
what they're doing, and hadoop should not try pointing them back at yarn jar.

bq. yarn-config.sh and hadoop-config.sh in a way that should be mostly 
conflict-free
Right there is another problem. "Mostly conflict-free" is not a great API, and 
adds more confusion.

bq. Deprecation warnings have always been added to the output throughout the 
history of Hadoop. Expect a lot more to show up in trunk.
Is this particular case supposed to be a deprecation warning ?
Most system will also have a mechanism to suppress deprecation messages; we 
should add such an option to the scripts in trunk.

Is there some place that the public API for the hadoop scripts is documented ? 
The only place I have seen references along with documentation is within the 
scripts itself, and in 'usage' output.



> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-06 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364973#comment-15364973
 ] 

Siddharth Seth commented on HADOOP-13335:
-

bq. It's going to be hard to take yarn jar or hadoop jar away. It's doubtful 
they will ever get removed. That said, we can at least make them act and work 
the same way. To me, that's the ultimate goal and it's pretty close to what 
happens in trunk:
1. yarn command sucks in yarn-env.sh, hadoop-env.sh, yarn-config.sh and 
hadoop-config.sh in a way that should be mostly conflict-free. (non-yarn 
commands do not pull in yarn-x.sh, obviously)
2. If YARN_OPTS is defined, yarn x (jar, rmadmin, etc) will use it but throw a 
deprecation warning.
3. Otherwise use HADOOP_OPTS

It is going to be hard to remove either of these. I don't know when something 
that has been deprecated in a previous release has actually been removed.
If hadoop jar and yarn jar should behave the same - and 1) 'yarn jar' is not 
the preferred usage, or 2) 'yarn jar' behaves as an alias - I'd be in favor of 
removing this warnings altogether. Don't encourage users to use yarn jar over 
hadoop jar/ don't advertise yarn jar.

In terms of removing the warning - I'm all for it, and is the preferred 
approach to 'fix' this jar (that's the first suggestion in the description).

This is arguable - but printing a new warning in the 2.7.0 release can be 
considered to be an incompatible change. Other than being an annoyance to users 
- Hive silent mode is broken by this, since it was not written to work with 
output from the hadoop command.

Lets keep the change / detailed recommendation in the 'help' output - and 
remove it from hadoop jar invocation.

bq. Very long term (post-3.x), it would probably be better if hive called 
hadoop-config.sh and/or hadoop-functions.sh directly. This would bypass the 
middleman and give much better control. I'd be very interested to hear what 
sort of holes we have in the functionality here that makes this 
hard/impossible. Off the top, I suspect we need to make one big function of the 
series of function calls in hadoop-config.sh, but would love to hear your 
insight on this.
I don't know enough about the internals of the scripts to have an educated 
opinion on this. hadoop scripts, required environment variables, how they 
interact seems quite complicated. Setting HADOOP_CLIENT_OPTS is apparently the 
way to change the hiveclient heap size. I would expect hive to control this 
independently, and not depend on variables exported by hadoop scripts. One 
possible usage is for Hadoop to provide basic information - CLASSPATH, configs. 
Products like Hive build on top of this information, rather than trying to use 
hadoop scripts, and define their own mechanism for users to specify various 
environment variables. hadoop jar is obviously useful for custom tools, which 
want a simple way to execute on a hadoop cluster.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-06 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363835#comment-15363835
 ] 

Siddharth Seth commented on HADOOP-13335:
-

bq. There are no HDFS_* vars that change how the client operates. ...
Not yet - until someone feels the need to have hdfs used as a separate command 
to do something other than filesystem operations. (Really hope there's no hdfs 
jar command)

bq. I'm not sure there is more clarity required. If a user wants/needs YARN_*, 
then they need to use 'yarn jar'. 
Again - I don't see the difference between hadoop jar and yarn jar other than a 
different set of variables being set and respected by the different commands. 
Stepping back - should the YARN_* parameters exist?, and should yarn jar exist? 
If I understand you correctly, I think you're trying to get rid of some of this.
If 'yarn jar' is something that we think is confusing, or something we 
potentially want to get rid off - I'd say it's better to not print any warning 
at all - and leave hadoop jar as is?

The hive binary could unset YARN_OPTS / YARN_CLIENT_OPTS - and leave them 
intact for the session/shell from where the hive binary was invoked.
That said, the hive jira linked to this one proposes moving to using 'yarn jar' 
- and I believe this is mainly because of the WARNING. It would be good for 
users / downstream projects to know how this should be handled. Will yarn jar 
survive or not.
cc [~vinodkv] since this discussion has quite a bit to do with yarn, and to 
potentially get more context on yarn jar and the YARN_* variables.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363564#comment-15363564
 ] 

Siddharth Seth commented on HADOOP-13335:
-

bq. If anything wants to talk to yarn in any way/shape/form and use hadoop's 
bootstrap code, they really, really, really should be using yarn jar. Otherwise 
there is a risk that some things may get mis-configured. We won't be able to 
clean this up until 4.x. (3.x finally deprecates a lot of this craziness from 
the project split back that happened in 0.22...)
I'm still not clear about the utility of yarn jar. It's adding some YARN 
specific variables - which no existing scripts would have setup. Other than 
that - it appears to behave the same as hadoop jar.
What happens when someone wants to talk to both yarn and hdfs? hdfs-config.sh 
is not going to get invoked anywhere.

For trunk, it may make sense to separate the 'jar' sub-command instead of 
clubbing it along with service specific sub-commands (appplications, top, 
rmadmin, nodemanager, etc) under the 'yarn' command. yarn jar is just confusing 
- and doesn't seem to be providing a lot of utility.

For this specific case - Hive could unset the YARN options in it's main script. 
Will have to check about the innovations of 'hadoop jar' that happen from 
within a running jvm.

Lets get this patch into branch-2 at least. Hopefully we can have more clarity 
on what yarn jar means, and how it should be handled in trunk.


> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363358#comment-15363358
 ] 

Siddharth Seth commented on HADOOP-13335:
-

[~aw] - when you get a chance, does this look good to go in ?

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Status: Patch Available  (was: Open)

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Attachment: HADOOP-13335.04.patch

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Status: Open  (was: Patch Available)

Posted the patch based on master instead of trunk. Will upload another one 
shortly

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Attachment: HADOOP-13335.03.patch

Renamed the parameter to be prefixed with HADOOP_. Also fixed a typo in the 
parameter name.

The patch is not changing the log line. I'm going to skip changing the 
hadoop_error usage in this patch, and leave that instead for the jira which 
changes this throughout the script.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Assignee: Siddharth Seth
  Status: Patch Available  (was: Open)

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Attachment: HADOOP-13335.03_branch-2.patch

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03_branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Status: Open  (was: Patch Available)

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Status: Patch Available  (was: Open)

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Attachment: HADOOP-13335.02.patch

Patch for master.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Attachment: HADOOP-13335.02_branch-2.patch

The same patch - renamed to branch-2.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02_branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Status: Open  (was: Patch Available)

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02_branch-2.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363161#comment-15363161
 ] 

Siddharth Seth commented on HADOOP-13335:
-

bq. But hive doesn't call yarn, it uses hadoop. hadoop does not import 
yarn-config.sh, which is where yarn configuration parameters are expected to be 
located. YARN_USER_CLASSPATH is not handled or likely defined at all. So none 
of those settings will be defined.
Hive does not use 'yarn' at the moment (and hopefully will not need to). The 
problem comes in when YARN_CLIENT_OPTS / YARN_OPTS are defined on a shell where 
hive is invoked (for beeline / the cli for example). That ends up displaying 
the 'use yarn jar' message which is confusing.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Attachment: HADOOP-13335.01.patch

Thanks for the explanation [~aw]. 

If I'm not mistaken yarn-config.sh imports hadoop-config.sh - so parameters 
like HADOOP_USER_CLASSPATH_FIRST etc will be respected. YARN_CLASSPATH_FIRST is 
handled after building the hadoop classpath.

I'm not sure how conflicting parameters between HADOOP_CLIENT_OPTS and 
YARN_CLIENT_OPTS can be handled though.

Anyway - from a Hive perspective - having the WARNING show if  the user uses 
the same shell as the one used to execute a YARN command is quite confusing. 
Also given this was introduced in 2.7.0 - I think it's useful to add an option 
to disable the warning.

Attaching a simple patch which allows the WARNING to be suppressed. Could you 
please take a look.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-13335:

Status: Patch Available  (was: Open)

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
> Attachments: HADOOP-13335.01.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-07-05 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362949#comment-15362949
 ] 

Siddharth Seth commented on HADOOP-13335:
-

[~aw] - what does yarn jar mean (as compared to Hadoop jar) - is it's main 
intent to provide the yarn libraries in the classpath ?
Will upload a patch a little later to optionally disable the warning.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Siddharth Seth
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13338) Incompatible change to SortedMapWritable

2016-07-01 Thread Siddharth Seth (JIRA)
Siddharth Seth created HADOOP-13338:
---

 Summary: Incompatible change to SortedMapWritable
 Key: HADOOP-13338
 URL: https://issues.apache.org/jira/browse/HADOOP-13338
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Siddharth Seth
Priority: Critical


Hive does not compile against Hadoop-2.8.0-SNAPSHOT

{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-contrib: Compilation failure
[ERROR] 
/Users/sseth/work2/projects/hive/dev/forMvnInstall/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableOutput.java:[215,70]
 incompatible types: java.lang.Object cannot be converted to 
java.util.Map.Entry
{code}

Looks like the change in HADOOP-10465 causes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13335) Add an option to suppress the 'use yarn jar' warning or remove it

2016-06-30 Thread Siddharth Seth (JIRA)
Siddharth Seth created HADOOP-13335:
---

 Summary: Add an option to suppress the 'use yarn jar' warning or 
remove it
 Key: HADOOP-13335
 URL: https://issues.apache.org/jira/browse/HADOOP-13335
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Siddharth Seth


https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
warning for 'hadoop jar'.

hadoop jar is used for a lot more that starting jobs. As an example - hive uses 
it to start all it's services (HiveServer2, the hive client, beeline etc).
Using 'yarn jar' for to start these services / tools doesn't make a lot of 
sense - there's no relation to yarn other than requiring the classpath to 
include yarn libraries.

I'd propose reverting the changes where this message is printed if YARN 
variables are set (leave it in the help message), or adding a mechanism which 
would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12910) Add new FileSystem API to support asynchronous method calls

2016-06-09 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323237#comment-15323237
 ] 

Siddharth Seth commented on HADOOP-12910:
-

+1 for not using Guava in the public API (unless we are copying the code into 
the Hadoop codebase). That ends up potentially limiting guava version changes 
in the Hadoop major version.

> Add new FileSystem API to support asynchronous method calls
> ---
>
> Key: HADOOP-12910
> URL: https://issues.apache.org/jira/browse/HADOOP-12910
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: HADOOP-12910-HDFS-9924.000.patch, 
> HADOOP-12910-HDFS-9924.001.patch, HADOOP-12910-HDFS-9924.002.patch
>
>
> Add a new API, namely FutureFileSystem (or AsynchronousFileSystem, if it is a 
> better name).  All the APIs in FutureFileSystem are the same as FileSystem 
> except that the return type is wrapped by Future, e.g.
> {code}
>   //FileSystem
>   public boolean rename(Path src, Path dst) throws IOException;
>   //FutureFileSystem
>   public Future rename(Path src, Path dst) throws IOException;
> {code}
> Note that FutureFileSystem does not extend FileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13057) Async IPC server support

2016-04-26 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257619#comment-15257619
 ] 

Siddharth Seth commented on HADOOP-13057:
-

It would - this is long pending. I don't think I'll be able to update the patch 
for the next several weeks. If anyone want to take a shot meanwhile, feel free 
to.

> Async IPC server support
> 
>
> Key: HADOOP-13057
> URL: https://issues.apache.org/jira/browse/HADOOP-13057
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>
> On some application, server may run out of handlers when performing many 
> blocking I/O operations during processing of each call (e.g. calling another 
> service, etc.). A viable solution is increasing number of handlers. 
> But this faces the problem that large amount of threads will consume much 
> memory (stack, etc.), and performance issues either.
> After HADOOP-12909, work on asynchronization has been done on caller-side. 
> This is a similar proposal on server-side.
> Suggesting the ability to handle requests asynchronously.
> For example, in such server, calls may return a Future object instead of 
> immediate value. Then sends response to client in {{onSuccess}} or 
> {{onFailed}} callbacks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-13057) Async IPC server support

2016-04-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255896#comment-15255896
 ] 

Siddharth Seth commented on HADOOP-13057:
-

HADOOP-11552 targets the same.

> Async IPC server support
> 
>
> Key: HADOOP-13057
> URL: https://issues.apache.org/jira/browse/HADOOP-13057
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>
> On some application, server may run out of handlers when performing many 
> blocking I/O operations during processing of each call (e.g. calling another 
> service, etc.). A viable solution is increasing number of handlers. 
> But this faces the problem that large amount of threads will consume much 
> memory (stack, etc.), and performance issues either.
> After HADOOP-12909, work on asynchronization has been done on caller-side. 
> This is a similar proposal on server-side.
> Suggesting the ability to handle requests asynchronously.
> For example, in such server, calls may return a Future object instead of 
> immediate value. Then sends response to client in {{onSuccess}} or 
> {{onFailed}} callbacks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12909) Change ipc.Client to support asynchronous calls

2016-03-10 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190172#comment-15190172
 ] 

Siddharth Seth commented on HADOOP-12909:
-

There are potential problems with supporting client side calls without fixing 
the server side - the main one being that all handler threads on the server can 
end up getting blocked. Of course, the same would happen if the client app were 
to create it's own threads and make remote calls (FileSystem for instance).
The future based approach mentioned here and other related jiras ends up 
simplifying client code; however frameworks need to be aware of the potential 
affect on the server.

> Change ipc.Client to support asynchronous calls
> ---
>
> Key: HADOOP-12909
> URL: https://issues.apache.org/jira/browse/HADOOP-12909
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
>
> In ipc.Client, the underlying mechanism is already supporting asynchronous 
> calls -- the calls shares a connection, the call requests are sent using a 
> thread pool and the responses can be out of order.  Indeed, synchronous call 
> is implemented by invoking wait() in the caller thread in order to wait for 
> the server response.
> In this JIRA, we change ipc.Client to support asynchronous mode.  In 
> asynchronous mode, it return once the request has been sent out but not wait 
> for the response from the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-04-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Target Version/s: 2.8.0  (was: 2.7.0)

 Allow handoff on the server side for RPC requests
 -

 Key: HADOOP-11552
 URL: https://issues.apache.org/jira/browse/HADOOP-11552
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, 
 HADOOP-11552.3.txt, HADOOP-11552.3.txt, HADOOP-11552.4.txt


 An RPC server handler thread is tied up for each incoming RPC request. This 
 isn't ideal, since this essentially implies that RPC operations should be 
 short lived, and most operations which could take time end up falling back to 
 a polling mechanism.
 Some use cases where this is useful.
 - YARN submitApplication - which currently submits, followed by a poll to 
 check if the application is accepted while the submit operation is written 
 out to storage. This can be collapsed into a single call.
 - YARN allocate - requests and allocations use the same protocol. New 
 allocations are received via polling.
 The allocate protocol could be split into a request/heartbeat along with a 
 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
 on a much longer heartbeat interval. awaitResponse is always left active with 
 the RM - and returns the moment something is available.
 MapReduce/Tez task to AM communication is another example of this pattern.
 The same pattern of splitting calls can be used for other protocols as well. 
 This should serve to improve latency, as well as reduce network traffic since 
 the keep-alive heartbeat can be sent less frequently.
 I believe there's some cases in HDFS as well, where the DN gets told to 
 perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-04-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393094#comment-14393094
 ] 

Siddharth Seth commented on HADOOP-11552:
-

cc/ [~vinodkv] - thoughts on making YARN changes in a branch ?

 Allow handoff on the server side for RPC requests
 -

 Key: HADOOP-11552
 URL: https://issues.apache.org/jira/browse/HADOOP-11552
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, 
 HADOOP-11552.3.txt, HADOOP-11552.3.txt, HADOOP-11552.4.txt


 An RPC server handler thread is tied up for each incoming RPC request. This 
 isn't ideal, since this essentially implies that RPC operations should be 
 short lived, and most operations which could take time end up falling back to 
 a polling mechanism.
 Some use cases where this is useful.
 - YARN submitApplication - which currently submits, followed by a poll to 
 check if the application is accepted while the submit operation is written 
 out to storage. This can be collapsed into a single call.
 - YARN allocate - requests and allocations use the same protocol. New 
 allocations are received via polling.
 The allocate protocol could be split into a request/heartbeat along with a 
 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
 on a much longer heartbeat interval. awaitResponse is always left active with 
 the RM - and returns the moment something is available.
 MapReduce/Tez task to AM communication is another example of this pattern.
 The same pattern of splitting calls can be used for other protocols as well. 
 This should serve to improve latency, as well as reduce network traffic since 
 the keep-alive heartbeat can be sent less frequently.
 I believe there's some cases in HDFS as well, where the DN gets told to 
 perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-04-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393093#comment-14393093
 ] 

Siddharth Seth commented on HADOOP-11552:
-

I'm interested in getting this patch into a released version of Hadoop. Having 
it in a released version does make it easier to consume for downstream 
projects; and I do intend to use this feature in Tez - and that can serve as 
another testbed. Was hoping to get this into 2.7, but it's too late for that. 
Will change the target version to 2.8 - which gives more breathing room to have 
it reviewed, and tried out in components within Hadoop.

There isn't that much work in the RPC layer itself. Follow up patches like the 
shared thread pool will be more disruptive. When this is used by YARN / HDFS - 
those patches are likely to be more involved, and a larger change set. I can 
create jiras for some of the YARN tasks, and would request folks in HDFS to 
create relevant jiras there.

This could absolutely be done in a branch. If this particular patch is 
considered 'safe' - it'd be good to get it into 2.8 even if the rest of the 
work to use it in sub-components isn't done.

HADOOP-10300 is related, and this patch borrows elements from there - like I 
mentioned in my first comment. If I'm not mistaken, 10300 doesn't allow for a 
return value. Daryn could correct me here if I've understood that incorrectly.

Multiplexing UGIs over a single connection - that's TBD right ? We still use 
distinct connections per UGI if I'm not mistaken. Don't think the patch affects 
this path. Are there plans to support multiplexing responses on a connection - 
i.e. allow a smaller response through, even if the responder isn't done with a 
previous response on the same connection ? 


 Allow handoff on the server side for RPC requests
 -

 Key: HADOOP-11552
 URL: https://issues.apache.org/jira/browse/HADOOP-11552
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, 
 HADOOP-11552.3.txt, HADOOP-11552.3.txt, HADOOP-11552.4.txt


 An RPC server handler thread is tied up for each incoming RPC request. This 
 isn't ideal, since this essentially implies that RPC operations should be 
 short lived, and most operations which could take time end up falling back to 
 a polling mechanism.
 Some use cases where this is useful.
 - YARN submitApplication - which currently submits, followed by a poll to 
 check if the application is accepted while the submit operation is written 
 out to storage. This can be collapsed into a single call.
 - YARN allocate - requests and allocations use the same protocol. New 
 allocations are received via polling.
 The allocate protocol could be split into a request/heartbeat along with a 
 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
 on a much longer heartbeat interval. awaitResponse is always left active with 
 the RM - and returns the moment something is available.
 MapReduce/Tez task to AM communication is another example of this pattern.
 The same pattern of splitting calls can be used for other protocols as well. 
 This should serve to improve latency, as well as reduce network traffic since 
 the keep-alive heartbeat can be sent less frequently.
 I believe there's some cases in HDFS as well, where the DN gets told to 
 perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-03-31 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Attachment: HADOOP-11552.3.txt

Updated patch to fix the rat check and javac warning. I don't think the unit 
test failure is related.

bq. I would like to see some code actually using this before we add it, to make 
sure we are getting the APIs right.
[~cmccabe] - the APIs are being added as unstable for now. There's at least one 
follow up jira to make use a common thread pool to handle requests and send 
responses - or at least have the RPC layer provide a pool for responses. At the 
mement apps would have to setup threads to send responses in parallel.
The test does show how this would be used in an application - where the 
specific method on a protocol would need to indicate intent to send a delayed 
response, and relinquish control.
I think it'll be useful to get this in - so that it's possible to start trying 
out this mechanism.

 Allow handoff on the server side for RPC requests
 -

 Key: HADOOP-11552
 URL: https://issues.apache.org/jira/browse/HADOOP-11552
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, 
 HADOOP-11552.3.txt, HADOOP-11552.3.txt


 An RPC server handler thread is tied up for each incoming RPC request. This 
 isn't ideal, since this essentially implies that RPC operations should be 
 short lived, and most operations which could take time end up falling back to 
 a polling mechanism.
 Some use cases where this is useful.
 - YARN submitApplication - which currently submits, followed by a poll to 
 check if the application is accepted while the submit operation is written 
 out to storage. This can be collapsed into a single call.
 - YARN allocate - requests and allocations use the same protocol. New 
 allocations are received via polling.
 The allocate protocol could be split into a request/heartbeat along with a 
 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
 on a much longer heartbeat interval. awaitResponse is always left active with 
 the RM - and returns the moment something is available.
 MapReduce/Tez task to AM communication is another example of this pattern.
 The same pattern of splitting calls can be used for other protocols as well. 
 This should serve to improve latency, as well as reduce network traffic since 
 the keep-alive heartbeat can be sent less frequently.
 I believe there's some cases in HDFS as well, where the DN gets told to 
 perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-03-31 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Status: Open  (was: Patch Available)

 Allow handoff on the server side for RPC requests
 -

 Key: HADOOP-11552
 URL: https://issues.apache.org/jira/browse/HADOOP-11552
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, 
 HADOOP-11552.3.txt, HADOOP-11552.3.txt, HADOOP-11552.4.txt


 An RPC server handler thread is tied up for each incoming RPC request. This 
 isn't ideal, since this essentially implies that RPC operations should be 
 short lived, and most operations which could take time end up falling back to 
 a polling mechanism.
 Some use cases where this is useful.
 - YARN submitApplication - which currently submits, followed by a poll to 
 check if the application is accepted while the submit operation is written 
 out to storage. This can be collapsed into a single call.
 - YARN allocate - requests and allocations use the same protocol. New 
 allocations are received via polling.
 The allocate protocol could be split into a request/heartbeat along with a 
 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
 on a much longer heartbeat interval. awaitResponse is always left active with 
 the RM - and returns the moment something is available.
 MapReduce/Tez task to AM communication is another example of this pattern.
 The same pattern of splitting calls can be used for other protocols as well. 
 This should serve to improve latency, as well as reduce network traffic since 
 the keep-alive heartbeat can be sent less frequently.
 I believe there's some cases in HDFS as well, where the DN gets told to 
 perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-03-31 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Status: Patch Available  (was: Open)

 Allow handoff on the server side for RPC requests
 -

 Key: HADOOP-11552
 URL: https://issues.apache.org/jira/browse/HADOOP-11552
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, 
 HADOOP-11552.3.txt, HADOOP-11552.3.txt, HADOOP-11552.4.txt


 An RPC server handler thread is tied up for each incoming RPC request. This 
 isn't ideal, since this essentially implies that RPC operations should be 
 short lived, and most operations which could take time end up falling back to 
 a polling mechanism.
 Some use cases where this is useful.
 - YARN submitApplication - which currently submits, followed by a poll to 
 check if the application is accepted while the submit operation is written 
 out to storage. This can be collapsed into a single call.
 - YARN allocate - requests and allocations use the same protocol. New 
 allocations are received via polling.
 The allocate protocol could be split into a request/heartbeat along with a 
 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
 on a much longer heartbeat interval. awaitResponse is always left active with 
 the RM - and returns the moment something is available.
 MapReduce/Tez task to AM communication is another example of this pattern.
 The same pattern of splitting calls can be used for other protocols as well. 
 This should serve to improve latency, as well as reduce network traffic since 
 the keep-alive heartbeat can be sent less frequently.
 I believe there's some cases in HDFS as well, where the DN gets told to 
 perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-03-31 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Attachment: HADOOP-11552.4.txt

 Allow handoff on the server side for RPC requests
 -

 Key: HADOOP-11552
 URL: https://issues.apache.org/jira/browse/HADOOP-11552
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, 
 HADOOP-11552.3.txt, HADOOP-11552.3.txt, HADOOP-11552.4.txt


 An RPC server handler thread is tied up for each incoming RPC request. This 
 isn't ideal, since this essentially implies that RPC operations should be 
 short lived, and most operations which could take time end up falling back to 
 a polling mechanism.
 Some use cases where this is useful.
 - YARN submitApplication - which currently submits, followed by a poll to 
 check if the application is accepted while the submit operation is written 
 out to storage. This can be collapsed into a single call.
 - YARN allocate - requests and allocations use the same protocol. New 
 allocations are received via polling.
 The allocate protocol could be split into a request/heartbeat along with a 
 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
 on a much longer heartbeat interval. awaitResponse is always left active with 
 the RM - and returns the moment something is available.
 MapReduce/Tez task to AM communication is another example of this pattern.
 The same pattern of splitting calls can be used for other protocols as well. 
 This should serve to improve latency, as well as reduce network traffic since 
 the keep-alive heartbeat can be sent less frequently.
 I believe there's some cases in HDFS as well, where the DN gets told to 
 perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-03-30 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Status: Patch Available  (was: Open)

 Allow handoff on the server side for RPC requests
 -

 Key: HADOOP-11552
 URL: https://issues.apache.org/jira/browse/HADOOP-11552
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt


 An RPC server handler thread is tied up for each incoming RPC request. This 
 isn't ideal, since this essentially implies that RPC operations should be 
 short lived, and most operations which could take time end up falling back to 
 a polling mechanism.
 Some use cases where this is useful.
 - YARN submitApplication - which currently submits, followed by a poll to 
 check if the application is accepted while the submit operation is written 
 out to storage. This can be collapsed into a single call.
 - YARN allocate - requests and allocations use the same protocol. New 
 allocations are received via polling.
 The allocate protocol could be split into a request/heartbeat along with a 
 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
 on a much longer heartbeat interval. awaitResponse is always left active with 
 the RM - and returns the moment something is available.
 MapReduce/Tez task to AM communication is another example of this pattern.
 The same pattern of splitting calls can be used for other protocols as well. 
 This should serve to improve latency, as well as reduce network traffic since 
 the keep-alive heartbeat can be sent less frequently.
 I believe there's some cases in HDFS as well, where the DN gets told to 
 perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11552) Allow handoff on the server side for RPC requests

2015-03-30 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HADOOP-11552:

Attachment: HADOOP-11552.2.txt

Updated patch with another couple of tests, and error handling.

[~sanjay.radia], [~daryn] - please review. Would appreciate it if you could pay 
additional attention to ensuring this doesn't break the regular flow. I've 
tried to keep the changes to a minimal for the regular flow.

 Allow handoff on the server side for RPC requests
 -

 Key: HADOOP-11552
 URL: https://issues.apache.org/jira/browse/HADOOP-11552
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt


 An RPC server handler thread is tied up for each incoming RPC request. This 
 isn't ideal, since this essentially implies that RPC operations should be 
 short lived, and most operations which could take time end up falling back to 
 a polling mechanism.
 Some use cases where this is useful.
 - YARN submitApplication - which currently submits, followed by a poll to 
 check if the application is accepted while the submit operation is written 
 out to storage. This can be collapsed into a single call.
 - YARN allocate - requests and allocations use the same protocol. New 
 allocations are received via polling.
 The allocate protocol could be split into a request/heartbeat along with a 
 'awaitResponse'. The request/heartbeat is sent only when there's a request or 
 on a much longer heartbeat interval. awaitResponse is always left active with 
 the RM - and returns the moment something is available.
 MapReduce/Tez task to AM communication is another example of this pattern.
 The same pattern of splitting calls can be used for other protocols as well. 
 This should serve to improve latency, as well as reduce network traffic since 
 the keep-alive heartbeat can be sent less frequently.
 I believe there's some cases in HDFS as well, where the DN gets told to 
 perform some operations when they heartbeat into the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >