[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-07-15 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088798#comment-16088798
 ] 

Mingliang Liu commented on HADOOP-13998:


Thanks Steve.

{quote}
For the merge, a big single squashed s3guard patch should seem to be the best 
way, "everything in one go"
{quote}
I'm OK with this as the code change will be simple and clear.

{quote}
I think I'll also do a 2.9 backport branch
{quote}
+1 for this. We had some internal effort and this seems very promising.

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-07-12 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084812#comment-16084812
 ] 

Aaron Fabbri commented on HADOOP-13998:
---

+1, [~ste...@apache.org]

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-07-11 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082545#comment-16082545
 ] 

Steve Loughran commented on HADOOP-13998:
-

Update: 

# I don't want to get those new things I mentioned in. Let's get the (fairly 
stable) preview out and then worry about the new features.
# That leaves only a couple of patches HADOOP-14505 and HADOOP-14633
# For the merge, a big single squashed s3guard patch should seem to be the best 
way, "everything in one go"
# I think I'll also do a 2.9 backport branch, which should just be java 7 
anonymous classes in places of lambdas in the tests. We have a lot of that 
internally already. That'd be a followon to the merge


> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-07-05 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075159#comment-16075159
 ] 

Mingliang Liu commented on HADOOP-13998:


Thanks Steve for the list, I'll review those related JIRAs.

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069945#comment-16069945
 ] 

Steve Loughran commented on HADOOP-13998:
-

As far as I can see.

I'd actually like to get the S3A lambda and retry logic in there too, even 
though its going to be in the committer where it gets picked up.

maybe a few more side things too (json ser/deser). Why? I want them in trunk 
for general S3A work, such as implementing all retry error handling. 

This is not the committer itself. that I'd like to get into the 13345 branch 
once the preview is out.

Regarding  merge process: vote in commons, 2+ votes. Even though mingliang and 
I are quorate, be good to get some review from others, ideally cnauroth and 
others with experience in the area

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-29 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069165#comment-16069165
 ] 

Mingliang Liu commented on HADOOP-13998:


If I get it correctly, [HADOOP-14457] is the only blocker before this is merged 
to trunk, right?

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038691#comment-16038691
 ] 

Steve Loughran commented on HADOOP-13998:
-

bq. Actually I believe a few of those tests had transient failures at a fairly 
consistent rate (something like 1 in 4 or 1 in 6 test runs if I remember 
correctly) that had always been assumed to be the result of inconsistency. They 
stopped failing entirely once the initial work for list-after-put consistency 
was incorporated.

yes. That's why our docs on using s3 as a dest now say "dont". The big test 
runs fail as they have validation of the output and catch problems. Other 
people's apps may not do that validation, so end up getting bad data & not 
noticing

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-02 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035280#comment-16035280
 ] 

Sean Mackrory commented on HADOOP-13998:


[~ste...@apache.org] - regarding that test issue, that would happen if a 
directory was deleted, and a file inside it was then created. If you're using 
the DynamoDB implementation, it should definitely be replacing the tombstone 
for the parent directory when the file is created. If you're using the Local 
implementation, I wonder if that's happening as a result of HADOOP-14457. I'll 
take a closer look at that again and see if I can reproduce, though I thought I 
had added test cases for that sequence.

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-02 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035268#comment-16035268
 ] 

Sean Mackrory commented on HADOOP-13998:


{quote}If these tests were working before you turned s3guard on then they 
weren't catching inconsistencies & so were lucky (as mine were){quote}

Actually I believe a few of those tests had transient failures at a fairly 
consistent rate (something like 1 in 4 or 1 in 6 test runs if I remember 
correctly) that had always been assumed to be the result of inconsistency. They 
stopped failing entirely once the initial work for list-after-put consistency 
was incorporated.

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035217#comment-16035217
 ] 

Steve Loughran commented on HADOOP-13998:
-

regarding tests, I'm seeing something up with the combination of (s3guard and 
the partition committer (and only it)): a newly created file is where it should 
be, but the parent dir is still tagged as missing. I  can GET the file, but if 
I try to list the parent I get rejected:
{code}
2017-06-02 18:19:10,709 [ScalaTest-main-running-S3ACommitDataframeSuite] INFO  
s3.S3AOperations (Logging.scala:logInfo(54)) - 
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/part-0-7573c876-38e5-4024-8a53-51fa1aa9c9c2-c000.snappy.orc
 size=384
2017-06-02 18:19:10,709 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path 
status for 
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS
  
(cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS)
2017-06-02 18:19:10,710 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3guard.MetadataStore (LocalMetadataStore.java:get(151)) - 
get(s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS)
 -> file  
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS
 3400UNKNOWN  false 
S3AFileStatus{path=s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS;
 isDirectory=false; length=3400; replication=1; blocksize=1048576; 
modification_time=1496423948811; access_time=0; owner=stevel; group=stevel; 
permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
isErasureCoded=false} isEmptyDirectory=FALSE
2017-06-02 18:19:10,710 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3a.S3AFileSystem (S3AFileSystem.java:innerListStatus(1660)) - List status for 
path: 
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
2017-06-02 18:19:10,710 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path 
status for 
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
  
(cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc)
2017-06-02 18:19:10,711 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3guard.MetadataStore (LocalMetadataStore.java:get(151)) - 
get(s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc)
 -> file  
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
 0   UNKNOWN  true  
FileStatus{path=s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc;
 isDirectory=false; length=0; replication=0; blocksize=0; 
modification_time=1496423936532; access_time=0; owner=; group=; 
permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
isErasureCoded=false}
2017-06-02 18:19:10,719 [dispatcher-event-loop-6] INFO  
spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(54)) - 
MapOutputTrackerMasterEndpoint stopped!
2017-06-02 18:19:10,727 [dispatcher-event-loop-3] INFO  
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint 
(Logging.scala:logInfo(54)) - OutputCommitCoordinator stopped!
2017-06-02 18:19:10,729 [ScalaTest-main-running-S3ACommitDataframeSuite] INFO  
spark.SparkContext (Logging.scala:logInfo(54)) - Successfully stopped 
SparkContext
- Dataframe+partitioned *** FAILED ***
  java.io.FileNotFoundException: Path 
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
 is recorded as deleted by S3Guard
  at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1906)
  at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1881)
  at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1664)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1640)
  at 
com.hortonworks.spark.cloud.ObjectStoreOperations$class.validateRowCount(ObjectStoreOperations.scala:340)
  at 
com.hortonworks.spark.cloud.CloudSuite.validateRowCount(CloudSuite.scala:37)
  at 
com.hortonworks.spark.cloud.s3.commit.S3ACommitDataframeSuite.testOneFormat(S3ACommitDataframeSuite.scala:107)
  at 

[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035077#comment-16035077
 ] 

Steve Loughran commented on HADOOP-13998:
-

bq. We've run our standard downstream Hive, Spark, MR, Impala, scale, and 
performance tests

If these tests were working *before* you turned s3guard on then they weren't 
catching inconsistencies & so were lucky (as mine were). I'm running my spark 
committer tests with the inconsistent client turned on, and it is repeatedly 
failing the classic & magic committers without s3guard enabled: both depend on 
consistent listing. Also found a brittleness in path cleanup for the magic 
committer too; cleanup code *must* handle an FNFE if there's a file returned in 
the listing but which isn't there in the GET. This is why I'd like the  factory 
for the inconsistent client be in src/main: it lets anyone turn on 
inconsistency for their test runs

bq. This is a good point. Do you prefer timing-based microbenchmarks, or S3 
request statistics (counts)?

the instrumentation ones are way less brittle; Ming has been fixing some 
nanotimer-assertion in WASB which was failing intermittently. I have some tests 
somewhere which call listFiles(recursive) against the amazon landsat store: 
that's the reference example of a deep and wide directory tree.



> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-01 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033340#comment-16033340
 ] 

Aaron Fabbri commented on HADOOP-13998:
---

Thank you for the comments [~liuml07] and [~ste...@apache.org].

{quote}
what functional tests have people done?
{quote}

We've done quite a bit of functional testing.  We've run our standard 
downstream Hive, Spark, MR, Impala, scale, and performance tests.  For 
performance, we generally saw similar or better performance with S3Guard 
enabled (due to short circuit getFileStatus()).

{quote}
Need to look at ... perf on non-s3guard codepaths is impacted, e.g. by new 
requests
{quote}

This is a good point.  Do you prefer timing-based microbenchmarks, or S3 
request statistics (counts)?


> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032994#comment-16032994
 ] 

Steve Loughran commented on HADOOP-13998:
-

linking to HADOOP-14423; need to stop a putObjectDirect with length of -1.

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-06-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032962#comment-16032962
 ] 

Steve Loughran commented on HADOOP-13998:
-

it's time to merge in, as the branch is big enough we should merge in trunk to 
the branch again, have everything working nicely; the troublesome SSE-C test 
can be skipped and covered in docs/release notes.

Before then, last chance to do refactorings and renamings of options, etc, and 
review.

h3. Code
Need to look at code to see if
# perf on non-s3guard codepaths is impacted, e.g. by new requests
# config options look good before freezing their names
# all those places which have TODO in them: are they critical? If not, do they 
at least have JIRA coverage somewhere

h3. Docs
# docs: how well do they read for someone who hasn't worked on it (this can 
evolve, obviously)
# release notes. Maybe: make clear it's experimental, safe in non-auth, auth is 
more dangerous

h3. Testing!

what functional tests have people done?

I'm trying to do some downstream testing in my [spark cloud integration 
module|https://github.com/hortonworks-spark/cloud-integration] project, mostly 
on committer. But I've just moved the spark SQL Hive test suite 
{{org.apache.spark.sql.sources.HadoopFsRelationTest}} & its ORC subclass in so 
I can verify that the commit algorithms work there. I'm now trying to switch 
over to the inconsistent client to see if I can make it easy to observe 
inconsistencies in the classic "legacy" file commit algorithms. (ignoring 
Parquet for technical reasons; that'll need a fix in spark itself).

Other logistics
# get some review from [~cnauroth] if he has the time
# submit a single patch to cover the merge

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-05-31 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032102#comment-16032102
 ] 

Mingliang Liu commented on HADOOP-13998:


I support to merge back to trunk if we agree there is no blockers. For basic 
listing and delete tracking, the core logic has been finished, reviewed and to 
some degree tested. There are other critical subtasks but developing in trunk 
is OK to me. As we chose to integrate the S3Guard feature in S3AFilesystem 
itself, it's better to let incoming S3A changes be aware of the fact that, 
there may be a metadata store.

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-05-30 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030140#comment-16030140
 ] 

Sean Mackrory commented on HADOOP-13998:


I would generally support a merge where we are in development. I feel like 
things are generally sufficiently useful and stable.

At a finer-grained level though I'm adding HADOOP-14448 as a dependency. There 
are tests in trunk that don't play nice with S3Guard. I'm of the opinion that 
it's just a "ignore-the-tests-in-that-scenario-and-document-the-implications" 
scenario though, so it will probably be resolved soon.

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-05-30 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030074#comment-16030074
 ] 

Aaron Fabbri commented on HADOOP-13998:
---

Hi [~liuml07], [~mackrorysd], [~ste...@apache.org] and others. Looking for your 
feedback here.

I'd like to start a DISCUSS thread on the email list to propose merging S3Guard 
to trunk if you guys feel like we are at a good point.

I think that active s3guard development will continue for a year or more, so we 
might also want to resolve HADOOP-13345 and create a new "phase 2" S3Guard JIRA 
to hold ongoing subtasks.  Thoughts?



> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-04-05 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957809#comment-15957809
 ] 

Mingliang Liu commented on HADOOP-13998:


Overall we're close to the merge. I link [HADOOP-14215] here as dependency for 
merging back to {{trunk}} (initial preview). Correct me if that is not 
critical. Thanks,

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-03-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901524#comment-15901524
 ] 

Steve Loughran commented on HADOOP-13998:
-

Add: review key hashing so that we are confident it spreads the data widely, 
rather than biased towards a single shard in the database

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-02-22 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877812#comment-15877812
 ] 

Aaron Fabbri commented on HADOOP-13998:
---

[~ste...@apache.org] probably merge in latest trunk, do a full round of 
testing, and resolve the documentation JIRA that this depends (haven't had time 
to do that yet).

FYI I'm currently working on HADOOP-13914, which will change a decent amount of 
code in S3AFileSystem and related tests.

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-02-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876143#comment-15876143
 ] 

Steve Loughran commented on HADOOP-13998:
-

Looking at this, what else do we need?

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-02-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854656#comment-15854656
 ] 

Chris Nauroth commented on HADOOP-13998:


+1 for proceeding with a trunk merge vote after resolving the linked issues.  
(I just added the HADOOP-14051 documentation fix mentioned in the last comment.)

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-02-02 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850796#comment-15850796
 ] 

Aaron Fabbri commented on HADOOP-13998:
---

[~aw] no.. thanks for noticing. I'll fix that (filed HADOOP-14051).

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-02-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850733#comment-15850733
 ] 

Allen Wittenauer commented on HADOOP-13998:
---

Is the s3guard documentation actually linked from the main index?

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-02-02 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849666#comment-15849666
 ] 

Aaron Fabbri commented on HADOOP-13998:
---

All of the dependencies for this have patches available for review.  I would 
like to start on empty directory handling improvements, but would prefer to 
merge the feature branch to trunk first to avoid having to maintain more 
S3AFileSystem diffs.

*I'm proposing we merge* HADOOP-13345 to trunk as soon as we get the dependent 
JIRAs linked here committed.  I'll provide a summary of where we are at below.  
I look forward to feedback from [~ste...@apache.org], [~cnauroth], [~eddyxu], 
[~mackrorysd], and the rest of the community.

The main feature we want for the initial version is listing consistency, and 
we've accomplished that.

For testing, we have completed (off the top of my head):
- List consistency tests with failure injection.  (HADOOP-13793) This 
integration test forces a delay in visibility of certain files by wrapping the 
AWS S3 client. It asserts listing is consistent.  The test fails without 
S3Guard, and succeeds with it. 
- All existing S3 integration tests with and without S3Guard.  The filesystem 
contract tests have been invaluable here. (HADOOP-13589 makes these very easy 
to run).
- MetadataStore contract tests that ensure that the API semantics of the 
DynamoDB and in-memory reference implementations are correct.
- MetadataStore scale tests that can be used to force DynamoDB service 
throttling and ensure we are robust to that.
- Unit tests for different parts of the S3Guard logic.

In addition to this upstream testing, my colleagues have run a couple of our 
in-house test harnesses against S3Guard.  This includes Hive, Spark, and a 
number of other components.  All the testing is looking great so far.


> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-01-18 Thread Sunil Govind (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828132#comment-15828132
 ] 

Sunil Govind commented on HADOOP-13998:
---

Mmm




> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-01-18 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828128#comment-15828128
 ] 

Steve Loughran commented on HADOOP-13998:
-

production code changes

* HADOOP-13985

Testing improvements

* HADOOP-13589
* HADOOP-13995
* HADOOP-13876



> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org