[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-07-11 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882964#comment-16882964
 ] 

Gabor Bota commented on HADOOP-13980:
-

[~mackrorysd], I've created the subtasks. I don't know what I wanted to do with 
handling {{Export metadatastore and S3 bucket hierarchy}} and {{export scan 
results in human readable format}} differently, it makes no sense to me now to 
separate those. 

> We can probably break this into more subtasks. If would be best if the 
> implementation had a sequence of specific "fixers" to address specific 
> discrepancies. "fixMissingParents", "fixOutOfDateEntries", etc.
I have handlers in the PR, the violation handler can log and fix the error. We 
can do complex stuff in a handler, but a handler cannot see other violations 
yet, just what it handles.


> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-07-09 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881233#comment-16881233
 ] 

Andrew Olson commented on HADOOP-13980:
---

+1 for TSV, the human readability / excel compat could be helpful and still 
easily machine-parseable or usable as a job input file format if MR processing 
needs to be done.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-07-09 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881222#comment-16881222
 ] 

Steve Loughran commented on HADOOP-13980:
-

My HADOOP-16384 JIRA exports scan results as a TSV file 

Different listings
* full dump of metastore
* treewalk of S3A FS; shows connected files
* listFiles(recursive) of S3AFS
* LIST / of the S3 store; shows what is really there

Have a play with that patch. 

FAQ:
* why TSV & not CSV? avoids escaping commas; lines up better in editors
* Why TSV and not Avro: avoids writing an avsc file, requiring avro JAR on the 
CP to work, avro jar version problems.
* Why TSV and not JSON? Can't be arsed. Oh, and you can import TSVs into google 
sheets, open in excel etc.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-07-08 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880424#comment-16880424
 ] 

Sean Mackrory commented on HADOOP-13980:


{quote}Export metadatastore and S3 bucket hierarchy{quote}
Is this different in some way from "export scan results in human readable 
format". Thought about maybe having a machine-readable export that we could 
import if that might help with supportability. I've personally never seen a 
support issue that it would've helped with, but just something to think about...

{quote}Implement the fixing mechanism{quote}
We can probably break this into more subtasks. If would be best if the 
implementation had a sequence of specific "fixers" to address specific 
discrepancies. "fixMissingParents", "fixOutOfDateEntries", etc.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-07-08 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880359#comment-16880359
 ] 

Gabor Bota commented on HADOOP-13980:
-

I started to work on this and I'll create some subtasks with the things we want 
to have with the checker.
First thought:
* Checking metadata consistency between S3 and metadatastore and log it
* Checking internal consistency of the MetadataStore
* Export metadatastore and S3 bucket hierarchi 
* Export scan results in human readable format
* Implement the fixing mechanism

As you can see the first thing that will be implemented is the consistency 
checker. If you agree with this (so no concerns or ideas) I'll create these 
sub-tasks and create a pull request for the first one.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-06-19 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867530#comment-16867530
 ] 

Gabor Bota commented on HADOOP-13980:
-

I already have a defined inconsistency in the WIP docs:
 * A file exists under a path for which there is a tombstone entry in the MS
 * *R1*: If the S3 bucket contains the directory - so it is not deleted - 
re-create the directory entry. 
 * *R2*: If there’s no path for the directory in S3 (so the file does not 
exist) add tombstone entry (isDeleted=true) to the ms for the file

Note: If the object exists in S3 there can be no case that the 'directory' is 
deleted in S3 because it's just a prefix, so we just need to check for the 
object. We can add tombstone if it's deleted becase we treat S3 consistent when 
checking.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-06-18 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866468#comment-16866468
 ] 

Steve Loughran commented on HADOOP-13980:
-

Extra inconsistency I've managed to somehow create in my own code

* directory is a tombstone
* child exists

I think this may be from some of the rename optimisations of HADOOP-15183: are 
we trying to be too clever in walking up the tree in addAncestors, and we 
should always try to write up to the top? 

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-06-17 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865709#comment-16865709
 ] 

Gabor Bota commented on HADOOP-13980:
-

Also, I agree with [~fabbri]'s idea that we should not consider "auth mode" as 
a factor.
We just check if the directory with is_auth==true is actually contains the same 
contents the bucket on s3. No need for check for the configuration if auth mode 
is enabled. Just the flag on the dir listing.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-06-17 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865663#comment-16865663
 ] 

Gabor Bota commented on HADOOP-13980:
-

Thanks [~ste...@apache.org], [~fabbri] for the spec draft and for the ideas. I 
started to create a draft for specs a while ago, but haven't updated much 
lately, so I took [~ste...@apache.org]'s ideas an put it to my docs and 
reformatted it.
Here's the doc: 
https://docs.google.com/document/d/1Gcl_dVLl0x7PCxfsFjp-ClBlbP4klo9hjocOKCGTi3s/edit?usp=sharing

All of you are welcome to comment in the doc. Note that it's still WIP and will 
be updated. The final version will be uploaded here before/during the 
implementation.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-06-13 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863204#comment-16863204
 ] 

Steve Loughran commented on HADOOP-13980:
-

I'd like the ability to get a dump of the state in a format we could analyse 
for support calls, recreating problems etc, 

Proposed: make this Avro

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-06-05 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857029#comment-16857029
 ] 

Aaron Fabbri commented on HADOOP-13980:
---

Thanks for your draft of FSCK requirements [~ste...@apache.org]. This is a good 
start.

One thing that comes to mind: I don't know that we want to consider "auth mode" 
as a factor here.  Erring on the side of over-explaining this stuff for clarity:

There are two main authoritative mode flags in play:

(1) per-directory metastore bit that says "this directory is fully loaded into 
the metastore"

(2) s3a client config bit fs.s3a.metadatastore.authoritative, which allows s3a 
to short-circuit (skip) s3 on some metadata queries. This one is just a runtime 
client behavior flag. You could have multiple clients with different settings 
sharing a bucket. FSCK could also have a different config.  I think you'll 
still want some FSCK options to select the level of enforcement / paranoia as 
you outline, just don't think it needs to be conflated with client's allow auth 
flag. I'd imagine this as a growing set of invariant checks that can be 
categorized into something like basic / paranoid / full.

Whether or not a s3a client has metadatastore.authoritative bit set in its 
config doesn't really affect the contents of the metadata store or its 
relationship to the underlying storage (s3) state**.  If the is_authoritative 
bit is set on a directory in the metastore, however, that directory listing 
from metadatastore should *match* the listing of that dir from s3. If the bit 
is not set, the metastore listing should be a subset of the s3 listing.

I would also split the consistency checks into two categories: 
MetadataStore-specific, and generic. Majority of the stuff here are generic 
tests that work with any MetadataStore. DDB also needs to check its internal 
consistency (since it uses the ancestor-exists invariant to avoid table scans).

Also agreed you'll need table scans here–but how do we expose this for FSCK 
only? FSCK traditionally reaches below the FS to check its structures. (e.g. 
ext3 fsck uses a block device below the ext3 fs to check on disk format, 
right?).

 

** some nuance here, if we want to discuss further.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-05-29 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851113#comment-16851113
 ] 

Steve Loughran commented on HADOOP-13980:
-

My thoughts on the topic. I managed to get a DDB table inconsistent today: fun. 
If you have a tombstone marker but child entries in the DDB underneath, an 
rm(dir) succeeds (no directory), but exists of the child underneath is still 
true

h1. What would an {{fsck}} operation against S3Guard actually do?

The goal of an {{fsck}} operation is to 

# verify that the DDB table is internally consistent
# verify that the DDB table is consistent with the store.


h2. Definitions of inconsistency

h3. Both modes

* All entries in DDB other than the root entry have a parent entry.
* Except in the special case that the object store itself is in this state: no 
entries have a parent which is a file.
* Every file entry has a directory entry as a parent, or it is in the root 
directory.
* Every directory entry has a directory entry as a parent, or it is in the root 
directory.


h3. Auth mode

* No file exists under a path for which there is a tombstone entry.
* Every directory entry has a directory (getFileStatus() path => isDirectory)
* Every file entry has a path in the S3 store where the size, versionId and 
etag match.

h3. Non-auth mode

* Where a path has a tombstone marker, there is no file which exists. If so, 
update DDB
* Where a path has an entry, the file exists. If not, update DDB
* Where a path points to a file, size and any etag and version ID matches that 
of the object. 



h2, Assumptions: 

* the S3 bucketstore is consistent w.r.t any changes and is not actively being 
updated.
* No other fsck of the S3 bucket+matching DDB table is in progress.
* other S3 bucket may share the same DDB table, and may be in active use.
* we don't care about throttling/conflict with other users, even in same table 
(i.e. its on demand or isolated).
* we aren't trying to optimise for minimising throttling with S3, though 
performance always matters.
* we have lots of memory in a local process to cache information like a 
directory listing.
* The store could have many millions of entries.
* There are more files than directories.
* Nobody wants to address inconsistencies by deleting data.
* Some of the entries in the table lack etags and versions.

h1. Operations we could do

h2. DDB internal consistency

* verify that the DDB table is a consistent tree: that there are no orphan 
entries or entries under a deleted directory
* optionally: create the parent entries, including overwriting tombstones.

This will fix consistency within the table, without making any assertions about 
the consistency with the store


h2. S3Guard to S3 consistency

* all files in S3Guard exist in S3
* if an entry lists an etag, that matches the real status
* if an entry includes a version, a file with that versionID exists
* if an entry is a tombstone marker, there is no S3 file at that path

h2. S3 to S3Guard consistency

* All entries in S3 exist in the DDB table. This is only valid in auth mode, as 
in non-auth mode it is moot.
* There are no entries in S3 for which the DDB entry is a tombstone marker.

h1. Actions on success

* Collect the entire listing and export it as: CSV, XML, Avro, JSON

h1. Actions on failure

h2. All

* Collect the entire listing and export it as: CSV, XML, Avro, JSON
* Generate report on the problem.
* Fail if there is an inconsistency

h2. DDB internal consistency
* DDB internal consistency: add missing parent entries, replace tombstones with 
entries.
* Purge all tombstones irrespective of age

h2. S3Guard to S3 consistency

* If a file does not exist, delete that entry.
* If a directory does not exist, delete that entry.
* Update versionID, etag and size with any new values.
* Log files deleted, updated.

h2. S3 to S3Guard consistency

* Add any new files with parents. This is the import operation.
* Log/record files added.


h1. How to implement (efficiently)


h2. DDB consistency

Requirement: every item is either root or has a parent directory which has not 
been deleted.

It is not enough to list the children off root and then recurse down the tree, 
because that will not find orphan entries (though it will find file-under-file 
and file-under-tombstone errors). 

A breadth first search where we cache all entries in the parent level would 
work, if we can create queries for parent paths like "/*/*". 

# list all entries at a depth, {{d}}.
# verify that all entries have a parent in the directory list of depth {{d-1}}.
# add all directory entries to the directory list of depth {{d}}
# repeat until there is a listing with no child entries
# then do a final scan of all children of arbitrary depth under that level 
(these are implicitly orphan)

It's too expensive to do an S3 HEAD for every entry in the DDB table; and too 
slow. 

Better to do a bulk list under a path 

[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-03-29 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805187#comment-16805187
 ] 

Gabor Bota commented on HADOOP-13980:
-

I started to a doc on this a while ago, I can add these new requirements to it 
and upload it after the guava27 update.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-03-29 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805147#comment-16805147
 ] 

Steve Loughran commented on HADOOP-13980:
-

+prune all obsolete tombstones (HADOOP-14000/HADOOP-16184).
+update store with etags & versions
+add ability to generate report (format?) and save to any hadoop FS
 

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-01-31 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757479#comment-16757479
 ] 

Andrew Olson commented on HADOOP-13980:
---

[~gabor.bota] Ok great, I imagined that it should be. In the event that the 
bucket is very large, that option could be helpful to split up the work into 
manageable chunks. Or if for some reason it contains non-S3A objects that were 
uploaded and need to be excluded from the check.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-01-31 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757449#comment-16757449
 ] 

Gabor Bota commented on HADOOP-13980:
-

I haven't started working on the feature yet, but it seems like a viable 
feature to add.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-01-31 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757434#comment-16757434
 ] 

Andrew Olson commented on HADOOP-13980:
---

Could this be limited to a specified parent directory?

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2017-11-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243835#comment-16243835
 ] 

Steve Loughran commented on HADOOP-13980:
-

what about `fsync --check` and `fsync --fix`? We have a `s3guard import` 
command, but it assumes the table is unpopulated. Here I'm thinking "we have 
the table, but it may have diverged after failures. Check, or check and fix"

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org