[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-11-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787445#comment-17787445
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

hadoop-yetus commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1817346717

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  shelldocs  |   0m  0s |  |  Shelldocs was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 13s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  35m  2s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |  16m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   4m 43s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  18m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   8m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   7m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |  31m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  67m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 41s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  33m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 42s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |  17m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |  16m 24s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 35s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |  14m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shellcheck  |   0m  1s |  |  No new issues.  |
   | +1 :green_heart: |  javadoc  |   8m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   7m 22s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |  32m  2s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  67m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 783m 45s | 
[/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/5/artifact/out/patch-unit-root.txt)
 |  root in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 1158m 10s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   |   | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6000 |
   | Optional Tests | dupname asflicense codespell detsecrets xmllint compile 
javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle 
shellcheck shelldocs |
   | uname | Linux 3aa23921fa44 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 798a48a97da0e8c6263b513e5859a71a4516ef8b |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787075#comment-17787075
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mehakmeet commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1815897072

   
[/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/4/artifact/out/patch-unit-root.txt)
 says the Build was success but still gave a -1. Going to put an empty patch up 
for yetus again
   CC: @mukund-thakur @steveloughran 




> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: pull-request-available
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-11-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786520#comment-17786520
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

hadoop-yetus commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1813345909

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  shelldocs  |   0m  0s |  |  Shelldocs was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 57s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   8m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   7m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   2m  6s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  11m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   4m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   4m 57s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |  16m 57s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  16m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m  1s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   8m  1s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   7m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   7m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 59s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   6m 54s |  |  the patch passed  |
   | +1 :green_heart: |  shellcheck  |   0m  0s |  |  No new issues.  |
   | +1 :green_heart: |  javadoc  |   4m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   4m 55s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |  17m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m  9s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 630m 50s | 
[/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/4/artifact/out/patch-unit-root.txt)
 |  root in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 832m 18s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6000 |
   | Optional Tests | dupname asflicense codespell detsecrets xmllint compile 
javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle 
shellcheck shelldocs |
   | uname | Linux 198df243d65b 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8cd07fdca6657e735d6ee4c1f723604fc9e3887e |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-11-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784214#comment-17784214
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mukund-thakur commented on code in PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#discussion_r1387231564


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/mapreduce/S3AAuditLogMergerAndParser.java:
##
@@ -83,17 +83,20 @@ public HashMap parseAuditLog(String 
singleAuditLog) {
   return auditLogMap;
 }
 final Matcher matcher = LOG_ENTRY_PATTERN.matcher(singleAuditLog);
-boolean patternMatching = matcher.matches();
-if (patternMatching) {
+boolean patternMatched = matcher.matches();
+if (patternMatched) {
   for (String key : AWS_LOG_REGEXP_GROUPS) {
 try {
   final String value = matcher.group(key);
   auditLogMap.put(key, value);
 } catch (IllegalStateException e) {
+  LOG.debug("Skipping key :{} due to no matching with the audit log "
+  + "pattern :", key);
   LOG.debug(String.valueOf(e));
 }
   }
 }
+LOG.info("MMT audit map: {}", auditLogMap);

Review Comment:
   nit: is MMT needed? :P



##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/audit/TestS3AAuditLogMergerAndParser.java:
##
@@ -0,0 +1,273 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.util.Map;
+
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.AbstractS3ATestBase;
+import org.apache.hadoop.fs.s3a.audit.mapreduce.S3AAuditLogMergerAndParser;
+
+/**
+ * This will implement different tests on S3AAuditLogMergerAndParser class.
+ */
+public class TestS3AAuditLogMergerAndParser extends AbstractS3ATestBase {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(TestS3AAuditLogMergerAndParser.class);
+
+  /**
+   * A real log entry.
+   * This is derived from a real log entry on a test run.
+   * If this needs to be updated, please do it from a real log.
+   * Splitting this up across lines has a tendency to break things, so
+   * be careful making changes.
+   */
+  static final String SAMPLE_LOG_ENTRY =
+  "183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a400"
+  + " bucket-london"
+  + " [13/May/2021:11:26:06 +]"
+  + " 109.157.171.174"
+  + " arn:aws:iam::152813717700:user/dev"
+  + " M7ZB7C4RTKXJKTM9"
+  + " REST.PUT.OBJECT"
+  + " fork-0001/test/testParseBrokenCSVFile"
+  + " \"PUT /fork-0001/test/testParseBrokenCSVFile HTTP/1.1\""
+  + " 200"
+  + " -"
+  + " -"
+  + " 794"
+  + " 55"
+  + " 17"
+  + " \"https://audit.example.org/hadoop/1/op_create/;
+  + "e8ede3c7-8506-4a43-8268-fe8fcbb510a4-0278/"
+  + "?op=op_create"
+  + "=fork-0001/test/testParseBrokenCSVFile"
+  + "=alice"
+  + "=2eac5a04-2153-48db-896a-09bc9a2fd132"
+  + "=e8ede3c7-8506-4a43-8268-fe8fcbb510a4-0278=154"
+  + "=e8ede3c7-8506-4a43-8268-fe8fcbb510a4=156&"
+  + "ts=1620905165700\""
+  + " \"Hadoop 3.4.0-SNAPSHOT, java/1.8.0_282 vendor/AdoptOpenJDK\""
+  + " -"
+  + " TrIqtEYGWAwvu0h1N9WJKyoqM0TyHUaY+ZZBwP2yNf2qQp1Z/0="
+  + " SigV4"
+  + " ECDHE-RSA-AES128-GCM-SHA256"
+  + " AuthHeader"
+  + " bucket-london.s3.eu-west-2.amazonaws.com"
+  + " TLSv1.2" + "\n";
+
+  static final String SAMPLE_LOG_ENTRY_1 =
+  "01234567890123456789"
+  + " bucket-london1"
+  + " [13/May/2021:11:26:06 +]"
+  + " 109.157.171.174"
+  + " arn:aws:iam::152813717700:user/dev"
+  + " M7ZB7C4RTKXJKTM9"
+  + " REST.PUT.OBJECT"
+  + " 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-11-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783304#comment-17783304
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

steveloughran commented on code in PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#discussion_r1383731616


##
hadoop-tools/hadoop-aws/src/test/resources/TestAuditLogs/sampleLog1:
##
@@ -0,0 +1,36 @@
+183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a400 bucket-london 
[13/May/2021:11:26:06 +] 109.157.171.174 arn:aws:iam::152813717700:user/dev 
M7ZB7C4RTKXJKTM9 REST.PUT.OBJECT fork-0001/test/testParseBrokenCSVFile "PUT 
/fork-0001/test/testParseBrokenCSVFile HTTP/1.1" 200 - - 794 55 17 
"https://audit.example.org/hadoop/1/op_create/e8ede3c7-8506-4a43-8268-fe8fcbb510a4-0278/?op=op_create=fork-0001/test/testParseBrokenCSVFile=alice=2eac5a04-2153-48db-896a-09bc9a2fd132=e8ede3c7-8506-4a43-8268-fe8fcbb510a4-0278=154=e8ede3c7-8506-4a43-8268-fe8fcbb510a4=156=1620905165700;
 "Hadoop 3.4.0-SNAPSHOT, java/1.8.0_282 vendor/AdoptOpenJDK" - 
TrIqtEYGWAwvu0h1N9WJKyoqM0TyHUaY+ZZBwP2yNf2qQp1Z/0= SigV4 
ECDHE-RSA-AES128-GCM-SHA256 AuthHeader bucket-london.s3.eu-west-2.amazonaws.com 
TLSv1.2";

Review Comment:
   add an excludes in apache-rat-plugin



##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/audit/TestS3AAuditLogMergerAndParser.java:
##
@@ -0,0 +1,273 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.util.Map;
+
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.AbstractS3ATestBase;
+import org.apache.hadoop.fs.s3a.audit.mapreduce.S3AAuditLogMergerAndParser;
+
+/**
+ * This will implement different tests on S3AAuditLogMergerAndParser class.
+ */
+public class TestS3AAuditLogMergerAndParser extends AbstractS3ATestBase {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(TestS3AAuditLogMergerAndParser.class);
+
+  /**
+   * A real log entry.
+   * This is derived from a real log entry on a test run.
+   * If this needs to be updated, please do it from a real log.
+   * Splitting this up across lines has a tendency to break things, so
+   * be careful making changes.
+   */
+  static final String SAMPLE_LOG_ENTRY =
+  "183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a400"
+  + " bucket-london"
+  + " [13/May/2021:11:26:06 +]"
+  + " 109.157.171.174"
+  + " arn:aws:iam::152813717700:user/dev"
+  + " M7ZB7C4RTKXJKTM9"
+  + " REST.PUT.OBJECT"
+  + " fork-0001/test/testParseBrokenCSVFile"
+  + " \"PUT /fork-0001/test/testParseBrokenCSVFile HTTP/1.1\""
+  + " 200"
+  + " -"
+  + " -"
+  + " 794"
+  + " 55"
+  + " 17"
+  + " \"https://audit.example.org/hadoop/1/op_create/;
+  + "e8ede3c7-8506-4a43-8268-fe8fcbb510a4-0278/"
+  + "?op=op_create"
+  + "=fork-0001/test/testParseBrokenCSVFile"
+  + "=alice"
+  + "=2eac5a04-2153-48db-896a-09bc9a2fd132"
+  + "=e8ede3c7-8506-4a43-8268-fe8fcbb510a4-0278=154"
+  + "=e8ede3c7-8506-4a43-8268-fe8fcbb510a4=156&"
+  + "ts=1620905165700\""
+  + " \"Hadoop 3.4.0-SNAPSHOT, java/1.8.0_282 vendor/AdoptOpenJDK\""
+  + " -"
+  + " TrIqtEYGWAwvu0h1N9WJKyoqM0TyHUaY+ZZBwP2yNf2qQp1Z/0="
+  + " SigV4"
+  + " ECDHE-RSA-AES128-GCM-SHA256"
+  + " AuthHeader"
+  + " bucket-london.s3.eu-west-2.amazonaws.com"
+  + " TLSv1.2" + "\n";
+
+  static final String SAMPLE_LOG_ENTRY_1 =
+  "01234567890123456789"
+  + " bucket-london1"
+  + " [13/May/2021:11:26:06 +]"
+  + " 109.157.171.174"
+  + " arn:aws:iam::152813717700:user/dev"
+  + " M7ZB7C4RTKXJKTM9"
+  + " 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783112#comment-17783112
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mehakmeet commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1794196516

   Addressed the review comments. CC: @mukund-thakur @steveloughran 




> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: pull-request-available
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-10-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771525#comment-17771525
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

hadoop-yetus commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1745239088

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  shelldocs  |   0m  0s |  |  Shelldocs was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m 57s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 22s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shellcheck  |   0m  0s |  |  No new issues.  |
   | +1 :green_heart: |  javadoc  |   0m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 50s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 58s |  |  hadoop-aws in the patch passed. 
 |
   | -1 :x: |  asflicense  |   0m 42s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/3/artifact/out/results-asflicense.txt)
 |  The patch generated 2 ASF License warnings.  |
   |  |   | 135m 39s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6000 |
   | Optional Tests | dupname asflicense codespell detsecrets xmllint compile 
javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle 
shellcheck shelldocs |
   | uname | Linux c5cdbb7413f0 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ac7266bc5e6c19e62a2e0e657db70cb4c0226376 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/3/testReport/ |
   | Max. process+thread count | 681 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-09-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770306#comment-17770306
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mehakmeet commented on code in PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#discussion_r1340942956


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AuditTool.java:
##
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.conf.Configured;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.audit.mapreduce.S3AAuditLogMergerAndParser;
+import org.apache.hadoop.util.ExitUtil;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_COMMAND_ARGUMENT_ERROR;
+import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_FAIL;
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_SUCCESS;
+
+/**
+ * AuditTool is a Command Line Interface.
+ * Its functionality is to parse the audit log files
+ * and generate avro file.
+ */
+public class AuditTool extends Configured implements Tool, Closeable {

Review Comment:
   I think initially we went with that, but then changed it to be an alone 
audit log tool, not quite sure why we didn't go that route. Were there any 
plans to remove s3guard tool in the future, since we would have to separate it 
out then.





> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: pull-request-available
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-09-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769341#comment-17769341
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mukund-thakur commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1736255549

   > Not currently. Is that something we would have to write the logic off of, 
I'll have to check the code for it? Specifically for number of mappers maybe we 
could have a threshold of number of files and then paginate them based on that.
   
   Looks like this is completely serial now. But you can think of this as a 
follow-up and maybe add support for that in the future once this gets used. 
Just create a jira for now. 
   
   




> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: pull-request-available
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-09-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769340#comment-17769340
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mukund-thakur commented on code in PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#discussion_r1337744566


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AuditTool.java:
##
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.conf.Configured;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.audit.mapreduce.S3AAuditLogMergerAndParser;
+import org.apache.hadoop.util.ExitUtil;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_COMMAND_ARGUMENT_ERROR;
+import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_FAIL;
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_SUCCESS;
+
+/**
+ * AuditTool is a Command Line Interface.
+ * Its functionality is to parse the audit log files
+ * and generate avro file.
+ */
+public class AuditTool extends Configured implements Tool, Closeable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(AuditTool.class);
+
+  private final S3AAuditLogMergerAndParser s3AAuditLogMergerAndParser =
+  new S3AAuditLogMergerAndParser();
+
+  /**
+   * Name of this tool: {@value}.
+   */
+  public static final String AUDIT_TOOL =
+  "org.apache.hadoop.fs.s3a.audit.AuditTool";
+
+  /**
+   * Purpose of this tool: {@value}.
+   */
+  public static final String PURPOSE =
+  "\n\nUSAGE:\nMerge, parse audit log files and convert into avro file "
+  + "for "
+  + "better "
+  + "visualization";
+
+  // Exit codes
+  private static final int SUCCESS = EXIT_SUCCESS;
+  private static final int FAILURE = EXIT_FAIL;
+  private static final int INVALID_ARGUMENT = EXIT_COMMAND_ARGUMENT_ERROR;
+
+  private static final String USAGE =
+  "bin/hadoop " + "Class" + " DestinationPath" + " SourcePath" + "\n" +

Review Comment:
   Okay.. change one BUCKET to source_bucket and other to destination_bucket.





> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: pull-request-available
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-09-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769126#comment-17769126
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

steveloughran commented on code in PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#discussion_r1337069562


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AuditTool.java:
##
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.conf.Configured;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.audit.mapreduce.S3AAuditLogMergerAndParser;
+import org.apache.hadoop.util.ExitUtil;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_COMMAND_ARGUMENT_ERROR;
+import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_FAIL;
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_SUCCESS;
+
+/**
+ * AuditTool is a Command Line Interface.
+ * Its functionality is to parse the audit log files
+ * and generate avro file.
+ */
+public class AuditTool extends Configured implements Tool, Closeable {

Review Comment:
   what about makign this something the hadoop s3guard can invoke?



##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/audit/TestS3AAuditLogMergerAndParser.java:
##
@@ -0,0 +1,273 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.util.Map;
+
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.AbstractS3ATestBase;
+import org.apache.hadoop.fs.s3a.audit.mapreduce.S3AAuditLogMergerAndParser;
+
+/**
+ * This will implement different tests on S3AAuditLogMergerAndParser class.
+ */
+public class TestS3AAuditLogMergerAndParser extends AbstractS3ATestBase {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(TestS3AAuditLogMergerAndParser.class);
+
+  /**
+   * A real log entry.
+   * This is derived from a real log entry on a test run.
+   * If this needs to be updated, please do it from a real log.
+   * Splitting this up across lines has a tendency to break things, so
+   * be careful making changes.
+   */
+  static final String SAMPLE_LOG_ENTRY =
+  "183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a400"
+  + " bucket-london"
+  + " [13/May/2021:11:26:06 +]"
+  + " 109.157.171.174"
+  + " arn:aws:iam::152813717700:user/dev"
+  + " M7ZB7C4RTKXJKTM9"
+  + " REST.PUT.OBJECT"
+  + " fork-0001/test/testParseBrokenCSVFile"
+  + " \"PUT 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-09-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768740#comment-17768740
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

hadoop-yetus commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1733914140

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 35s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  56m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m  6s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | -1 :x: |  spotbugs  |   0m 53s | 
[/branch-spotbugs-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/2/artifact/out/branch-spotbugs-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in trunk failed.  |
   | +1 :green_heart: |  shadedclient  |   5m 17s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 40s | 
[/patch-mvninstall-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/2/artifact/out/patch-mvninstall-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch failed.  |
   | -1 :x: |  compile  |   0m 26s | 
[/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/2/artifact/out/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt)
 |  hadoop-aws in the patch failed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.  |
   | -1 :x: |  javac  |   0m 26s | 
[/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/2/artifact/out/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt)
 |  hadoop-aws in the patch failed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.  |
   | -1 :x: |  compile  |   0m 24s | 
[/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/2/artifact/out/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05.  |
   | -1 :x: |  javac  |   0m 24s | 
[/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/2/artifact/out/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/buildtool-patch-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/2/artifact/out/buildtool-patch-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  The patch fails to run checkstyle in hadoop-aws  |
   | -1 :x: |  mvnsite  |   0m 22s | 
[/patch-mvnsite-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/2/artifact/out/patch-mvnsite-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch failed.  |
   | -1 :x: |  javadoc  |   0m 24s | 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-09-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768659#comment-17768659
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mehakmeet commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1733516788

   > Have you tested this on actual files? And specifically so many files... 
total size in GB's kind of scale testing?
   
   Have tested this but not at scale. Will do that.
   Example test:
   
   ```
   ❯ bin/hadoop org.apache.hadoop.fs.s3a.audit.AuditTool 
s3a://mehakmeet-singh-data/logdir2/ s3a://mehakmeet-singh-data/logsdir/
   16:48:41,319 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   16:49:58,339 INFO mapreduce.S3AAuditLogMergerAndParser: Successfully 
generated avro data
   16:49:58,839 INFO mapreduce.S3AAuditLogMergerAndParser: Successfully parsed 
:7547 audit logs and 6718 referrer headers logs in the logs
   16:49:58,854 INFO impl.MetricsSystemImpl: Stopping s3a-file-system metrics 
system...
   16:49:58,854 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
stopped.
   16:49:58,854 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
shutdown complete.
   ```
   Since we're reading each file serially per line, I would assume this would 
be alot slower in that scenario. Optimisation can be a follow-up patch. 
   
   > If there are so many files for ex 1000, does it launch multiple mappers to 
process x files ny each mapper based on the splits?
   Not currently. Is that something we would have to write the logic off of, 
I'll have to check the code for it? Specifically for number of mappers maybe we 
could have a threshold of number of files and then paginate them based on that.




> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: pull-request-available
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-09-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768655#comment-17768655
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mehakmeet commented on code in PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#discussion_r1335767883


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AuditTool.java:
##
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.conf.Configured;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.audit.mapreduce.S3AAuditLogMergerAndParser;
+import org.apache.hadoop.util.ExitUtil;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_COMMAND_ARGUMENT_ERROR;
+import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_FAIL;
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_SUCCESS;
+
+/**
+ * AuditTool is a Command Line Interface.
+ * Its functionality is to parse the audit log files
+ * and generate avro file.
+ */
+public class AuditTool extends Configured implements Tool, Closeable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(AuditTool.class);
+
+  private final S3AAuditLogMergerAndParser s3AAuditLogMergerAndParser =
+  new S3AAuditLogMergerAndParser();
+
+  /**
+   * Name of this tool: {@value}.
+   */
+  public static final String AUDIT_TOOL =
+  "org.apache.hadoop.fs.s3a.audit.AuditTool";
+
+  /**
+   * Purpose of this tool: {@value}.
+   */
+  public static final String PURPOSE =
+  "\n\nUSAGE:\nMerge, parse audit log files and convert into avro file "
+  + "for "
+  + "better "
+  + "visualization";
+
+  // Exit codes
+  private static final int SUCCESS = EXIT_SUCCESS;
+  private static final int FAILURE = EXIT_FAIL;
+  private static final int INVALID_ARGUMENT = EXIT_COMMAND_ARGUMENT_ERROR;
+
+  private static final String USAGE =
+  "bin/hadoop " + "Class" + " DestinationPath" + " SourcePath" + "\n" +
+  "bin/hadoop " + AUDIT_TOOL + " s3a://BUCKET" + " s3a://BUCKET" + 
"\n";
+
+  private PrintWriter out;
+
+  public AuditTool() {
+super();
+  }
+
+  /**
+   * Tells us the usage of the AuditTool by commands.
+   *
+   * @return the string USAGE
+   */
+  public String getUsage() {
+return USAGE + PURPOSE;
+  }
+
+  public String getName() {
+return AUDIT_TOOL;
+  }
+
+  /**
+   * This run method in AuditTool takes source and destination path of bucket,
+   * and check if there are directories and pass these paths to merge and
+   * parse audit log files.
+   *
+   * @param args argument list
+   * @return SUCCESS i.e, '0', which is an exit code
+   * @throws Exception on any failure.
+   */
+  @Override
+  public int run(String[] args) throws Exception {
+List paths = Arrays.asList(args);
+if(paths.size() == 2) {
+  // Path of audit log files
+  Path logsPath = new Path(paths.get(1));
+  // Path of destination directory
+  Path destPath = new Path(paths.get(0));
+
+  // Setting the file system
+  URI fsURI = new URI(logsPath.toString());
+  FileSystem fileSystem = FileSystem.get(fsURI, new Configuration());
+
+  FileStatus fileStatus = fileSystem.getFileStatus(logsPath);
+  if (fileStatus.isFile()) {
+errorln("Expecting a directory, but " + logsPath.getName() + " is a"
++ " file which was passed as an argument");
+throw invalidArgs(
+"Expecting a directory, but " + logsPath.getName() + " is a"
++ " file which was passed as an argument");
+  }
+  

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-09-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768647#comment-17768647
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mehakmeet commented on code in PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#discussion_r1335750686


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AuditTool.java:
##
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.conf.Configured;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.audit.mapreduce.S3AAuditLogMergerAndParser;
+import org.apache.hadoop.util.ExitUtil;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_COMMAND_ARGUMENT_ERROR;
+import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_FAIL;
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_SUCCESS;
+
+/**
+ * AuditTool is a Command Line Interface.
+ * Its functionality is to parse the audit log files
+ * and generate avro file.
+ */
+public class AuditTool extends Configured implements Tool, Closeable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(AuditTool.class);
+
+  private final S3AAuditLogMergerAndParser s3AAuditLogMergerAndParser =
+  new S3AAuditLogMergerAndParser();
+
+  /**
+   * Name of this tool: {@value}.
+   */
+  public static final String AUDIT_TOOL =
+  "org.apache.hadoop.fs.s3a.audit.AuditTool";
+
+  /**
+   * Purpose of this tool: {@value}.
+   */
+  public static final String PURPOSE =
+  "\n\nUSAGE:\nMerge, parse audit log files and convert into avro file "
+  + "for "
+  + "better "
+  + "visualization";
+
+  // Exit codes
+  private static final int SUCCESS = EXIT_SUCCESS;
+  private static final int FAILURE = EXIT_FAIL;
+  private static final int INVALID_ARGUMENT = EXIT_COMMAND_ARGUMENT_ERROR;
+
+  private static final String USAGE =
+  "bin/hadoop " + "Class" + " DestinationPath" + " SourcePath" + "\n" +

Review Comment:
   this is to essentially define the command in verbose vs an example. This is 
what we'll see 
   ```
   ❯ bin/hadoop org.apache.hadoop.fs.s3a.audit.AuditTool
   bin/hadoop Class DestinationPath SourcePath
   bin/hadoop org.apache.hadoop.fs.s3a.audit.AuditTool s3a://BUCKET s3a://BUCKET
   ```
   I'll simplify this, it may be confusing



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AuditTool.java:
##
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.slf4j.Logger;
+import 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-09-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767309#comment-17767309
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mukund-thakur commented on code in PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#discussion_r1317864139


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AuditTool.java:
##
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.conf.Configured;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.audit.mapreduce.S3AAuditLogMergerAndParser;
+import org.apache.hadoop.util.ExitUtil;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_COMMAND_ARGUMENT_ERROR;
+import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_FAIL;
+import static 
org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_SUCCESS;
+
+/**
+ * AuditTool is a Command Line Interface.
+ * Its functionality is to parse the audit log files
+ * and generate avro file.
+ */
+public class AuditTool extends Configured implements Tool, Closeable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(AuditTool.class);
+
+  private final S3AAuditLogMergerAndParser s3AAuditLogMergerAndParser =
+  new S3AAuditLogMergerAndParser();
+
+  /**
+   * Name of this tool: {@value}.
+   */
+  public static final String AUDIT_TOOL =
+  "org.apache.hadoop.fs.s3a.audit.AuditTool";
+
+  /**
+   * Purpose of this tool: {@value}.
+   */
+  public static final String PURPOSE =
+  "\n\nUSAGE:\nMerge, parse audit log files and convert into avro file "
+  + "for "
+  + "better "
+  + "visualization";
+
+  // Exit codes
+  private static final int SUCCESS = EXIT_SUCCESS;
+  private static final int FAILURE = EXIT_FAIL;
+  private static final int INVALID_ARGUMENT = EXIT_COMMAND_ARGUMENT_ERROR;
+
+  private static final String USAGE =
+  "bin/hadoop " + "Class" + " DestinationPath" + " SourcePath" + "\n" +
+  "bin/hadoop " + AUDIT_TOOL + " s3a://BUCKET" + " s3a://BUCKET" + 
"\n";
+
+  private PrintWriter out;
+
+  public AuditTool() {
+super();
+  }
+
+  /**
+   * Tells us the usage of the AuditTool by commands.
+   *
+   * @return the string USAGE
+   */
+  public String getUsage() {
+return USAGE + PURPOSE;
+  }
+
+  public String getName() {
+return AUDIT_TOOL;
+  }
+
+  /**
+   * This run method in AuditTool takes source and destination path of bucket,
+   * and check if there are directories and pass these paths to merge and
+   * parse audit log files.
+   *
+   * @param args argument list
+   * @return SUCCESS i.e, '0', which is an exit code
+   * @throws Exception on any failure.
+   */
+  @Override
+  public int run(String[] args) throws Exception {
+List paths = Arrays.asList(args);
+if(paths.size() == 2) {
+  // Path of audit log files
+  Path logsPath = new Path(paths.get(1));
+  // Path of destination directory
+  Path destPath = new Path(paths.get(0));
+
+  // Setting the file system
+  URI fsURI = new URI(logsPath.toString());
+  FileSystem fileSystem = FileSystem.get(fsURI, new Configuration());
+
+  FileStatus fileStatus = fileSystem.getFileStatus(logsPath);

Review Comment:
   nit: put the variable name as logsFileStatus



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/mapreduce/S3AAuditLogMergerAndParser.java:
##
@@ -0,0 +1,281 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-08-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760311#comment-17760311
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

hadoop-yetus commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1698859376

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  4s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 22s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m  5s |  |  trunk passed  |
   | -1 :x: |  shadedclient  |  36m 46s |  |  branch has errors when building 
and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m  6s |  |  the patch passed  |
   | -1 :x: |  shadedclient  |  36m 24s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 29s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 136m  2s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6000 |
   | Optional Tests | dupname asflicense codespell detsecrets xmllint compile 
javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle |
   | uname | Linux b7186fac4495 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e4f11882fc6ab6c18ced62866fb8b713fed8fda5 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/1/testReport/ |
   | Max. process+thread count | 457 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6000/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Analyzing S3A Audit Logs 
> 

[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2023-08-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760267#comment-17760267
 ] 

ASF GitHub Bot commented on HADOOP-18257:
-

mehakmeet opened a new pull request, #6000:
URL: https://github.com/apache/hadoop/pull/6000

   ### Description of PR
   This is a follow-up to #4383 PR with most of the code from that PR already 
in place.
   Adding support for an Audit Tool to merge, parse audit logs into avro file.
   ### How was this patch tested?
   `mvn clean verify -Dparallel-tests -DtestsThreadCount=4 -Dscale`
   
   ```
   Tests run: 454, Failures: 0, Errors: 0, Skipped: 4
   
   Tests run: 1171, Failures: 0, Errors: 0, Skipped: 138
   
   Tests run: 135, Failures: 0, Errors: 1, Skipped: 10 (Timeout, unrelated)
   ```
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Assignee: Mehakmeet Singh
>Priority: Major
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2022-05-31 Thread Sravani Gadey (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544171#comment-17544171
 ] 

Sravani Gadey commented on HADOOP-18257:


Yeah sure. Already started creating patches for each sub-task. Will do the 
things done one-by-one.

> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Assignee: Sravani Gadey
>Priority: Major
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2022-05-30 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17543874#comment-17543874
 ] 

Steve Loughran commented on HADOOP-18257:
-

assigned it to you; you co the same to the others. i propose

* doing this into hadoop-aws for future releases
* we copy this into cloudstore after so we can ship early and run against older 
versions

> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Assignee: Sravani Gadey
>Priority: Major
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18257) Analyzing S3A Audit Logs

2022-05-25 Thread Sravani Gadey (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541949#comment-17541949
 ] 

Sravani Gadey commented on HADOOP-18257:


cc [~ste...@apache.org] [~mehakmeetSingh] 

> Analyzing S3A Audit Logs 
> -
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Sravani Gadey
>Priority: Major
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and 
> Spark jobs.
> Steps involved are:
>  * Merging audit log files containing huge number of audit logs collected 
> from a job containing various S3 requests.
>  * Parsing audit logs using regular expressions i.e., dividing them into key 
> value pairs.
>  * Converting the key value pairs into CSV file and AVRO file formats.
>  * Querying on data which would give better insights for different jobs.
>  * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org