Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-03-07 Thread via GitHub


abstractdog merged PR #5054:
URL: https://github.com/apache/hive/pull/5054


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-03-07 Thread via GitHub


zhangbutao commented on PR #5054:
URL: https://github.com/apache/hive/pull/5054#issuecomment-1985206960

   > @zhangbutao : thanks for your comments, according to the last comment, I'm 
considering this as approved, let me know if it's otherwise
   
   Yes, free to merge this change. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-03-07 Thread via GitHub


abstractdog commented on PR #5054:
URL: https://github.com/apache/hive/pull/5054#issuecomment-1985203059

   @zhangbutao : thanks for you comments, according to the last comment, I'm 
considering this as approved, let me know if it's otherwise


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-23 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1501361266


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private List localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = Arrays.asList(localDirs);
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);

Review Comment:
   HOUR is just the default unit for config values like "2"
   the unit used here should correspond to the minus operation's first argument



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-23 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1501361100


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private List localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;

Review Comment:
   this depends on another review comment, I think we'll keep this SECONDS, so 
we can choose between fileModifyTimeThresholdSec and fileModifyTimeThreshold



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-23 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1501361157


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private List localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = Arrays.asList(localDirs);
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,

Review Comment:
   we resolve the time units to seconds internally, so it would make sense to 
log them as they are



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-23 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1501360964


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private List localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = Arrays.asList(localDirs);
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);

Review Comment:
   let's agree on this, and then we can resolve other comments I think



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-23 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1501360812


##
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemon.java:
##
@@ -83,26 +95,67 @@ public void tearDown() {
 for (String mSource : METRICS_SOURCES) {
   ms.unregisterSource(mSource);
 }
-daemon.shutdown();
+if (daemon != null) {
+  daemon.shutdown();
+}
   }
 
   @Test(expected = IllegalArgumentException.class)
   public void testEnforceProperNumberOfIOThreads() throws IOException {
-Configuration thisHiveConf = new HiveConf();
-HiveConf.setVar(thisHiveConf, HiveConf.ConfVars.LLAP_DAEMON_SERVICE_HOSTS, 
"@llap");
-HiveConf.setIntVar(thisHiveConf, 
HiveConf.ConfVars.LLAP_DAEMON_NUM_EXECUTORS, 4);
-HiveConf.setIntVar(thisHiveConf, 
HiveConf.ConfVars.LLAP_IO_THREADPOOL_SIZE, 3);
-
-LlapDaemon thisDaemon = new LlapDaemon(thisHiveConf, 1, 
LlapDaemon.getTotalHeapSize(), false, false,
--1, new String[1], 0, false, 0,0, 0, -1, "TestLlapDaemon");
-thisDaemon.close();
+HiveConf.setIntVar(hiveConf, HiveConf.ConfVars.LLAP_IO_THREADPOOL_SIZE, 3);
+
+daemon = new LlapDaemon(hiveConf, 4, LlapDaemon.getTotalHeapSize(), true, 
false,
+-1, new String[1], 0, false, 0,0, 0, defaultWebPort, 
"TestLlapDaemon");
+  }
+
+   @Test
+  public void testLocalDirCleaner() throws IOException, InterruptedException {
+HiveConf.setTimeVar(hiveConf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, 2, TimeUnit.SECONDS);
+HiveConf.setTimeVar(hiveConf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD, 1,
+TimeUnit.SECONDS);

Review Comment:
   it's SECONDS: we set the exact amount of time that we use here in the test



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-23 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1501360630


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private List localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = Arrays.asList(localDirs);
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);

Review Comment:
   we cannot use this as HOUR default, because that way, every config with a 
shorter period will truncated to 0h:
   ```
   2024-02-23T23:03:24,716  INFO [main] impl.LocalDirCleaner: Initialized local 
dir cleaner: interval: 0h, threshold: 0h
   ```
   this happened when I tried to test this (in the unit test I need to 
configure SECONDS to get a result in a timely manner)
   ```
   HiveConf.setTimeVar(hiveConf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, 2, TimeUnit.SECONDS);
   HiveConf.setTimeVar(hiveConf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD, 1,
   TimeUnit.SECONDS);
   ```
   
   I think we can change the default unit in HiveConf to HOURS, but we're free 
to use SECONDS inside LocalDirCleaner, as it's the class' internal behavior



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-23 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1501359175


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5508,6 +5508,17 @@ public static enum ConfVars {
 LLAP_TASK_TIME_SUMMARY(
 "hive.llap.task.time.print.summary", false,
 "Display queue and runtime of tasks by host for every query executed 
by the shell."),
+
+LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL(
+"hive.llap.local.dir.cleaner.cleanup.interval", "2h", new 
TimeValidator(TimeUnit.HOURS),
+  "Interval by which the LocalDirCleaner service in LLAP daemon checks for 
stale/old files." +
+  "Under normal circumstances, local files are cleaned up properly, so 
it's not recommended to" +
+  "set this more frequent than a couple of hours. Default is 2 hours."),
+
LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD("hive.llap.local.dir.cleaner.file.modify.time.threshold",
 "24h",
+new TimeValidator(TimeUnit.HOURS),
+  "Threshold time for LocalDirCleaner: if a regular file's modify time is 
older than this value, the file gets deleted." +
+  "Defaults to 86400s (1 day), which is a reasonable period for a local 
file to be considered as a stale one."),

Review Comment:
   right, I forgot to update this one :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-22 Thread via GitHub


zhangbutao commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1499298765


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private List localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;

Review Comment:
   Maybe
   `private long fileModifyTimeThresholdHour;`
   or
   `private long fileModifyTimeThreshold;`



##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private List localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = Arrays.asList(localDirs);
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,

Review Comment:
   Maybe the LOG also should use `hour `unit.



##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use 

Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-22 Thread via GitHub


zhangbutao commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1499289540


##
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemon.java:
##
@@ -83,26 +95,67 @@ public void tearDown() {
 for (String mSource : METRICS_SOURCES) {
   ms.unregisterSource(mSource);
 }
-daemon.shutdown();
+if (daemon != null) {
+  daemon.shutdown();
+}
   }
 
   @Test(expected = IllegalArgumentException.class)
   public void testEnforceProperNumberOfIOThreads() throws IOException {
-Configuration thisHiveConf = new HiveConf();
-HiveConf.setVar(thisHiveConf, HiveConf.ConfVars.LLAP_DAEMON_SERVICE_HOSTS, 
"@llap");
-HiveConf.setIntVar(thisHiveConf, 
HiveConf.ConfVars.LLAP_DAEMON_NUM_EXECUTORS, 4);
-HiveConf.setIntVar(thisHiveConf, 
HiveConf.ConfVars.LLAP_IO_THREADPOOL_SIZE, 3);
-
-LlapDaemon thisDaemon = new LlapDaemon(thisHiveConf, 1, 
LlapDaemon.getTotalHeapSize(), false, false,
--1, new String[1], 0, false, 0,0, 0, -1, "TestLlapDaemon");
-thisDaemon.close();
+HiveConf.setIntVar(hiveConf, HiveConf.ConfVars.LLAP_IO_THREADPOOL_SIZE, 3);
+
+daemon = new LlapDaemon(hiveConf, 4, LlapDaemon.getTotalHeapSize(), true, 
false,
+-1, new String[1], 0, false, 0,0, 0, defaultWebPort, 
"TestLlapDaemon");
+  }
+
+   @Test
+  public void testLocalDirCleaner() throws IOException, InterruptedException {
+HiveConf.setTimeVar(hiveConf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, 2, TimeUnit.SECONDS);
+HiveConf.setTimeVar(hiveConf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD, 1,
+TimeUnit.SECONDS);

Review Comment:
   nit
   ```suggestion
   TimeUnit.HOURS);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-22 Thread via GitHub


zhangbutao commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1499289083


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private List localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = Arrays.asList(localDirs);
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);

Review Comment:
   nit:
   ```suggestion
   TimeUnit.HOURS);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-22 Thread via GitHub


zhangbutao commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1499286791


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5508,6 +5508,17 @@ public static enum ConfVars {
 LLAP_TASK_TIME_SUMMARY(
 "hive.llap.task.time.print.summary", false,
 "Display queue and runtime of tasks by host for every query executed 
by the shell."),
+
+LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL(
+"hive.llap.local.dir.cleaner.cleanup.interval", "2h", new 
TimeValidator(TimeUnit.HOURS),
+  "Interval by which the LocalDirCleaner service in LLAP daemon checks for 
stale/old files." +
+  "Under normal circumstances, local files are cleaned up properly, so 
it's not recommended to" +
+  "set this more frequent than a couple of hours. Default is 2 hours."),
+
LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD("hive.llap.local.dir.cleaner.file.modify.time.threshold",
 "24h",
+new TimeValidator(TimeUnit.HOURS),
+  "Threshold time for LocalDirCleaner: if a regular file's modify time is 
older than this value, the file gets deleted." +
+  "Defaults to 86400s (1 day), which is a reasonable period for a local 
file to be considered as a stale one."),

Review Comment:
   nit:
   ```suggestion
 "Defaults to 24h, which is a reasonable period for a local file to be 
considered as a stale one."),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-22 Thread via GitHub


sonarcloud[bot] commented on PR #5054:
URL: https://github.com/apache/hive/pull/5054#issuecomment-1959113379

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5054) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [120 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5054=false=true)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5054=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5054)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-21 Thread via GitHub


zhangbutao commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1497681596


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = localDirs;
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);
+
+for (String localDir : localDirs) {
+  Path pathLocalDir = Paths.get(localDir);
+  cleanupPath(deleteBefore, pathLocalDir);
+}
+  }
+
+  private void cleanupPath(Instant deleteBefore, Path pathLocalDir) {
+LOG.info("Cleaning up files older than {} from {}", deleteBefore, 
pathLocalDir);
+
+try (Stream files = Files.walk(pathLocalDir)) {

Review Comment:
   yes, might be risky. There seems to be no better way. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-21 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1497656070


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = localDirs;
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);
+
+for (String localDir : localDirs) {
+  Path pathLocalDir = Paths.get(localDir);
+  cleanupPath(deleteBefore, pathLocalDir);
+}
+  }
+
+  private void cleanupPath(Instant deleteBefore, Path pathLocalDir) {
+LOG.info("Cleaning up files older than {} from {}", deleteBefore, 
pathLocalDir);
+
+try (Stream files = Files.walk(pathLocalDir)) {

Review Comment:
   deleting an application folder might be risky: for long-running apps it's 
possible that we're using the same application folder for more than a day, so a 
modification/creation date of the folder doesn't really help there, I cannot 
think of a solution where we can safely and easily remove unused folders



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-21 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1497656070


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = localDirs;
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);
+
+for (String localDir : localDirs) {
+  Path pathLocalDir = Paths.get(localDir);
+  cleanupPath(deleteBefore, pathLocalDir);
+}
+  }
+
+  private void cleanupPath(Instant deleteBefore, Path pathLocalDir) {
+LOG.info("Cleaning up files older than {} from {}", deleteBefore, 
pathLocalDir);
+
+try (Stream files = Files.walk(pathLocalDir)) {

Review Comment:
   delete an application folder might be risky: for long-running apps it's 
possible that we're using the same application folder for a day, so a 
modification/creation date doesn't really help there, I cannot really think of 
a solution where we can safely an easily remove unused folders



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-19 Thread via GitHub


zhangbutao commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1494276114


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = localDirs;
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);
+
+for (String localDir : localDirs) {
+  Path pathLocalDir = Paths.get(localDir);
+  cleanupPath(deleteBefore, pathLocalDir);
+}
+  }
+
+  private void cleanupPath(Instant deleteBefore, Path pathLocalDir) {
+LOG.info("Cleaning up files older than {} from {}", deleteBefore, 
pathLocalDir);
+
+try (Stream files = Files.walk(pathLocalDir)) {

Review Comment:
   Can `/apps/llap/work/usercache/hive/appcache/` be the root folder? Can we 
only check the first directory layer?
   List the root folder  `/apps/llap/work/usercache/hive/appcache/` and get all 
the sub-directory whose name starts with `application_`?
   new File("/apps/llap/work/usercache/hive/appcache/").listFiles(fn -> 
fn.getName().startsWith("application_"));
   
   I want to know  if it is enough to delete the directory 
`/apps/llap/work/usercache/hive/appcache/application_1707917402901_000` instead 
of to get all files under this directory?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-19 Thread via GitHub


zhangbutao commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1494276114


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = localDirs;
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);
+
+for (String localDir : localDirs) {
+  Path pathLocalDir = Paths.get(localDir);
+  cleanupPath(deleteBefore, pathLocalDir);
+}
+  }
+
+  private void cleanupPath(Instant deleteBefore, Path pathLocalDir) {
+LOG.info("Cleaning up files older than {} from {}", deleteBefore, 
pathLocalDir);
+
+try (Stream files = Files.walk(pathLocalDir)) {

Review Comment:
   Can `/apps/llap/work/usercache/hive/appcache/` be the root folder? Can we 
only check the first directory layer?
   List the root folder  `/apps/llap/work/usercache/hive/appcache/` and get all 
the sub-directory whose name starts with `application_`?
   new File("/apps/llap/work/usercache/hive/appcache/").listFiles(fn -> 
fn.getName().startsWith("application_"));



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-18 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1494124553


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = localDirs;
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);
+
+for (String localDir : localDirs) {
+  Path pathLocalDir = Paths.get(localDir);
+  cleanupPath(deleteBefore, pathLocalDir);
+}
+  }
+
+  private void cleanupPath(Instant deleteBefore, Path pathLocalDir) {
+LOG.info("Cleaning up files older than {} from {}", deleteBefore, 
pathLocalDir);
+
+try (Stream files = Files.walk(pathLocalDir)) {

Review Comment:
   we have a root folder like:
   ```
   /apps/llap/work/
   ```
   
   and we'll have dangling files like:
   ```
   
/apps/llap/work/usercache/hive/appcache/application_1707917402901_0001/3/output/attempt_1707917402901_0001_3_05_02_0_10414:
   total 16
   drwxr-xr-x  2 hive hive   44 Feb 14 13:41 .
   drwxr-xr-x 63 hive hive 4096 Feb 14 13:41 ..
   -rw-r--r--  1 hive hive  425 Feb 14 13:41 file.out
   -rw-r--r--  1 hive hive   32 Feb 14 13:41 file.out.index
   
   
/apps/llap/work/usercache/hive/appcache/application_1707917402901_0001/3/output/attempt_1707917402901_0001_3_05_05_0_10416:
   total 16
   drwxr-xr-x  2 hive hive   44 Feb 14 13:41 .
   drwxr-xr-x 63 hive hive 4096 Feb 14 13:41 ..
   -rw-r--r--  1 hive hive  383 Feb 14 13:41 file.out
   -rw-r--r--  1 hive hive   32 Feb 14 13:41 file.out.index
   
   ```
   
   so the point is to find all dangling intermediate files which might be a few 
layers below then the root work dir, so Files.walk() seemed to be the way to 
find them all and I'm a bit concerned about how complicated it would be to 
achieve 

Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-18 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1494124553


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = localDirs;
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);
+
+for (String localDir : localDirs) {
+  Path pathLocalDir = Paths.get(localDir);
+  cleanupPath(deleteBefore, pathLocalDir);
+}
+  }
+
+  private void cleanupPath(Instant deleteBefore, Path pathLocalDir) {
+LOG.info("Cleaning up files older than {} from {}", deleteBefore, 
pathLocalDir);
+
+try (Stream files = Files.walk(pathLocalDir)) {

Review Comment:
   we have a root folder like:
   ```
   /apps/llap/work/
   ```
   
   and we'll have dangling files like:
   ```
   
/apps/llap/work/usercache/hive/appcache/application_1707917402901_0001/3/output/attempt_1707917402901_0001_3_05_02_0_10414:
   total 16
   drwxr-xr-x  2 hive hive   44 Feb 14 13:41 .
   drwxr-xr-x 63 hive hive 4096 Feb 14 13:41 ..
   -rw-r--r--  1 hive hive  425 Feb 14 13:41 file.out
   -rw-r--r--  1 hive hive   32 Feb 14 13:41 file.out.index
   
   
/apps/llap/work/usercache/hive/appcache/application_1707917402901_0001/3/output/attempt_1707917402901_0001_3_05_05_0_10416:
   total 16
   drwxr-xr-x  2 hive hive   44 Feb 14 13:41 .
   drwxr-xr-x 63 hive hive 4096 Feb 14 13:41 ..
   -rw-r--r--  1 hive hive  383 Feb 14 13:41 file.out
   -rw-r--r--  1 hive hive   32 Feb 14 13:41 file.out.index
   
   ```
   
   so the point is to find all dangling intermediate files which might be a few 
layers below then the root work dir, so Files.walk()mseemed to be the way to 
find them all and I'm a bit concerned about how complicated it would be to 
achieve 

Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-18 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1494118910


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5508,6 +5508,17 @@ public static enum ConfVars {
 LLAP_TASK_TIME_SUMMARY(
 "hive.llap.task.time.print.summary", false,
 "Display queue and runtime of tasks by host for every query executed 
by the shell."),
+
+LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL(
+"hive.llap.local.dir.cleaner.cleanup.interval", "7200s", new 
TimeValidator(TimeUnit.SECONDS),
+  "Interval by which the LocalDirCleaner service in LLAP daemon checks for 
stale/old files. \n" +
+  "Under normal circumstances, local files are cleaned up properly, so 
it's not recommended to \n" +
+  "set this more frequent than a couple of hours. Default is 2 hours."),
+
LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD("hive.llap.local.dir.cleaner.file.modify.time.threshold",
 "86400s",

Review Comment:
   makes sense, I can change it to hour as default unit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-18 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1494118286


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = localDirs;
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);
+
+for (String localDir : localDirs) {
+  Path pathLocalDir = Paths.get(localDir);
+  cleanupPath(deleteBefore, pathLocalDir);
+}

Review Comment:
   good catch, I'll change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-18 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1494117549


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);

Review Comment:
   yes, I think so, this is not something to be over-optimized, old files are 
not expected to be present at all, but if any, we don't have a requirement for 
the performance of this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-18 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1494116227


##
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LocalDirCleaner.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.llap.daemon.impl;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.FileTime;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.service.AbstractService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LocalDirCleaner is an LLAP daemon service to clean up old local files. 
Under normal circumstances,
+ * intermediate/local files are cleaned up (typically at end of the DAG), but 
daemons crash sometimes,
+ * and the attached local disk might end up being the same when a new daemon 
starts (this applies to
+ * on-prem as well as cloud scenarios).
+ */
+public class LocalDirCleaner extends AbstractService {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LocalDirCleaner.class);
+
+  private String[] localDirs;
+
+  private long cleanupIntervalSec;
+  private long fileModifyTimeThresholdSec;
+
+  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
+
+  public LocalDirCleaner(String[] localDirs, Configuration conf) {
+super("LocalDirCleaner");
+this.localDirs = localDirs;
+this.cleanupIntervalSec = getInterval(conf);
+this.fileModifyTimeThresholdSec = getFileModifyTimeThreshold(conf);
+LOG.info("Initialized local dir cleaner: interval: {}s, threshold: {}s", 
cleanupIntervalSec,
+fileModifyTimeThresholdSec);
+  }
+
+  @Override
+  public void serviceStart() throws IOException {
+scheduler.scheduleAtFixedRate(this::cleanup, 0, cleanupIntervalSec, 
TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void serviceStop() throws IOException {
+// we can shutdown this service now and ignore leftovers, because under 
normal circumstances,
+// files from the local dirs are cleaned up (so LocalDirCleaner is a best 
effort utility)
+scheduler.shutdownNow();
+  }
+
+  private long getFileModifyTimeThreshold(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_FILE_MODIFY_TIME_THRESHOLD,
+TimeUnit.SECONDS);
+  }
+
+  private long getInterval(Configuration conf) {
+return HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL, TimeUnit.SECONDS);
+  }
+
+  private void cleanup() {
+Instant deleteBefore = Instant.now().minus(fileModifyTimeThresholdSec, 
ChronoUnit.SECONDS);
+
+for (String localDir : localDirs) {
+  Path pathLocalDir = Paths.get(localDir);
+  cleanupPath(deleteBefore, pathLocalDir);
+}
+  }
+
+  private void cleanupPath(Instant deleteBefore, Path pathLocalDir) {
+LOG.info("Cleaning up files older than {} from {}", deleteBefore, 
pathLocalDir);
+
+try (Stream files = Files.walk(pathLocalDir)) {
+  files.filter(f -> {
+try {
+  FileTime modified = Files.getLastModifiedTime(f);
+  LOG.debug("Checking: {}, modified: {}", f, modified);
+  return Files.isRegularFile(f) && 
modified.toInstant().isBefore(deleteBefore);
+} catch (IOException ex) {
+  LOG.warn("IOException caught while checking file for deletion", ex);
+  return false;
+}
+  }).forEach(f -> {
+try {
+  LOG.info("Delete old local file: {}", f);
+  Files.delete(f);
+} catch (IOException ex) {
+  LOG.warn("Failed to delete file", ex);
+}
+  });
+} catch (IOException e) {
+  throw new RuntimeException(e);

Review Comment:
   right, logging is enough, we're not handling this exception anyway



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:

Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-14 Thread via GitHub


zhangbutao commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1490231311


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5508,6 +5508,17 @@ public static enum ConfVars {
 LLAP_TASK_TIME_SUMMARY(
 "hive.llap.task.time.print.summary", false,
 "Display queue and runtime of tasks by host for every query executed 
by the shell."),
+
+LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL(

Review Comment:
   Ok,will come back to check this after vacation. :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-10 Thread via GitHub


sonarcloud[bot] commented on PR #5054:
URL: https://github.com/apache/hive/pull/5054#issuecomment-1936936795

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5054) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [2 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5054=false=true)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5054=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5054)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-09 Thread via GitHub


sonarcloud[bot] commented on PR #5054:
URL: https://github.com/apache/hive/pull/5054#issuecomment-1936588310

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5054) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [4 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5054=false=true)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5054=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5054)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-09 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1484727510


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5508,6 +5508,17 @@ public static enum ConfVars {
 LLAP_TASK_TIME_SUMMARY(
 "hive.llap.task.time.print.summary", false,
 "Display queue and runtime of tasks by host for every query executed 
by the shell."),
+
+LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL(

Review Comment:
   checked 
[ClearDanglingScratchDir](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/session/ClearDanglingScratchDir.java),
 it seems to be specific to scratch dir cleanup and it's too smart for this 
scenario, I feel it would be more complicated to make it general and reuse than 
having a simple utility for the cases: ClearDanglingScratchDir checks lock 
files, works on hdfs, has a command line utility, etc., whereas LocalDirCleaner 
is just about to remove old files from a local folder



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-01 Thread via GitHub


abstractdog commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1474339506


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5508,6 +5508,17 @@ public static enum ConfVars {
 LLAP_TASK_TIME_SUMMARY(
 "hive.llap.task.time.print.summary", false,
 "Display queue and runtime of tasks by host for every query executed 
by the shell."),
+
+LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL(

Review Comment:
   I'll check: maybe the utility itself can be reused but with keeping a 
separate config



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-02-01 Thread via GitHub


zhangbutao commented on code in PR #5054:
URL: https://github.com/apache/hive/pull/5054#discussion_r1474211349


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5508,6 +5508,17 @@ public static enum ConfVars {
 LLAP_TASK_TIME_SUMMARY(
 "hive.llap.task.time.print.summary", false,
 "Display queue and runtime of tasks by host for every query executed 
by the shell."),
+
+LLAP_LOCAL_DIR_CLEANER_CLEANUP_INTERVAL(

Review Comment:
   Can we reuse the existing cleaning feature?
   
https://github.com/apache/hive/blob/e892ce7212b2f0c9e55092d89076b129c86c8172/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3894-L3898



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-01-31 Thread via GitHub


abstractdog commented on PR #5054:
URL: https://github.com/apache/hive/pull/5054#issuecomment-1920601513

   unrelated test failure, I can rerun eventually, this can be reviewed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-01-31 Thread via GitHub


sonarcloud[bot] commented on PR #5054:
URL: https://github.com/apache/hive/pull/5054#issuecomment-1919998817

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5054) 
**Quality Gate passed**  
   The SonarCloud Quality Gate passed, but some issues were introduced.
   
   [4 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5054=false=true)
  
   [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5054=false=true)
  
   No data about Coverage  
   No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5054)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28051: LLAP: cleanup local folders on startup and periodically [hive]

2024-01-31 Thread via GitHub


abstractdog commented on PR #5054:
URL: https://github.com/apache/hive/pull/5054#issuecomment-1919361710

   > I think you missed updating the commit message with Hive jira
   
   LOL, thanks, fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org