[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing
[ https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481380#comment-16481380 ] ASF GitHub Bot commented on NIFI-4165: -- Github user alopresto commented on the issue: https://github.com/apache/nifi/pull/2502 @markap14 sorry I got distracted from this review. I have revisited it and I have some points I'd like to discuss: * I rebased against `master`, as there have obviously been some changes there. These fall into a couple places: ** the version bump to `1.7.0-SNAPSHOT` in the `pom.xml` for both this artifact and a dependency ** there have been changes to `FlowFileQueue` which `DummyFlowFileQueue` must implement * I added some logic to `RemoveFlowFilesWithMissingContent` which loads the *master key* from the expected `bootstrap.conf` file in order to handle a `nifi.properties` file with encrypted configuration values. * The other NiFi Toolkit components have a `*.bat`/`*.sh` script which allows them to be run. This provides a couple features: ** named command-line arguments as opposed to positional arguments ** Setting up `$JAVA_HOME` and the classpath rather than calling `java` directly on the command-line * The `jar-with-dependencies` in `maven-assembly-plugin` only seems to run when you use `mvn clean compile assembly:single` rather than being tied to the `install` phase via a profile (see [Stack Overflow](https://stackoverflow.com/a/574650/70465)). Please let me know if I'm missing something here I ran the scenario you suggested by generating some flowfiles into a queue and then removing the `content_repository` directory contents. When I did that, I got this message: ``` hw12203:/Users/alopresto/Workspace/nifi/nifi-toolkit/nifi-toolkit-flowfile-repo (pr2502) alopresto ๐ 149s @ 17:17:25 $ cd target/ hw12203:...ers/alopresto/Workspace/nifi/nifi-toolkit/nifi-toolkit-flowfile-repo/target (pr2502) alopresto ๐ 0s @ 17:17:31 $ java -cp nifi-toolkit-flowfile-repo-1.7.0-SNAPSHOT-jar-with-dependencies.jar:../../nifi-toolkit-assembly/target/nifi-toolkit-1.7.0-SNAPSHOT-bin/nifi-toolkit-1.7.0-SNAPSHOT/lib/slf4j-api-1.7.25.jar org.apache.nifi.toolkit.repos.flowfile.RemoveFlowFilesWithMissingContent ~/Workspace/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT/conf/nifi.properties ~/Workspace/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT/flowfile_repository/ 17:17:35.865 [main] INFO org.apache.nifi.properties.NiFiPropertiesLoader - Loaded 148 properties from /Users/alopresto/Workspace/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT/conf/nifi.properties 17:17:35.872 [main] DEBUG org.apache.nifi.properties.ProtectedNiFiProperties - Loaded 148 properties (including 0 protection schemes) into ProtectedNiFiProperties 17:17:35.872 [main] DEBUG org.apache.nifi.properties.ProtectedNiFiProperties - No protected properties Cannot find or cannot read ./content_repository or it is not a directory hw12203:...ers/alopresto/Workspace/nifi/nifi-toolkit/nifi-toolkit-flowfile-repo/target (pr2502) alopresto ๐ 0s @ 17:17:36 $ ``` The directory definitely exists: ``` hw12203:...space/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT (pr2502) alopresto ๐ 0s @ 17:17:48 $ ll total 416 drwxr-xr-x 17 alopresto staff 578B May 10 16:40 ./ drwxr-xr-x3 alopresto staff 102B May 10 10:20 ../ -rw-r--r--1 alopresto staff 119K Mar 13 17:25 LICENSE -rw-r--r--1 alopresto staff80K May 10 09:23 NOTICE -rw-r--r--1 alopresto staff 4.4K Dec 13 15:56 README drwxr-xr-x8 alopresto staff 272B May 10 10:20 bin/ drwxr-xr-x 12 alopresto staff 408B May 18 16:51 conf/ drwxr-xr-x2 alopresto staff68B May 18 16:51 content_repository/ drwxr-xr-x6 alopresto staff 204B May 18 16:50 database_repository/ drwxr-xr-x3 alopresto staff 102B May 10 10:20 docs/ drwxr-xr-x5 alopresto staff 170B May 18 16:52 flowfile_repository/ drwxr-xr-x 113 alopresto staff 3.8K May 10 10:20 lib/ drwxr-xr-x 10 alopresto staff 340B May 18 17:00 logs/ drwxr-xr-x9 alopresto staff 306B May 18 16:51 provenance_repository/ drwxr-xr-x4 alopresto staff 136B May 18 16:49 run/ drwxr-xr-x3 alopresto staff 102B May 10 16:40 state/ drwxr-xr-x5 alopresto staff 170B May 18 16:50 work/ hw12203:...space/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT (pr2502) alopresto ๐ 0s @ 17:19:35 $ ll content_repository/ total 0 drwxr-xr-x 2 alopresto staff68B May 18 16:51 ./ drwxr-xr-x 17 alopresto staff 578B May 10 16:40 ../ ``` I believe this is because in the default `nifi.properties` file,
[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing
[ https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383582#comment-16383582 ] ASF GitHub Bot commented on NIFI-4165: -- Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/2502 @alopresto This is not really for a corrupted flowfile repository but rather for a flowfile repo that points to content that no longer exists. So the easiest thing would be to create a GenerateFlowFile that generates at least 1 byte of data, then stop NiFi with data queued up and blow away the content repo. or change your nifi.properties to look at a different location for the content repo. > Update NiFi FlowFile Repository Toolkit to provide ability to remove > FlowFiles whose content is missing > --- > > Key: NIFI-4165 > URL: https://issues.apache.org/jira/browse/NIFI-4165 > Project: Apache NiFi > Issue Type: New Feature > Components: Tools and Build >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Major > > The FlowFile Repo toolkit has the ability to address issues with flowfile > repo corruption due to sudden power loss. Another problem that has been known > to occur is if content goes missing from the content repository for whatever > reason (say some process deletes some of the files) then the FlowFile Repo > can contain a lot of FlowFiles whose content is missing. This causes a lot of > problems with stack traces being dumped to logs and the flow taking a really > long time to get back to normal. We should update the toolkit to provide a > mechanism for pointing to a FlowFile Repo and Content Repo, then writing out > a new FlowFile Repo that removes any FlowFile whose content is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing
[ https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382673#comment-16382673 ] ASF GitHub Bot commented on NIFI-4165: -- Github user alopresto commented on the issue: https://github.com/apache/nifi/pull/2502 @markap14 do you have an example corrupted flowfile repository to test this against? > Update NiFi FlowFile Repository Toolkit to provide ability to remove > FlowFiles whose content is missing > --- > > Key: NIFI-4165 > URL: https://issues.apache.org/jira/browse/NIFI-4165 > Project: Apache NiFi > Issue Type: New Feature > Components: Tools and Build >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Major > > The FlowFile Repo toolkit has the ability to address issues with flowfile > repo corruption due to sudden power loss. Another problem that has been known > to occur is if content goes missing from the content repository for whatever > reason (say some process deletes some of the files) then the FlowFile Repo > can contain a lot of FlowFiles whose content is missing. This causes a lot of > problems with stack traces being dumped to logs and the flow taking a really > long time to get back to normal. We should update the toolkit to provide a > mechanism for pointing to a FlowFile Repo and Content Repo, then writing out > a new FlowFile Repo that removes any FlowFile whose content is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing
[ https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382670#comment-16382670 ] ASF GitHub Bot commented on NIFI-4165: -- Github user alopresto commented on the issue: https://github.com/apache/nifi/pull/2502 Reviewing... > Update NiFi FlowFile Repository Toolkit to provide ability to remove > FlowFiles whose content is missing > --- > > Key: NIFI-4165 > URL: https://issues.apache.org/jira/browse/NIFI-4165 > Project: Apache NiFi > Issue Type: New Feature > Components: Tools and Build >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Major > > The FlowFile Repo toolkit has the ability to address issues with flowfile > repo corruption due to sudden power loss. Another problem that has been known > to occur is if content goes missing from the content repository for whatever > reason (say some process deletes some of the files) then the FlowFile Repo > can contain a lot of FlowFiles whose content is missing. This causes a lot of > problems with stack traces being dumped to logs and the flow taking a really > long time to get back to normal. We should update the toolkit to provide a > mechanism for pointing to a FlowFile Repo and Content Repo, then writing out > a new FlowFile Repo that removes any FlowFile whose content is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing
[ https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382024#comment-16382024 ] ASF GitHub Bot commented on NIFI-4165: -- GitHub user markap14 opened a pull request: https://github.com/apache/nifi/pull/2502 NIFI-4165: Added RemoveFlowFilesWithMissingContent.java and associateโฆ โฆd helper classes Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/markap14/nifi NIFI-4165 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/2502.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2502 commit 1109ceb33d51815b90067cc078705dd1c91fc596 Author: Mark Payne Date: 2017-07-07T16:54:21Z NIFI-4165: Added RemoveFlowFilesWithMissingContent.java and associated helper classes > Update NiFi FlowFile Repository Toolkit to provide ability to remove > FlowFiles whose content is missing > --- > > Key: NIFI-4165 > URL: https://issues.apache.org/jira/browse/NIFI-4165 > Project: Apache NiFi > Issue Type: New Feature > Components: Tools and Build >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Major > > The FlowFile Repo toolkit has the ability to address issues with flowfile > repo corruption due to sudden power loss. Another problem that has been known > to occur is if content goes missing from the content repository for whatever > reason (say some process deletes some of the files) then the FlowFile Repo > can contain a lot of FlowFiles whose content is missing. This causes a lot of > problems with stack traces being dumped to logs and the flow taking a really > long time to get back to normal. We should update the toolkit to provide a > mechanism for pointing to a FlowFile Repo and Content Repo, then writing out > a new FlowFile Repo that removes any FlowFile whose content is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)