[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing

2018-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481380#comment-16481380
 ] 

ASF GitHub Bot commented on NIFI-4165:
--

Github user alopresto commented on the issue:

https://github.com/apache/nifi/pull/2502
  
@markap14 sorry I got distracted from this review. I have revisited it and 
I have some points I'd like to discuss:

* I rebased against `master`, as there have obviously been some changes 
there. These fall into a couple places:
** the version bump to `1.7.0-SNAPSHOT` in the `pom.xml` for both this 
artifact and a dependency
** there have been changes to `FlowFileQueue` which `DummyFlowFileQueue` 
must implement
* I added some logic to `RemoveFlowFilesWithMissingContent` which loads the 
*master key* from the expected `bootstrap.conf` file in order to handle a 
`nifi.properties` file with encrypted configuration values. 
* The other NiFi Toolkit components have a `*.bat`/`*.sh` script which 
allows them to be run. This provides a couple features:
** named command-line arguments as opposed to positional arguments
** Setting up `$JAVA_HOME` and the classpath rather than calling `java` 
directly on the command-line
* The `jar-with-dependencies` in `maven-assembly-plugin` only seems to run 
when you use `mvn clean compile assembly:single` rather than being tied to the 
`install` phase via a profile (see [Stack 
Overflow](https://stackoverflow.com/a/574650/70465)). Please let me know if I'm 
missing something here

I ran the scenario you suggested by generating some flowfiles into a queue 
and then removing the `content_repository` directory contents. When I did that, 
I got this message:

```

hw12203:/Users/alopresto/Workspace/nifi/nifi-toolkit/nifi-toolkit-flowfile-repo 
(pr2502) alopresto
 149s @ 17:17:25 $ cd target/

hw12203:...ers/alopresto/Workspace/nifi/nifi-toolkit/nifi-toolkit-flowfile-repo/target
 (pr2502) alopresto
 0s @ 17:17:31 $ java -cp 
nifi-toolkit-flowfile-repo-1.7.0-SNAPSHOT-jar-with-dependencies.jar:../../nifi-toolkit-assembly/target/nifi-toolkit-1.7.0-SNAPSHOT-bin/nifi-toolkit-1.7.0-SNAPSHOT/lib/slf4j-api-1.7.25.jar
 org.apache.nifi.toolkit.repos.flowfile.RemoveFlowFilesWithMissingContent 
~/Workspace/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT/conf/nifi.properties
 
~/Workspace/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT/flowfile_repository/
17:17:35.865 [main] INFO org.apache.nifi.properties.NiFiPropertiesLoader - 
Loaded 148 properties from 
/Users/alopresto/Workspace/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT/conf/nifi.properties
17:17:35.872 [main] DEBUG 
org.apache.nifi.properties.ProtectedNiFiProperties - Loaded 148 properties 
(including 0 protection schemes) into ProtectedNiFiProperties
17:17:35.872 [main] DEBUG 
org.apache.nifi.properties.ProtectedNiFiProperties - No protected properties
Cannot find or cannot read ./content_repository or it is not a directory

hw12203:...ers/alopresto/Workspace/nifi/nifi-toolkit/nifi-toolkit-flowfile-repo/target
 (pr2502) alopresto
 0s @ 17:17:36 $
```

The directory definitely exists:

```

hw12203:...space/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT
 (pr2502) alopresto
 0s @ 17:17:48 $ ll
total 416
drwxr-xr-x   17 alopresto  staff   578B May 10 16:40 ./
drwxr-xr-x3 alopresto  staff   102B May 10 10:20 ../
-rw-r--r--1 alopresto  staff   119K Mar 13 17:25 LICENSE
-rw-r--r--1 alopresto  staff80K May 10 09:23 NOTICE
-rw-r--r--1 alopresto  staff   4.4K Dec 13 15:56 README
drwxr-xr-x8 alopresto  staff   272B May 10 10:20 bin/
drwxr-xr-x   12 alopresto  staff   408B May 18 16:51 conf/
drwxr-xr-x2 alopresto  staff68B May 18 16:51 content_repository/
drwxr-xr-x6 alopresto  staff   204B May 18 16:50 database_repository/
drwxr-xr-x3 alopresto  staff   102B May 10 10:20 docs/
drwxr-xr-x5 alopresto  staff   170B May 18 16:52 flowfile_repository/
drwxr-xr-x  113 alopresto  staff   3.8K May 10 10:20 lib/
drwxr-xr-x   10 alopresto  staff   340B May 18 17:00 logs/
drwxr-xr-x9 alopresto  staff   306B May 18 16:51 provenance_repository/
drwxr-xr-x4 alopresto  staff   136B May 18 16:49 run/
drwxr-xr-x3 alopresto  staff   102B May 10 16:40 state/
drwxr-xr-x5 alopresto  staff   170B May 18 16:50 work/

hw12203:...space/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT
 (pr2502) alopresto
 0s @ 17:19:35 $ ll content_repository/
total 0
drwxr-xr-x   2 alopresto  staff68B May 18 16:51 ./
drwxr-xr-x  17 alopresto  staff   578B May 10 16:40 ../
```

I believe this is because in the default `nifi.properties` file, the 
content 

[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing

2018-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383582#comment-16383582
 ] 

ASF GitHub Bot commented on NIFI-4165:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2502
  
@alopresto This is not really for a corrupted flowfile repository but 
rather for a flowfile repo that points to content that no longer exists. So the 
easiest thing would be to create a GenerateFlowFile that generates at least 1 
byte of data, then stop NiFi with data queued up and blow away the content 
repo. or change your nifi.properties to look at a different location for 
the content repo.


> Update NiFi FlowFile Repository Toolkit to provide ability to remove 
> FlowFiles whose content is missing
> ---
>
> Key: NIFI-4165
> URL: https://issues.apache.org/jira/browse/NIFI-4165
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Tools and Build
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> The FlowFile Repo toolkit has the ability to address issues with flowfile 
> repo corruption due to sudden power loss. Another problem that has been known 
> to occur is if content goes missing from the content repository for whatever 
> reason (say some process deletes some of the files) then the FlowFile Repo 
> can contain a lot of FlowFiles whose content is missing. This causes a lot of 
> problems with stack traces being dumped to logs and the flow taking a really 
> long time to get back to normal. We should update the toolkit to provide a 
> mechanism for pointing to a FlowFile Repo and Content Repo, then writing out 
> a new FlowFile Repo that removes any FlowFile whose content is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382673#comment-16382673
 ] 

ASF GitHub Bot commented on NIFI-4165:
--

Github user alopresto commented on the issue:

https://github.com/apache/nifi/pull/2502
  
@markap14 do you have an example corrupted flowfile repository to test this 
against?


> Update NiFi FlowFile Repository Toolkit to provide ability to remove 
> FlowFiles whose content is missing
> ---
>
> Key: NIFI-4165
> URL: https://issues.apache.org/jira/browse/NIFI-4165
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Tools and Build
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> The FlowFile Repo toolkit has the ability to address issues with flowfile 
> repo corruption due to sudden power loss. Another problem that has been known 
> to occur is if content goes missing from the content repository for whatever 
> reason (say some process deletes some of the files) then the FlowFile Repo 
> can contain a lot of FlowFiles whose content is missing. This causes a lot of 
> problems with stack traces being dumped to logs and the flow taking a really 
> long time to get back to normal. We should update the toolkit to provide a 
> mechanism for pointing to a FlowFile Repo and Content Repo, then writing out 
> a new FlowFile Repo that removes any FlowFile whose content is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382670#comment-16382670
 ] 

ASF GitHub Bot commented on NIFI-4165:
--

Github user alopresto commented on the issue:

https://github.com/apache/nifi/pull/2502
  
Reviewing...


> Update NiFi FlowFile Repository Toolkit to provide ability to remove 
> FlowFiles whose content is missing
> ---
>
> Key: NIFI-4165
> URL: https://issues.apache.org/jira/browse/NIFI-4165
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Tools and Build
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> The FlowFile Repo toolkit has the ability to address issues with flowfile 
> repo corruption due to sudden power loss. Another problem that has been known 
> to occur is if content goes missing from the content repository for whatever 
> reason (say some process deletes some of the files) then the FlowFile Repo 
> can contain a lot of FlowFiles whose content is missing. This causes a lot of 
> problems with stack traces being dumped to logs and the flow taking a really 
> long time to get back to normal. We should update the toolkit to provide a 
> mechanism for pointing to a FlowFile Repo and Content Repo, then writing out 
> a new FlowFile Repo that removes any FlowFile whose content is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382024#comment-16382024
 ] 

ASF GitHub Bot commented on NIFI-4165:
--

GitHub user markap14 opened a pull request:

https://github.com/apache/nifi/pull/2502

NIFI-4165: Added RemoveFlowFilesWithMissingContent.java and associate…

…d helper classes

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markap14/nifi NIFI-4165

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2502.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2502


commit 1109ceb33d51815b90067cc078705dd1c91fc596
Author: Mark Payne 
Date:   2017-07-07T16:54:21Z

NIFI-4165: Added RemoveFlowFilesWithMissingContent.java and associated 
helper classes




> Update NiFi FlowFile Repository Toolkit to provide ability to remove 
> FlowFiles whose content is missing
> ---
>
> Key: NIFI-4165
> URL: https://issues.apache.org/jira/browse/NIFI-4165
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Tools and Build
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> The FlowFile Repo toolkit has the ability to address issues with flowfile 
> repo corruption due to sudden power loss. Another problem that has been known 
> to occur is if content goes missing from the content repository for whatever 
> reason (say some process deletes some of the files) then the FlowFile Repo 
> can contain a lot of FlowFiles whose content is missing. This causes a lot of 
> problems with stack traces being dumped to logs and the flow taking a really 
> long time to get back to normal. We should update the toolkit to provide a 
> mechanism for pointing to a FlowFile Repo and Content Repo, then writing out 
> a new FlowFile Repo that removes any FlowFile whose content is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)