[jira] [Commented] (NIFI-5289) NoClassDefFoundError for org.junit.Assert When Using nifi-mock

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509215#comment-16509215
 ] 

ASF GitHub Bot commented on NIFI-5289:
--

Github user MartinPayne commented on the issue:

https://github.com/apache/nifi/pull/2780
  
@joewitt Compile scope is the correct scope in this case. If JUnit 4 was 
only used in the tests for NiFi Mock, test scope would be correct. However, the 
[NiFi Mock code uses JUnit 4 as a compile time 
dependency](https://github.com/apache/nifi/blob/f8466cb16d6723ddc3bf5f0e7f8ce8a47d27cbe5/nifi-mock/src/main/java/org/apache/nifi/util/StandardProcessorTestRunner.java#L74),
 so JUnit 4 needs to be brought into consuming projects as a transitive 
dependency.

As per the [Maven dependency scope 
table](https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope),
 making it a compile scope dependency means it is added to the test classpath 
of a consuming project if that project declares NiFi Mock with test scope. It 
would only get included all over the place if consuming projects have declared 
NiFi Mock with a compile or runtime scope. If that was the case here, NiFi Mock 
would also be getting included all over the place.


> NoClassDefFoundError for org.junit.Assert When Using nifi-mock
> --
>
> Key: NIFI-5289
> URL: https://issues.apache.org/jira/browse/NIFI-5289
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.6.0
>Reporter: Martin Payne
>Priority: Minor
>
> When using the NiFi Mock framework but not using JUnit 4, tests fail with a 
> NoClassDefFoundError for org.junit.Assert. This is because nifi-mock sets the 
> scope of junit to "provided", which means it's not pulled into consuming 
> projects as a transitive dependency. It should be set to "compile" so that 
> users don't have to set an explicit JUnit dependency in their projects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi issue #2780: NIFI-5289 - Changed nifi-mock junit Dependency to Compile ...

2018-06-11 Thread MartinPayne
Github user MartinPayne commented on the issue:

https://github.com/apache/nifi/pull/2780
  
@joewitt Compile scope is the correct scope in this case. If JUnit 4 was 
only used in the tests for NiFi Mock, test scope would be correct. However, the 
[NiFi Mock code uses JUnit 4 as a compile time 
dependency](https://github.com/apache/nifi/blob/f8466cb16d6723ddc3bf5f0e7f8ce8a47d27cbe5/nifi-mock/src/main/java/org/apache/nifi/util/StandardProcessorTestRunner.java#L74),
 so JUnit 4 needs to be brought into consuming projects as a transitive 
dependency.

As per the [Maven dependency scope 
table](https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope),
 making it a compile scope dependency means it is added to the test classpath 
of a consuming project if that project declares NiFi Mock with test scope. It 
would only get included all over the place if consuming projects have declared 
NiFi Mock with a compile or runtime scope. If that was the case here, NiFi Mock 
would also be getting included all over the place.


---


[jira] [Commented] (NIFI-5289) NoClassDefFoundError for org.junit.Assert When Using nifi-mock

2018-06-11 Thread Martin Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509210#comment-16509210
 ] 

Martin Payne commented on NIFI-5289:


[~mike.thomsen] I was writing some tests for a custom processor. We use JUnit 
5, so JUnit 4 was not on the test classpath. I would expect the same behaviour 
with other test frameworks which aren't JUnit 4 too.

> NoClassDefFoundError for org.junit.Assert When Using nifi-mock
> --
>
> Key: NIFI-5289
> URL: https://issues.apache.org/jira/browse/NIFI-5289
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.6.0
>Reporter: Martin Payne
>Priority: Minor
>
> When using the NiFi Mock framework but not using JUnit 4, tests fail with a 
> NoClassDefFoundError for org.junit.Assert. This is because nifi-mock sets the 
> scope of junit to "provided", which means it's not pulled into consuming 
> projects as a transitive dependency. It should be set to "compile" so that 
> users don't have to set an explicit JUnit dependency in their projects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5214) Add a REST lookup service

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509110#comment-16509110
 ] 

ASF GitHub Bot commented on NIFI-5214:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2723#discussion_r194604410
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-lookup-services-bundle/nifi-lookup-services/src/main/java/org/apache/nifi/lookup/RestLookupService.java
 ---
@@ -0,0 +1,435 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.lookup;
+
+import com.burgstaller.okhttp.AuthenticationCacheInterceptor;
+import com.burgstaller.okhttp.CachingAuthenticatorDecorator;
+import com.burgstaller.okhttp.digest.CachingAuthenticator;
+import com.burgstaller.okhttp.digest.DigestAuthenticator;
+import okhttp3.Credentials;
+import okhttp3.MediaType;
+import okhttp3.OkHttpClient;
+import okhttp3.Request;
+import okhttp3.RequestBody;
+import okhttp3.Response;
+import org.apache.nifi.annotation.behavior.DynamicProperties;
+import org.apache.nifi.annotation.behavior.DynamicProperty;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnDisabled;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.attribute.expression.language.PreparedQuery;
+import org.apache.nifi.attribute.expression.language.Query;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.controller.AbstractControllerService;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.proxy.ProxyConfiguration;
+import org.apache.nifi.proxy.ProxyConfigurationService;
+import org.apache.nifi.proxy.ProxySpec;
+import org.apache.nifi.record.path.FieldValue;
+import org.apache.nifi.record.path.RecordPath;
+import org.apache.nifi.record.path.validation.RecordPathValidator;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.ssl.SSLContextService;
+import org.apache.nifi.util.StringUtils;
+
+import javax.net.ssl.SSLContext;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.Proxy;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
+
+import static org.apache.commons.lang3.StringUtils.trimToEmpty;
+
+@Tags({ "rest", "lookup", "json", "xml", "http" })
+@CapabilityDescription("Use a REST service to enrich records.")
+@DynamicProperties({
+@DynamicProperty(name = "*", value = "*", description = "All dynamic 
properties are added as HTTP headers with the name " +
+"as the header name and the value as the header value.")
+})
+public class RestLookupService extends AbstractControllerService 
implements LookupService {
+static final PropertyDescriptor URL = new PropertyDescriptor.Builder()
+.name("rest-lookup-url")
+.displayName("URL")
+

[GitHub] nifi pull request #2723: NIFI-5214 Added REST LookupService

2018-06-11 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2723#discussion_r194604410
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-lookup-services-bundle/nifi-lookup-services/src/main/java/org/apache/nifi/lookup/RestLookupService.java
 ---
@@ -0,0 +1,435 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.lookup;
+
+import com.burgstaller.okhttp.AuthenticationCacheInterceptor;
+import com.burgstaller.okhttp.CachingAuthenticatorDecorator;
+import com.burgstaller.okhttp.digest.CachingAuthenticator;
+import com.burgstaller.okhttp.digest.DigestAuthenticator;
+import okhttp3.Credentials;
+import okhttp3.MediaType;
+import okhttp3.OkHttpClient;
+import okhttp3.Request;
+import okhttp3.RequestBody;
+import okhttp3.Response;
+import org.apache.nifi.annotation.behavior.DynamicProperties;
+import org.apache.nifi.annotation.behavior.DynamicProperty;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnDisabled;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.attribute.expression.language.PreparedQuery;
+import org.apache.nifi.attribute.expression.language.Query;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.controller.AbstractControllerService;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.proxy.ProxyConfiguration;
+import org.apache.nifi.proxy.ProxyConfigurationService;
+import org.apache.nifi.proxy.ProxySpec;
+import org.apache.nifi.record.path.FieldValue;
+import org.apache.nifi.record.path.RecordPath;
+import org.apache.nifi.record.path.validation.RecordPathValidator;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.ssl.SSLContextService;
+import org.apache.nifi.util.StringUtils;
+
+import javax.net.ssl.SSLContext;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.Proxy;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
+
+import static org.apache.commons.lang3.StringUtils.trimToEmpty;
+
+@Tags({ "rest", "lookup", "json", "xml", "http" })
+@CapabilityDescription("Use a REST service to enrich records.")
+@DynamicProperties({
+@DynamicProperty(name = "*", value = "*", description = "All dynamic 
properties are added as HTTP headers with the name " +
+"as the header name and the value as the header value.")
+})
+public class RestLookupService extends AbstractControllerService 
implements LookupService {
+static final PropertyDescriptor URL = new PropertyDescriptor.Builder()
+.name("rest-lookup-url")
+.displayName("URL")
+.description("The URL for the REST endpoint. Expression language 
is evaluated against the lookup key/value pairs, " +
+"not flowfile attributes or variable registry.")
+

[jira] [Updated] (NIFI-5252) Allow arbitrary headers in PutEmail processor

2018-06-11 Thread Dustin Rodrigues (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Rodrigues updated NIFI-5252:
---
Status: Patch Available  (was: Open)

> Allow arbitrary headers in PutEmail processor
> -
>
> Key: NIFI-5252
> URL: https://issues.apache.org/jira/browse/NIFI-5252
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Dustin Rodrigues
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5252) Allow arbitrary headers in PutEmail processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509093#comment-16509093
 ] 

ASF GitHub Bot commented on NIFI-5252:
--

GitHub user dtrodrigues opened a pull request:

https://github.com/apache/nifi/pull/2787

NIFI-5252 - support arbitrary headers in PutEmail processor

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dtrodrigues/nifi NIFI-5252

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2787.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2787


commit 250736cc14ffb6c44925fe606b5de67d7a53638a
Author: Dustin Rodrigues 
Date:   2018-06-12T02:00:28Z

NIFI-5252 - support arbitrary headers in PutEmail processor




> Allow arbitrary headers in PutEmail processor
> -
>
> Key: NIFI-5252
> URL: https://issues.apache.org/jira/browse/NIFI-5252
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Dustin Rodrigues
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi pull request #2787: NIFI-5252 - support arbitrary headers in PutEmail p...

2018-06-11 Thread dtrodrigues
GitHub user dtrodrigues opened a pull request:

https://github.com/apache/nifi/pull/2787

NIFI-5252 - support arbitrary headers in PutEmail processor

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dtrodrigues/nifi NIFI-5252

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2787.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2787


commit 250736cc14ffb6c44925fe606b5de67d7a53638a
Author: Dustin Rodrigues 
Date:   2018-06-12T02:00:28Z

NIFI-5252 - support arbitrary headers in PutEmail processor




---


[jira] [Commented] (NIFI-5213) Allow AvroReader with explicit schema to read files with embedded schema

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509022#comment-16509022
 ] 

ASF GitHub Bot commented on NIFI-5213:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194590103
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReaderWithExplicitSchema.java
 ---
@@ -17,33 +17,61 @@
 
 package org.apache.nifi.avro;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.SequenceInputStream;
 
 import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileStream;
 import org.apache.avro.generic.GenericDatumReader;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.io.BinaryDecoder;
 import org.apache.avro.io.DatumReader;
 import org.apache.avro.io.DecoderFactory;
-import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.commons.io.input.TeeInputStream;
 import org.apache.nifi.serialization.MalformedRecordException;
 import org.apache.nifi.serialization.record.RecordSchema;
 
 public class AvroReaderWithExplicitSchema extends AvroRecordReader {
 private final InputStream in;
 private final RecordSchema recordSchema;
 private final DatumReader datumReader;
-private final BinaryDecoder decoder;
+private BinaryDecoder decoder;
 private GenericRecord genericRecord;
+private DataFileStream dataFileStream;
 
-public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException, 
SchemaNotFoundException {
+public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException {
 this.in = in;
 this.recordSchema = recordSchema;
 
-datumReader = new GenericDatumReader(avroSchema);
-decoder = DecoderFactory.get().binaryDecoder(in, null);
+datumReader = new GenericDatumReader<>(avroSchema);
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+TeeInputStream teeInputStream = new TeeInputStream(in, baos);
+// Try to parse as a DataFileStream, if it works, glue the streams 
back together and delegate calls to the DataFileStream
+try {
+dataFileStream = new DataFileStream<>(teeInputStream, new 
GenericDatumReader<>());
+} catch (IOException ioe) {
+// Carry on, hopefully a raw Avro file
+// Need to be able to re-read the bytes read so far, and the 
InputStream passed in doesn't support reset. Use the TeeInputStream in
+// conjunction with SequenceInputStream to glue the two 
streams back together for future reading
+ByteArrayInputStream bais = new 
ByteArrayInputStream(baos.toByteArray());
+SequenceInputStream sis = new SequenceInputStream(bais, in);
+decoder = DecoderFactory.get().binaryDecoder(sis, null);
+}
+if (dataFileStream != null) {
+// Verify the schemas are the same
+Schema embeddedSchema = dataFileStream.getSchema();
+if (!embeddedSchema.equals(avroSchema)) {
+throw new IOException("Explicit schema does not match 
embedded schema");
--- End diff --

I thought schema evolution was supported in other ways such as including 
optional (possibly missing) fields to support a transition to/from 
additional/deleted fields, but I admit I don't have my mind wrapped around the 
whole thing. In this case it was driven by the Avro API, if the file has a 
schema, there is a much more fluent API to read the records than if it does 
not. That is not for the case when someone wants to impose a schema on a file 
that already has a schema; I'm not sure that's a case for schema evolution 
(i.e. the embedded schema is not correct?), the alternate API is for "raw" Avro 
files that don't have an embedded schema, and instead need an external one for 
processing. TBH I don't know how to parse an Avro file that has an embedded 
schema with their API by imposing an external one. This was the middle ground 
to allow it as long as the external schema matched the embedded one. Thoughts?


> Allow AvroReader with explicit schema to read files with embedded schema
> 
>
> Key: NIFI-5213
> URL: 

[GitHub] nifi pull request #2718: NIFI-5213: Allow AvroReader to process files w embe...

2018-06-11 Thread mattyb149
Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194590103
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReaderWithExplicitSchema.java
 ---
@@ -17,33 +17,61 @@
 
 package org.apache.nifi.avro;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.SequenceInputStream;
 
 import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileStream;
 import org.apache.avro.generic.GenericDatumReader;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.io.BinaryDecoder;
 import org.apache.avro.io.DatumReader;
 import org.apache.avro.io.DecoderFactory;
-import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.commons.io.input.TeeInputStream;
 import org.apache.nifi.serialization.MalformedRecordException;
 import org.apache.nifi.serialization.record.RecordSchema;
 
 public class AvroReaderWithExplicitSchema extends AvroRecordReader {
 private final InputStream in;
 private final RecordSchema recordSchema;
 private final DatumReader datumReader;
-private final BinaryDecoder decoder;
+private BinaryDecoder decoder;
 private GenericRecord genericRecord;
+private DataFileStream dataFileStream;
 
-public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException, 
SchemaNotFoundException {
+public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException {
 this.in = in;
 this.recordSchema = recordSchema;
 
-datumReader = new GenericDatumReader(avroSchema);
-decoder = DecoderFactory.get().binaryDecoder(in, null);
+datumReader = new GenericDatumReader<>(avroSchema);
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+TeeInputStream teeInputStream = new TeeInputStream(in, baos);
+// Try to parse as a DataFileStream, if it works, glue the streams 
back together and delegate calls to the DataFileStream
+try {
+dataFileStream = new DataFileStream<>(teeInputStream, new 
GenericDatumReader<>());
+} catch (IOException ioe) {
+// Carry on, hopefully a raw Avro file
+// Need to be able to re-read the bytes read so far, and the 
InputStream passed in doesn't support reset. Use the TeeInputStream in
+// conjunction with SequenceInputStream to glue the two 
streams back together for future reading
+ByteArrayInputStream bais = new 
ByteArrayInputStream(baos.toByteArray());
+SequenceInputStream sis = new SequenceInputStream(bais, in);
+decoder = DecoderFactory.get().binaryDecoder(sis, null);
+}
+if (dataFileStream != null) {
+// Verify the schemas are the same
+Schema embeddedSchema = dataFileStream.getSchema();
+if (!embeddedSchema.equals(avroSchema)) {
+throw new IOException("Explicit schema does not match 
embedded schema");
--- End diff --

I thought schema evolution was supported in other ways such as including 
optional (possibly missing) fields to support a transition to/from 
additional/deleted fields, but I admit I don't have my mind wrapped around the 
whole thing. In this case it was driven by the Avro API, if the file has a 
schema, there is a much more fluent API to read the records than if it does 
not. That is not for the case when someone wants to impose a schema on a file 
that already has a schema; I'm not sure that's a case for schema evolution 
(i.e. the embedded schema is not correct?), the alternate API is for "raw" Avro 
files that don't have an embedded schema, and instead need an external one for 
processing. TBH I don't know how to parse an Avro file that has an embedded 
schema with their API by imposing an external one. This was the middle ground 
to allow it as long as the external schema matched the embedded one. Thoughts?


---


[jira] [Commented] (NIFI-5213) Allow AvroReader with explicit schema to read files with embedded schema

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509019#comment-16509019
 ] 

ASF GitHub Bot commented on NIFI-5213:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194589636
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReaderWithExplicitSchema.java
 ---
@@ -17,33 +17,61 @@
 
 package org.apache.nifi.avro;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.SequenceInputStream;
 
 import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileStream;
 import org.apache.avro.generic.GenericDatumReader;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.io.BinaryDecoder;
 import org.apache.avro.io.DatumReader;
 import org.apache.avro.io.DecoderFactory;
-import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.commons.io.input.TeeInputStream;
 import org.apache.nifi.serialization.MalformedRecordException;
 import org.apache.nifi.serialization.record.RecordSchema;
 
 public class AvroReaderWithExplicitSchema extends AvroRecordReader {
 private final InputStream in;
 private final RecordSchema recordSchema;
 private final DatumReader datumReader;
-private final BinaryDecoder decoder;
+private BinaryDecoder decoder;
 private GenericRecord genericRecord;
+private DataFileStream dataFileStream;
 
-public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException, 
SchemaNotFoundException {
+public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException {
 this.in = in;
 this.recordSchema = recordSchema;
 
-datumReader = new GenericDatumReader(avroSchema);
-decoder = DecoderFactory.get().binaryDecoder(in, null);
+datumReader = new GenericDatumReader<>(avroSchema);
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+TeeInputStream teeInputStream = new TeeInputStream(in, baos);
+// Try to parse as a DataFileStream, if it works, glue the streams 
back together and delegate calls to the DataFileStream
+try {
+dataFileStream = new DataFileStream<>(teeInputStream, new 
GenericDatumReader<>());
--- End diff --

Sounds about right :) I was playing around with gluing these together and 
as soon as it "worked" I stopped touching it. Will take a closer look, thanks!


> Allow AvroReader with explicit schema to read files with embedded schema
> 
>
> Key: NIFI-5213
> URL: https://issues.apache.org/jira/browse/NIFI-5213
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Minor
>
> AvroReader allows the choice of schema access strategy from such options as 
> Use Embedded Schema, Use Schema Name, Use Schema Text, etc. If the incoming 
> Avro files will have embedded schemas, then Use Embedded Schema is best 
> practice for the Avro Reader. However it is not intuitive that if the same 
> schema that is embedded in the file is specified by name (using a schema 
> registry) or explicitly via Schema Text, that errors can occur. This has been 
> noticed in QueryRecord for example, and the error is also not intuitive or 
> descriptive (it is often an ArrayIndexOutOfBoundsException).
> To provide a better user experience, it would be an improvement for 
> AvroReader to be able to successfully process Avro files with embedded 
> schemas, even when the Schema Access Strategy is not "Use Embedded Schema". 
> Of course, the explicit schema would have to match the embedded schema, or an 
> error would be reported (and rightfully so).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi pull request #2718: NIFI-5213: Allow AvroReader to process files w embe...

2018-06-11 Thread mattyb149
Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194589636
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReaderWithExplicitSchema.java
 ---
@@ -17,33 +17,61 @@
 
 package org.apache.nifi.avro;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.SequenceInputStream;
 
 import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileStream;
 import org.apache.avro.generic.GenericDatumReader;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.io.BinaryDecoder;
 import org.apache.avro.io.DatumReader;
 import org.apache.avro.io.DecoderFactory;
-import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.commons.io.input.TeeInputStream;
 import org.apache.nifi.serialization.MalformedRecordException;
 import org.apache.nifi.serialization.record.RecordSchema;
 
 public class AvroReaderWithExplicitSchema extends AvroRecordReader {
 private final InputStream in;
 private final RecordSchema recordSchema;
 private final DatumReader datumReader;
-private final BinaryDecoder decoder;
+private BinaryDecoder decoder;
 private GenericRecord genericRecord;
+private DataFileStream dataFileStream;
 
-public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException, 
SchemaNotFoundException {
+public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException {
 this.in = in;
 this.recordSchema = recordSchema;
 
-datumReader = new GenericDatumReader(avroSchema);
-decoder = DecoderFactory.get().binaryDecoder(in, null);
+datumReader = new GenericDatumReader<>(avroSchema);
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+TeeInputStream teeInputStream = new TeeInputStream(in, baos);
+// Try to parse as a DataFileStream, if it works, glue the streams 
back together and delegate calls to the DataFileStream
+try {
+dataFileStream = new DataFileStream<>(teeInputStream, new 
GenericDatumReader<>());
--- End diff --

Sounds about right :) I was playing around with gluing these together and 
as soon as it "worked" I stopped touching it. Will take a closer look, thanks!


---


[GitHub] nifi pull request #2718: NIFI-5213: Allow AvroReader to process files w embe...

2018-06-11 Thread mattyb149
Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194589527
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/avro/TestAvroReaderWithExplicitSchema.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.avro;
+
+import org.apache.avro.Schema;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+public class TestAvroReaderWithExplicitSchema {
+
+@Test
+public void testAvroExplicitReaderWithSchemalessFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_schemaless.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
--- End diff --

Probably, I think I did too much copy-paste in the tests instead of 
validating the individual things


---


[jira] [Commented] (NIFI-5213) Allow AvroReader with explicit schema to read files with embedded schema

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509018#comment-16509018
 ] 

ASF GitHub Bot commented on NIFI-5213:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194589527
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/avro/TestAvroReaderWithExplicitSchema.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.avro;
+
+import org.apache.avro.Schema;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+public class TestAvroReaderWithExplicitSchema {
+
+@Test
+public void testAvroExplicitReaderWithSchemalessFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_schemaless.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
--- End diff --

Probably, I think I did too much copy-paste in the tests instead of 
validating the individual things


> Allow AvroReader with explicit schema to read files with embedded schema
> 
>
> Key: NIFI-5213
> URL: https://issues.apache.org/jira/browse/NIFI-5213
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Minor
>
> AvroReader allows the choice of schema access strategy from such options as 
> Use Embedded Schema, Use Schema Name, Use Schema Text, etc. If the incoming 
> Avro files will have embedded schemas, then Use Embedded Schema is best 
> practice for the Avro Reader. However it is not intuitive that if the same 
> schema that is embedded in the file is specified by name (using a schema 
> registry) or explicitly via Schema Text, that errors can occur. This has been 
> noticed in QueryRecord for example, and the error is also not intuitive or 
> descriptive (it is often an ArrayIndexOutOfBoundsException).
> To provide a better user experience, it would be an improvement for 
> AvroReader to be able to successfully process Avro files with embedded 
> schemas, even when the Schema Access Strategy is not "Use Embedded Schema". 
> Of course, the explicit schema would have to match the embedded schema, or an 
> error would be reported (and rightfully so).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5176) NiFi needs to be buildable on Java 9

2018-06-11 Thread Jeff Storck (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Storck updated NIFI-5176:
--
Description: 
While retaining a source/target comptability of 1.8, NiFi needs to be buildable 
on Java 9.

The following issues have been encountered while attempting to run a Java 
1.8-built NiFi on Java 9:
||Issue||Solution||
|Groovy compiler not parsing groovy code correctly on Java 9|Updated 
maven-compiler-plugin to 3.7.0, and included dependencies for 
groovy-eclipse-compiler:2.9.3-01 and groovy-eclipse-batch:2.4.15-01|
|Antler3 issue with ambiguous method calls|Explicit cast to ValidationContext 
needed in TestHL7Query.java|
|jaxb2-maven-plugin not compatible with Java 9|Switched to maven-jaxb-plugin|
|hbase-client:1.1.2 depends on jdk.tools:jdk.tools:1.7|Excluded this dependency 
*(needs testing)*|
|nifi-enrich-processors uses package com.sun.jndi.dns, which does not exist| |

  was:
While retaining a source/target comptability of 1.8, NiFi needs to be buildable 
on Java 9.

The following issues have been encountered while attempting to run a Java 
1.8-built NiFi on Java 9:
||Issue||Solution||
|Groovy compiler not parsing groovy code correctly on Java 9|Updated 
maven-compiler-plugin to 3.7.0, and included dependencies for 
groovy-eclipse-compiler:2.9.3-01 and groovy-eclipse-batch:2.4.15-01|
|Antler3 issue with ambiguous method calls|Explicit cast to ValidationContext|
|jaxb2-maven-plugin not compatible with Java 9|Switched to maven-jaxb-plugin|
|hbase-client:1.1.2 depends on jdk.tools:jdk.tools:1.7|Excluded this dependency 
*(needs testing)*|
|nifi-enrich-processors uses package com.sun.jndi.dns, which does not exist| |


> NiFi needs to be buildable on Java 9
> 
>
> Key: NIFI-5176
> URL: https://issues.apache.org/jira/browse/NIFI-5176
> Project: Apache NiFi
>  Issue Type: Sub-task
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Major
>
> While retaining a source/target comptability of 1.8, NiFi needs to be 
> buildable on Java 9.
> The following issues have been encountered while attempting to run a Java 
> 1.8-built NiFi on Java 9:
> ||Issue||Solution||
> |Groovy compiler not parsing groovy code correctly on Java 9|Updated 
> maven-compiler-plugin to 3.7.0, and included dependencies for 
> groovy-eclipse-compiler:2.9.3-01 and groovy-eclipse-batch:2.4.15-01|
> |Antler3 issue with ambiguous method calls|Explicit cast to ValidationContext 
> needed in TestHL7Query.java|
> |jaxb2-maven-plugin not compatible with Java 9|Switched to maven-jaxb-plugin|
> |hbase-client:1.1.2 depends on jdk.tools:jdk.tools:1.7|Excluded this 
> dependency *(needs testing)*|
> |nifi-enrich-processors uses package com.sun.jndi.dns, which does not exist| |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5176) NiFi needs to be buildable on Java 9

2018-06-11 Thread Jeff Storck (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Storck updated NIFI-5176:
--
Description: 
While retaining a source/target comptability of 1.8, NiFi needs to be buildable 
on Java 9.

The following issues have been encountered while attempting to run a Java 
1.8-built NiFi on Java 9:
||Issue||Solution||
|Groovy compiler not parsing groovy code correctly on Java 9|Updated 
maven-compiler-plugin to 3.7.0, and included dependencies for 
groovy-eclipse-compiler:2.9.3-01 and groovy-eclipse-batch:2.4.15-01|
|Antler3 issue with ambiguous method calls|Explicit cast to ValidationContext|
|jaxb2-maven-plugin not compatible with Java 9|Switched to maven-jaxb-plugin|
|hbase-client:1.1.2 depends on jdk.tools:jdk.tools:1.7|Excluded this dependency 
*(needs testing)*|
|nifi-enrich-processors uses package com.sun.jndi.dns, which does not exist| |

  was:
While retaining a source/target comptability of 1.8, NiFi needs to be buildable 
on Java 9.

The following issues have been encountered while attempting to run a Java 
1.8-built NiFi on Java 9:
||Issue||Solution||
|Groovy compiler not parsing groovy code correctly on Java 9|Updated 
maven-compiler-plugin to 3.7.0, and included dependencies for 
groovy-eclipse-compiler:2.9.3-01 and groovy-eclipse-batch:2.4.15-01|
|Antler isn't able to process grammars| |


> NiFi needs to be buildable on Java 9
> 
>
> Key: NIFI-5176
> URL: https://issues.apache.org/jira/browse/NIFI-5176
> Project: Apache NiFi
>  Issue Type: Sub-task
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Major
>
> While retaining a source/target comptability of 1.8, NiFi needs to be 
> buildable on Java 9.
> The following issues have been encountered while attempting to run a Java 
> 1.8-built NiFi on Java 9:
> ||Issue||Solution||
> |Groovy compiler not parsing groovy code correctly on Java 9|Updated 
> maven-compiler-plugin to 3.7.0, and included dependencies for 
> groovy-eclipse-compiler:2.9.3-01 and groovy-eclipse-batch:2.4.15-01|
> |Antler3 issue with ambiguous method calls|Explicit cast to ValidationContext|
> |jaxb2-maven-plugin not compatible with Java 9|Switched to maven-jaxb-plugin|
> |hbase-client:1.1.2 depends on jdk.tools:jdk.tools:1.7|Excluded this 
> dependency *(needs testing)*|
> |nifi-enrich-processors uses package com.sun.jndi.dns, which does not exist| |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (NIFI-5296) Add EL Support with Variable Registry scope on SSL context service

2018-06-11 Thread Andy LoPresto (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508990#comment-16508990
 ] 

Andy LoPresto edited comment on NIFI-5296 at 6/12/18 12:23 AM:
---

[~pvillard] can you describe the use case where you feel VR EL support is 
necessary for these fields? I am guessing you want to be able to specify a 
keystore and truststore path in the VR and allow an SSLContextService to 
reference it. I have a couple concerns with this approach:

# It allows any user with variable permissions to see the path to the keystore 
and truststore. In the current setup, the controller service can be deployed 
with literal paths by an administrator/trusted user, and the service can be 
referenced by dependent components without exposing the actual file system 
paths to a less-trusted user (given the proper absence of permissions on the 
controller service). Currently, a user who does not have view permission to the 
{{StandardSSLContextService}} cannot see the explicit path at all, even if they 
have view permission to the referencing {{InvokeHTTP}} processor (for example). 
In this new scenario, they would be able to see the explicit path in the 
Variables window, and see the UUID of the referencing SSLCS (see 
[https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#unauthorized-referencing-components])
 
# It allows the keystore and truststore path to change without visibility on 
the configuration dialog. Given EL evaluation, the processor will evaluate not 
only VR-specific variables, but also system properties and OS-level environment 
variables. This means that a malicious or incidental occurrence could change 
the path used to locate the keystore and truststore without the NiFi user being 
aware

The original work that introduced the unit tests you reference was done in 
[NIFI-4274|https://issues.apache.org/jira/browse/NIFI-4274] because of a [Stack 
Overflow question|https://stackoverflow.com/a/45575232/70465] where the 
documentation (which said EL was not supported) and the behavior (incorrectly 
evaluating EL) did not match. 

I appreciate the work you put into this PR. If you have a specific use case 
that really requires this behavior, let's discuss it. If this is just to bring 
these properties inline with the baseline EL evaluation, I would ask that we do 
not implement this at this time. 

There is some ongoing work/discussion about how to handle the sensitive 
properties in conjunction with the VR, Registry, flow versioning, setting from 
external sources, etc. and I believe it requires a cohesive approach with 
substantial threat modeling to avoid introducing serious issues into NiFi. We 
have traditionally held to a "secure but severe" policy where some features are 
absent because of the strict principle that "sensitive properties are always 
protected/blocked". It may be time to re-evaluate that, but introducing these 
changes piecemeal is dangerous in my opinion. 


was (Author: alopresto):
[~pvillard] can you describe the use case where you feel VR EL support is 
necessary for these fields? I am guessing you want to be able to specify a 
keystore and truststore path in the VR and allow an SSLContextService to 
reference it. I have a couple concerns with this approach:

# It allows any user with variable permissions to see the path to the keystore 
and truststore. In the current setup, the controller service can be deployed 
with literal paths by an administrator/trusted user, and the service can be 
referenced by dependent components without exposing the actual file system 
paths to a less-trusted user (given the proper absence of permissions on the 
controller service). Currently, a user who does not have view permission to the 
{{StandardSSLContextService}} cannot see the explicit path at all, even if they 
have view permission to the referencing {{InvokeHTTP}} processor (for example). 
In this new scenario, they would be able to see the explicit path in the 
Variables window, and see the UUID of the referencing SSLCS (see 
[https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#unauthorized-referencing-components])
 
# It allows the keystore and truststore path to change without visibility on 
the configuration dialog. Given EL evaluation, the processor will evaluate not 
only VR-specific variables, but also system properties and OS-level environment 
variables. This means that a malicious or incidental occurrence could change 
the path used to locate the keystore and truststore without the NiFi user being 
aware
# The original work that introduced the unit tests you reference was done in 
[NIFI-4274|https://issues.apache.org/jira/browse/NIFI-4274] because of a [Stack 
Overflow question|https://stackoverflow.com/a/45575232/70465] where the 
documentation (which said EL was not supported) and the behavior (incorrectly 
evaluating EL) did not match. 

I 

[jira] [Commented] (NIFI-5296) Add EL Support with Variable Registry scope on SSL context service

2018-06-11 Thread Andy LoPresto (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508990#comment-16508990
 ] 

Andy LoPresto commented on NIFI-5296:
-

[~pvillard] can you describe the use case where you feel VR EL support is 
necessary for these fields? I am guessing you want to be able to specify a 
keystore and truststore path in the VR and allow an SSLContextService to 
reference it. I have a couple concerns with this approach:

# It allows any user with variable permissions to see the path to the keystore 
and truststore. In the current setup, the controller service can be deployed 
with literal paths by an administrator/trusted user, and the service can be 
referenced by dependent components without exposing the actual file system 
paths to a less-trusted user (given the proper absence of permissions on the 
controller service). Currently, a user who does not have view permission to the 
{{StandardSSLContextService}} cannot see the explicit path at all, even if they 
have view permission to the referencing {{InvokeHTTP}} processor (for example). 
In this new scenario, they would be able to see the explicit path in the 
Variables window, and see the UUID of the referencing SSLCS (see 
[https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#unauthorized-referencing-components])
 
# It allows the keystore and truststore path to change without visibility on 
the configuration dialog. Given EL evaluation, the processor will evaluate not 
only VR-specific variables, but also system properties and OS-level environment 
variables. This means that a malicious or incidental occurrence could change 
the path used to locate the keystore and truststore without the NiFi user being 
aware
# The original work that introduced the unit tests you reference was done in 
[NIFI-4274|https://issues.apache.org/jira/browse/NIFI-4274] because of a [Stack 
Overflow question|https://stackoverflow.com/a/45575232/70465] where the 
documentation (which said EL was not supported) and the behavior (incorrectly 
evaluating EL) did not match. 

I appreciate the work you put into this PR. If you have a specific use case 
that really requires this behavior, let's discuss it. If this is just to bring 
these properties inline with the baseline EL evaluation, I would ask that we do 
not implement this at this time. 

There is some ongoing work/discussion about how to handle the sensitive 
properties in conjunction with the VR, Registry, flow versioning, setting from 
external sources, etc. and I believe it requires a cohesive approach with 
substantial threat modeling to avoid introducing serious issues into NiFi. We 
have traditionally held to a "secure but severe" policy where some features are 
absent because of the strict principle that "sensitive properties are always 
protected/blocked". It may be time to re-evaluate that, but introducing these 
changes piecemeal is dangerous in my opinion. 

> Add EL Support with Variable Registry scope on SSL context service
> --
>
> Key: NIFI-5296
> URL: https://issues.apache.org/jira/browse/NIFI-5296
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Major
>
> Add EL support on Truststore and Keystore filename properties with Variable 
> Registry scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5213) Allow AvroReader with explicit schema to read files with embedded schema

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508944#comment-16508944
 ] 

ASF GitHub Bot commented on NIFI-5213:
--

Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194575493
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/avro/TestAvroReaderWithExplicitSchema.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.avro;
+
+import org.apache.avro.Schema;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+public class TestAvroReaderWithExplicitSchema {
+
+@Test
+public void testAvroExplicitReaderWithSchemalessFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_schemaless.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
--- End diff --

As-is, this wouldn't test anything that I can tell with that method because 
all of the errors are caught in that method. Should have some assertions around 
it IMO.


> Allow AvroReader with explicit schema to read files with embedded schema
> 
>
> Key: NIFI-5213
> URL: https://issues.apache.org/jira/browse/NIFI-5213
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Minor
>
> AvroReader allows the choice of schema access strategy from such options as 
> Use Embedded Schema, Use Schema Name, Use Schema Text, etc. If the incoming 
> Avro files will have embedded schemas, then Use Embedded Schema is best 
> practice for the Avro Reader. However it is not intuitive that if the same 
> schema that is embedded in the file is specified by name (using a schema 
> registry) or explicitly via Schema Text, that errors can occur. This has been 
> noticed in QueryRecord for example, and the error is also not intuitive or 
> descriptive (it is often an ArrayIndexOutOfBoundsException).
> To provide a better user experience, it would be an improvement for 
> AvroReader to be able to successfully process Avro files with embedded 
> schemas, even when the Schema Access Strategy is not "Use Embedded Schema". 
> Of course, the explicit schema would have to match the embedded schema, or an 
> error would be reported (and rightfully so).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5213) Allow AvroReader with explicit schema to read files with embedded schema

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508942#comment-16508942
 ] 

ASF GitHub Bot commented on NIFI-5213:
--

Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194575528
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/avro/TestAvroReaderWithExplicitSchema.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.avro;
+
+import org.apache.avro.Schema;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+public class TestAvroReaderWithExplicitSchema {
+
+@Test
+public void testAvroExplicitReaderWithSchemalessFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_schemaless.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
+}
+
+@Test
+public void testAvroExplicitReaderWithEmbeddedSchemaFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_embed_schema.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
--- End diff --

Same deal.


> Allow AvroReader with explicit schema to read files with embedded schema
> 
>
> Key: NIFI-5213
> URL: https://issues.apache.org/jira/browse/NIFI-5213
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Minor
>
> AvroReader allows the choice of schema access strategy from such options as 
> Use Embedded Schema, Use Schema Name, Use Schema Text, etc. If the incoming 
> Avro files will have embedded schemas, then Use Embedded Schema is best 
> practice for the Avro Reader. However it is not intuitive that if the same 
> schema that is embedded in the file is specified by name (using a schema 
> registry) or explicitly via Schema Text, that errors can occur. This has been 
> noticed in QueryRecord for example, and the error is also not intuitive or 
> descriptive (it is often an ArrayIndexOutOfBoundsException).
> To provide a better user experience, it would be an improvement for 
> AvroReader to be able to successfully process Avro files with embedded 
> schemas, even when the Schema Access Strategy is not "Use Embedded Schema". 
> Of course, the explicit schema would have to match the embedded schema, or an 
> error would be reported (and rightfully so).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5213) Allow AvroReader with explicit schema to read files with embedded schema

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508943#comment-16508943
 ] 

ASF GitHub Bot commented on NIFI-5213:
--

Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194575701
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/avro/TestAvroReaderWithExplicitSchema.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.avro;
+
+import org.apache.avro.Schema;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+public class TestAvroReaderWithExplicitSchema {
+
+@Test
+public void testAvroExplicitReaderWithSchemalessFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_schemaless.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
+}
+
+@Test
+public void testAvroExplicitReaderWithEmbeddedSchemaFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_embed_schema.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
+}
+
+@Test(expected = IOException.class)
--- End diff --

I assume that comes from this, right?

```
if (dataFileStream != null) {
return dataFileStream.hasNext() ? dataFileStream.next() : null;
}
```


> Allow AvroReader with explicit schema to read files with embedded schema
> 
>
> Key: NIFI-5213
> URL: https://issues.apache.org/jira/browse/NIFI-5213
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Minor
>
> AvroReader allows the choice of schema access strategy from such options as 
> Use Embedded Schema, Use Schema Name, Use Schema Text, etc. If the incoming 
> Avro files will have embedded schemas, then Use Embedded Schema is best 
> practice for the Avro Reader. However it is not intuitive that if the same 
> schema that is embedded in the file is specified by name (using a schema 
> registry) or explicitly via Schema Text, that errors can occur. This has been 
> noticed in QueryRecord for example, and the error is also not intuitive or 
> descriptive (it is often an ArrayIndexOutOfBoundsException).
> To provide a better user experience, it would be an improvement for 
> AvroReader to be able to successfully process Avro files with embedded 
> schemas, even when the Schema Access Strategy is not "Use Embedded Schema". 
> Of course, the explicit schema would have to match the embedded 

[jira] [Commented] (NIFI-5213) Allow AvroReader with explicit schema to read files with embedded schema

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508941#comment-16508941
 ] 

ASF GitHub Bot commented on NIFI-5213:
--

Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194576777
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReaderWithExplicitSchema.java
 ---
@@ -17,33 +17,61 @@
 
 package org.apache.nifi.avro;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.SequenceInputStream;
 
 import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileStream;
 import org.apache.avro.generic.GenericDatumReader;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.io.BinaryDecoder;
 import org.apache.avro.io.DatumReader;
 import org.apache.avro.io.DecoderFactory;
-import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.commons.io.input.TeeInputStream;
 import org.apache.nifi.serialization.MalformedRecordException;
 import org.apache.nifi.serialization.record.RecordSchema;
 
 public class AvroReaderWithExplicitSchema extends AvroRecordReader {
 private final InputStream in;
 private final RecordSchema recordSchema;
 private final DatumReader datumReader;
-private final BinaryDecoder decoder;
+private BinaryDecoder decoder;
 private GenericRecord genericRecord;
+private DataFileStream dataFileStream;
 
-public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException, 
SchemaNotFoundException {
+public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException {
 this.in = in;
 this.recordSchema = recordSchema;
 
-datumReader = new GenericDatumReader(avroSchema);
-decoder = DecoderFactory.get().binaryDecoder(in, null);
+datumReader = new GenericDatumReader<>(avroSchema);
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+TeeInputStream teeInputStream = new TeeInputStream(in, baos);
+// Try to parse as a DataFileStream, if it works, glue the streams 
back together and delegate calls to the DataFileStream
+try {
+dataFileStream = new DataFileStream<>(teeInputStream, new 
GenericDatumReader<>());
--- End diff --

I don't see where `decoder` is initialized outside of the try block, but 
it's used in `nextAvroRecord`. Shouldn't you initialize it here so it's 
guaranteed to be properly initialized when `nextAvroRecord` is called or are 
you just relying on the first if statement in that method as a null check?


> Allow AvroReader with explicit schema to read files with embedded schema
> 
>
> Key: NIFI-5213
> URL: https://issues.apache.org/jira/browse/NIFI-5213
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Minor
>
> AvroReader allows the choice of schema access strategy from such options as 
> Use Embedded Schema, Use Schema Name, Use Schema Text, etc. If the incoming 
> Avro files will have embedded schemas, then Use Embedded Schema is best 
> practice for the Avro Reader. However it is not intuitive that if the same 
> schema that is embedded in the file is specified by name (using a schema 
> registry) or explicitly via Schema Text, that errors can occur. This has been 
> noticed in QueryRecord for example, and the error is also not intuitive or 
> descriptive (it is often an ArrayIndexOutOfBoundsException).
> To provide a better user experience, it would be an improvement for 
> AvroReader to be able to successfully process Avro files with embedded 
> schemas, even when the Schema Access Strategy is not "Use Embedded Schema". 
> Of course, the explicit schema would have to match the embedded schema, or an 
> error would be reported (and rightfully so).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi pull request #2718: NIFI-5213: Allow AvroReader to process files w embe...

2018-06-11 Thread MikeThomsen
Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194575701
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/avro/TestAvroReaderWithExplicitSchema.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.avro;
+
+import org.apache.avro.Schema;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+public class TestAvroReaderWithExplicitSchema {
+
+@Test
+public void testAvroExplicitReaderWithSchemalessFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_schemaless.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
+}
+
+@Test
+public void testAvroExplicitReaderWithEmbeddedSchemaFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_embed_schema.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
+}
+
+@Test(expected = IOException.class)
--- End diff --

I assume that comes from this, right?

```
if (dataFileStream != null) {
return dataFileStream.hasNext() ? dataFileStream.next() : null;
}
```


---


[GitHub] nifi pull request #2718: NIFI-5213: Allow AvroReader to process files w embe...

2018-06-11 Thread MikeThomsen
Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194575528
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/avro/TestAvroReaderWithExplicitSchema.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.avro;
+
+import org.apache.avro.Schema;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+public class TestAvroReaderWithExplicitSchema {
+
+@Test
+public void testAvroExplicitReaderWithSchemalessFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_schemaless.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
+}
+
+@Test
+public void testAvroExplicitReaderWithEmbeddedSchemaFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_embed_schema.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
--- End diff --

Same deal.


---


[GitHub] nifi pull request #2718: NIFI-5213: Allow AvroReader to process files w embe...

2018-06-11 Thread MikeThomsen
Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194575493
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/avro/TestAvroReaderWithExplicitSchema.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.avro;
+
+import org.apache.avro.Schema;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+public class TestAvroReaderWithExplicitSchema {
+
+@Test
+public void testAvroExplicitReaderWithSchemalessFile() throws 
Exception {
+File avroFileWithEmbeddedSchema = new 
File("src/test/resources/avro/avro_schemaless.avro");
+FileInputStream fileInputStream = new 
FileInputStream(avroFileWithEmbeddedSchema);
+Schema dataSchema = new Schema.Parser().parse(new 
File("src/test/resources/avro/avro_schemaless.avsc"));
+RecordSchema recordSchema = new 
SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, 
null);
+
+AvroReaderWithExplicitSchema avroReader = new 
AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema);
+avroReader.nextAvroRecord();
--- End diff --

As-is, this wouldn't test anything that I can tell with that method because 
all of the errors are caught in that method. Should have some assertions around 
it IMO.


---


[GitHub] nifi pull request #2718: NIFI-5213: Allow AvroReader to process files w embe...

2018-06-11 Thread MikeThomsen
Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194576777
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReaderWithExplicitSchema.java
 ---
@@ -17,33 +17,61 @@
 
 package org.apache.nifi.avro;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.SequenceInputStream;
 
 import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileStream;
 import org.apache.avro.generic.GenericDatumReader;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.io.BinaryDecoder;
 import org.apache.avro.io.DatumReader;
 import org.apache.avro.io.DecoderFactory;
-import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.commons.io.input.TeeInputStream;
 import org.apache.nifi.serialization.MalformedRecordException;
 import org.apache.nifi.serialization.record.RecordSchema;
 
 public class AvroReaderWithExplicitSchema extends AvroRecordReader {
 private final InputStream in;
 private final RecordSchema recordSchema;
 private final DatumReader datumReader;
-private final BinaryDecoder decoder;
+private BinaryDecoder decoder;
 private GenericRecord genericRecord;
+private DataFileStream dataFileStream;
 
-public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException, 
SchemaNotFoundException {
+public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException {
 this.in = in;
 this.recordSchema = recordSchema;
 
-datumReader = new GenericDatumReader(avroSchema);
-decoder = DecoderFactory.get().binaryDecoder(in, null);
+datumReader = new GenericDatumReader<>(avroSchema);
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+TeeInputStream teeInputStream = new TeeInputStream(in, baos);
+// Try to parse as a DataFileStream, if it works, glue the streams 
back together and delegate calls to the DataFileStream
+try {
+dataFileStream = new DataFileStream<>(teeInputStream, new 
GenericDatumReader<>());
--- End diff --

I don't see where `decoder` is initialized outside of the try block, but 
it's used in `nextAvroRecord`. Shouldn't you initialize it here so it's 
guaranteed to be properly initialized when `nextAvroRecord` is called or are 
you just relying on the first if statement in that method as a null check?


---


[GitHub] nifi issue #2785: NIFI-5296 - Add EL Support with Variable Registry scope on...

2018-06-11 Thread alopresto
Github user alopresto commented on the issue:

https://github.com/apache/nifi/pull/2785
  
Reviewing...


---


[jira] [Commented] (NIFI-5296) Add EL Support with Variable Registry scope on SSL context service

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508931#comment-16508931
 ] 

ASF GitHub Bot commented on NIFI-5296:
--

Github user alopresto commented on the issue:

https://github.com/apache/nifi/pull/2785
  
Reviewing...


> Add EL Support with Variable Registry scope on SSL context service
> --
>
> Key: NIFI-5296
> URL: https://issues.apache.org/jira/browse/NIFI-5296
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Major
>
> Add EL support on Truststore and Keystore filename properties with Variable 
> Registry scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5298) Fix typo in FDS README.md

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508928#comment-16508928
 ] 

ASF GitHub Bot commented on NIFI-5298:
--

GitHub user alopresto opened a pull request:

https://github.com/apache/nifi-fds/pull/6

NIFI-5298 Fixed typo in README.md.

Thank you for submitting a contribution to Apache NiFi Flow Design System.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced
 in the commit message?

- [x] Does your PR title start with either NIFI- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [x] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] Have you ensured that a full build and that the full suite of unit 
tests is executed via npm run clean:install at the root nifi-fds folder?
- [ ] Have you written or updated the Apache NiFi Flow Design System demo 
application to demonstrate any new functionality, provide examples of usage, 
and to verify your changes via npm start at the nifi-fds/target folder?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-fds?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-fds?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alopresto/nifi-fds NIFI-5298

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi-fds/pull/6.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6


commit a8f734547fad2fa73015a1d5d1bdd44a5181df68
Author: Andy LoPresto 
Date:   2018-06-11T23:07:43Z

NIFI-5298 Fixed typo in README.md.




> Fix typo in FDS README.md
> -
>
> Key: NIFI-5298
> URL: https://issues.apache.org/jira/browse/NIFI-5298
> Project: Apache NiFi
>  Issue Type: Task
>  Components: Documentation  Website, FDS
>Affects Versions: 0.1.0
>Reporter: Andy LoPresto
>Assignee: Andy LoPresto
>Priority: Trivial
>  Labels: documentation, typo
>
> Fix a typo in the NiFi FDS README.md. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi-fds pull request #6: NIFI-5298 Fixed typo in README.md.

2018-06-11 Thread alopresto
GitHub user alopresto opened a pull request:

https://github.com/apache/nifi-fds/pull/6

NIFI-5298 Fixed typo in README.md.

Thank you for submitting a contribution to Apache NiFi Flow Design System.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced
 in the commit message?

- [x] Does your PR title start with either NIFI- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [x] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] Have you ensured that a full build and that the full suite of unit 
tests is executed via npm run clean:install at the root nifi-fds folder?
- [ ] Have you written or updated the Apache NiFi Flow Design System demo 
application to demonstrate any new functionality, provide examples of usage, 
and to verify your changes via npm start at the nifi-fds/target folder?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-fds?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-fds?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alopresto/nifi-fds NIFI-5298

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi-fds/pull/6.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6


commit a8f734547fad2fa73015a1d5d1bdd44a5181df68
Author: Andy LoPresto 
Date:   2018-06-11T23:07:43Z

NIFI-5298 Fixed typo in README.md.




---


[jira] [Created] (NIFI-5298) Fix typo in FDS README.md

2018-06-11 Thread Andy LoPresto (JIRA)
Andy LoPresto created NIFI-5298:
---

 Summary: Fix typo in FDS README.md
 Key: NIFI-5298
 URL: https://issues.apache.org/jira/browse/NIFI-5298
 Project: Apache NiFi
  Issue Type: Task
  Components: Documentation  Website, FDS
Affects Versions: 0.1.0
Reporter: Andy LoPresto
Assignee: Andy LoPresto


Fix a typo in the NiFi FDS README.md. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5213) Allow AvroReader with explicit schema to read files with embedded schema

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508920#comment-16508920
 ] 

ASF GitHub Bot commented on NIFI-5213:
--

Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194572498
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReaderWithExplicitSchema.java
 ---
@@ -17,33 +17,61 @@
 
 package org.apache.nifi.avro;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.SequenceInputStream;
 
 import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileStream;
 import org.apache.avro.generic.GenericDatumReader;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.io.BinaryDecoder;
 import org.apache.avro.io.DatumReader;
 import org.apache.avro.io.DecoderFactory;
-import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.commons.io.input.TeeInputStream;
 import org.apache.nifi.serialization.MalformedRecordException;
 import org.apache.nifi.serialization.record.RecordSchema;
 
 public class AvroReaderWithExplicitSchema extends AvroRecordReader {
 private final InputStream in;
 private final RecordSchema recordSchema;
 private final DatumReader datumReader;
-private final BinaryDecoder decoder;
+private BinaryDecoder decoder;
 private GenericRecord genericRecord;
+private DataFileStream dataFileStream;
 
-public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException, 
SchemaNotFoundException {
+public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException {
 this.in = in;
 this.recordSchema = recordSchema;
 
-datumReader = new GenericDatumReader(avroSchema);
-decoder = DecoderFactory.get().binaryDecoder(in, null);
+datumReader = new GenericDatumReader<>(avroSchema);
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+TeeInputStream teeInputStream = new TeeInputStream(in, baos);
+// Try to parse as a DataFileStream, if it works, glue the streams 
back together and delegate calls to the DataFileStream
+try {
+dataFileStream = new DataFileStream<>(teeInputStream, new 
GenericDatumReader<>());
+} catch (IOException ioe) {
+// Carry on, hopefully a raw Avro file
+// Need to be able to re-read the bytes read so far, and the 
InputStream passed in doesn't support reset. Use the TeeInputStream in
+// conjunction with SequenceInputStream to glue the two 
streams back together for future reading
+ByteArrayInputStream bais = new 
ByteArrayInputStream(baos.toByteArray());
+SequenceInputStream sis = new SequenceInputStream(bais, in);
+decoder = DecoderFactory.get().binaryDecoder(sis, null);
+}
+if (dataFileStream != null) {
+// Verify the schemas are the same
+Schema embeddedSchema = dataFileStream.getSchema();
+if (!embeddedSchema.equals(avroSchema)) {
+throw new IOException("Explicit schema does not match 
embedded schema");
--- End diff --

@mattyb149 How does it handle schema evolution in this case? It's possible 
that the Kafka producer has `Corporate Schema v1` and NiFi is configured with 
`Corporate Schema v2` and v2 gracefully allows an upgrade from v1 via Avro 
schema evolution rules. Or am I missing something about that being not really a 
thing WRT the Record API?


> Allow AvroReader with explicit schema to read files with embedded schema
> 
>
> Key: NIFI-5213
> URL: https://issues.apache.org/jira/browse/NIFI-5213
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Minor
>
> AvroReader allows the choice of schema access strategy from such options as 
> Use Embedded Schema, Use Schema Name, Use Schema Text, etc. If the incoming 
> Avro files will have embedded schemas, then Use Embedded Schema is best 
> practice for the Avro Reader. However it is not intuitive that if the same 
> schema that is embedded in the file is 

[GitHub] nifi pull request #2718: NIFI-5213: Allow AvroReader to process files w embe...

2018-06-11 Thread MikeThomsen
Github user MikeThomsen commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2718#discussion_r194572498
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReaderWithExplicitSchema.java
 ---
@@ -17,33 +17,61 @@
 
 package org.apache.nifi.avro;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.SequenceInputStream;
 
 import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileStream;
 import org.apache.avro.generic.GenericDatumReader;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.io.BinaryDecoder;
 import org.apache.avro.io.DatumReader;
 import org.apache.avro.io.DecoderFactory;
-import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.commons.io.input.TeeInputStream;
 import org.apache.nifi.serialization.MalformedRecordException;
 import org.apache.nifi.serialization.record.RecordSchema;
 
 public class AvroReaderWithExplicitSchema extends AvroRecordReader {
 private final InputStream in;
 private final RecordSchema recordSchema;
 private final DatumReader datumReader;
-private final BinaryDecoder decoder;
+private BinaryDecoder decoder;
 private GenericRecord genericRecord;
+private DataFileStream dataFileStream;
 
-public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException, 
SchemaNotFoundException {
+public AvroReaderWithExplicitSchema(final InputStream in, final 
RecordSchema recordSchema, final Schema avroSchema) throws IOException {
 this.in = in;
 this.recordSchema = recordSchema;
 
-datumReader = new GenericDatumReader(avroSchema);
-decoder = DecoderFactory.get().binaryDecoder(in, null);
+datumReader = new GenericDatumReader<>(avroSchema);
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+TeeInputStream teeInputStream = new TeeInputStream(in, baos);
+// Try to parse as a DataFileStream, if it works, glue the streams 
back together and delegate calls to the DataFileStream
+try {
+dataFileStream = new DataFileStream<>(teeInputStream, new 
GenericDatumReader<>());
+} catch (IOException ioe) {
+// Carry on, hopefully a raw Avro file
+// Need to be able to re-read the bytes read so far, and the 
InputStream passed in doesn't support reset. Use the TeeInputStream in
+// conjunction with SequenceInputStream to glue the two 
streams back together for future reading
+ByteArrayInputStream bais = new 
ByteArrayInputStream(baos.toByteArray());
+SequenceInputStream sis = new SequenceInputStream(bais, in);
+decoder = DecoderFactory.get().binaryDecoder(sis, null);
+}
+if (dataFileStream != null) {
+// Verify the schemas are the same
+Schema embeddedSchema = dataFileStream.getSchema();
+if (!embeddedSchema.equals(avroSchema)) {
+throw new IOException("Explicit schema does not match 
embedded schema");
--- End diff --

@mattyb149 How does it handle schema evolution in this case? It's possible 
that the Kafka producer has `Corporate Schema v1` and NiFi is configured with 
`Corporate Schema v2` and v2 gracefully allows an upgrade from v1 via Avro 
schema evolution rules. Or am I missing something about that being not really a 
thing WRT the Record API?


---


[jira] [Updated] (NIFI-5297) Add EL Support with Variable Registry scope in ScanAttribute

2018-06-11 Thread Mike Thomsen (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Thomsen updated NIFI-5297:
---
   Resolution: Fixed
Fix Version/s: 1.7.0
   Status: Resolved  (was: Patch Available)

> Add EL Support with Variable Registry scope in ScanAttribute
> 
>
> Key: NIFI-5297
> URL: https://issues.apache.org/jira/browse/NIFI-5297
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Trivial
> Fix For: 1.7.0
>
>
> Add EL support with Variable Registry scope for the Dictionary File property 
> in the ScanAttribute processor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5297) Add EL Support with Variable Registry scope in ScanAttribute

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508910#comment-16508910
 ] 

ASF GitHub Bot commented on NIFI-5297:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2786


> Add EL Support with Variable Registry scope in ScanAttribute
> 
>
> Key: NIFI-5297
> URL: https://issues.apache.org/jira/browse/NIFI-5297
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Trivial
>
> Add EL support with Variable Registry scope for the Dictionary File property 
> in the ScanAttribute processor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5297) Add EL Support with Variable Registry scope in ScanAttribute

2018-06-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508909#comment-16508909
 ] 

ASF subversion and git services commented on NIFI-5297:
---

Commit ee18ead16c7b0697560e27c63cf1dd06b1c38c4f in nifi's branch 
refs/heads/master from [~pvillard]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ee18ead ]

NIFI-5297 - EL support in ScanAttribute

This closes #2786

Signed-off-by: Mike Thomsen 


> Add EL Support with Variable Registry scope in ScanAttribute
> 
>
> Key: NIFI-5297
> URL: https://issues.apache.org/jira/browse/NIFI-5297
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Trivial
>
> Add EL support with Variable Registry scope for the Dictionary File property 
> in the ScanAttribute processor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi pull request #2786: NIFI-5297 - EL support in ScanAttribute

2018-06-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2786


---


[jira] [Commented] (NIFI-5292) Rename existing ElasticSearch client service impl to specify it is for 5.X

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508807#comment-16508807
 ] 

ASF GitHub Bot commented on NIFI-5292:
--

Github user MikeThomsen commented on the issue:

https://github.com/apache/nifi/pull/2782
  
@markap14 can you review?

@joewitt can you review the L? I took a stab at updating the v6 client's 
NOTICE, but am not sure if it's right.


> Rename existing ElasticSearch client service impl to specify it is for 5.X
> --
>
> Key: NIFI-5292
> URL: https://issues.apache.org/jira/browse/NIFI-5292
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.7.0
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>  Labels: Migration
>
> The current version of the impl is 5.X, but has a generic name that will be 
> confusing down the road.
> Add an ES 6.X client service as well.
>  
> Migration note: Anyone using the existing client service component will have 
> to create a new one that corresponds to the version of ElasticSearch they are 
> using.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi issue #2782: NIFI-5292 Renamed ElasticSearch client service impl to sho...

2018-06-11 Thread MikeThomsen
Github user MikeThomsen commented on the issue:

https://github.com/apache/nifi/pull/2782
  
@markap14 can you review?

@joewitt can you review the L? I took a stab at updating the v6 client's 
NOTICE, but am not sure if it's right.


---


[jira] [Commented] (NIFI-5292) Rename existing ElasticSearch client service impl to specify it is for 5.X

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508802#comment-16508802
 ] 

ASF GitHub Bot commented on NIFI-5292:
--

Github user MikeThomsen commented on the issue:

https://github.com/apache/nifi/pull/2782
  
Looked over the transitive dependencies and HdrHistogram ([license info; 
appears public 
domain](https://github.com/HdrHistogram/HdrHistogram/blob/master/LICENSE.txt)) 
and SnakeYaml (ASL, unknown copyright) appear to be the only two needed to be 
added to the NOTICE for the v6 version.


> Rename existing ElasticSearch client service impl to specify it is for 5.X
> --
>
> Key: NIFI-5292
> URL: https://issues.apache.org/jira/browse/NIFI-5292
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.7.0
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>  Labels: Migration
>
> The current version of the impl is 5.X, but has a generic name that will be 
> confusing down the road.
> Add an ES 6.X client service as well.
>  
> Migration note: Anyone using the existing client service component will have 
> to create a new one that corresponds to the version of ElasticSearch they are 
> using.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi issue #2782: NIFI-5292 Renamed ElasticSearch client service impl to sho...

2018-06-11 Thread MikeThomsen
Github user MikeThomsen commented on the issue:

https://github.com/apache/nifi/pull/2782
  
Looked over the transitive dependencies and HdrHistogram ([license info; 
appears public 
domain](https://github.com/HdrHistogram/HdrHistogram/blob/master/LICENSE.txt)) 
and SnakeYaml (ASL, unknown copyright) appear to be the only two needed to be 
added to the NOTICE for the v6 version.


---


[jira] [Commented] (NIFI-5292) Rename existing ElasticSearch client service impl to specify it is for 5.X

2018-06-11 Thread Mike Thomsen (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508716#comment-16508716
 ] 

Mike Thomsen commented on NIFI-5292:


[~pvillard] Ok. It added a label and put a description here. Hopefully that'll 
come up at release time.

> Rename existing ElasticSearch client service impl to specify it is for 5.X
> --
>
> Key: NIFI-5292
> URL: https://issues.apache.org/jira/browse/NIFI-5292
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.7.0
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>  Labels: Migration
>
> The current version of the impl is 5.X, but has a generic name that will be 
> confusing down the road.
> Add an ES 6.X client service as well.
>  
> Migration note: Anyone using the existing client service component will have 
> to create a new one that corresponds to the version of ElasticSearch they are 
> using.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5292) Rename existing ElasticSearch client service impl to specify it is for 5.X

2018-06-11 Thread Mike Thomsen (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Thomsen updated NIFI-5292:
---
Description: 
The current version of the impl is 5.X, but has a generic name that will be 
confusing down the road.

Add an ES 6.X client service as well.

 

Migration note: Anyone using the existing client service component will have to 
create a new one that corresponds to the version of ElasticSearch they are 
using.

  was:
The current version of the impl is 5.X, but has a generic name that will be 
confusing down the road.

Add an ES 6.X client service as well.


> Rename existing ElasticSearch client service impl to specify it is for 5.X
> --
>
> Key: NIFI-5292
> URL: https://issues.apache.org/jira/browse/NIFI-5292
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.7.0
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>  Labels: Migration
>
> The current version of the impl is 5.X, but has a generic name that will be 
> confusing down the road.
> Add an ES 6.X client service as well.
>  
> Migration note: Anyone using the existing client service component will have 
> to create a new one that corresponds to the version of ElasticSearch they are 
> using.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5292) Rename existing ElasticSearch client service impl to specify it is for 5.X

2018-06-11 Thread Mike Thomsen (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Thomsen updated NIFI-5292:
---
Affects Version/s: 1.7.0

> Rename existing ElasticSearch client service impl to specify it is for 5.X
> --
>
> Key: NIFI-5292
> URL: https://issues.apache.org/jira/browse/NIFI-5292
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.7.0
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>  Labels: Migration
>
> The current version of the impl is 5.X, but has a generic name that will be 
> confusing down the road.
> Add an ES 6.X client service as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5292) Rename existing ElasticSearch client service impl to specify it is for 5.X

2018-06-11 Thread Mike Thomsen (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Thomsen updated NIFI-5292:
---
Labels: Migration  (was: )

> Rename existing ElasticSearch client service impl to specify it is for 5.X
> --
>
> Key: NIFI-5292
> URL: https://issues.apache.org/jira/browse/NIFI-5292
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.7.0
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>  Labels: Migration
>
> The current version of the impl is 5.X, but has a generic name that will be 
> confusing down the road.
> Add an ES 6.X client service as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5287) LookupRecord should supply flowfile attributes to the lookup service

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508672#comment-16508672
 ] 

ASF GitHub Bot commented on NIFI-5287:
--

Github user MikeThomsen commented on the issue:

https://github.com/apache/nifi/pull/2777
  
@markap14 @ijokarumawak updated based on the last comment.


> LookupRecord should supply flowfile attributes to the lookup service
> 
>
> Key: NIFI-5287
> URL: https://issues.apache.org/jira/browse/NIFI-5287
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>
> -LookupRecord should supply the flowfile attributes to the lookup service. It 
> should be done as follows:-
>  # -Provide a regular expression to choose which attributes are used.-
>  # -The chosen attributes should be foundation of the coordinates map used 
> for the lookup.-
>  # -If a configured key collides with a flowfile attribute, it should 
> override the flowfile attribute in the coordinate map.-
> Mark had the right idea:
>  
> I would propose an alternative approach, which would be to add a new method 
> to the interface that has a default implementation:
> {{default Optional lookup(Map coordinates, Map String> context) throws LookupFailureException \{ return lookup(coordinates); 
> } }}
> Where {{context}} is used for the FlowFile attributes (I'm referring to it as 
> {{context}} instead of {{attributes}} because there may well be a case where 
> we want to provide some other value that is not specifically a FlowFile 
> attribute). Here is why I am suggesting this:
>  * It provides a clean interface that properly separates the data's 
> coordinates from FlowFile attributes.
>  * It prevents any collisions between FlowFile attribute names and 
> coordinates.
>  * It maintains backward compatibility, and we know that it won't change the 
> behavior of existing services or processors/components using those services - 
> even those that may have been implemented by others outside of the Apache 
> realm.
>  * If attributes are passed in by a Processor, those attributes will be 
> ignored anyway unless the Controller Service is specifically updated to make 
> use of those attributes, such as via Expression Language. In such a case, the 
> Controller Service can simply be updated at that time to make use of the new 
> method instead of the existing method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi issue #2777: NIFI-5287 Made LookupRecord able to take in flowfile attri...

2018-06-11 Thread MikeThomsen
Github user MikeThomsen commented on the issue:

https://github.com/apache/nifi/pull/2777
  
@markap14 @ijokarumawak updated based on the last comment.


---


[jira] [Updated] (NIFI-5287) LookupRecord should supply flowfile attributes to the lookup service

2018-06-11 Thread Mike Thomsen (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Thomsen updated NIFI-5287:
---
Description: 
-LookupRecord should supply the flowfile attributes to the lookup service. It 
should be done as follows:-
 # -Provide a regular expression to choose which attributes are used.-
 # -The chosen attributes should be foundation of the coordinates map used for 
the lookup.-
 # -If a configured key collides with a flowfile attribute, it should override 
the flowfile attribute in the coordinate map.-

Mark had the right idea:

 

I would propose an alternative approach, which would be to add a new method to 
the interface that has a default implementation:

{{default Optional lookup(Map coordinates, Map context) throws LookupFailureException \{ return lookup(coordinates); } 
}}

Where {{context}} is used for the FlowFile attributes (I'm referring to it as 
{{context}} instead of {{attributes}} because there may well be a case where we 
want to provide some other value that is not specifically a FlowFile 
attribute). Here is why I am suggesting this:
 * It provides a clean interface that properly separates the data's coordinates 
from FlowFile attributes.
 * It prevents any collisions between FlowFile attribute names and coordinates.
 * It maintains backward compatibility, and we know that it won't change the 
behavior of existing services or processors/components using those services - 
even those that may have been implemented by others outside of the Apache realm.
 * If attributes are passed in by a Processor, those attributes will be ignored 
anyway unless the Controller Service is specifically updated to make use of 
those attributes, such as via Expression Language. In such a case, the 
Controller Service can simply be updated at that time to make use of the new 
method instead of the existing method.

  was:
LookupRecord should supply the flowfile attributes to the lookup service. It 
should be done as follows:
 # Provide a regular expression to choose which attributes are used.
 # The chosen attributes should be foundation of the coordinates map used for 
the lookup.
 # If a configured key collides with a flowfile attribute, it should override 
the flowfile attribute in the coordinate map.


> LookupRecord should supply flowfile attributes to the lookup service
> 
>
> Key: NIFI-5287
> URL: https://issues.apache.org/jira/browse/NIFI-5287
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>
> -LookupRecord should supply the flowfile attributes to the lookup service. It 
> should be done as follows:-
>  # -Provide a regular expression to choose which attributes are used.-
>  # -The chosen attributes should be foundation of the coordinates map used 
> for the lookup.-
>  # -If a configured key collides with a flowfile attribute, it should 
> override the flowfile attribute in the coordinate map.-
> Mark had the right idea:
>  
> I would propose an alternative approach, which would be to add a new method 
> to the interface that has a default implementation:
> {{default Optional lookup(Map coordinates, Map String> context) throws LookupFailureException \{ return lookup(coordinates); 
> } }}
> Where {{context}} is used for the FlowFile attributes (I'm referring to it as 
> {{context}} instead of {{attributes}} because there may well be a case where 
> we want to provide some other value that is not specifically a FlowFile 
> attribute). Here is why I am suggesting this:
>  * It provides a clean interface that properly separates the data's 
> coordinates from FlowFile attributes.
>  * It prevents any collisions between FlowFile attribute names and 
> coordinates.
>  * It maintains backward compatibility, and we know that it won't change the 
> behavior of existing services or processors/components using those services - 
> even those that may have been implemented by others outside of the Apache 
> realm.
>  * If attributes are passed in by a Processor, those attributes will be 
> ignored anyway unless the Controller Service is specifically updated to make 
> use of those attributes, such as via Expression Language. In such a case, the 
> Controller Service can simply be updated at that time to make use of the new 
> method instead of the existing method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5292) Rename existing ElasticSearch client service impl to specify it is for 5.X

2018-06-11 Thread Pierre Villard (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508641#comment-16508641
 ] 

Pierre Villard commented on NIFI-5292:
--

[~mike.thomsen] - I believe the 'migration' label has already been used in the 
past to tag a JIRA specifically requiring actions during the RM process. 
However, I don't think this has been formalized. It'd be nice to add a 
paragraph about that in the release guide 
([https://nifi.apache.org/release-guide.html]).

> Rename existing ElasticSearch client service impl to specify it is for 5.X
> --
>
> Key: NIFI-5292
> URL: https://issues.apache.org/jira/browse/NIFI-5292
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>
> The current version of the impl is 5.X, but has a generic name that will be 
> confusing down the road.
> Add an ES 6.X client service as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5297) Add EL Support with Variable Registry scope in ScanAttribute

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508622#comment-16508622
 ] 

ASF GitHub Bot commented on NIFI-5297:
--

GitHub user pvillard31 opened a pull request:

https://github.com/apache/nifi/pull/2786

NIFI-5297 - EL support in ScanAttribute

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pvillard31/nifi NIFI-5297

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2786


commit 589e631ec4f038b0cd4919305671a25b0ba436c0
Author: Pierre Villard 
Date:   2018-06-11T19:39:19Z

NIFI-5297 - EL support in ScanAttribute




> Add EL Support with Variable Registry scope in ScanAttribute
> 
>
> Key: NIFI-5297
> URL: https://issues.apache.org/jira/browse/NIFI-5297
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Trivial
>
> Add EL support with Variable Registry scope for the Dictionary File property 
> in the ScanAttribute processor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5297) Add EL Support with Variable Registry scope in ScanAttribute

2018-06-11 Thread Pierre Villard (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-5297:
-
Status: Patch Available  (was: Open)

> Add EL Support with Variable Registry scope in ScanAttribute
> 
>
> Key: NIFI-5297
> URL: https://issues.apache.org/jira/browse/NIFI-5297
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Trivial
>
> Add EL support with Variable Registry scope for the Dictionary File property 
> in the ScanAttribute processor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi pull request #2786: NIFI-5297 - EL support in ScanAttribute

2018-06-11 Thread pvillard31
GitHub user pvillard31 opened a pull request:

https://github.com/apache/nifi/pull/2786

NIFI-5297 - EL support in ScanAttribute

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pvillard31/nifi NIFI-5297

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2786


commit 589e631ec4f038b0cd4919305671a25b0ba436c0
Author: Pierre Villard 
Date:   2018-06-11T19:39:19Z

NIFI-5297 - EL support in ScanAttribute




---


[jira] [Commented] (NIFI-5292) Rename existing ElasticSearch client service impl to specify it is for 5.X

2018-06-11 Thread Mike Thomsen (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508615#comment-16508615
 ] 

Mike Thomsen commented on NIFI-5292:


[~pvillard] is there an established way of doing that that will be easy to pick 
up in the release notes and migration guide?

> Rename existing ElasticSearch client service impl to specify it is for 5.X
> --
>
> Key: NIFI-5292
> URL: https://issues.apache.org/jira/browse/NIFI-5292
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>
> The current version of the impl is 5.X, but has a generic name that will be 
> confusing down the road.
> Add an ES 6.X client service as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NIFI-5297) Add EL Support with Variable Registry scope in ScanAttribute

2018-06-11 Thread Pierre Villard (JIRA)
Pierre Villard created NIFI-5297:


 Summary: Add EL Support with Variable Registry scope in 
ScanAttribute
 Key: NIFI-5297
 URL: https://issues.apache.org/jira/browse/NIFI-5297
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions
Reporter: Pierre Villard
Assignee: Pierre Villard


Add EL support with Variable Registry scope for the Dictionary File property in 
the ScanAttribute processor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5296) Add EL Support with Variable Registry scope on SSL context service

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508594#comment-16508594
 ] 

ASF GitHub Bot commented on NIFI-5296:
--

Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2785
  
@alopresto - could you have a look? I want to be sure I'm not doing 
something that would not be allowed for some reasons (I removed unit tests 
specifically checking that EL was not allowed, I hope I'm not missing a point 
here).


> Add EL Support with Variable Registry scope on SSL context service
> --
>
> Key: NIFI-5296
> URL: https://issues.apache.org/jira/browse/NIFI-5296
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Major
>
> Add EL support on Truststore and Keystore filename properties with Variable 
> Registry scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5296) Add EL Support with Variable Registry scope on SSL context service

2018-06-11 Thread Pierre Villard (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-5296:
-
Status: Patch Available  (was: Open)

> Add EL Support with Variable Registry scope on SSL context service
> --
>
> Key: NIFI-5296
> URL: https://issues.apache.org/jira/browse/NIFI-5296
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Major
>
> Add EL support on Truststore and Keystore filename properties with Variable 
> Registry scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi issue #2785: NIFI-5296 - Add EL Support with Variable Registry scope on...

2018-06-11 Thread pvillard31
Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2785
  
@alopresto - could you have a look? I want to be sure I'm not doing 
something that would not be allowed for some reasons (I removed unit tests 
specifically checking that EL was not allowed, I hope I'm not missing a point 
here).


---


[jira] [Commented] (NIFI-5296) Add EL Support with Variable Registry scope on SSL context service

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508591#comment-16508591
 ] 

ASF GitHub Bot commented on NIFI-5296:
--

GitHub user pvillard31 opened a pull request:

https://github.com/apache/nifi/pull/2785

NIFI-5296 - Add EL Support with Variable Registry scope on SSL contex…

…t service

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pvillard31/nifi NIFI-5296

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2785


commit c38e53f295d04055351dcb76f539e1338b229c30
Author: Pierre Villard 
Date:   2018-06-11T19:18:40Z

NIFI-5296 - Add EL Support with Variable Registry scope on SSL context 
service




> Add EL Support with Variable Registry scope on SSL context service
> --
>
> Key: NIFI-5296
> URL: https://issues.apache.org/jira/browse/NIFI-5296
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Major
>
> Add EL support on Truststore and Keystore filename properties with Variable 
> Registry scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi pull request #2785: NIFI-5296 - Add EL Support with Variable Registry s...

2018-06-11 Thread pvillard31
GitHub user pvillard31 opened a pull request:

https://github.com/apache/nifi/pull/2785

NIFI-5296 - Add EL Support with Variable Registry scope on SSL contex…

…t service

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pvillard31/nifi NIFI-5296

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2785


commit c38e53f295d04055351dcb76f539e1338b229c30
Author: Pierre Villard 
Date:   2018-06-11T19:18:40Z

NIFI-5296 - Add EL Support with Variable Registry scope on SSL context 
service




---


[jira] [Commented] (NIFI-4907) Provenance authorization refactoring

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508555#comment-16508555
 ] 

ASF GitHub Bot commented on NIFI-4907:
--

Github user markobean commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194513579
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/FlowController.java
 ---
@@ -4919,6 +4925,22 @@ private void updateRemoteProcessGroups() {
 return new 
ArrayList<>(provenanceRepository.getEvents(firstEventId, maxRecords));
 }
 
+public AuthorizationResult checkConnectableAuthorization(final String 
componentId) {
--- End diff --

Correct. This was moved to ControllerFacade.java. I will remove it from 
FlowController.java.


> Provenance authorization refactoring
> 
>
> Key: NIFI-4907
> URL: https://issues.apache.org/jira/browse/NIFI-4907
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.5.0
>Reporter: Mark Bean
>Assignee: Mark Bean
>Priority: Major
>
> Currently, the 'view the data' component policy is too tightly coupled with 
> Provenance queries. The 'query provenance' policy should be the only policy 
> required for viewing Provenance query results. Both 'view the component' and 
> 'view the data' policies should be used to refine the appropriate visibility 
> of event details - but not the event itself.
> 1) Component Visibility
> The authorization of Provenance events is inconsistent with the behavior of 
> the graph. For example, if a user does not have 'view the component' policy, 
> the graph shows this component as a "black box" (no details such as name, 
> UUID, etc.) However, when querying Provenance, this component will show up 
> including the Component Type and the Component Name. This is in effect a 
> violation of the policy. These component details should be obscured in the 
> Provenance event displayed if user does not have the appropriate 'view the 
> component' policy.
> 2) Data Visibility
> For a Provenance query, all events should be visible as long as the user 
> performing the query belongs to the 'query provenance' global policy. As 
> mentioned above, some information about the component may be obscured 
> depending on 'view the component' policy, but the event itself should be 
> visible. Additionally, details of the event (clicking the View Details "i" 
> icon) should only be accessible if the user belongs to the 'view the data' 
> policy for the affected component. If the user is not in the appropriate 
> 'view the data' policy, a popup warning should be displayed indicating the 
> reason details are not visible with more specific detail than the current 
> "Contact the system administrator".
> 3) Lineage Graphs
> As with the Provenance table view recommendation above, the lineage graph 
> should display all events. Currently, if the lineage graph includes an event 
> belonging to a component which the user does not have 'view the data', it is 
> shown on the graph as "UNKNOWN". As with Data Visibility mentioned above, the 
> graph should indicate the event type as long as the user is in the 'view the 
> component'. Subsequent "View Details" on the event should only be visible if 
> the user is in the 'view the data' policy.
> In summary, for Provenance query results and lineage graphs, all events 
> should be shown. Component Name and Component Type information should be 
> conditionally visible depending on the corresponding component policy 'view 
> the component' policy. Event details including Provenance event type and 
> FlowFile information should be conditionally available depending on the 
> corresponding component policy 'view the data'. Inability to display event 
> details should provide feedback to the user indicating the reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] nifi pull request #2703: NIFI-4907: add 'view provenance' component policy

2018-06-11 Thread markobean
Github user markobean commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194513579
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/FlowController.java
 ---
@@ -4919,6 +4925,22 @@ private void updateRemoteProcessGroups() {
 return new 
ArrayList<>(provenanceRepository.getEvents(firstEventId, maxRecords));
 }
 
+public AuthorizationResult checkConnectableAuthorization(final String 
componentId) {
--- End diff --

Correct. This was moved to ControllerFacade.java. I will remove it from 
FlowController.java.


---


[jira] [Commented] (NIFI-4907) Provenance authorization refactoring

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508548#comment-16508548
 ] 

ASF GitHub Bot commented on NIFI-4907:
--

Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194512038
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
--- End diff --

Yes, I agree. The event type should be controlled by the new provenance 
event policy. It is not controlled by the component policy that protects the 
component name and component type. 


> Provenance authorization refactoring
> 
>
> Key: NIFI-4907
> URL: https://issues.apache.org/jira/browse/NIFI-4907
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.5.0
>Reporter: Mark Bean
>Assignee: Mark Bean
>Priority: Major
>
> Currently, the 'view the data' component policy is too tightly coupled with 
> Provenance queries. The 'query provenance' policy should be the only policy 
> required for viewing Provenance query results. Both 'view the component' and 
> 'view the data' policies should be used to refine the appropriate visibility 
> of event details - but not the event itself.
> 1) Component Visibility
> The authorization of Provenance events is inconsistent with the behavior of 
> the graph. For example, if a user does not have 'view the component' policy, 
> the graph shows this component as a "black box" (no details such as name, 
> UUID, etc.) However, when querying Provenance, this component will show up 
> including the Component Type and the Component Name. This is in effect a 
> violation of the policy. These component details should be obscured in the 
> Provenance event displayed if user does not have the appropriate 'view the 
> component' policy.
> 2) Data Visibility
> For a Provenance query, all events should be visible as long as the user 
> performing the query belongs to the 'query provenance' global policy. As 
> mentioned above, some information about the component may be obscured 
> depending on 'view the component' policy, but the event itself should be 
> visible. Additionally, details of the event (clicking the View Details "i" 
> icon) should only be accessible if the user belongs to the 'view the data' 
> policy for the affected component. If the user is not in the appropriate 
> 'view the data' policy, a popup warning should be displayed indicating the 
> reason details are not visible with more specific detail than the current 
> "Contact the system administrator".
> 3) Lineage Graphs
> As with the Provenance table view recommendation above, the lineage graph 
> should display all events. Currently, if the lineage graph includes an event 
> belonging to a component which the user does not have 'view the data', it is 
> shown on the graph as "UNKNOWN". As with Data Visibility mentioned above, the 
> graph should indicate the event type as long as the user is in the 'view the 
> component'. Subsequent "View Details" on the event should only be visible if 
> the user is in the 'view the data' policy.
> In summary, for Provenance query results and lineage graphs, all events 
> should be shown. Component Name and Component Type information should be 
> conditionally visible depending on the corresponding component policy 'view 
> the component' policy. Event details including Provenance event type and 
> FlowFile information should be conditionally available depending 

[GitHub] nifi pull request #2703: NIFI-4907: add 'view provenance' component policy

2018-06-11 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194512038
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
--- End diff --

Yes, I agree. The event type should be controlled by the new provenance 
event policy. It is not controlled by the component policy that protects the 
component name and component type. 


---


[jira] [Commented] (NIFI-4907) Provenance authorization refactoring

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508544#comment-16508544
 ] 

ASF GitHub Bot commented on NIFI-4907:
--

Github user markobean commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194510876
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
--- End diff --

If we choose to _not_ redact event type, that makes life easier. Currently, 
it displays "UNKNOWN" in the table (when 'view provenance' is enabled and 'view 
the component' is not). But, the event type IS diplayed in the lineage graph. 
We need to get to consistency one way or the other on this. I'm leaning towards 
allowing the event type info to be visible since this is a characteristic of 
provenance (i.e. 'view provenance') and not a characteristic of 'view the 
component'.


> Provenance authorization refactoring
> 
>
> Key: NIFI-4907
> URL: https://issues.apache.org/jira/browse/NIFI-4907
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.5.0
>Reporter: Mark Bean
>Assignee: Mark Bean
>Priority: Major
>
> Currently, the 'view the data' component policy is too tightly coupled with 
> Provenance queries. The 'query provenance' policy should be the only policy 
> required for viewing Provenance query results. Both 'view the component' and 
> 'view the data' policies should be used to refine the appropriate visibility 
> of event details - but not the event itself.
> 1) Component Visibility
> The authorization of Provenance events is inconsistent with the behavior of 
> the graph. For example, if a user does not have 'view the component' policy, 
> the graph shows this component as a "black box" (no details such as name, 
> UUID, etc.) However, when querying Provenance, this component will show up 
> including the Component Type and the Component Name. This is in effect a 
> violation of the policy. These component details should be obscured in the 
> Provenance event displayed if user does not have the appropriate 'view the 
> component' policy.
> 2) Data Visibility
> For a Provenance query, all events should be visible as long as the user 
> performing the query belongs to the 'query provenance' global policy. As 
> mentioned above, some information about the component may be obscured 
> depending on 'view the component' policy, but the event itself should be 
> visible. Additionally, details of the event (clicking the View Details "i" 
> icon) should only be accessible if the user belongs to the 'view the data' 
> policy for the affected component. If the user is not in the appropriate 
> 'view the data' policy, a popup warning should be displayed indicating the 
> reason details are not visible with more specific detail than the current 
> "Contact the system administrator".
> 3) Lineage Graphs
> As with the Provenance table view recommendation above, the lineage graph 
> should display all events. Currently, if the lineage graph includes an event 
> belonging to a component which the user does not have 'view the data', it is 
> shown on the graph as "UNKNOWN". As with Data Visibility mentioned above, the 
> graph should indicate the event type as long as the user is in the 'view the 
> component'. Subsequent "View Details" on the event should only be visible if 
> the user is in the 'view the data' policy.
> In summary, for Provenance query results and lineage graphs, all events 
> 

[GitHub] nifi pull request #2703: NIFI-4907: add 'view provenance' component policy

2018-06-11 Thread markobean
Github user markobean commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194510876
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
--- End diff --

If we choose to _not_ redact event type, that makes life easier. Currently, 
it displays "UNKNOWN" in the table (when 'view provenance' is enabled and 'view 
the component' is not). But, the event type IS diplayed in the lineage graph. 
We need to get to consistency one way or the other on this. I'm leaning towards 
allowing the event type info to be visible since this is a characteristic of 
provenance (i.e. 'view provenance') and not a characteristic of 'view the 
component'.


---


[jira] [Created] (NIFI-5296) Add EL Support with Variable Registry scope on SSL context service

2018-06-11 Thread Pierre Villard (JIRA)
Pierre Villard created NIFI-5296:


 Summary: Add EL Support with Variable Registry scope on SSL 
context service
 Key: NIFI-5296
 URL: https://issues.apache.org/jira/browse/NIFI-5296
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions
Reporter: Pierre Villard
Assignee: Pierre Villard


Add EL support on Truststore and Keystore filename properties with Variable 
Registry scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5292) Rename existing ElasticSearch client service impl to specify it is for 5.X

2018-06-11 Thread Pierre Villard (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508523#comment-16508523
 ] 

Pierre Villard commented on NIFI-5292:
--

Can we add a note somewhere (or add a label to this JIRA) to mention this 
change in the release note / upgrade path once 1.7.0 is released?

> Rename existing ElasticSearch client service impl to specify it is for 5.X
> --
>
> Key: NIFI-5292
> URL: https://issues.apache.org/jira/browse/NIFI-5292
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike Thomsen
>Assignee: Mike Thomsen
>Priority: Major
>
> The current version of the impl is 5.X, but has a generic name that will be 
> confusing down the road.
> Add an ES 6.X client service as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4907) Provenance authorization refactoring

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508510#comment-16508510
 ] 

ASF GitHub Bot commented on NIFI-4907:
--

Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194499578
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
+}
 
-final SortedSet attributes = new 
TreeSet<>(attributeComparator);
+//authorizeData(event);
+final AuthorizationResult dataResult = 
checkAuthorizationForData(event); //(authorizer, RequestAction.READ, user, 
event.getAttributes());
 
-final Map updatedAttrs = 
event.getUpdatedAttributes();
-final Map previousAttrs = 
event.getPreviousAttributes();
+// only include all details if not summarizing and approved
+if (!summarize && 
Result.Approved.equals(dataResult.getResult())) {
--- End diff --

If the user is not authorized for the data of a component we should still 
be able to return a non-summary. In this case, we should just be leaving out 
any of the data fields in the ProvenanceEventDto. I would consider these fields 
data fields as they are associated with either attributes, content, or replay 
(all of which requires data policies to execute).

```
private Collection attributes;

private Boolean contentEqual;
private Boolean inputContentAvailable;
private String inputContentClaimSection;
private String inputContentClaimContainer;
private String inputContentClaimIdentifier;
private Long inputContentClaimOffset;
private String inputContentClaimFileSize;
private Long inputContentClaimFileSizeBytes;
private Boolean outputContentAvailable;
private String outputContentClaimSection;
private String outputContentClaimContainer;
private String outputContentClaimIdentifier;
private Long outputContentClaimOffset;
private String outputContentClaimFileSize;
private Long outputContentClaimFileSizeBytes;

private Boolean replayAvailable;
private String replayExplanation;
private String sourceConnectionIdentifier;
```


> Provenance authorization refactoring
> 
>
> Key: NIFI-4907
> URL: https://issues.apache.org/jira/browse/NIFI-4907
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.5.0
>Reporter: Mark Bean
>Assignee: Mark Bean
>Priority: Major
>
> Currently, the 'view the data' component policy is too tightly coupled with 
> Provenance queries. The 'query provenance' policy should be the only policy 
> required for viewing Provenance query results. Both 'view the component' and 
> 'view the data' policies should be used to refine the appropriate visibility 
> of event details - but not the event itself.
> 1) Component Visibility
> The authorization of Provenance events is inconsistent with the behavior of 
> the graph. For example, if a user does not have 'view the component' policy, 
> the graph shows this component as a "black box" (no details such as name, 
> UUID, etc.) However, when querying Provenance, this component will show up 
> including the Component Type and the Component Name. This is in effect a 
> violation of the policy. These component details should be obscured in the 
> Provenance event 

[jira] [Commented] (NIFI-4907) Provenance authorization refactoring

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508505#comment-16508505
 ] 

ASF GitHub Bot commented on NIFI-4907:
--

Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194495379
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/FlowController.java
 ---
@@ -4919,6 +4925,22 @@ private void updateRemoteProcessGroups() {
 return new 
ArrayList<>(provenanceRepository.getEvents(firstEventId, maxRecords));
 }
 
+public AuthorizationResult checkConnectableAuthorization(final String 
componentId) {
--- End diff --

I don't believe this is called.


> Provenance authorization refactoring
> 
>
> Key: NIFI-4907
> URL: https://issues.apache.org/jira/browse/NIFI-4907
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.5.0
>Reporter: Mark Bean
>Assignee: Mark Bean
>Priority: Major
>
> Currently, the 'view the data' component policy is too tightly coupled with 
> Provenance queries. The 'query provenance' policy should be the only policy 
> required for viewing Provenance query results. Both 'view the component' and 
> 'view the data' policies should be used to refine the appropriate visibility 
> of event details - but not the event itself.
> 1) Component Visibility
> The authorization of Provenance events is inconsistent with the behavior of 
> the graph. For example, if a user does not have 'view the component' policy, 
> the graph shows this component as a "black box" (no details such as name, 
> UUID, etc.) However, when querying Provenance, this component will show up 
> including the Component Type and the Component Name. This is in effect a 
> violation of the policy. These component details should be obscured in the 
> Provenance event displayed if user does not have the appropriate 'view the 
> component' policy.
> 2) Data Visibility
> For a Provenance query, all events should be visible as long as the user 
> performing the query belongs to the 'query provenance' global policy. As 
> mentioned above, some information about the component may be obscured 
> depending on 'view the component' policy, but the event itself should be 
> visible. Additionally, details of the event (clicking the View Details "i" 
> icon) should only be accessible if the user belongs to the 'view the data' 
> policy for the affected component. If the user is not in the appropriate 
> 'view the data' policy, a popup warning should be displayed indicating the 
> reason details are not visible with more specific detail than the current 
> "Contact the system administrator".
> 3) Lineage Graphs
> As with the Provenance table view recommendation above, the lineage graph 
> should display all events. Currently, if the lineage graph includes an event 
> belonging to a component which the user does not have 'view the data', it is 
> shown on the graph as "UNKNOWN". As with Data Visibility mentioned above, the 
> graph should indicate the event type as long as the user is in the 'view the 
> component'. Subsequent "View Details" on the event should only be visible if 
> the user is in the 'view the data' policy.
> In summary, for Provenance query results and lineage graphs, all events 
> should be shown. Component Name and Component Type information should be 
> conditionally visible depending on the corresponding component policy 'view 
> the component' policy. Event details including Provenance event type and 
> FlowFile information should be conditionally available depending on the 
> corresponding component policy 'view the data'. Inability to display event 
> details should provide feedback to the user indicating the reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4907) Provenance authorization refactoring

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508507#comment-16508507
 ] 

ASF GitHub Bot commented on NIFI-4907:
--

Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194498155
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
+}
 
-final SortedSet attributes = new 
TreeSet<>(attributeComparator);
+//authorizeData(event);
+final AuthorizationResult dataResult = 
checkAuthorizationForData(event); //(authorizer, RequestAction.READ, user, 
event.getAttributes());
--- End diff --

We only need to authorize for the data if the event is a non-summary. For 
instance, when we're pulling back 1000 summaries to load the provenance table 
we don't need to check any data policies.


> Provenance authorization refactoring
> 
>
> Key: NIFI-4907
> URL: https://issues.apache.org/jira/browse/NIFI-4907
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.5.0
>Reporter: Mark Bean
>Assignee: Mark Bean
>Priority: Major
>
> Currently, the 'view the data' component policy is too tightly coupled with 
> Provenance queries. The 'query provenance' policy should be the only policy 
> required for viewing Provenance query results. Both 'view the component' and 
> 'view the data' policies should be used to refine the appropriate visibility 
> of event details - but not the event itself.
> 1) Component Visibility
> The authorization of Provenance events is inconsistent with the behavior of 
> the graph. For example, if a user does not have 'view the component' policy, 
> the graph shows this component as a "black box" (no details such as name, 
> UUID, etc.) However, when querying Provenance, this component will show up 
> including the Component Type and the Component Name. This is in effect a 
> violation of the policy. These component details should be obscured in the 
> Provenance event displayed if user does not have the appropriate 'view the 
> component' policy.
> 2) Data Visibility
> For a Provenance query, all events should be visible as long as the user 
> performing the query belongs to the 'query provenance' global policy. As 
> mentioned above, some information about the component may be obscured 
> depending on 'view the component' policy, but the event itself should be 
> visible. Additionally, details of the event (clicking the View Details "i" 
> icon) should only be accessible if the user belongs to the 'view the data' 
> policy for the affected component. If the user is not in the appropriate 
> 'view the data' policy, a popup warning should be displayed indicating the 
> reason details are not visible with more specific detail than the current 
> "Contact the system administrator".
> 3) Lineage Graphs
> As with the Provenance table view recommendation above, the lineage graph 
> should display all events. Currently, if the lineage graph includes an event 
> belonging to a component which the user does not have 'view the data', it is 
> shown on the graph as "UNKNOWN". As with Data Visibility mentioned above, the 
> graph should indicate the event type as long as the user is in the 'view the 
> component'. Subsequent "View Details" on the event should only be visible if 
> the user is in the 'view the data' policy.
> In summary, for Provenance query results and lineage 

[jira] [Commented] (NIFI-4907) Provenance authorization refactoring

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508508#comment-16508508
 ] 

ASF GitHub Bot commented on NIFI-4907:
--

Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194496260
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
--- End diff --

Do you think that we need to redact the event type when the user does not 
have permissions to the component policy? I would have considered this field 
under the new provenance event policy.


> Provenance authorization refactoring
> 
>
> Key: NIFI-4907
> URL: https://issues.apache.org/jira/browse/NIFI-4907
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.5.0
>Reporter: Mark Bean
>Assignee: Mark Bean
>Priority: Major
>
> Currently, the 'view the data' component policy is too tightly coupled with 
> Provenance queries. The 'query provenance' policy should be the only policy 
> required for viewing Provenance query results. Both 'view the component' and 
> 'view the data' policies should be used to refine the appropriate visibility 
> of event details - but not the event itself.
> 1) Component Visibility
> The authorization of Provenance events is inconsistent with the behavior of 
> the graph. For example, if a user does not have 'view the component' policy, 
> the graph shows this component as a "black box" (no details such as name, 
> UUID, etc.) However, when querying Provenance, this component will show up 
> including the Component Type and the Component Name. This is in effect a 
> violation of the policy. These component details should be obscured in the 
> Provenance event displayed if user does not have the appropriate 'view the 
> component' policy.
> 2) Data Visibility
> For a Provenance query, all events should be visible as long as the user 
> performing the query belongs to the 'query provenance' global policy. As 
> mentioned above, some information about the component may be obscured 
> depending on 'view the component' policy, but the event itself should be 
> visible. Additionally, details of the event (clicking the View Details "i" 
> icon) should only be accessible if the user belongs to the 'view the data' 
> policy for the affected component. If the user is not in the appropriate 
> 'view the data' policy, a popup warning should be displayed indicating the 
> reason details are not visible with more specific detail than the current 
> "Contact the system administrator".
> 3) Lineage Graphs
> As with the Provenance table view recommendation above, the lineage graph 
> should display all events. Currently, if the lineage graph includes an event 
> belonging to a component which the user does not have 'view the data', it is 
> shown on the graph as "UNKNOWN". As with Data Visibility mentioned above, the 
> graph should indicate the event type as long as the user is in the 'view the 
> component'. Subsequent "View Details" on the event should only be visible if 
> the user is in the 'view the data' policy.
> In summary, for Provenance query results and lineage graphs, all events 
> should be shown. Component Name and Component Type information should be 
> conditionally visible depending on the corresponding component policy 'view 
> the component' policy. Event details including Provenance event type and 
> FlowFile information should be conditionally available 

[jira] [Commented] (NIFI-4907) Provenance authorization refactoring

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508506#comment-16508506
 ] 

ASF GitHub Bot commented on NIFI-4907:
--

Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194495873
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
--- End diff --

Why not check the authorization within `setComponentDetails`? In there you 
already have the components to authorize and you'll know the corresponding type.


> Provenance authorization refactoring
> 
>
> Key: NIFI-4907
> URL: https://issues.apache.org/jira/browse/NIFI-4907
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.5.0
>Reporter: Mark Bean
>Assignee: Mark Bean
>Priority: Major
>
> Currently, the 'view the data' component policy is too tightly coupled with 
> Provenance queries. The 'query provenance' policy should be the only policy 
> required for viewing Provenance query results. Both 'view the component' and 
> 'view the data' policies should be used to refine the appropriate visibility 
> of event details - but not the event itself.
> 1) Component Visibility
> The authorization of Provenance events is inconsistent with the behavior of 
> the graph. For example, if a user does not have 'view the component' policy, 
> the graph shows this component as a "black box" (no details such as name, 
> UUID, etc.) However, when querying Provenance, this component will show up 
> including the Component Type and the Component Name. This is in effect a 
> violation of the policy. These component details should be obscured in the 
> Provenance event displayed if user does not have the appropriate 'view the 
> component' policy.
> 2) Data Visibility
> For a Provenance query, all events should be visible as long as the user 
> performing the query belongs to the 'query provenance' global policy. As 
> mentioned above, some information about the component may be obscured 
> depending on 'view the component' policy, but the event itself should be 
> visible. Additionally, details of the event (clicking the View Details "i" 
> icon) should only be accessible if the user belongs to the 'view the data' 
> policy for the affected component. If the user is not in the appropriate 
> 'view the data' policy, a popup warning should be displayed indicating the 
> reason details are not visible with more specific detail than the current 
> "Contact the system administrator".
> 3) Lineage Graphs
> As with the Provenance table view recommendation above, the lineage graph 
> should display all events. Currently, if the lineage graph includes an event 
> belonging to a component which the user does not have 'view the data', it is 
> shown on the graph as "UNKNOWN". As with Data Visibility mentioned above, the 
> graph should indicate the event type as long as the user is in the 'view the 
> component'. Subsequent "View Details" on the event should only be visible if 
> the user is in the 'view the data' policy.
> In summary, for Provenance query results and lineage graphs, all events 
> should be shown. Component Name and Component Type information should be 
> conditionally visible depending on the corresponding component policy 'view 
> the component' policy. Event details including Provenance event type and 
> FlowFile information should be conditionally available depending on the 
> corresponding component policy 'view the data'. Inability to display event 
> details should provide feedback to the user indicating the reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4907) Provenance authorization refactoring

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508509#comment-16508509
 ] 

ASF GitHub Bot commented on NIFI-4907:
--

Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194503331
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
+}
 
-final SortedSet attributes = new 
TreeSet<>(attributeComparator);
+//authorizeData(event);
+final AuthorizationResult dataResult = 
checkAuthorizationForData(event); //(authorizer, RequestAction.READ, user, 
event.getAttributes());
--- End diff --

Also, it appears that we're checking the checkAuthorizationForData is 
verifying READ to the data of the corresponding component. This check is 
already done as part of the checkAuthorizationForReplay method. It appears that 
is the only place the replay authorization check is performed. It likely makes 
sense to refactor some of this so that we're only checking permissions for READ 
to the data of the corresponding component once. The remainder of the replay 
authorization check only needs to be performed when we're populating the data 
fields (READ to the data of the corresponding component is approved). See below.


> Provenance authorization refactoring
> 
>
> Key: NIFI-4907
> URL: https://issues.apache.org/jira/browse/NIFI-4907
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.5.0
>Reporter: Mark Bean
>Assignee: Mark Bean
>Priority: Major
>
> Currently, the 'view the data' component policy is too tightly coupled with 
> Provenance queries. The 'query provenance' policy should be the only policy 
> required for viewing Provenance query results. Both 'view the component' and 
> 'view the data' policies should be used to refine the appropriate visibility 
> of event details - but not the event itself.
> 1) Component Visibility
> The authorization of Provenance events is inconsistent with the behavior of 
> the graph. For example, if a user does not have 'view the component' policy, 
> the graph shows this component as a "black box" (no details such as name, 
> UUID, etc.) However, when querying Provenance, this component will show up 
> including the Component Type and the Component Name. This is in effect a 
> violation of the policy. These component details should be obscured in the 
> Provenance event displayed if user does not have the appropriate 'view the 
> component' policy.
> 2) Data Visibility
> For a Provenance query, all events should be visible as long as the user 
> performing the query belongs to the 'query provenance' global policy. As 
> mentioned above, some information about the component may be obscured 
> depending on 'view the component' policy, but the event itself should be 
> visible. Additionally, details of the event (clicking the View Details "i" 
> icon) should only be accessible if the user belongs to the 'view the data' 
> policy for the affected component. If the user is not in the appropriate 
> 'view the data' policy, a popup warning should be displayed indicating the 
> reason details are not visible with more specific detail than the current 
> "Contact the system administrator".
> 3) Lineage Graphs
> As with the Provenance table view recommendation above, the lineage graph 
> should display all events. Currently, if the lineage graph includes an 

[GitHub] nifi pull request #2703: NIFI-4907: add 'view provenance' component policy

2018-06-11 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194496260
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
--- End diff --

Do you think that we need to redact the event type when the user does not 
have permissions to the component policy? I would have considered this field 
under the new provenance event policy.


---


[GitHub] nifi pull request #2703: NIFI-4907: add 'view provenance' component policy

2018-06-11 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194495379
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/FlowController.java
 ---
@@ -4919,6 +4925,22 @@ private void updateRemoteProcessGroups() {
 return new 
ArrayList<>(provenanceRepository.getEvents(firstEventId, maxRecords));
 }
 
+public AuthorizationResult checkConnectableAuthorization(final String 
componentId) {
--- End diff --

I don't believe this is called.


---


[GitHub] nifi pull request #2703: NIFI-4907: add 'view provenance' component policy

2018-06-11 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194503331
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
+}
 
-final SortedSet attributes = new 
TreeSet<>(attributeComparator);
+//authorizeData(event);
+final AuthorizationResult dataResult = 
checkAuthorizationForData(event); //(authorizer, RequestAction.READ, user, 
event.getAttributes());
--- End diff --

Also, it appears that we're checking the checkAuthorizationForData is 
verifying READ to the data of the corresponding component. This check is 
already done as part of the checkAuthorizationForReplay method. It appears that 
is the only place the replay authorization check is performed. It likely makes 
sense to refactor some of this so that we're only checking permissions for READ 
to the data of the corresponding component once. The remainder of the replay 
authorization check only needs to be performed when we're populating the data 
fields (READ to the data of the corresponding component is approved). See below.


---


[GitHub] nifi pull request #2703: NIFI-4907: add 'view provenance' component policy

2018-06-11 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194495873
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
--- End diff --

Why not check the authorization within `setComponentDetails`? In there you 
already have the components to authorize and you'll know the corresponding type.


---


[GitHub] nifi pull request #2703: NIFI-4907: add 'view provenance' component policy

2018-06-11 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194498155
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
+}
 
-final SortedSet attributes = new 
TreeSet<>(attributeComparator);
+//authorizeData(event);
+final AuthorizationResult dataResult = 
checkAuthorizationForData(event); //(authorizer, RequestAction.READ, user, 
event.getAttributes());
--- End diff --

We only need to authorize for the data if the event is a non-summary. For 
instance, when we're pulling back 1000 summaries to load the provenance table 
we don't need to check any data policies.


---


[GitHub] nifi pull request #2703: NIFI-4907: add 'view provenance' component policy

2018-06-11 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2703#discussion_r194499578
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/controller/ControllerFacade.java
 ---
@@ -1389,104 +1420,119 @@ private ProvenanceEventDTO 
createProvenanceEventDto(final ProvenanceEventRecord
 // sets the component details if it can find the component still 
in the flow
 setComponentDetails(dto);
 
-// only include all details if not summarizing
-if (!summarize) {
-// convert the attributes
-final Comparator attributeComparator = new 
Comparator() {
-@Override
-public int compare(AttributeDTO a1, AttributeDTO a2) {
-return 
Collator.getInstance(Locale.US).compare(a1.getName(), a2.getName());
-}
-};
+//try {
+//AuthorizationResult result = 
flowController.checkConnectableAuthorization(event.getComponentId());
+AuthorizationResult result = 
checkConnectableAuthorization(event.getComponentId());
+if (Result.Denied.equals(result.getResult())) {
+dto.setComponentType("Processor"); // is this always a 
Processor?
+dto.setComponentName(dto.getComponentId());
+dto.setEventType("UNKNOWN");
+}
 
-final SortedSet attributes = new 
TreeSet<>(attributeComparator);
+//authorizeData(event);
+final AuthorizationResult dataResult = 
checkAuthorizationForData(event); //(authorizer, RequestAction.READ, user, 
event.getAttributes());
 
-final Map updatedAttrs = 
event.getUpdatedAttributes();
-final Map previousAttrs = 
event.getPreviousAttributes();
+// only include all details if not summarizing and approved
+if (!summarize && 
Result.Approved.equals(dataResult.getResult())) {
--- End diff --

If the user is not authorized for the data of a component we should still 
be able to return a non-summary. In this case, we should just be leaving out 
any of the data fields in the ProvenanceEventDto. I would consider these fields 
data fields as they are associated with either attributes, content, or replay 
(all of which requires data policies to execute).

```
private Collection attributes;

private Boolean contentEqual;
private Boolean inputContentAvailable;
private String inputContentClaimSection;
private String inputContentClaimContainer;
private String inputContentClaimIdentifier;
private Long inputContentClaimOffset;
private String inputContentClaimFileSize;
private Long inputContentClaimFileSizeBytes;
private Boolean outputContentAvailable;
private String outputContentClaimSection;
private String outputContentClaimContainer;
private String outputContentClaimIdentifier;
private Long outputContentClaimOffset;
private String outputContentClaimFileSize;
private Long outputContentClaimFileSizeBytes;

private Boolean replayAvailable;
private String replayExplanation;
private String sourceConnectionIdentifier;
```


---


[jira] [Updated] (NIFI-5270) my ftp password is "${password}" so nifi's LISTFtp won't use it.

2018-06-11 Thread Andy LoPresto (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy LoPresto updated NIFI-5270:

Component/s: Core UI
 Core Framework

> my ftp password is "${password}" so nifi's LISTFtp won't use it.
> 
>
> Key: NIFI-5270
> URL: https://issues.apache.org/jira/browse/NIFI-5270
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework, Core UI
>Affects Versions: 1.6.0
>Reporter: eric twilegar
>Priority: Major
>  Labels: expression-language, passwords, registry, security, 
> variable
>
> I'm joking of course, but if that was your password the processor would fail 
> as it would consider it an expression and not a password.
> In all seriousness though we really do something like "isPasswordExpression" 
> checkbox for all controllers. This would also allow nifi registry to not 
> consider them secrets and so you don't have to cut and paste ${ftp_password} 
> after deploying a version. Maybe just adding passwordExpression vs sharing 
> the property is a better idea. 
>  
> I didn't test whether you can escape the password in someway, so there is a 
> chance this isn't a bug.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5270) my ftp password is "${password}" so nifi's LISTFtp won't use it.

2018-06-11 Thread Andy LoPresto (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy LoPresto updated NIFI-5270:

Labels: expression-language passwords registry security variable  (was: )

> my ftp password is "${password}" so nifi's LISTFtp won't use it.
> 
>
> Key: NIFI-5270
> URL: https://issues.apache.org/jira/browse/NIFI-5270
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework, Core UI
>Affects Versions: 1.6.0
>Reporter: eric twilegar
>Priority: Major
>  Labels: expression-language, passwords, registry, security, 
> variable
>
> I'm joking of course, but if that was your password the processor would fail 
> as it would consider it an expression and not a password.
> In all seriousness though we really do something like "isPasswordExpression" 
> checkbox for all controllers. This would also allow nifi registry to not 
> consider them secrets and so you don't have to cut and paste ${ftp_password} 
> after deploying a version. Maybe just adding passwordExpression vs sharing 
> the property is a better idea. 
>  
> I didn't test whether you can escape the password in someway, so there is a 
> chance this isn't a bug.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5270) my ftp password is "${password}" so nifi's LISTFtp won't use it.

2018-06-11 Thread Andy LoPresto (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy LoPresto updated NIFI-5270:

Affects Version/s: 1.6.0

> my ftp password is "${password}" so nifi's LISTFtp won't use it.
> 
>
> Key: NIFI-5270
> URL: https://issues.apache.org/jira/browse/NIFI-5270
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework, Core UI
>Affects Versions: 1.6.0
>Reporter: eric twilegar
>Priority: Major
>  Labels: expression-language, passwords, registry, security, 
> variable
>
> I'm joking of course, but if that was your password the processor would fail 
> as it would consider it an expression and not a password.
> In all seriousness though we really do something like "isPasswordExpression" 
> checkbox for all controllers. This would also allow nifi registry to not 
> consider them secrets and so you don't have to cut and paste ${ftp_password} 
> after deploying a version. Maybe just adding passwordExpression vs sharing 
> the property is a better idea. 
>  
> I didn't test whether you can escape the password in someway, so there is a 
> chance this isn't a bug.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5270) my ftp password is "${password}" so nifi's LISTFtp won't use it.

2018-06-11 Thread Andy LoPresto (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508486#comment-16508486
 ] 

Andy LoPresto commented on NIFI-5270:
-

Hi Eric,

There has been some on-going discussion of this (pre-dating the NiFi Registry 
effort) and how it would relate to the Variable Registry. The effort has been 
paused a bit while other priorities have come up. 

I think the last discussion I recall had landed on "password guidance should 
explicitly prohibit literal passwords of the format {{'${xxx}'}}" as this was 
backward compatible with the existing {{PropertyDescriptor}} definitions and 
did not require additional work. Now may be a good time to re-evaluate that 
decision and perform new work for a future release. 

https://issues.apache.org/jira/browse/NIFI-2653
https://issues.apache.org/jira/browse/NIFI-3046
https://issues.apache.org/jira/browse/NIFI-3110
https://issues.apache.org/jira/browse/NIFI-3311
https://issues.apache.org/jira/browse/NIFI-3439
https://issues.apache.org/jira/browse/NIFI-4557

> my ftp password is "${password}" so nifi's LISTFtp won't use it.
> 
>
> Key: NIFI-5270
> URL: https://issues.apache.org/jira/browse/NIFI-5270
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: eric twilegar
>Priority: Major
>
> I'm joking of course, but if that was your password the processor would fail 
> as it would consider it an expression and not a password.
> In all seriousness though we really do something like "isPasswordExpression" 
> checkbox for all controllers. This would also allow nifi registry to not 
> consider them secrets and so you don't have to cut and paste ${ftp_password} 
> after deploying a version. Maybe just adding passwordExpression vs sharing 
> the property is a better idea. 
>  
> I didn't test whether you can escape the password in someway, so there is a 
> chance this isn't a bug.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508396#comment-16508396
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194430127
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508397#comment-16508397
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194424610
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
--- End diff --

@TriggerWhenEmpty is used to make the framework trigger the processor when 
there is an empty incoming connection, is that necessary in this case? 

You typically use this if there is a case where you need to perform some 
kind of check/clean-up even when no flow files are available.


> Add GetHdfsFileInfo Processor
> -
>
> Key: NIFI-4906
> URL: https://issues.apache.org/jira/browse/NIFI-4906
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Ed Berezitsky
>Assignee: Ed Berezitsky
>Priority: Major
>  Labels: patch, pull-request-available
> Attachments: NiFi-GetHDFSFileInfo.pdf, gethdfsfileinfo.patch
>
>
> Add *GetHdfsFileInfo* Processor to be able to get stats from a file system.
> This processor should support recursive scan, getting information of 
> directories and files.
> _File-level info required_: name, path, length, modified timestamp, last 
> access timestamp, owner, group, permissions.
> _Directory-level info required_: name, path, sum of lengths of files under a 
> dir, count of files under a dir, modified 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508401#comment-16508401
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194456502
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508392#comment-16508392
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194421999
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508395#comment-16508395
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194443340
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508400#comment-16508400
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194446685
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508391#comment-16508391
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194421861
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508398#comment-16508398
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194437894
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508399#comment-16508399
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194428165
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508394#comment-16508394
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194427937
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[jira] [Commented] (NIFI-4906) Add GetHdfsFileInfo Processor

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508393#comment-16508393
 ] 

ASF GitHub Bot commented on NIFI-4906:
--

Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194421928
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", 

[GitHub] nifi pull request #2639: NIFI-4906 Add GetHDFSFileInfo

2018-06-11 Thread bbende
Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194437894
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", description="The user that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.group", description="The group that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.lastModified", description="The 
timestamp 

[GitHub] nifi pull request #2639: NIFI-4906 Add GetHDFSFileInfo

2018-06-11 Thread bbende
Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194421999
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", description="The user that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.group", description="The group that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.lastModified", description="The 
timestamp 

[GitHub] nifi pull request #2639: NIFI-4906 Add GetHDFSFileInfo

2018-06-11 Thread bbende
Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194421861
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", description="The user that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.group", description="The group that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.lastModified", description="The 
timestamp 

[GitHub] nifi pull request #2639: NIFI-4906 Add GetHDFSFileInfo

2018-06-11 Thread bbende
Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194430127
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", description="The user that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.group", description="The group that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.lastModified", description="The 
timestamp 

[GitHub] nifi pull request #2639: NIFI-4906 Add GetHDFSFileInfo

2018-06-11 Thread bbende
Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194428165
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", description="The user that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.group", description="The group that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.lastModified", description="The 
timestamp 

[GitHub] nifi pull request #2639: NIFI-4906 Add GetHDFSFileInfo

2018-06-11 Thread bbende
Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2639#discussion_r194421928
  
--- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ---
@@ -0,0 +1,803 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.hadoop;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.TriggerSerially;
+import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.PropertyValue;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import 
org.apache.nifi.processors.hadoop.GetHDFSFileInfo.HDFSFileInfoRequest.Groupping;
+
+@TriggerSerially
+@TriggerWhenEmpty
+@InputRequirement(Requirement.INPUT_ALLOWED)
+@Tags({"hadoop", "HDFS", "get", "list", "ingest", "source", "filesystem"})
+@CapabilityDescription("Retrieves a listing of files and directories from 
HDFS. "
++ "This processor creates a FlowFile(s) that represents the HDFS 
file/dir with relevant information. "
++ "Main purpose of this processor to provide functionality similar 
to HDFS Client, i.e. count, du, ls, test, etc. "
++ "Unlike ListHDFS, this processor is stateless, supports incoming 
connections and provides information on a dir level. "
+)
+@WritesAttributes({
+@WritesAttribute(attribute="hdfs.objectName", description="The name of 
the file/dir found on HDFS."),
+@WritesAttribute(attribute="hdfs.path", description="The path is set 
to the absolute path of the object's parent directory on HDFS. "
++ "For example, if an object is a directory 'foo', under 
directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' 
will be '/bar'"),
+@WritesAttribute(attribute="hdfs.type", description="The type of an 
object. Possible values: directory, file, link"),
+@WritesAttribute(attribute="hdfs.owner", description="The user that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.group", description="The group that 
owns the object in HDFS"),
+@WritesAttribute(attribute="hdfs.lastModified", description="The 
timestamp 

  1   2   >