[GitHub] nifi pull request: NIFI-856 Implements experimental ListenLumberja...

2016-05-30 Thread apiri
Github user apiri commented on the pull request:

https://github.com/apache/nifi/pull/290#issuecomment-222586092
  
Hey @trixpan,

Scoped out your new changes and left a couple comments.  Changes look good, 
just had a question on whether the SSL Context should be made required and 
there is, I believe, the unused message demarcator field.  Let me know your 
thoughts on those and we can get this incorporated and closed out.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-856 Implements experimental ListenLumberja...

2016-05-30 Thread apiri
Github user apiri commented on a diff in the pull request:

https://github.com/apache/nifi/pull/290#discussion_r65120353
  
--- Diff: 
nifi-nar-bundles/nifi-lumberjack-bundle/nifi-lumberjack-processors/src/main/java/org/apache/nifi/processors/lumberjack/ListenLumberjack.java
 ---
@@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.lumberjack;
+
+import com.google.gson.Gson;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.flowfile.attributes.FlowFileAttributeKey;
+import org.apache.nifi.processor.DataUnit;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import 
org.apache.nifi.processor.util.listen.AbstractListenEventBatchingProcessor;
+import 
org.apache.nifi.processor.util.listen.dispatcher.AsyncChannelDispatcher;
+import org.apache.nifi.processor.util.listen.dispatcher.ChannelDispatcher;
+import 
org.apache.nifi.processor.util.listen.dispatcher.SocketChannelDispatcher;
+import org.apache.nifi.processor.util.listen.event.EventFactory;
+import org.apache.nifi.processor.util.listen.handler.ChannelHandlerFactory;
+import org.apache.nifi.processor.util.listen.response.ChannelResponder;
+import org.apache.nifi.processor.util.listen.response.ChannelResponse;
+import org.apache.nifi.processors.lumberjack.event.LumberjackEvent;
+import org.apache.nifi.processors.lumberjack.event.LumberjackEventFactory;
+import org.apache.nifi.processors.lumberjack.frame.LumberjackEncoder;
+import 
org.apache.nifi.processors.lumberjack.handler.LumberjackSocketChannelHandlerFactory;
+import 
org.apache.nifi.processors.lumberjack.response.LumberjackChannelResponse;
+import org.apache.nifi.processors.lumberjack.response.LumberjackResponse;
+import org.apache.nifi.ssl.SSLContextService;
+
+import javax.net.ssl.SSLContext;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Collection;
+import java.util.ArrayList;
+import java.util.concurrent.BlockingQueue;
+
+
+@InputRequirement(InputRequirement.Requirement.INPUT_FORBIDDEN)
+@Tags({"listen", "lumberjack", "tcp", "logs"})
+@CapabilityDescription("Listens for Lumberjack messages being sent to a 
given port over TCP. Each message will be " +
+"acknowledged after successfully writing the message to a 
FlowFile. Each FlowFile will contain data " +
+"portion of one or more Lumberjack frames. In the case where the 
Lumberjack frames contain syslog messages, the " +
+"output of this processor can be sent to a ParseSyslog processor 
for further processing.")
+@WritesAttributes({
+@WritesAttribute(attribute="lumberjack.sender", description="The 
sending host of the messages."),
+@WritesAttribute(attribute="lumberjack.port", description="The 
sending port the messages were received over."),
+@WritesAttribute(attribute="lumberjack.sequencenumber", 
description="The sequence number of the message. Only included if  
is 1."),
+@WritesAttribute(attribute="lumberjack.*", description="The keys 
and respective values as sent by the lumberjack producer. Only included if 
 is 1."),
+@WritesAttribute(attribute="

[GitHub] nifi pull request: NIFI-856 Implements experimental ListenLumberja...

2016-05-30 Thread apiri
Github user apiri commented on a diff in the pull request:

https://github.com/apache/nifi/pull/290#discussion_r65119862
  
--- Diff: 
nifi-nar-bundles/nifi-lumberjack-bundle/nifi-lumberjack-processors/src/main/java/org/apache/nifi/processors/lumberjack/ListenLumberjack.java
 ---
@@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.lumberjack;
+
+import com.google.gson.Gson;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.flowfile.attributes.FlowFileAttributeKey;
+import org.apache.nifi.processor.DataUnit;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import 
org.apache.nifi.processor.util.listen.AbstractListenEventBatchingProcessor;
+import 
org.apache.nifi.processor.util.listen.dispatcher.AsyncChannelDispatcher;
+import org.apache.nifi.processor.util.listen.dispatcher.ChannelDispatcher;
+import 
org.apache.nifi.processor.util.listen.dispatcher.SocketChannelDispatcher;
+import org.apache.nifi.processor.util.listen.event.EventFactory;
+import org.apache.nifi.processor.util.listen.handler.ChannelHandlerFactory;
+import org.apache.nifi.processor.util.listen.response.ChannelResponder;
+import org.apache.nifi.processor.util.listen.response.ChannelResponse;
+import org.apache.nifi.processors.lumberjack.event.LumberjackEvent;
+import org.apache.nifi.processors.lumberjack.event.LumberjackEventFactory;
+import org.apache.nifi.processors.lumberjack.frame.LumberjackEncoder;
+import 
org.apache.nifi.processors.lumberjack.handler.LumberjackSocketChannelHandlerFactory;
+import 
org.apache.nifi.processors.lumberjack.response.LumberjackChannelResponse;
+import org.apache.nifi.processors.lumberjack.response.LumberjackResponse;
+import org.apache.nifi.ssl.SSLContextService;
+
+import javax.net.ssl.SSLContext;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Collection;
+import java.util.ArrayList;
+import java.util.concurrent.BlockingQueue;
+
+
+@InputRequirement(InputRequirement.Requirement.INPUT_FORBIDDEN)
+@Tags({"listen", "lumberjack", "tcp", "logs"})
+@CapabilityDescription("Listens for Lumberjack messages being sent to a 
given port over TCP. Each message will be " +
+"acknowledged after successfully writing the message to a 
FlowFile. Each FlowFile will contain data " +
+"portion of one or more Lumberjack frames. In the case where the 
Lumberjack frames contain syslog messages, the " +
+"output of this processor can be sent to a ParseSyslog processor 
for further processing.")
+@WritesAttributes({
+@WritesAttribute(attribute="lumberjack.sender", description="The 
sending host of the messages."),
+@WritesAttribute(attribute="lumberjack.port", description="The 
sending port the messages were received over."),
+@WritesAttribute(attribute="lumberjack.sequencenumber", 
description="The sequence number of the message. Only included if  
is 1."),
+@WritesAttribute(attribute="lumberjack.*", description="The keys 
and respective values as sent by the lumberjack producer. Only included if 
 is 1."),
+@WritesAttribute(attribute="

[GitHub] nifi pull request: NIFI-1754 Rollback log messages should include ...

2016-05-30 Thread jskora
Github user jskora closed the pull request at:

https://github.com/apache/nifi/pull/430


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1754 Rollback log messages should include ...

2016-05-30 Thread jskora
Github user jskora commented on the pull request:

https://github.com/apache/nifi/pull/430#issuecomment-222580134
  
This as it was replaced by [Pull Request 
478](https://github.com/apache/nifi/pull/478).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1754 Rollback log messages should include ...

2016-05-30 Thread jskora
GitHub user jskora opened a pull request:

https://github.com/apache/nifi/pull/478

NIFI-1754 Rollback log messages should include the flowfile filename …

…and UUID to assist in flow management.  Incorporates debug logging into 
StandardProcessSession.rollback() to list Flowfile records retreived from the 
session.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jskora/nifi NIFI-1754-V3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/478.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #478


commit d1838dc410a3e15e0f85a5cc1385112b0135e861
Author: Joe Skora 
Date:   2016-05-25T19:51:02Z

NIFI-1754 Rollback log messages should include the flowfile filename and 
UUID to assist in flow management.  Incorporates debug logging into 
StandardProcessSession.rollback() to list Flowfile records retreived from the 
session.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1663: Add ConvertAvroToORC processor

2016-05-30 Thread mattyb149
GitHub user mattyb149 opened a pull request:

https://github.com/apache/nifi/pull/477

NIFI-1663: Add ConvertAvroToORC processor

There is a test template here: 
https://gist.github.com/mattyb149/3644803e8e0642346cf02b89ef62b411

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mattyb149/nifi NIFI-1663

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #477


commit 9e55151346a015b17713976637f10ee4cc90e6ce
Author: Matt Burgess 
Date:   2016-05-31T02:03:35Z

NIFI-1663: Add ConvertAvroToORC processor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1942 Processor to validate CSV against use...

2016-05-30 Thread pvillard31
GitHub user pvillard31 opened a pull request:

https://github.com/apache/nifi/pull/476

NIFI-1942 Processor to validate CSV against user-supplied schema

This processor is designed to validate a CSV formatted FlowFile against a 
user-supplied schema.

It leverages Cell Processors from super-csv library and gives the following 
options to define the expected schema:

- ParseBigDecimal
- ParseBool
- ParseChar
- ParseDate
- ParseDouble
- ParseInt
- ParseLong
- Optional
- DMinMax
- Equals
- ForbidSubStr
- LMinMax
- NotNull
- Null
- RequireHashCode
- RequireSubStr
- Strlen
- StrMinMax
- StrNotNullOrEmpty
- StrRegEx
- Unique
- UniqueHashCode

Nested cell processors are not supported except with Optional.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pvillard31/nifi validate-csv

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/476.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #476


commit 30b8ad2e3dddbad07ce39e54db78f9426aed6001
Author: Pierre Villard 
Date:   2016-05-27T08:05:16Z

NIFI-1942 Processor to validate CSV against user-supplied schema




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1908 Added encoding-version attribute to T...

2016-05-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/471


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1908 Added encoding-version attribute to T...

2016-05-30 Thread mcgilman
Github user mcgilman commented on the pull request:

https://github.com/apache/nifi/pull/471#issuecomment-222539446
  
Other changes look good! Will merge to master. 

During my testing of a 1.x template in an 0.x instance I discovered a 
separate issue that the templates from an 1.x instance are currently not 
including the child group contents. I've created a new JIRA to address this 
issue [1]. Testing with a sub-group-less 1.x template worked fine in a 0.x 
instance.

[1] https://issues.apache.org/jira/browse/NIFI-1941


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1908 Added encoding-version attribute to T...

2016-05-30 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/471#discussion_r65097776
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java
 ---
@@ -1354,55 +1349,55 @@ private void validateSnippetContents(final 
FlowSnippetDTO flow) {
 @Override
 public FlowEntity copySnippet(final String groupId, final String 
snippetId, final Double originX, final Double originY, final String 
idGenerationSeed) {
 final FlowDTO flowDto = revisionManager.get(groupId,
-rev -> {
-// create the new snippet
-final FlowSnippetDTO snippet = 
snippetDAO.copySnippet(groupId, snippetId, originX, originY, idGenerationSeed);
-
-// validate the new snippet
-validateSnippetContents(snippet);
+rev -> {
+// create the new snippet
+final FlowSnippetDTO snippet = 
snippetDAO.copySnippet(groupId, snippetId, originX, originY, idGenerationSeed);
 
-// save the flow
-controllerFacade.save();
-
-// drop the snippet
-snippetDAO.dropSnippet(snippetId);
+// validate the new snippet
+validateSnippetContents(snippet);
 
-// identify all components added
-final Set identifiers = new HashSet<>();
-snippet.getProcessors().stream()
-.map(proc -> proc.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getConnections().stream()
-.map(conn -> conn.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getInputPorts().stream()
-.map(port -> port.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getOutputPorts().stream()
-.map(port -> port.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getProcessGroups().stream()
-.map(group -> group.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.map(remoteGroup -> remoteGroup.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.flatMap(remoteGroup -> 
remoteGroup.getContents().getInputPorts().stream())
-.map(remoteInputPort -> remoteInputPort.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.flatMap(remoteGroup -> 
remoteGroup.getContents().getOutputPorts().stream())
-.map(remoteOutputPort -> remoteOutputPort.getId())
-.forEach(id -> identifiers.add(id));
+// save the flow
+controllerFacade.save();
 
-final ProcessGroup processGroup = 
processGroupDAO.getProcessGroup(groupId);
-return revisionManager.get(identifiers,
-() -> {
-final ProcessGroupStatus groupStatus = 
controllerFacade.getProcessGroupStatus(groupId);
-return dtoFactory.createFlowDto(processGroup, 
groupStatus, snippet, revisionManager);
-});
-});
+// drop the snippet
+snippetDAO.dropSnippet(snippetId);
+
+// identify all components added
+final Set identifiers = new HashSet<>();
+snippet.getProcessors().stream()
+.map(proc -> proc.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getConnections().stream()
+.map(conn -> conn.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getInputPorts().stream()
+.map(port -> port.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getOutputPorts().stream()
+.map(port -> port.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getProcessGroups().stream()
+.map(group -> group.getId())
+.forEach(id -> identifiers.add(id));
   

[GitHub] nifi pull request: NIFI-1908 Added encoding-version attribute to T...

2016-05-30 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/471#discussion_r65096361
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java
 ---
@@ -1354,55 +1349,55 @@ private void validateSnippetContents(final 
FlowSnippetDTO flow) {
 @Override
 public FlowEntity copySnippet(final String groupId, final String 
snippetId, final Double originX, final Double originY, final String 
idGenerationSeed) {
 final FlowDTO flowDto = revisionManager.get(groupId,
-rev -> {
-// create the new snippet
-final FlowSnippetDTO snippet = 
snippetDAO.copySnippet(groupId, snippetId, originX, originY, idGenerationSeed);
-
-// validate the new snippet
-validateSnippetContents(snippet);
+rev -> {
+// create the new snippet
+final FlowSnippetDTO snippet = 
snippetDAO.copySnippet(groupId, snippetId, originX, originY, idGenerationSeed);
 
-// save the flow
-controllerFacade.save();
-
-// drop the snippet
-snippetDAO.dropSnippet(snippetId);
+// validate the new snippet
+validateSnippetContents(snippet);
 
-// identify all components added
-final Set identifiers = new HashSet<>();
-snippet.getProcessors().stream()
-.map(proc -> proc.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getConnections().stream()
-.map(conn -> conn.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getInputPorts().stream()
-.map(port -> port.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getOutputPorts().stream()
-.map(port -> port.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getProcessGroups().stream()
-.map(group -> group.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.map(remoteGroup -> remoteGroup.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.flatMap(remoteGroup -> 
remoteGroup.getContents().getInputPorts().stream())
-.map(remoteInputPort -> remoteInputPort.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.flatMap(remoteGroup -> 
remoteGroup.getContents().getOutputPorts().stream())
-.map(remoteOutputPort -> remoteOutputPort.getId())
-.forEach(id -> identifiers.add(id));
+// save the flow
+controllerFacade.save();
 
-final ProcessGroup processGroup = 
processGroupDAO.getProcessGroup(groupId);
-return revisionManager.get(identifiers,
-() -> {
-final ProcessGroupStatus groupStatus = 
controllerFacade.getProcessGroupStatus(groupId);
-return dtoFactory.createFlowDto(processGroup, 
groupStatus, snippet, revisionManager);
-});
-});
+// drop the snippet
+snippetDAO.dropSnippet(snippetId);
+
+// identify all components added
+final Set identifiers = new HashSet<>();
+snippet.getProcessors().stream()
+.map(proc -> proc.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getConnections().stream()
+.map(conn -> conn.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getInputPorts().stream()
+.map(port -> port.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getOutputPorts().stream()
+.map(port -> port.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getProcessGroups().stream()
+.map(group -> group.getId())
+.forEach(id -> identifiers.add(id));
   

[GitHub] nifi pull request: NIFI-1908 Added encoding-version attribute to T...

2016-05-30 Thread jtstorck
Github user jtstorck commented on a diff in the pull request:

https://github.com/apache/nifi/pull/471#discussion_r65096222
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java
 ---
@@ -1354,55 +1349,55 @@ private void validateSnippetContents(final 
FlowSnippetDTO flow) {
 @Override
 public FlowEntity copySnippet(final String groupId, final String 
snippetId, final Double originX, final Double originY, final String 
idGenerationSeed) {
 final FlowDTO flowDto = revisionManager.get(groupId,
-rev -> {
-// create the new snippet
-final FlowSnippetDTO snippet = 
snippetDAO.copySnippet(groupId, snippetId, originX, originY, idGenerationSeed);
-
-// validate the new snippet
-validateSnippetContents(snippet);
+rev -> {
+// create the new snippet
+final FlowSnippetDTO snippet = 
snippetDAO.copySnippet(groupId, snippetId, originX, originY, idGenerationSeed);
 
-// save the flow
-controllerFacade.save();
-
-// drop the snippet
-snippetDAO.dropSnippet(snippetId);
+// validate the new snippet
+validateSnippetContents(snippet);
 
-// identify all components added
-final Set identifiers = new HashSet<>();
-snippet.getProcessors().stream()
-.map(proc -> proc.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getConnections().stream()
-.map(conn -> conn.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getInputPorts().stream()
-.map(port -> port.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getOutputPorts().stream()
-.map(port -> port.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getProcessGroups().stream()
-.map(group -> group.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.map(remoteGroup -> remoteGroup.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.flatMap(remoteGroup -> 
remoteGroup.getContents().getInputPorts().stream())
-.map(remoteInputPort -> remoteInputPort.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.flatMap(remoteGroup -> 
remoteGroup.getContents().getOutputPorts().stream())
-.map(remoteOutputPort -> remoteOutputPort.getId())
-.forEach(id -> identifiers.add(id));
+// save the flow
+controllerFacade.save();
 
-final ProcessGroup processGroup = 
processGroupDAO.getProcessGroup(groupId);
-return revisionManager.get(identifiers,
-() -> {
-final ProcessGroupStatus groupStatus = 
controllerFacade.getProcessGroupStatus(groupId);
-return dtoFactory.createFlowDto(processGroup, 
groupStatus, snippet, revisionManager);
-});
-});
+// drop the snippet
+snippetDAO.dropSnippet(snippetId);
+
+// identify all components added
+final Set identifiers = new HashSet<>();
+snippet.getProcessors().stream()
+.map(proc -> proc.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getConnections().stream()
+.map(conn -> conn.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getInputPorts().stream()
+.map(port -> port.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getOutputPorts().stream()
+.map(port -> port.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getProcessGroups().stream()
+.map(group -> group.getId())
+.forEach(id -> identifiers.add(id));
   

[GitHub] nifi pull request: NIFI-1908 Added encoding-version attribute to T...

2016-05-30 Thread mcgilman
Github user mcgilman commented on a diff in the pull request:

https://github.com/apache/nifi/pull/471#discussion_r65095238
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java
 ---
@@ -1354,55 +1349,55 @@ private void validateSnippetContents(final 
FlowSnippetDTO flow) {
 @Override
 public FlowEntity copySnippet(final String groupId, final String 
snippetId, final Double originX, final Double originY, final String 
idGenerationSeed) {
 final FlowDTO flowDto = revisionManager.get(groupId,
-rev -> {
-// create the new snippet
-final FlowSnippetDTO snippet = 
snippetDAO.copySnippet(groupId, snippetId, originX, originY, idGenerationSeed);
-
-// validate the new snippet
-validateSnippetContents(snippet);
+rev -> {
+// create the new snippet
+final FlowSnippetDTO snippet = 
snippetDAO.copySnippet(groupId, snippetId, originX, originY, idGenerationSeed);
 
-// save the flow
-controllerFacade.save();
-
-// drop the snippet
-snippetDAO.dropSnippet(snippetId);
+// validate the new snippet
+validateSnippetContents(snippet);
 
-// identify all components added
-final Set identifiers = new HashSet<>();
-snippet.getProcessors().stream()
-.map(proc -> proc.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getConnections().stream()
-.map(conn -> conn.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getInputPorts().stream()
-.map(port -> port.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getOutputPorts().stream()
-.map(port -> port.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getProcessGroups().stream()
-.map(group -> group.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.map(remoteGroup -> remoteGroup.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.flatMap(remoteGroup -> 
remoteGroup.getContents().getInputPorts().stream())
-.map(remoteInputPort -> remoteInputPort.getId())
-.forEach(id -> identifiers.add(id));
-snippet.getRemoteProcessGroups().stream()
-.flatMap(remoteGroup -> 
remoteGroup.getContents().getOutputPorts().stream())
-.map(remoteOutputPort -> remoteOutputPort.getId())
-.forEach(id -> identifiers.add(id));
+// save the flow
+controllerFacade.save();
 
-final ProcessGroup processGroup = 
processGroupDAO.getProcessGroup(groupId);
-return revisionManager.get(identifiers,
-() -> {
-final ProcessGroupStatus groupStatus = 
controllerFacade.getProcessGroupStatus(groupId);
-return dtoFactory.createFlowDto(processGroup, 
groupStatus, snippet, revisionManager);
-});
-});
+// drop the snippet
+snippetDAO.dropSnippet(snippetId);
+
+// identify all components added
+final Set identifiers = new HashSet<>();
+snippet.getProcessors().stream()
+.map(proc -> proc.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getConnections().stream()
+.map(conn -> conn.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getInputPorts().stream()
+.map(port -> port.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getOutputPorts().stream()
+.map(port -> port.getId())
+.forEach(id -> identifiers.add(id));
+snippet.getProcessGroups().stream()
+.map(group -> group.getId())
+.forEach(id -> identifiers.add(id));
   

[GitHub] nifi pull request: NIFI-1850 - Initial Commit for JSON-to-JSON Sch...

2016-05-30 Thread mcgilman
Github user mcgilman commented on the pull request:

https://github.com/apache/nifi/pull/424#issuecomment-222531466
  
@YolandaMDavis I'm having some issues saving the JSON specification in the 
Custom UI while running in a clustered instance. When I click save, the value 
for the specification is reset to the previous value and I do not get any 
message indicating an issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Use VFS with (Put|List|.*)HDFS

2016-05-30 Thread Andre
Bryan, Matt

Thanks for the input, much appreciated.

Perhaps then the simplest option may be giving the user the option to
compile against one particular vendor, while ensuring the project build
points to the Apache licensed code.


This way the user is given the following choices:

User primary platform is Open-Source/HDP/CDH: Cluster can be accessed via
native HDFS or WebHdfs can be used to access the 3rd party
implementations[*];

User primary platform is MapR: Build against MapR JARs, gives access using
MapR-FS and hopefully HDFS still works (depending on protocol version
supported by the MapR JARs). If nothing works, then WebHdfs should come to
the rescue[*]

User primary platform is something else: WebHdfs or if possible, add
another opt-in profile to the pom.

I am not sure if licensing would prevent the profile from being included
but I submitted an initial suggestion of what the profile may look like
anyhow.

What do you think?


[*] WebHdfs functionality looks promising (see NIFI-1924) but is yet to be
confirmed.

On Tue, May 31, 2016 at 2:09 AM, Bryan Rosander 
wrote:

> Hey all,
>
> The pentaho-hdfs-vfs artifact is in the process of being deprecated and
> superceded by vfs providers in their big data plugin.
>
> They wouldn't be sufficient for outside use either way as they depend on
> the big data plugin, Kettle, and the correct shim (see Matt's comment) to
> work.
>
> A more generic provider with some way to swap dependencies and
> implementations could be a way forward.
>
> This could be a lot simpler than the above if it only cared about
> providing vfs access to hdfs/maprfs.
>
> Thanks,
> Bryan
>
> On May 30, 2016 10:56 AM, Matt Burgess  wrote:
> As a former Pentaho employee, I can add some details around this:
>
> - Pentaho does/did have a fork of Apache VFS. Mostly it was the
> application of bugs in the 2.x line against the 1.x codebase. Since
> then they have contributed fixes from their branch back to the 2.x
> fork, and when I left they were hoping for an upcoming release of
> Apache VFS to include their fixes (so they didn't have to depend on a
> local or SNAPSHOT build). IIRC the Apache VFS project was not under
> (very) active development at the time. Perhaps with fixes for Accumulo
> this has changed.
>
> - Pentaho's HDFS and MapRFS providers are not part of the VFS
> fork/project, nor do I (or did we) believe they should be. The
> dependencies for these providers are pretty big, and specific to the
> Hadoop ecosystem. Having them as separate drop-in modules is a better
> idea IMO. Also, the Hadoop JARs are "provided" dependencies of these
> modules (so Hadoop is not included), this is so the same VFS provider
> can (hopefully) be used against different versions of Hadoop. I guess
> in that sense they could be added to Apache VFS proper, but then it's
> on the VFS project to handle the other issues (see below).
>
> - Including the HDFS or MapRFS provider along with Apache VFS is not
> enough to make it work. Pentaho had a classloading configuration where
> they would bundle all the necessary code, JARs, configs, etc. into
> what they call a "shim". They have shims for multiple versions of
> multiple distributions of Hadoop (HDP, CDH, MapR, etc.).  To support
> this in NiFi, we would need some mechanism to get the right JARs from
> the right places.  If NiFi is on a cluster node, then the correct
> Hadoop libraries are available already. In any case, it probably would
> only work as a "Bring Your Own Hadoop". Currently NiFi includes a
> vanilla version of Hadoop such that it can exist outside the Hadoop
> cluster and still use client libraries/protocols to move data into
> Hadoop. With the advent of the Extension Registry, the Bring Your Own
> Hadoop solution becomes more viable.
>
> - Even with a Bring Your Own Hadoop solution, we'd need to isolate
> that functionality behind a service for the purpose of allowing
> multiple Hadoop vendors, versions, etc.  That would allow such things
> as a migration from one vendor's Hadoop to another. The alternatives
> include WebHDFS and HttpFS as you mentioned (if they are enabled on
> the system), and possibly Knox and/or Falcon depending on your use
> case. As of now, Pentaho still only supports one Vendor/Version of
> Hadoop at a time, though this has been logged for improvement
> (http://jira.pentaho.com/browse/PDI-8121).
>
> - MapR has a somewhat different architecture than many other vendors.
> For example, they have a native client the user must install on each
> node intending to communicate with Hadoop. The JARs are installed in a
> specific location and thus NiFi would need a way for the user to point
> to that location. This is hopefully just a configuration issue (i.e.
> where to point the Bring Your Own Hadoop solution), but I believe they
> also have a different security model, so I imagine there will be some
> MapR-specific processing to be done around Kerberos, e.g.
>
> There is a NiFi Jira that covers multiple Hadoop versio

[GitHub] nifi pull request: - Add Maven profile to compile nifi-hadoop-libr...

2016-05-30 Thread trixpan
GitHub user trixpan opened a pull request:

https://github.com/apache/nifi/pull/475

- Add Maven profile to compile nifi-hadoop-libraries-nar using MapR jars

  * Create a Maven profile to allow users to chose to use MapR 
Hadoop-compatible JARs
when building NiFi.

This should cause no changes to default behaviour, just eliminating the 
need to clone,
modify, build and copy NAR bundles over standard NiFi artifacts.

Unless the profile is explicitly requested, build will still includes 
just the Apache licensed artifacts.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/trixpan/nifi MapR-Profile

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #475


commit e8663e2c094a32cb80eaaff604c91bef5af28450
Author: Andre F de Miranda 
Date:   2016-05-30T13:59:50Z

* Create a Maven profile to allow users to chose to
  use MapR Hadoop-compatible JARs when building NiFi




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Use VFS with (Put|List|.*)HDFS

2016-05-30 Thread Bryan Rosander
Hey all,

The pentaho-hdfs-vfs artifact is in the process of being deprecated and 
superceded by vfs providers in their big data plugin.

They wouldn't be sufficient for outside use either way as they depend on the 
big data plugin, Kettle, and the correct shim (see Matt's comment) to work.

A more generic provider with some way to swap dependencies and implementations 
could be a way forward.

This could be a lot simpler than the above if it only cared about providing vfs 
access to hdfs/maprfs.

Thanks,
Bryan

On May 30, 2016 10:56 AM, Matt Burgess  wrote:
As a former Pentaho employee, I can add some details around this:

- Pentaho does/did have a fork of Apache VFS. Mostly it was the
application of bugs in the 2.x line against the 1.x codebase. Since
then they have contributed fixes from their branch back to the 2.x
fork, and when I left they were hoping for an upcoming release of
Apache VFS to include their fixes (so they didn't have to depend on a
local or SNAPSHOT build). IIRC the Apache VFS project was not under
(very) active development at the time. Perhaps with fixes for Accumulo
this has changed.

- Pentaho's HDFS and MapRFS providers are not part of the VFS
fork/project, nor do I (or did we) believe they should be. The
dependencies for these providers are pretty big, and specific to the
Hadoop ecosystem. Having them as separate drop-in modules is a better
idea IMO. Also, the Hadoop JARs are "provided" dependencies of these
modules (so Hadoop is not included), this is so the same VFS provider
can (hopefully) be used against different versions of Hadoop. I guess
in that sense they could be added to Apache VFS proper, but then it's
on the VFS project to handle the other issues (see below).

- Including the HDFS or MapRFS provider along with Apache VFS is not
enough to make it work. Pentaho had a classloading configuration where
they would bundle all the necessary code, JARs, configs, etc. into
what they call a "shim". They have shims for multiple versions of
multiple distributions of Hadoop (HDP, CDH, MapR, etc.).  To support
this in NiFi, we would need some mechanism to get the right JARs from
the right places.  If NiFi is on a cluster node, then the correct
Hadoop libraries are available already. In any case, it probably would
only work as a "Bring Your Own Hadoop". Currently NiFi includes a
vanilla version of Hadoop such that it can exist outside the Hadoop
cluster and still use client libraries/protocols to move data into
Hadoop. With the advent of the Extension Registry, the Bring Your Own
Hadoop solution becomes more viable.

- Even with a Bring Your Own Hadoop solution, we'd need to isolate
that functionality behind a service for the purpose of allowing
multiple Hadoop vendors, versions, etc.  That would allow such things
as a migration from one vendor's Hadoop to another. The alternatives
include WebHDFS and HttpFS as you mentioned (if they are enabled on
the system), and possibly Knox and/or Falcon depending on your use
case. As of now, Pentaho still only supports one Vendor/Version of
Hadoop at a time, though this has been logged for improvement
(http://jira.pentaho.com/browse/PDI-8121).

- MapR has a somewhat different architecture than many other vendors.
For example, they have a native client the user must install on each
node intending to communicate with Hadoop. The JARs are installed in a
specific location and thus NiFi would need a way for the user to point
to that location. This is hopefully just a configuration issue (i.e.
where to point the Bring Your Own Hadoop solution), but I believe they
also have a different security model, so I imagine there will be some
MapR-specific processing to be done around Kerberos, e.g.

There is a NiFi Jira that covers multiple Hadoop versions
(https://issues.apache.org/jira/browse/NIFI-710), hopefully good
discussion like this will inform that case (and end up in the
comments).

Regards,
Matt

P.S. Mr. Rosander :) If you're reading this, I'd love to get your
comments on this thread as well, thanks!

On Sun, May 29, 2016 at 10:41 AM, Jim Hughes  wrote:
> Hi Andre,
>
> Your plan seems reasonable to me.  The shortest path to verifying it might
> be to drop in the pentaho-hdfs-vfs artifacts and remove any conflicting VFS
> providers (or just their config (1)) from the Apache VFS jars.
>
> Some of the recent effort in VFS has been to address bugs which were
> relevant to Apache Accumulo.  From hearing about that, it sounds like VFS
> may have a little smaller team.
>
> That said, it might be worth asking the Pentaho folks 1) if they could
> contribute their project to VFS and 2) how they leverage it. They might have
> some guidance about how to use their project as replacement for the HDFS
> parts of VFS.
>
> Good luck!
>
> Jim
>
> 1.  I'd peek at files like this:
> https://github.com/pentaho/pentaho-hdfs-vfs/blob/master/res/META-INF/vfs-providers.xml.
>
> On 5/29/2016 8:10 AM, Andre wrote:
>>
>> All,
>>
>> Not sure how many other MapR users are effec

Re: Use VFS with (Put|List|.*)HDFS

2016-05-30 Thread Matt Burgess
As a former Pentaho employee, I can add some details around this:

- Pentaho does/did have a fork of Apache VFS. Mostly it was the
application of bugs in the 2.x line against the 1.x codebase. Since
then they have contributed fixes from their branch back to the 2.x
fork, and when I left they were hoping for an upcoming release of
Apache VFS to include their fixes (so they didn't have to depend on a
local or SNAPSHOT build). IIRC the Apache VFS project was not under
(very) active development at the time. Perhaps with fixes for Accumulo
this has changed.

- Pentaho's HDFS and MapRFS providers are not part of the VFS
fork/project, nor do I (or did we) believe they should be. The
dependencies for these providers are pretty big, and specific to the
Hadoop ecosystem. Having them as separate drop-in modules is a better
idea IMO. Also, the Hadoop JARs are "provided" dependencies of these
modules (so Hadoop is not included), this is so the same VFS provider
can (hopefully) be used against different versions of Hadoop. I guess
in that sense they could be added to Apache VFS proper, but then it's
on the VFS project to handle the other issues (see below).

- Including the HDFS or MapRFS provider along with Apache VFS is not
enough to make it work. Pentaho had a classloading configuration where
they would bundle all the necessary code, JARs, configs, etc. into
what they call a "shim". They have shims for multiple versions of
multiple distributions of Hadoop (HDP, CDH, MapR, etc.).  To support
this in NiFi, we would need some mechanism to get the right JARs from
the right places.  If NiFi is on a cluster node, then the correct
Hadoop libraries are available already. In any case, it probably would
only work as a "Bring Your Own Hadoop". Currently NiFi includes a
vanilla version of Hadoop such that it can exist outside the Hadoop
cluster and still use client libraries/protocols to move data into
Hadoop. With the advent of the Extension Registry, the Bring Your Own
Hadoop solution becomes more viable.

- Even with a Bring Your Own Hadoop solution, we'd need to isolate
that functionality behind a service for the purpose of allowing
multiple Hadoop vendors, versions, etc.  That would allow such things
as a migration from one vendor's Hadoop to another. The alternatives
include WebHDFS and HttpFS as you mentioned (if they are enabled on
the system), and possibly Knox and/or Falcon depending on your use
case. As of now, Pentaho still only supports one Vendor/Version of
Hadoop at a time, though this has been logged for improvement
(http://jira.pentaho.com/browse/PDI-8121).

- MapR has a somewhat different architecture than many other vendors.
For example, they have a native client the user must install on each
node intending to communicate with Hadoop. The JARs are installed in a
specific location and thus NiFi would need a way for the user to point
to that location. This is hopefully just a configuration issue (i.e.
where to point the Bring Your Own Hadoop solution), but I believe they
also have a different security model, so I imagine there will be some
MapR-specific processing to be done around Kerberos, e.g.

There is a NiFi Jira that covers multiple Hadoop versions
(https://issues.apache.org/jira/browse/NIFI-710), hopefully good
discussion like this will inform that case (and end up in the
comments).

Regards,
Matt

P.S. Mr. Rosander :) If you're reading this, I'd love to get your
comments on this thread as well, thanks!

On Sun, May 29, 2016 at 10:41 AM, Jim Hughes  wrote:
> Hi Andre,
>
> Your plan seems reasonable to me.  The shortest path to verifying it might
> be to drop in the pentaho-hdfs-vfs artifacts and remove any conflicting VFS
> providers (or just their config (1)) from the Apache VFS jars.
>
> Some of the recent effort in VFS has been to address bugs which were
> relevant to Apache Accumulo.  From hearing about that, it sounds like VFS
> may have a little smaller team.
>
> That said, it might be worth asking the Pentaho folks 1) if they could
> contribute their project to VFS and 2) how they leverage it. They might have
> some guidance about how to use their project as replacement for the HDFS
> parts of VFS.
>
> Good luck!
>
> Jim
>
> 1.  I'd peek at files like this:
> https://github.com/pentaho/pentaho-hdfs-vfs/blob/master/res/META-INF/vfs-providers.xml.
>
> On 5/29/2016 8:10 AM, Andre wrote:
>>
>> All,
>>
>> Not sure how many other MapR users are effectively using NiFi (I only know
>> two others) but as you may remember from old threads that integrating some
>> different flavours of HDFS compatible APIs can sometimes be puzzling and
>> require recompilation of bundles.
>>
>> However, recompilation doesn't solve scenarios where for whatever reason,
>> a
>> user may want to use more than one HDFS provider (e.g. MapR + HDP, or
>> Isilon + MapR) and HDFS version are distinct (e.g.
>>
>> While WebHDFS and HttpFs are good palliative solutions to some of this
>> issue, they have their own limitations, the more st