[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-09-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496607#comment-15496607
 ] 

ASF subversion and git services commented on NIFI-1942:
---

Commit d838f61291d2582592754a37314911b701c6891b in nifi's branch 
refs/heads/master from [~pvillard]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=d838f61 ]

NIFI-1942 Processor to validate CSV against user-supplied schema

This closes #476

Signed-off-by: jpercivall 


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-09-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496612#comment-15496612
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/476


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-09-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496603#comment-15496603
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/476
  
+1

Visually verified code and did a contrib check build. In a standalone 
instance ran multiple different schemas through ValidateCSV testing each of the 
properties, all worked as expected. Thanks @pvillard31, I will merge it in.


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-09-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494469#comment-15494469
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/476
  
Thanks @JPercivall !
I did the modification in the ``OnSchedule`` method and it is now working 
as expected. I also took the liberty to squash my commits. Let me know if there 
is something else.


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-09-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493968#comment-15493968
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/476
  
@pvillard31, I know what the problem with the end-line characters is. When 
going from the UI to Java, the characters are escaped so that what you input is 
transferred over to Java as is. So when you type the characters "\" and "\n" 
into the UI the Java string will end up being those two characters *not* the 
interpreted value "\n".

There's been some discussion about it before and how we need to make some 
change but it hasn't been a top priority. For now what is done, is something 
like is done here[1]. Where the default value is escaped and then in the 
OnScheduled[2] or as a separate method[3] it is interpreted. 

[1] 
https://github.com/apache/nifi/blob/1373bf672586ba5ddcfa697c45c832ccc79425cb/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/AbstractListenEventBatchingProcessor.java#L61-L61
[2] 
https://github.com/apache/nifi/blob/1373bf672586ba5ddcfa697c45c832ccc79425cb/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/AbstractListenEventBatchingProcessor.java#L97-L97
[3] 
https://github.com/apache/nifi/blob/cd846c8d627efb2606f72b6af009358dec27be63/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/put/AbstractPutEventProcessor.java#L566-L566


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414387#comment-15414387
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/476
  
Hey @JPercivall, I think I've addressed most of your comments:
- changed log level
- fixed decrementation (was not related to header but rather to 'finally' 
in 'while' loop)
- strategy naming and description
- fixed first line handling and added a unit test for that and to confirm 
Unique() behavior when validating line by line
- changed exception catching

It remains what you observed when displaying content of flow files after 
being processed by the processor. I've reproduced your observation but I didn't 
find any explanation. By any chance, do you know if there are some specific 
encoding related to the UI display? (I remember some discussions regarding how 
is processed the carriage return (shift + enter instead of \n) when used in a 
property in some processors) If you haves ideas, let me know!

Thanks.


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-08-03 Thread Joseph Percivall (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405986#comment-15405986
 ] 

Joseph Percivall commented on NIFI-1942:


Hey [~pvillard], it seems work has stagnated a bit. I'm gonna remove the fix 
version of "1.0.0". When this gets merged in we can add the proper Fix Version. 
Let me know if you have any questions.

> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383284#comment-15383284
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/476
  
@pvillard31, sorry for not realizing this earlier but a great improvement 
that could be made is to route like RouteText does. Where it can either route 
the whole FlowFile when it fails or route each line to the respective 
valid/invalid destination. Where invalid rows would have an attribute detailing 
what went wrong.

I believe all that would need to be done is to create flowfiles with each 
reading of the row and if it fails add it to the invalid list (along with 
adding an attribute to it).


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Fix For: 1.0.0
>
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383255#comment-15383255
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r71247145
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -408,9 +407,12 @@ public void process(final InputStream in) throws 
IOException {
 listReader.read();
 }
 while(listReader.read(cellProcs) != null) {}
-} catch (final IOException | 
SuperCsvCellProcessorException e) {
+} catch (final IOException e) {
 valid.set(false);
 logger.error("Failed to validate {} against schema due 
to {}", new Object[]{flowFile}, e);
+} catch (final SuperCsvCellProcessorException e) {
+valid.set(false);
+logger.info("Failed to validate {} against schema due 
to {}; routing to 'invalid'", new Object[]{flowFile}, e);
--- End diff --

Actually some form of this should be made into an attribute to be added to 
invalid FlowFiles. That way users can know what is actually invalid. 


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Fix For: 1.0.0
>
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383208#comment-15383208
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r71243836
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -408,9 +407,12 @@ public void process(final InputStream in) throws 
IOException {
 listReader.read();
 }
 while(listReader.read(cellProcs) != null) {}
-} catch (final IOException | 
SuperCsvCellProcessorException e) {
+} catch (final IOException e) {
 valid.set(false);
 logger.error("Failed to validate {} against schema due 
to {}", new Object[]{flowFile}, e);
+} catch (final SuperCsvCellProcessorException e) {
+valid.set(false);
+logger.info("Failed to validate {} against schema due 
to {}; routing to 'invalid'", new Object[]{flowFile}, e);
--- End diff --

This should be debug, not info, this will occur as a part of normal 
operation and is only really relevant for debugging.


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Fix For: 1.0.0
>
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381508#comment-15381508
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/476
  
I'm trying to use a couple of the processors listed in the accompanying doc 
you link to[1] but there seem to be some processors that aren't available 
(notably "IsIncludedIn"). I was trying to make sure a column had a value in a 
set of strings ("male" or "female"). 

Is there a reason for not including all the processors available?

[1] http://super-csv.github.io/super-csv/cell_processors.html
[2] 
http://super-csv.github.io/super-csv/apidocs/org/supercsv/cellprocessor/constraint/IsIncludedIn.html


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Fix For: 1.0.0
>
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381503#comment-15381503
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r71088596
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -0,0 +1,433 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.standard;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.supercsv.cellprocessor.Optional;
+import org.supercsv.cellprocessor.ParseBigDecimal;
+import org.supercsv.cellprocessor.ParseBool;
+import org.supercsv.cellprocessor.ParseChar;
+import org.supercsv.cellprocessor.ParseDate;
+import org.supercsv.cellprocessor.ParseDouble;
+import org.supercsv.cellprocessor.ParseInt;
+import org.supercsv.cellprocessor.ParseLong;
+import org.supercsv.cellprocessor.constraint.DMinMax;
+import org.supercsv.cellprocessor.constraint.Equals;
+import org.supercsv.cellprocessor.constraint.ForbidSubStr;
+import org.supercsv.cellprocessor.constraint.LMinMax;
+import org.supercsv.cellprocessor.constraint.NotNull;
+import org.supercsv.cellprocessor.constraint.RequireHashCode;
+import org.supercsv.cellprocessor.constraint.RequireSubStr;
+import org.supercsv.cellprocessor.constraint.StrMinMax;
+import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty;
+import org.supercsv.cellprocessor.constraint.StrRegEx;
+import org.supercsv.cellprocessor.constraint.Strlen;
+import org.supercsv.cellprocessor.constraint.Unique;
+import org.supercsv.cellprocessor.constraint.UniqueHashCode;
+import org.supercsv.cellprocessor.ift.CellProcessor;
+import org.supercsv.exception.SuperCsvCellProcessorException;
+import org.supercsv.io.CsvListReader;
+import org.supercsv.prefs.CsvPreference;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"csv", "schema", "validation"})
+@CapabilityDescription("Validates the contents of FlowFiles against a 
user-specified CSV schema. " +
+"Take a look at the additional documentation of this processor for 
some schema examples.")
+public class ValidateCsv extends AbstractProcessor {
+
+private final static List allowedOperators = 

[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381482#comment-15381482
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r71088067
  
--- Diff: nifi-assembly/NOTICE ---
@@ -193,6 +193,8 @@ The following binary components are provided under the 
Apache Software License v
 
   (ASLv2) opencsv (net.sf.opencsv:opencsv:2.3)
 
+  (ASLv2) Super CSV (net.sf.supercsv:super-csv:2.4.0)
--- End diff --

This is a Apache 2.0 licensed import with not NOTICE, it can be used with 
adding anything to the NOTICE or LICENSE file. This applies to this assembly 
NOTICE and the nar NOTICE.


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Fix For: 1.0.0
>
> Attachments: ValidateCSV.xml
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-16 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380828#comment-15380828
 ] 

Joseph Witt commented on NIFI-1942:
---

[~JPercivall] [~pvillard] since this a new extension please consider removing 
this from 1.0 and assigning the fix version once it is ready.

> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Fix For: 1.0.0
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376183#comment-15376183
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/476
  
@pvillard31 I left a couple comments. Do you by chance have a template 
and/or example csv data I can use to validate the processor?


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Fix For: 1.0.0
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375884#comment-15375884
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r70719085
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.standard;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.supercsv.cellprocessor.Optional;
+import org.supercsv.cellprocessor.ParseBigDecimal;
+import org.supercsv.cellprocessor.ParseBool;
+import org.supercsv.cellprocessor.ParseChar;
+import org.supercsv.cellprocessor.ParseDate;
+import org.supercsv.cellprocessor.ParseDouble;
+import org.supercsv.cellprocessor.ParseInt;
+import org.supercsv.cellprocessor.ParseLong;
+import org.supercsv.cellprocessor.constraint.DMinMax;
+import org.supercsv.cellprocessor.constraint.Equals;
+import org.supercsv.cellprocessor.constraint.ForbidSubStr;
+import org.supercsv.cellprocessor.constraint.LMinMax;
+import org.supercsv.cellprocessor.constraint.NotNull;
+import org.supercsv.cellprocessor.constraint.RequireHashCode;
+import org.supercsv.cellprocessor.constraint.RequireSubStr;
+import org.supercsv.cellprocessor.constraint.StrMinMax;
+import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty;
+import org.supercsv.cellprocessor.constraint.StrRegEx;
+import org.supercsv.cellprocessor.constraint.Strlen;
+import org.supercsv.cellprocessor.constraint.Unique;
+import org.supercsv.cellprocessor.constraint.UniqueHashCode;
+import org.supercsv.cellprocessor.ift.CellProcessor;
+import org.supercsv.exception.SuperCsvCellProcessorException;
+import org.supercsv.io.CsvListReader;
+import org.supercsv.prefs.CsvPreference;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"csv", "schema", "validation"})
+@CapabilityDescription("Validates the contents of FlowFiles against a 
user-specified CSV schema")
+public class ValidateCsv extends AbstractProcessor {
+
+private final static List allowedOperators = 
Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate",
+"ParseDouble", 

[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375880#comment-15375880
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r70718795
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.standard;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.supercsv.cellprocessor.Optional;
+import org.supercsv.cellprocessor.ParseBigDecimal;
+import org.supercsv.cellprocessor.ParseBool;
+import org.supercsv.cellprocessor.ParseChar;
+import org.supercsv.cellprocessor.ParseDate;
+import org.supercsv.cellprocessor.ParseDouble;
+import org.supercsv.cellprocessor.ParseInt;
+import org.supercsv.cellprocessor.ParseLong;
+import org.supercsv.cellprocessor.constraint.DMinMax;
+import org.supercsv.cellprocessor.constraint.Equals;
+import org.supercsv.cellprocessor.constraint.ForbidSubStr;
+import org.supercsv.cellprocessor.constraint.LMinMax;
+import org.supercsv.cellprocessor.constraint.NotNull;
+import org.supercsv.cellprocessor.constraint.RequireHashCode;
+import org.supercsv.cellprocessor.constraint.RequireSubStr;
+import org.supercsv.cellprocessor.constraint.StrMinMax;
+import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty;
+import org.supercsv.cellprocessor.constraint.StrRegEx;
+import org.supercsv.cellprocessor.constraint.Strlen;
+import org.supercsv.cellprocessor.constraint.Unique;
+import org.supercsv.cellprocessor.constraint.UniqueHashCode;
+import org.supercsv.cellprocessor.ift.CellProcessor;
+import org.supercsv.exception.SuperCsvCellProcessorException;
+import org.supercsv.io.CsvListReader;
+import org.supercsv.prefs.CsvPreference;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"csv", "schema", "validation"})
+@CapabilityDescription("Validates the contents of FlowFiles against a 
user-specified CSV schema")
+public class ValidateCsv extends AbstractProcessor {
+
+private final static List allowedOperators = 
Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate",
+"ParseDouble", 

[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375873#comment-15375873
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r70718276
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.standard;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.supercsv.cellprocessor.Optional;
+import org.supercsv.cellprocessor.ParseBigDecimal;
+import org.supercsv.cellprocessor.ParseBool;
+import org.supercsv.cellprocessor.ParseChar;
+import org.supercsv.cellprocessor.ParseDate;
+import org.supercsv.cellprocessor.ParseDouble;
+import org.supercsv.cellprocessor.ParseInt;
+import org.supercsv.cellprocessor.ParseLong;
+import org.supercsv.cellprocessor.constraint.DMinMax;
+import org.supercsv.cellprocessor.constraint.Equals;
+import org.supercsv.cellprocessor.constraint.ForbidSubStr;
+import org.supercsv.cellprocessor.constraint.LMinMax;
+import org.supercsv.cellprocessor.constraint.NotNull;
+import org.supercsv.cellprocessor.constraint.RequireHashCode;
+import org.supercsv.cellprocessor.constraint.RequireSubStr;
+import org.supercsv.cellprocessor.constraint.StrMinMax;
+import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty;
+import org.supercsv.cellprocessor.constraint.StrRegEx;
+import org.supercsv.cellprocessor.constraint.Strlen;
+import org.supercsv.cellprocessor.constraint.Unique;
+import org.supercsv.cellprocessor.constraint.UniqueHashCode;
+import org.supercsv.cellprocessor.ift.CellProcessor;
+import org.supercsv.exception.SuperCsvCellProcessorException;
+import org.supercsv.io.CsvListReader;
+import org.supercsv.prefs.CsvPreference;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"csv", "schema", "validation"})
+@CapabilityDescription("Validates the contents of FlowFiles against a 
user-specified CSV schema")
+public class ValidateCsv extends AbstractProcessor {
+
+private final static List allowedOperators = 
Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate",
+"ParseDouble", 

[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375857#comment-15375857
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r70716978
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.standard;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.supercsv.cellprocessor.Optional;
+import org.supercsv.cellprocessor.ParseBigDecimal;
+import org.supercsv.cellprocessor.ParseBool;
+import org.supercsv.cellprocessor.ParseChar;
+import org.supercsv.cellprocessor.ParseDate;
+import org.supercsv.cellprocessor.ParseDouble;
+import org.supercsv.cellprocessor.ParseInt;
+import org.supercsv.cellprocessor.ParseLong;
+import org.supercsv.cellprocessor.constraint.DMinMax;
+import org.supercsv.cellprocessor.constraint.Equals;
+import org.supercsv.cellprocessor.constraint.ForbidSubStr;
+import org.supercsv.cellprocessor.constraint.LMinMax;
+import org.supercsv.cellprocessor.constraint.NotNull;
+import org.supercsv.cellprocessor.constraint.RequireHashCode;
+import org.supercsv.cellprocessor.constraint.RequireSubStr;
+import org.supercsv.cellprocessor.constraint.StrMinMax;
+import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty;
+import org.supercsv.cellprocessor.constraint.StrRegEx;
+import org.supercsv.cellprocessor.constraint.Strlen;
+import org.supercsv.cellprocessor.constraint.Unique;
+import org.supercsv.cellprocessor.constraint.UniqueHashCode;
+import org.supercsv.cellprocessor.ift.CellProcessor;
+import org.supercsv.exception.SuperCsvCellProcessorException;
+import org.supercsv.io.CsvListReader;
+import org.supercsv.prefs.CsvPreference;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"csv", "schema", "validation"})
+@CapabilityDescription("Validates the contents of FlowFiles against a 
user-specified CSV schema")
+public class ValidateCsv extends AbstractProcessor {
+
+private final static List allowedOperators = 
Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate",
+"ParseDouble", 

[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375849#comment-15375849
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r70716478
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.standard;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.supercsv.cellprocessor.Optional;
+import org.supercsv.cellprocessor.ParseBigDecimal;
+import org.supercsv.cellprocessor.ParseBool;
+import org.supercsv.cellprocessor.ParseChar;
+import org.supercsv.cellprocessor.ParseDate;
+import org.supercsv.cellprocessor.ParseDouble;
+import org.supercsv.cellprocessor.ParseInt;
+import org.supercsv.cellprocessor.ParseLong;
+import org.supercsv.cellprocessor.constraint.DMinMax;
+import org.supercsv.cellprocessor.constraint.Equals;
+import org.supercsv.cellprocessor.constraint.ForbidSubStr;
+import org.supercsv.cellprocessor.constraint.LMinMax;
+import org.supercsv.cellprocessor.constraint.NotNull;
+import org.supercsv.cellprocessor.constraint.RequireHashCode;
+import org.supercsv.cellprocessor.constraint.RequireSubStr;
+import org.supercsv.cellprocessor.constraint.StrMinMax;
+import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty;
+import org.supercsv.cellprocessor.constraint.StrRegEx;
+import org.supercsv.cellprocessor.constraint.Strlen;
+import org.supercsv.cellprocessor.constraint.Unique;
+import org.supercsv.cellprocessor.constraint.UniqueHashCode;
+import org.supercsv.cellprocessor.ift.CellProcessor;
+import org.supercsv.exception.SuperCsvCellProcessorException;
+import org.supercsv.io.CsvListReader;
+import org.supercsv.prefs.CsvPreference;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"csv", "schema", "validation"})
+@CapabilityDescription("Validates the contents of FlowFiles against a 
user-specified CSV schema")
+public class ValidateCsv extends AbstractProcessor {
+
+private final static List allowedOperators = 
Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate",
+"ParseDouble", 

[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375832#comment-15375832
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r70715716
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.standard;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.supercsv.cellprocessor.Optional;
+import org.supercsv.cellprocessor.ParseBigDecimal;
+import org.supercsv.cellprocessor.ParseBool;
+import org.supercsv.cellprocessor.ParseChar;
+import org.supercsv.cellprocessor.ParseDate;
+import org.supercsv.cellprocessor.ParseDouble;
+import org.supercsv.cellprocessor.ParseInt;
+import org.supercsv.cellprocessor.ParseLong;
+import org.supercsv.cellprocessor.constraint.DMinMax;
+import org.supercsv.cellprocessor.constraint.Equals;
+import org.supercsv.cellprocessor.constraint.ForbidSubStr;
+import org.supercsv.cellprocessor.constraint.LMinMax;
+import org.supercsv.cellprocessor.constraint.NotNull;
+import org.supercsv.cellprocessor.constraint.RequireHashCode;
+import org.supercsv.cellprocessor.constraint.RequireSubStr;
+import org.supercsv.cellprocessor.constraint.StrMinMax;
+import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty;
+import org.supercsv.cellprocessor.constraint.StrRegEx;
+import org.supercsv.cellprocessor.constraint.Strlen;
+import org.supercsv.cellprocessor.constraint.Unique;
+import org.supercsv.cellprocessor.constraint.UniqueHashCode;
+import org.supercsv.cellprocessor.ift.CellProcessor;
+import org.supercsv.exception.SuperCsvCellProcessorException;
+import org.supercsv.io.CsvListReader;
+import org.supercsv.prefs.CsvPreference;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"csv", "schema", "validation"})
+@CapabilityDescription("Validates the contents of FlowFiles against a 
user-specified CSV schema")
--- End diff --

Ah duh, I should have just read the additional docs portion. A comment here 
reminding users to checkout the additional details would suffice


> Create a processor to validate CSV against a 

[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375823#comment-15375823
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/476#discussion_r70715212
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.standard;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.supercsv.cellprocessor.Optional;
+import org.supercsv.cellprocessor.ParseBigDecimal;
+import org.supercsv.cellprocessor.ParseBool;
+import org.supercsv.cellprocessor.ParseChar;
+import org.supercsv.cellprocessor.ParseDate;
+import org.supercsv.cellprocessor.ParseDouble;
+import org.supercsv.cellprocessor.ParseInt;
+import org.supercsv.cellprocessor.ParseLong;
+import org.supercsv.cellprocessor.constraint.DMinMax;
+import org.supercsv.cellprocessor.constraint.Equals;
+import org.supercsv.cellprocessor.constraint.ForbidSubStr;
+import org.supercsv.cellprocessor.constraint.LMinMax;
+import org.supercsv.cellprocessor.constraint.NotNull;
+import org.supercsv.cellprocessor.constraint.RequireHashCode;
+import org.supercsv.cellprocessor.constraint.RequireSubStr;
+import org.supercsv.cellprocessor.constraint.StrMinMax;
+import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty;
+import org.supercsv.cellprocessor.constraint.StrRegEx;
+import org.supercsv.cellprocessor.constraint.Strlen;
+import org.supercsv.cellprocessor.constraint.Unique;
+import org.supercsv.cellprocessor.constraint.UniqueHashCode;
+import org.supercsv.cellprocessor.ift.CellProcessor;
+import org.supercsv.exception.SuperCsvCellProcessorException;
+import org.supercsv.io.CsvListReader;
+import org.supercsv.prefs.CsvPreference;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"csv", "schema", "validation"})
+@CapabilityDescription("Validates the contents of FlowFiles against a 
user-specified CSV schema")
--- End diff --

Is there some sort of example CSV schema or other documentation we can 
point to here? I know nothing about CSV validation and if I was given the task 
to add/configure this processor I would be 

[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375479#comment-15375479
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/476
  
Hey @JPercivall, that should be OK now, thanks!


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Fix For: 1.0.0
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375308#comment-15375308
 ] 

ASF GitHub Bot commented on NIFI-1942:
--

Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/476
  
Hey @pvillard31, looks like this has merge conflicts. Could you address 
them?


> Create a processor to validate CSV against a user-supplied schema
> -
>
> Key: NIFI-1942
> URL: https://issues.apache.org/jira/browse/NIFI-1942
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Minor
> Fix For: 1.0.0
>
>
> In order to extend the set of "quality control" processors, it would be 
> interesting to have a processor validating CSV formatted flow files against a 
> user-specified schema.
> Flow file validated against schema would be routed to "valid" relationship 
> although flow file not validated against schema would be routed to "invalid" 
> relationship.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)