[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496607#comment-15496607 ] ASF subversion and git services commented on NIFI-1942: --- Commit d838f61291d2582592754a37314911b701c6891b in nifi's branch refs/heads/master from [~pvillard] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=d838f61 ] NIFI-1942 Processor to validate CSV against user-supplied schema This closes #476 Signed-off-by: jpercivall> Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496612#comment-15496612 ] ASF GitHub Bot commented on NIFI-1942: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/476 > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496603#comment-15496603 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/476 +1 Visually verified code and did a contrib check build. In a standalone instance ran multiple different schemas through ValidateCSV testing each of the properties, all worked as expected. Thanks @pvillard31, I will merge it in. > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494469#comment-15494469 ] ASF GitHub Bot commented on NIFI-1942: -- Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/476 Thanks @JPercivall ! I did the modification in the ``OnSchedule`` method and it is now working as expected. I also took the liberty to squash my commits. Let me know if there is something else. > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493968#comment-15493968 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/476 @pvillard31, I know what the problem with the end-line characters is. When going from the UI to Java, the characters are escaped so that what you input is transferred over to Java as is. So when you type the characters "\" and "\n" into the UI the Java string will end up being those two characters *not* the interpreted value "\n". There's been some discussion about it before and how we need to make some change but it hasn't been a top priority. For now what is done, is something like is done here[1]. Where the default value is escaped and then in the OnScheduled[2] or as a separate method[3] it is interpreted. [1] https://github.com/apache/nifi/blob/1373bf672586ba5ddcfa697c45c832ccc79425cb/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/AbstractListenEventBatchingProcessor.java#L61-L61 [2] https://github.com/apache/nifi/blob/1373bf672586ba5ddcfa697c45c832ccc79425cb/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/AbstractListenEventBatchingProcessor.java#L97-L97 [3] https://github.com/apache/nifi/blob/cd846c8d627efb2606f72b6af009358dec27be63/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/put/AbstractPutEventProcessor.java#L566-L566 > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414387#comment-15414387 ] ASF GitHub Bot commented on NIFI-1942: -- Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/476 Hey @JPercivall, I think I've addressed most of your comments: - changed log level - fixed decrementation (was not related to header but rather to 'finally' in 'while' loop) - strategy naming and description - fixed first line handling and added a unit test for that and to confirm Unique() behavior when validating line by line - changed exception catching It remains what you observed when displaying content of flow files after being processed by the processor. I've reproduced your observation but I didn't find any explanation. By any chance, do you know if there are some specific encoding related to the UI display? (I remember some discussions regarding how is processed the carriage return (shift + enter instead of \n) when used in a property in some processors) If you haves ideas, let me know! Thanks. > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405986#comment-15405986 ] Joseph Percivall commented on NIFI-1942: Hey [~pvillard], it seems work has stagnated a bit. I'm gonna remove the fix version of "1.0.0". When this gets merged in we can add the proper Fix Version. Let me know if you have any questions. > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383284#comment-15383284 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/476 @pvillard31, sorry for not realizing this earlier but a great improvement that could be made is to route like RouteText does. Where it can either route the whole FlowFile when it fails or route each line to the respective valid/invalid destination. Where invalid rows would have an attribute detailing what went wrong. I believe all that would need to be done is to create flowfiles with each reading of the row and if it fails add it to the invalid list (along with adding an attribute to it). > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Fix For: 1.0.0 > > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383255#comment-15383255 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r71247145 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -408,9 +407,12 @@ public void process(final InputStream in) throws IOException { listReader.read(); } while(listReader.read(cellProcs) != null) {} -} catch (final IOException | SuperCsvCellProcessorException e) { +} catch (final IOException e) { valid.set(false); logger.error("Failed to validate {} against schema due to {}", new Object[]{flowFile}, e); +} catch (final SuperCsvCellProcessorException e) { +valid.set(false); +logger.info("Failed to validate {} against schema due to {}; routing to 'invalid'", new Object[]{flowFile}, e); --- End diff -- Actually some form of this should be made into an attribute to be added to invalid FlowFiles. That way users can know what is actually invalid. > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Fix For: 1.0.0 > > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383208#comment-15383208 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r71243836 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -408,9 +407,12 @@ public void process(final InputStream in) throws IOException { listReader.read(); } while(listReader.read(cellProcs) != null) {} -} catch (final IOException | SuperCsvCellProcessorException e) { +} catch (final IOException e) { valid.set(false); logger.error("Failed to validate {} against schema due to {}", new Object[]{flowFile}, e); +} catch (final SuperCsvCellProcessorException e) { +valid.set(false); +logger.info("Failed to validate {} against schema due to {}; routing to 'invalid'", new Object[]{flowFile}, e); --- End diff -- This should be debug, not info, this will occur as a part of normal operation and is only really relevant for debugging. > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Fix For: 1.0.0 > > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381508#comment-15381508 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/476 I'm trying to use a couple of the processors listed in the accompanying doc you link to[1] but there seem to be some processors that aren't available (notably "IsIncludedIn"). I was trying to make sure a column had a value in a set of strings ("male" or "female"). Is there a reason for not including all the processors available? [1] http://super-csv.github.io/super-csv/cell_processors.html [2] http://super-csv.github.io/super-csv/apidocs/org/supercsv/cellprocessor/constraint/IsIncludedIn.html > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Fix For: 1.0.0 > > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381503#comment-15381503 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r71088596 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.standard; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicReference; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.ValidationContext; +import org.apache.nifi.components.ValidationResult; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.util.StandardValidators; +import org.supercsv.cellprocessor.Optional; +import org.supercsv.cellprocessor.ParseBigDecimal; +import org.supercsv.cellprocessor.ParseBool; +import org.supercsv.cellprocessor.ParseChar; +import org.supercsv.cellprocessor.ParseDate; +import org.supercsv.cellprocessor.ParseDouble; +import org.supercsv.cellprocessor.ParseInt; +import org.supercsv.cellprocessor.ParseLong; +import org.supercsv.cellprocessor.constraint.DMinMax; +import org.supercsv.cellprocessor.constraint.Equals; +import org.supercsv.cellprocessor.constraint.ForbidSubStr; +import org.supercsv.cellprocessor.constraint.LMinMax; +import org.supercsv.cellprocessor.constraint.NotNull; +import org.supercsv.cellprocessor.constraint.RequireHashCode; +import org.supercsv.cellprocessor.constraint.RequireSubStr; +import org.supercsv.cellprocessor.constraint.StrMinMax; +import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty; +import org.supercsv.cellprocessor.constraint.StrRegEx; +import org.supercsv.cellprocessor.constraint.Strlen; +import org.supercsv.cellprocessor.constraint.Unique; +import org.supercsv.cellprocessor.constraint.UniqueHashCode; +import org.supercsv.cellprocessor.ift.CellProcessor; +import org.supercsv.exception.SuperCsvCellProcessorException; +import org.supercsv.io.CsvListReader; +import org.supercsv.prefs.CsvPreference; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({"csv", "schema", "validation"}) +@CapabilityDescription("Validates the contents of FlowFiles against a user-specified CSV schema. " + +"Take a look at the additional documentation of this processor for some schema examples.") +public class ValidateCsv extends AbstractProcessor { + +private final static List allowedOperators =
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381482#comment-15381482 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r71088067 --- Diff: nifi-assembly/NOTICE --- @@ -193,6 +193,8 @@ The following binary components are provided under the Apache Software License v (ASLv2) opencsv (net.sf.opencsv:opencsv:2.3) + (ASLv2) Super CSV (net.sf.supercsv:super-csv:2.4.0) --- End diff -- This is a Apache 2.0 licensed import with not NOTICE, it can be used with adding anything to the NOTICE or LICENSE file. This applies to this assembly NOTICE and the nar NOTICE. > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Fix For: 1.0.0 > > Attachments: ValidateCSV.xml > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380828#comment-15380828 ] Joseph Witt commented on NIFI-1942: --- [~JPercivall] [~pvillard] since this a new extension please consider removing this from 1.0 and assigning the fix version once it is ready. > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Fix For: 1.0.0 > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376183#comment-15376183 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/476 @pvillard31 I left a couple comments. Do you by chance have a template and/or example csv data I can use to validate the processor? > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Fix For: 1.0.0 > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375884#comment-15375884 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r70719085 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.standard; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicReference; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.ValidationContext; +import org.apache.nifi.components.ValidationResult; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.util.StandardValidators; +import org.supercsv.cellprocessor.Optional; +import org.supercsv.cellprocessor.ParseBigDecimal; +import org.supercsv.cellprocessor.ParseBool; +import org.supercsv.cellprocessor.ParseChar; +import org.supercsv.cellprocessor.ParseDate; +import org.supercsv.cellprocessor.ParseDouble; +import org.supercsv.cellprocessor.ParseInt; +import org.supercsv.cellprocessor.ParseLong; +import org.supercsv.cellprocessor.constraint.DMinMax; +import org.supercsv.cellprocessor.constraint.Equals; +import org.supercsv.cellprocessor.constraint.ForbidSubStr; +import org.supercsv.cellprocessor.constraint.LMinMax; +import org.supercsv.cellprocessor.constraint.NotNull; +import org.supercsv.cellprocessor.constraint.RequireHashCode; +import org.supercsv.cellprocessor.constraint.RequireSubStr; +import org.supercsv.cellprocessor.constraint.StrMinMax; +import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty; +import org.supercsv.cellprocessor.constraint.StrRegEx; +import org.supercsv.cellprocessor.constraint.Strlen; +import org.supercsv.cellprocessor.constraint.Unique; +import org.supercsv.cellprocessor.constraint.UniqueHashCode; +import org.supercsv.cellprocessor.ift.CellProcessor; +import org.supercsv.exception.SuperCsvCellProcessorException; +import org.supercsv.io.CsvListReader; +import org.supercsv.prefs.CsvPreference; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({"csv", "schema", "validation"}) +@CapabilityDescription("Validates the contents of FlowFiles against a user-specified CSV schema") +public class ValidateCsv extends AbstractProcessor { + +private final static List allowedOperators = Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate", +"ParseDouble",
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375880#comment-15375880 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r70718795 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.standard; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicReference; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.ValidationContext; +import org.apache.nifi.components.ValidationResult; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.util.StandardValidators; +import org.supercsv.cellprocessor.Optional; +import org.supercsv.cellprocessor.ParseBigDecimal; +import org.supercsv.cellprocessor.ParseBool; +import org.supercsv.cellprocessor.ParseChar; +import org.supercsv.cellprocessor.ParseDate; +import org.supercsv.cellprocessor.ParseDouble; +import org.supercsv.cellprocessor.ParseInt; +import org.supercsv.cellprocessor.ParseLong; +import org.supercsv.cellprocessor.constraint.DMinMax; +import org.supercsv.cellprocessor.constraint.Equals; +import org.supercsv.cellprocessor.constraint.ForbidSubStr; +import org.supercsv.cellprocessor.constraint.LMinMax; +import org.supercsv.cellprocessor.constraint.NotNull; +import org.supercsv.cellprocessor.constraint.RequireHashCode; +import org.supercsv.cellprocessor.constraint.RequireSubStr; +import org.supercsv.cellprocessor.constraint.StrMinMax; +import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty; +import org.supercsv.cellprocessor.constraint.StrRegEx; +import org.supercsv.cellprocessor.constraint.Strlen; +import org.supercsv.cellprocessor.constraint.Unique; +import org.supercsv.cellprocessor.constraint.UniqueHashCode; +import org.supercsv.cellprocessor.ift.CellProcessor; +import org.supercsv.exception.SuperCsvCellProcessorException; +import org.supercsv.io.CsvListReader; +import org.supercsv.prefs.CsvPreference; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({"csv", "schema", "validation"}) +@CapabilityDescription("Validates the contents of FlowFiles against a user-specified CSV schema") +public class ValidateCsv extends AbstractProcessor { + +private final static List allowedOperators = Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate", +"ParseDouble",
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375873#comment-15375873 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r70718276 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.standard; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicReference; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.ValidationContext; +import org.apache.nifi.components.ValidationResult; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.util.StandardValidators; +import org.supercsv.cellprocessor.Optional; +import org.supercsv.cellprocessor.ParseBigDecimal; +import org.supercsv.cellprocessor.ParseBool; +import org.supercsv.cellprocessor.ParseChar; +import org.supercsv.cellprocessor.ParseDate; +import org.supercsv.cellprocessor.ParseDouble; +import org.supercsv.cellprocessor.ParseInt; +import org.supercsv.cellprocessor.ParseLong; +import org.supercsv.cellprocessor.constraint.DMinMax; +import org.supercsv.cellprocessor.constraint.Equals; +import org.supercsv.cellprocessor.constraint.ForbidSubStr; +import org.supercsv.cellprocessor.constraint.LMinMax; +import org.supercsv.cellprocessor.constraint.NotNull; +import org.supercsv.cellprocessor.constraint.RequireHashCode; +import org.supercsv.cellprocessor.constraint.RequireSubStr; +import org.supercsv.cellprocessor.constraint.StrMinMax; +import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty; +import org.supercsv.cellprocessor.constraint.StrRegEx; +import org.supercsv.cellprocessor.constraint.Strlen; +import org.supercsv.cellprocessor.constraint.Unique; +import org.supercsv.cellprocessor.constraint.UniqueHashCode; +import org.supercsv.cellprocessor.ift.CellProcessor; +import org.supercsv.exception.SuperCsvCellProcessorException; +import org.supercsv.io.CsvListReader; +import org.supercsv.prefs.CsvPreference; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({"csv", "schema", "validation"}) +@CapabilityDescription("Validates the contents of FlowFiles against a user-specified CSV schema") +public class ValidateCsv extends AbstractProcessor { + +private final static List allowedOperators = Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate", +"ParseDouble",
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375857#comment-15375857 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r70716978 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.standard; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicReference; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.ValidationContext; +import org.apache.nifi.components.ValidationResult; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.util.StandardValidators; +import org.supercsv.cellprocessor.Optional; +import org.supercsv.cellprocessor.ParseBigDecimal; +import org.supercsv.cellprocessor.ParseBool; +import org.supercsv.cellprocessor.ParseChar; +import org.supercsv.cellprocessor.ParseDate; +import org.supercsv.cellprocessor.ParseDouble; +import org.supercsv.cellprocessor.ParseInt; +import org.supercsv.cellprocessor.ParseLong; +import org.supercsv.cellprocessor.constraint.DMinMax; +import org.supercsv.cellprocessor.constraint.Equals; +import org.supercsv.cellprocessor.constraint.ForbidSubStr; +import org.supercsv.cellprocessor.constraint.LMinMax; +import org.supercsv.cellprocessor.constraint.NotNull; +import org.supercsv.cellprocessor.constraint.RequireHashCode; +import org.supercsv.cellprocessor.constraint.RequireSubStr; +import org.supercsv.cellprocessor.constraint.StrMinMax; +import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty; +import org.supercsv.cellprocessor.constraint.StrRegEx; +import org.supercsv.cellprocessor.constraint.Strlen; +import org.supercsv.cellprocessor.constraint.Unique; +import org.supercsv.cellprocessor.constraint.UniqueHashCode; +import org.supercsv.cellprocessor.ift.CellProcessor; +import org.supercsv.exception.SuperCsvCellProcessorException; +import org.supercsv.io.CsvListReader; +import org.supercsv.prefs.CsvPreference; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({"csv", "schema", "validation"}) +@CapabilityDescription("Validates the contents of FlowFiles against a user-specified CSV schema") +public class ValidateCsv extends AbstractProcessor { + +private final static List allowedOperators = Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate", +"ParseDouble",
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375849#comment-15375849 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r70716478 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.standard; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicReference; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.ValidationContext; +import org.apache.nifi.components.ValidationResult; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.util.StandardValidators; +import org.supercsv.cellprocessor.Optional; +import org.supercsv.cellprocessor.ParseBigDecimal; +import org.supercsv.cellprocessor.ParseBool; +import org.supercsv.cellprocessor.ParseChar; +import org.supercsv.cellprocessor.ParseDate; +import org.supercsv.cellprocessor.ParseDouble; +import org.supercsv.cellprocessor.ParseInt; +import org.supercsv.cellprocessor.ParseLong; +import org.supercsv.cellprocessor.constraint.DMinMax; +import org.supercsv.cellprocessor.constraint.Equals; +import org.supercsv.cellprocessor.constraint.ForbidSubStr; +import org.supercsv.cellprocessor.constraint.LMinMax; +import org.supercsv.cellprocessor.constraint.NotNull; +import org.supercsv.cellprocessor.constraint.RequireHashCode; +import org.supercsv.cellprocessor.constraint.RequireSubStr; +import org.supercsv.cellprocessor.constraint.StrMinMax; +import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty; +import org.supercsv.cellprocessor.constraint.StrRegEx; +import org.supercsv.cellprocessor.constraint.Strlen; +import org.supercsv.cellprocessor.constraint.Unique; +import org.supercsv.cellprocessor.constraint.UniqueHashCode; +import org.supercsv.cellprocessor.ift.CellProcessor; +import org.supercsv.exception.SuperCsvCellProcessorException; +import org.supercsv.io.CsvListReader; +import org.supercsv.prefs.CsvPreference; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({"csv", "schema", "validation"}) +@CapabilityDescription("Validates the contents of FlowFiles against a user-specified CSV schema") +public class ValidateCsv extends AbstractProcessor { + +private final static List allowedOperators = Arrays.asList("ParseBigDecimal", "ParseBool", "ParseChar", "ParseDate", +"ParseDouble",
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375832#comment-15375832 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r70715716 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.standard; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicReference; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.ValidationContext; +import org.apache.nifi.components.ValidationResult; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.util.StandardValidators; +import org.supercsv.cellprocessor.Optional; +import org.supercsv.cellprocessor.ParseBigDecimal; +import org.supercsv.cellprocessor.ParseBool; +import org.supercsv.cellprocessor.ParseChar; +import org.supercsv.cellprocessor.ParseDate; +import org.supercsv.cellprocessor.ParseDouble; +import org.supercsv.cellprocessor.ParseInt; +import org.supercsv.cellprocessor.ParseLong; +import org.supercsv.cellprocessor.constraint.DMinMax; +import org.supercsv.cellprocessor.constraint.Equals; +import org.supercsv.cellprocessor.constraint.ForbidSubStr; +import org.supercsv.cellprocessor.constraint.LMinMax; +import org.supercsv.cellprocessor.constraint.NotNull; +import org.supercsv.cellprocessor.constraint.RequireHashCode; +import org.supercsv.cellprocessor.constraint.RequireSubStr; +import org.supercsv.cellprocessor.constraint.StrMinMax; +import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty; +import org.supercsv.cellprocessor.constraint.StrRegEx; +import org.supercsv.cellprocessor.constraint.Strlen; +import org.supercsv.cellprocessor.constraint.Unique; +import org.supercsv.cellprocessor.constraint.UniqueHashCode; +import org.supercsv.cellprocessor.ift.CellProcessor; +import org.supercsv.exception.SuperCsvCellProcessorException; +import org.supercsv.io.CsvListReader; +import org.supercsv.prefs.CsvPreference; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({"csv", "schema", "validation"}) +@CapabilityDescription("Validates the contents of FlowFiles against a user-specified CSV schema") --- End diff -- Ah duh, I should have just read the additional docs portion. A comment here reminding users to checkout the additional details would suffice > Create a processor to validate CSV against a
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375823#comment-15375823 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/476#discussion_r70715212 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.standard; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicReference; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.ValidationContext; +import org.apache.nifi.components.ValidationResult; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.util.StandardValidators; +import org.supercsv.cellprocessor.Optional; +import org.supercsv.cellprocessor.ParseBigDecimal; +import org.supercsv.cellprocessor.ParseBool; +import org.supercsv.cellprocessor.ParseChar; +import org.supercsv.cellprocessor.ParseDate; +import org.supercsv.cellprocessor.ParseDouble; +import org.supercsv.cellprocessor.ParseInt; +import org.supercsv.cellprocessor.ParseLong; +import org.supercsv.cellprocessor.constraint.DMinMax; +import org.supercsv.cellprocessor.constraint.Equals; +import org.supercsv.cellprocessor.constraint.ForbidSubStr; +import org.supercsv.cellprocessor.constraint.LMinMax; +import org.supercsv.cellprocessor.constraint.NotNull; +import org.supercsv.cellprocessor.constraint.RequireHashCode; +import org.supercsv.cellprocessor.constraint.RequireSubStr; +import org.supercsv.cellprocessor.constraint.StrMinMax; +import org.supercsv.cellprocessor.constraint.StrNotNullOrEmpty; +import org.supercsv.cellprocessor.constraint.StrRegEx; +import org.supercsv.cellprocessor.constraint.Strlen; +import org.supercsv.cellprocessor.constraint.Unique; +import org.supercsv.cellprocessor.constraint.UniqueHashCode; +import org.supercsv.cellprocessor.ift.CellProcessor; +import org.supercsv.exception.SuperCsvCellProcessorException; +import org.supercsv.io.CsvListReader; +import org.supercsv.prefs.CsvPreference; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({"csv", "schema", "validation"}) +@CapabilityDescription("Validates the contents of FlowFiles against a user-specified CSV schema") --- End diff -- Is there some sort of example CSV schema or other documentation we can point to here? I know nothing about CSV validation and if I was given the task to add/configure this processor I would be
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375479#comment-15375479 ] ASF GitHub Bot commented on NIFI-1942: -- Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/476 Hey @JPercivall, that should be OK now, thanks! > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Fix For: 1.0.0 > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-1942) Create a processor to validate CSV against a user-supplied schema
[ https://issues.apache.org/jira/browse/NIFI-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375308#comment-15375308 ] ASF GitHub Bot commented on NIFI-1942: -- Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/476 Hey @pvillard31, looks like this has merge conflicts. Could you address them? > Create a processor to validate CSV against a user-supplied schema > - > > Key: NIFI-1942 > URL: https://issues.apache.org/jira/browse/NIFI-1942 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Pierre Villard >Assignee: Pierre Villard >Priority: Minor > Fix For: 1.0.0 > > > In order to extend the set of "quality control" processors, it would be > interesting to have a processor validating CSV formatted flow files against a > user-specified schema. > Flow file validated against schema would be routed to "valid" relationship > although flow file not validated against schema would be routed to "invalid" > relationship. -- This message was sent by Atlassian JIRA (v6.3.4#6332)