[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448686#comment-16448686
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2587


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
> Fix For: 1.7.0
>
> Attachments: 0001-NIFI-4185-Minor-tweaks.patch
>
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448684#comment-16448684
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2587
  
This is now merged to master. Thanks again for the contribution!


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
> Fix For: 1.7.0
>
> Attachments: 0001-NIFI-4185-Minor-tweaks.patch
>
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448682#comment-16448682
 ] 

ASF subversion and git services commented on NIFI-4185:
---

Commit d21bd3870b1edd107bc6ef12bc5f12820a10bebb in nifi's branch 
refs/heads/master from [~jope]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=d21bd38 ]

NIFI-4185: Add XML Record Reader


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
> Attachments: 0001-NIFI-4185-Minor-tweaks.patch
>
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448683#comment-16448683
 ] 

ASF subversion and git services commented on NIFI-4185:
---

Commit 0e736f59fdc29db471cecb4c7a885bf378e9ae71 in nifi's branch 
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=0e736f5 ]

NIFI-4185: Minor tweaks to XML Record Reader, around documentation and error 
handling

This closes #2587.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
> Attachments: 0001-NIFI-4185-Minor-tweaks.patch
>
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448663#comment-16448663
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@JohannesDaniel that would be great! I think the Record attribute stuff 
will significantly improve how we are able to handle XML-based records. But I 
think the approach that you've taken here is a great first step and will be 
very beneficial to the community. I'd recommend going ahead with the XML Writer 
and then we can work in the attribute-related stuff later.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
> Attachments: 0001-NIFI-4185-Minor-tweaks.patch
>
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448620#comment-16448620
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r183489279
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -84,6 +84,10 @@ public XMLRecordReader(InputStream in, RecordSchema 
schema, String rootName, Str
 
 try {
 final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
--- End diff --

@tballison Thank you for the advice! I refactored this in a way that only 
the local part is considered. 
:)


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
> Attachments: 0001-NIFI-4185-Minor-tweaks.patch
>
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448612#comment-16448612
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@markap14 thanks for the help with the patch file. I am fine with it :)

If you have not planned to do that yourself, I would start implementing a 
XMLWriter as soon as this has been merged. Or should we first discuss your plan 
with the Record attributes you mentioned above?


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
> Attachments: 0001-NIFI-4185-Minor-tweaks.patch
>
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448494#comment-16448494
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@JohannesDaniel this is great! I've been doing a good bit of testing to 
ensure that everything works as expected. I had just a few more comments, 
mostly around the descriptions in the property descriptor and the handling of 
invalid values for the "Expect Records as Array" property. To make it a little 
easier to understand, I'm just going to attach a PATCH file to the JIRA. Can 
you please look over the patch file? If you're okay with the patch file, then 
I'm ready to merge. Thanks again, you've put in a lot of work here to make this 
consistent with the current approaches and to provide a huge new capability for 
many users!


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448129#comment-16448129
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user tballison commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r183390424
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -84,6 +84,10 @@ public XMLRecordReader(InputStream in, RecordSchema 
schema, String rootName, Str
 
 try {
 final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
--- End diff --

Might want to avoid XEE vulnerability via improved configuration of 
XMLInputFactory?

https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#XMLInputFactory_.28a_StAX_parser.29


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447249#comment-16447249
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@markap14 
- Added EL for record format property
- Removed record tag validation


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446364#comment-16446364
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@markap14 thank you for the response. 

I will simply remove that record tag validation as there are indeed many 
ways to do that before the data is processed by this reader. 

There is one little corner case, I need to discuss: 
Assuming we have the following data:
```
value1...
```
If the reader is used with (coerce==true), the field "map_field" can be 
parsed by defining a map in the schema. The embedded key fields do not have to 
be defined, its values only have to be of the defined type for the map. 

If the reader is used with (coerce==false && dropUnknown==true), the reader 
will parse all fields that exist in the schema ignoring its type. However, the 
data above will not be parsable even if the map exists in the schema. In this 
case, the reader identifies "map_field" as a field that exists in the schema, 
but the reader is not aware that it is of type map. Therefore, the reader will 
not parse the embedded key fields, as they don't exist in the schema. The field 
"map_field" will be classified as an empty field and not added to the record.

Furthermore, even if the reader is used with (coerce==false && 
dropUnknown==true), it will be type-aware to some extent. The reader first 
checks, whether fields exist in the schema. If that is the case, the reader 
additionally will check whether they are of type record (or of type array 
embedding records, respectively). If that is also the case, the reader will 
retrieve the subschema in order to be enabled to check whether subtags of the 
current tag are known. 



> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446291#comment-16446291
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@JohannesDaniel thanks for the update! I commented above re: the use of 
Expression Language in the property descriptor.
I do still feel like the check for 'record tag names' is unnecessary, as 
the reader should not be responsible for filtering the data but rather just for 
reading it. There already exist mechanisms for filtering the data (You could 
use PartitionRecord + RouteOnAttribute, ValidateRecord, or QueryRecord just off 
the top of my head to achieve this). Additionally, we have the Schema for the 
Record Reader. So if the element name matches the top-level Schema name (or one 
of them, if the top-level field is a UNION/CHOICE element), then we could use 
that. So, with your example above, if you only want to read the 
`` part, your schema should indicate that the top-level field 
name is `record`. In that case, it should filter out the `other` record. Does 
that make sense? 


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446275#comment-16446275
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r183152900
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
--- End diff --

Yes, exactly. I would use a property to convey this. I would be okay with 
allowing Expression Language to be used, or just allowing for a 'true'/'false' 
without Expression Language (I think in most cases, you'll want one or the 
other, not dependent upon each individual FlowFile). But if you think EL is 
important then I won't argue that point :)
One other option, which we do in a few different processors, would be to 
offer a third option that looks at a well-known attribute. So you could choose 
'true' (treat outer element as a wrapper), 'false' (treat each flowfile as a 
single record), or 'use xml.stream attribute', and when that is selected, the 
'xml.stream' attribute would be looked at to determine how to handle it - a 
value of 'true' would mean it's a stream of multiple records, 'false' would 
mean it's only 1 record, missing or any other value would throw an Exception. I 
don't have  strong preference one way or another how this should be handled, 
but wanted to present options that we typically use.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444908#comment-16444908
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@markap14 @pvillard31 
- I refactored some code as the cases (coerce==true && drop==false) and 
(coerce==false && drop==true) in some cases showed an unexpected behavior
- Data like contentcontentcontent now can be 
parsed
- Maps (e. g. 
value1value2) are now supported
- The reader is now able to parse single records (e. g. 
) as well as arrays of records (e. g. 
). I added a property to make it configurable 
whether the reader shall expect a single record or an array. One question: As 
there are only two options for this, I defined AllowableValues for this 
property. Despite that, I think it would be reasonable to enable EL for this 
property. But how can this be realized?
- I removed the root validation, but remained the check for record tag 
names in order to support processing data like this 
 (tag "other" will be ignored if check 
for record tag name is activated)


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439880#comment-16439880
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181851731
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439106#comment-16439106
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181653767
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436305#comment-16436305
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181221789
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
+"XML data, embedded in an enclosing root tag.")
+public class XMLReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+public static final PropertyDescriptor VALIDATE_ROOT_TAG = new 
PropertyDescriptor.Builder()
+.name("validate_root_tag")
+.displayName("Validate Root Tag")
+.description("If this property is set, the name of root tags 
(e. g. ...) of incoming FlowFiles will be 
evaluated against this value. " +
+"In the case of a mismatch, an exception is thrown. 
The treatment of such FlowFiles depends on the implementation " +
+"of respective Processors.")
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+.expressionLanguageSupported(true)
+.required(false)
+.build();
+
+public static final PropertyDescriptor VALIDATE_RECORD_TAG = new 
PropertyDescriptor.Builder()
--- End diff --

(non-record shall be skipped)


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436293#comment-16436293
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@markap14 thank you for the comprehensive review. I will start refactoring 
the implementations with respect to the improvements that are clear.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436291#comment-16436291
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181218609
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436286#comment-16436286
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181218182
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436285#comment-16436285
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181217494
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436281#comment-16436281
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181215801
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
--- End diff --

ok, I will activate namespaces and implement some tests for this.


> Add XML record reader & writer services
> 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436279#comment-16436279
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181215337
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
--- End diff --

when I started implementing this reader, I was wondering, how the reader 
knows whether to parse wrapped records or a single record. unfortunately we 
dont have an unambiguous indicator like we have for json: [ vs. { 
I considered to make it configurable with EL whether the reader shall 
expect a single record or an array of records. what do you think?


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436269#comment-16436269
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181213403
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
+"XML data, embedded in an enclosing root tag.")
+public class XMLReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+public static final PropertyDescriptor VALIDATE_ROOT_TAG = new 
PropertyDescriptor.Builder()
+.name("validate_root_tag")
+.displayName("Validate Root Tag")
+.description("If this property is set, the name of root tags 
(e. g. ...) of incoming FlowFiles will be 
evaluated against this value. " +
+"In the case of a mismatch, an exception is thrown. 
The treatment of such FlowFiles depends on the implementation " +
+"of respective Processors.")
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+.expressionLanguageSupported(true)
+.required(false)
+.build();
+
+public static final PropertyDescriptor VALIDATE_RECORD_TAG = new 
PropertyDescriptor.Builder()
--- End diff --

my original intention actually was to enable users to parse recordsets like 
this
```

  ...
  ...
  ...


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436271#comment-16436271
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r181213473
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
+"XML data, embedded in an enclosing root tag.")
+public class XMLReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+public static final PropertyDescriptor VALIDATE_ROOT_TAG = new 
PropertyDescriptor.Builder()
+.name("validate_root_tag")
+.displayName("Validate Root Tag")
+.description("If this property is set, the name of root tags 
(e. g. ...) of incoming FlowFiles will be 
evaluated against this value. " +
+"In the case of a mismatch, an exception is thrown. 
The treatment of such FlowFiles depends on the implementation " +
+"of respective Processors.")
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+.expressionLanguageSupported(true)
+.required(false)
+.build();
+
+public static final PropertyDescriptor VALIDATE_RECORD_TAG = new 
PropertyDescriptor.Builder()
+.name("validate_record_tag")
+.displayName("Validate Record Tag")
+.description("If this property is set, the name of record tags 
(e. g. ...) of incoming FlowFiles will be 
evaluated against this value. " +
+"In the case of a mismatch, the respective record will 
be skipped. If this property is not set, each level 2 starting tag will be 
treated " +
+"as the beginning of a record.")
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+.expressionLanguageSupported(true)
+.required(false)
+.build();
+
+public static final PropertyDescriptor ATTRIBUTE_PREFIX = new 
PropertyDescriptor.Builder()
+.name("attribute_prefix")
+.displayName("Attribute Prefix")
+.description("If this property is set, the name of attributes 
will be appended by a prefix when they are added to a record.")
--- End diff --

ok


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
> 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428741#comment-16428741
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179838657
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428731#comment-16428731
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179831387
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428732#comment-16428732
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179830847
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428726#comment-16428726
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179819928
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
+"XML data, embedded in an enclosing root tag.")
+public class XMLReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+public static final PropertyDescriptor VALIDATE_ROOT_TAG = new 
PropertyDescriptor.Builder()
+.name("validate_root_tag")
+.displayName("Validate Root Tag")
+.description("If this property is set, the name of root tags 
(e. g. ...) of incoming FlowFiles will be 
evaluated against this value. " +
+"In the case of a mismatch, an exception is thrown. 
The treatment of such FlowFiles depends on the implementation " +
+"of respective Processors.")
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+.expressionLanguageSupported(true)
+.required(false)
+.build();
+
+public static final PropertyDescriptor VALIDATE_RECORD_TAG = new 
PropertyDescriptor.Builder()
--- End diff --

Likewise, I think we should remove this property and this sort of 
validation as well. If the user wants to validate some specific XML element 
names, the ValidateRecord processor is a great solution for that, and provides 
far more flexible validation via schema.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428745#comment-16428745
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179834679
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428735#comment-16428735
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179827839
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428736#comment-16428736
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179820052
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
+"XML data, embedded in an enclosing root tag.")
+public class XMLReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+public static final PropertyDescriptor VALIDATE_ROOT_TAG = new 
PropertyDescriptor.Builder()
+.name("validate_root_tag")
+.displayName("Validate Root Tag")
+.description("If this property is set, the name of root tags 
(e. g. ...) of incoming FlowFiles will be 
evaluated against this value. " +
+"In the case of a mismatch, an exception is thrown. 
The treatment of such FlowFiles depends on the implementation " +
+"of respective Processors.")
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+.expressionLanguageSupported(true)
+.required(false)
+.build();
+
+public static final PropertyDescriptor VALIDATE_RECORD_TAG = new 
PropertyDescriptor.Builder()
+.name("validate_record_tag")
+.displayName("Validate Record Tag")
+.description("If this property is set, the name of record tags 
(e. g. ...) of incoming FlowFiles will be 
evaluated against this value. " +
+"In the case of a mismatch, the respective record will 
be skipped. If this property is not set, each level 2 starting tag will be 
treated " +
+"as the beginning of a record.")
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+.expressionLanguageSupported(true)
+.required(false)
+.build();
+
+public static final PropertyDescriptor ATTRIBUTE_PREFIX = new 
PropertyDescriptor.Builder()
+.name("attribute_prefix")
+.displayName("Attribute Prefix")
+.description("If this property is set, the name of attributes 
will be appended by a prefix when they are added to a record.")
--- End diff --

I think this is supposed to say "prepended with a prefix"


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428730#comment-16428730
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179829486
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428737#comment-16428737
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179833060
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428729#comment-16428729
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179829341
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428725#comment-16428725
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179820952
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
--- End diff --

I think the requirement that XML data must be wrapped in some sort of 
wrapper is going to be problematic. While this will be a fairly common case, so 
that multiple XML elements can be combined into a single FlowFile, it is also 
going to be common (probably more common) that each XML element will be its own 
standalone Record. This is especially important if this Reader is used for 
something like ListenTCPRecord or ConsumeKafkaRecord, where the data is 
received from elsewhere so no processor has a chance to wrap the content prior 
to using the XML Reader. I think we need to support both ignoring the 
outer-most element as well as incorporating the outer-most element.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428733#comment-16428733
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179830412
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428738#comment-16428738
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179833900
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428742#comment-16428742
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179835305
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428740#comment-16428740
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179829981
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428728#comment-16428728
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179827156
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428727#comment-16428727
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179824910
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
--- End diff --

I think this approach may lead to some odd behaviors if the incoming XML is 
actually namespace aware. For example, if 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428743#comment-16428743
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179834908
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428734#comment-16428734
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179829864
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428739#comment-16428739
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179831931
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
+private StartElement currentRecordStartTag;
+
+private final XMLEventReader xmlEventReader;
+
+private final Supplier LAZY_DATE_FORMAT;
+private final Supplier LAZY_TIME_FORMAT;
+private final Supplier LAZY_TIMESTAMP_FORMAT;
+
+public XMLRecordReader(InputStream in, RecordSchema schema, String 
rootName, String recordName, String attributePrefix, String contentFieldName,
+   final String dateFormat, final String 
timeFormat, final String timestampFormat, final ComponentLog logger) throws 
MalformedRecordException {
+this.schema = schema;
+this.recordName = recordName;
+this.attributePrefix = attributePrefix;
+this.contentFieldName = contentFieldName;
+this.logger = logger;
+
+final DateFormat df = dateFormat == null ? null : 
DataTypeUtils.getDateFormat(dateFormat);
+final DateFormat tf = timeFormat == null ? null : 
DataTypeUtils.getDateFormat(timeFormat);
+final DateFormat tsf = timestampFormat == null ? null : 
DataTypeUtils.getDateFormat(timestampFormat);
+
+LAZY_DATE_FORMAT = () -> df;
+LAZY_TIME_FORMAT = () -> tf;
+LAZY_TIMESTAMP_FORMAT = () -> tsf;
+
+try {
+final XMLInputFactory xmlInputFactory = 
XMLInputFactory.newInstance();
+
+// Avoid namespace replacements
+
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
+
+xmlEventReader = xmlInputFactory.createXMLEventReader(in);
+final StartElement rootTag = getNextStartTag();
 

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428744#comment-16428744
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179840389
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
--- End diff --

More specifically, I think that if the following were the content of a 
FlowFile:
```

  John Doe
  123
  01/01/2017

```
Then I would expect to have this parse as a single Record that would match 
this schema:
```
{
  "name": "person", "namespace": "nifi",
  "type": "record",
  "fields": [
{ "name": "name", "type": "string" },
{ "name": "id", "type": "int" },
{ "name": "dob", "type": "date" }
  ]
}
```
Additionally, I would expect to be able to set a property that indicates 
that the outer-most XML element is simply a wrapper. If that property were set 
to "true", then I would expect to use that exact same schema to parse the 
following XML:
```

  
John Doe
123
01/01/2017
  
  
Jane Doe
124
01/01/2016
  
  
Jake Doe
125
01/01/2015
  

```
In this case, the 'people' element is just a wrapper and could just as 
easily be an element named 'root' or 'foo' or 'bar'.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428724#comment-16428724
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179822292
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLRecordReader.java
 ---
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.SimpleRecordSchema;
+import org.apache.nifi.serialization.record.DataType;
+import org.apache.nifi.serialization.record.MapRecord;
+import org.apache.nifi.serialization.record.Record;
+import org.apache.nifi.serialization.record.RecordField;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.serialization.record.type.ArrayDataType;
+import org.apache.nifi.serialization.record.type.RecordDataType;
+import org.apache.nifi.serialization.record.util.DataTypeUtils;
+
+import javax.xml.stream.XMLEventReader;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.events.Attribute;
+import javax.xml.stream.events.Characters;
+import javax.xml.stream.events.StartElement;
+import javax.xml.stream.events.XMLEvent;
+import java.io.IOException;
+import java.io.InputStream;
+import java.text.DateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public class XMLRecordReader implements RecordReader {
+
+private final ComponentLog logger;
+private final RecordSchema schema;
+private final String recordName;
+private final String attributePrefix;
+private final String contentFieldName;
+
+// thread safety required?
--- End diff --

Record Readers don't need to be thread-safe, only the RecordReaderFactory 
does.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428723#comment-16428723
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r179819209
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
+"XML data, embedded in an enclosing root tag.")
+public class XMLReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+public static final PropertyDescriptor VALIDATE_ROOT_TAG = new 
PropertyDescriptor.Builder()
--- End diff --

I am actually in favor of removing this property all together. In order to 
properly read the records, the Record Readers will need to validate syntax of 
the data, but I don't believe that it should be validating arbitrary semantic 
meanings. I.e., I don't think that we should be checking the name of the 
outer-most element for any specific name. 


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425771#comment-16425771
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2587
  
Hey @JohannesDaniel - just made new tests and can't reproduce what I saw 
previously... so let's forget about it :) (thanks for the unit tests though!)

I tested the EL support for tag validation, working well, thanks for the 
addition. I'd just make one suggestion: right now for the record tag, "in the 
case of a mismatch, the respective record will be skipped". I'd suggest to make 
this behavior configurable through a parameter in the CS: skip the record 
(current behavior) or throw an exception for the flow file (as you're doing for 
the root tag validation).

Regarding the performance test, it was a GFF => ConvertRecord (XML to JSON) 
=> UpdateAttribute. Don't remember the exact numbers but IIRC it was around 
1300 flowfile per second with an XML payload of 6KB (single record in my case).

Overall, it's really cool!
I'll try to have a look over the code during the week.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421843#comment-16421843
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on the issue:

https://github.com/apache/nifi/pull/2587
  
Can you maybe post the XML that led to the empty record?


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421842#comment-16421842
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on the issue:

https://github.com/apache/nifi/pull/2587
  
Hi @pvillard31 

thank you for your comments! I realized all your suggestions. I like your 
news regarding the performance :-) Which kind of transformation did you test? 
XML => Record or XML => JSON (e. g. with ConvertRecord)?

For any reason some tests disappeared for a certain commit at my local git 
(probably, I wanted to reorder the tests,  but deleted them, omg ...). However, 
I inserted them again (this is why there are many more tests now). 

In addition, I adjusted the definition about how namespaces shall be 
treated. 

I implemented several tests for XMLReader to verify that the usage of 
expression language works as expected.

However, I was not able to reproduce your observation regarding the empty 
record for the header 
```

```

I implemented the following tests:
```
testSimpleRecordWithHeader()
testSimpleRecordWithHeaderNoValidation()
```

Actually, they work as expected. 


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421837#comment-16421837
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r178470638
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
+"XML data, embedded in an enclosing root tag.")
+public class XMLReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+public static final PropertyDescriptor VALIDATE_ROOT_TAG = new 
PropertyDescriptor.Builder()
--- End diff --

done. additionally, I added some tests for class XMLReader


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421838#comment-16421838
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r178470648
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/resources/docs/org.apache.nifi.xml.XMLReader/additionalDetails.html
 ---
@@ -0,0 +1,378 @@
+
+
+
+
+
+XMLReader
+
+
+
+
+
+The XMLReader Controller Service reads XML content and creates 
Record objects. The Controller Service
+must be configured with a schema that describes the structure of 
the XML data. Fields in the XML data
+that are not defined in the schema will be skipped.
+
+
+Records are expected in the second level of the XML data, embedded 
within an enclosing root tag:
+
+
+
+root
+  record
+field1content/field1
+field2content/field2
+  /record
+  record
+field1content/field1
+field2content/field2
+  /record
+/root
+
+
+
+
+For the following examples, it is assumed that the exemplary 
records are enclosed by a root tag.
+
+
+Example 1: Simple Fields
+
+
+The simplest kind of data within XML data are tags / fields only 
containing content (no attributes, no embedded tags).
+They can be described in the schema by simple types (e. g. INT, 
STRING, ...).
+
+
+
+
+record
+  simple_fieldcontent/simple_field
+/record
+
+
+
+
+This record can be described by a schema containing one field (e. 
g. of type string). By providing this schema,
+the reader expects zero or one occurrences of "simple_field" in 
the record.
+
+
+
+
+{
+  "namespace": "nifi",
+  "name": "test",
+  "type": "record",
+  "fields": [
+{ "name": "simple_field", "type": "string" }
+  ]
+}
+
+
+
+Example 2: Arrays with Simple Fields
+
+
+Arrays are considered as repetitive tags / fields in XML data. For 
the following XML data, "array_field" is considered
+to be an array enclosing simple fields, whereas "simple_field" is 
considered to be a simple field not enclosed in
+an array.
+
+
+
+
+record
+  array_fieldcontent/array_field
+  array_fieldcontent/array_field
+  simple_fieldcontent/simple_field
+/record
+
+
+
+
+This record can be described by the following schema:
+
+
+
+
+{
+  "namespace": "nifi",
+  "name": "test",
+  "type": "record",
+  "fields": [
+{ "name": "array_field", "type":
+  { "type": "array", "items": string }
+},
+{ "name": "simple_field", "type": "string" }
+  ]
+}
+
+
+
+
+If a field in a schema is embedded in an array, the reader expects 
zero, one or more occurrences of the field
+in a record. The field "array_field" principally also could be 
defined as a simple field, but then the second occurrence
+of this field would replace the first in the record object. 
Moreover, the field "simple_field" could also be defined
+as an array. In this case, the reader would put it into the record 
object as an array with one element.
+
+
+Example 3: Tags with Attributes
+
+
+XML fields frequently not only contain content, but also 
attributes. The following record contains a field with
+an attribute "attr" and content:
+
+
+
+
+record
+  field_with_attribute attr="attr_content"content 
of field/field_with_attribute
+   

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-04-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421836#comment-16421836
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r178470625
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/resources/docs/org.apache.nifi.xml.XMLReader/additionalDetails.html
 ---
@@ -0,0 +1,378 @@
+
+
+
+
+
+XMLReader
+
+
+
+
+
+The XMLReader Controller Service reads XML content and creates 
Record objects. The Controller Service
+must be configured with a schema that describes the structure of 
the XML data. Fields in the XML data
+that are not defined in the schema will be skipped.
+
+
+Records are expected in the second level of the XML data, embedded 
within an enclosing root tag:
+
+
+
+root
+  record
+field1content/field1
+field2content/field2
+  /record
+  record
+field1content/field1
+field2content/field2
+  /record
+/root
+
+
+
+
+For the following examples, it is assumed that the exemplary 
records are enclosed by a root tag.
+
+
+Example 1: Simple Fields
+
+
+The simplest kind of data within XML data are tags / fields only 
containing content (no attributes, no embedded tags).
+They can be described in the schema by simple types (e. g. INT, 
STRING, ...).
+
+
+
+
+record
+  simple_fieldcontent/simple_field
+/record
+
+
+
+
+This record can be described by a schema containing one field (e. 
g. of type string). By providing this schema,
+the reader expects zero or one occurrences of "simple_field" in 
the record.
+
+
+
+
+{
+  "namespace": "nifi",
+  "name": "test",
+  "type": "record",
+  "fields": [
+{ "name": "simple_field", "type": "string" }
+  ]
+}
+
+
+
+Example 2: Arrays with Simple Fields
+
+
+Arrays are considered as repetitive tags / fields in XML data. For 
the following XML data, "array_field" is considered
+to be an array enclosing simple fields, whereas "simple_field" is 
considered to be a simple field not enclosed in
+an array.
+
+
+
+
+record
+  array_fieldcontent/array_field
+  array_fieldcontent/array_field
+  simple_fieldcontent/simple_field
+/record
+
+
+
+
+This record can be described by the following schema:
+
+
+
+
+{
+  "namespace": "nifi",
+  "name": "test",
+  "type": "record",
+  "fields": [
+{ "name": "array_field", "type":
+  { "type": "array", "items": string }
+},
+{ "name": "simple_field", "type": "string" }
+  ]
+}
+
+
+
+
+If a field in a schema is embedded in an array, the reader expects 
zero, one or more occurrences of the field
+in a record. The field "array_field" principally also could be 
defined as a simple field, but then the second occurrence
+of this field would replace the first in the record object. 
Moreover, the field "simple_field" could also be defined
+as an array. In this case, the reader would put it into the record 
object as an array with one element.
+
+
+Example 3: Tags with Attributes
+
+
+XML fields frequently not only contain content, but also 
attributes. The following record contains a field with
+an attribute "attr" and content:
+
+
+
+
+record
+  field_with_attribute attr="attr_content"content 
of field/field_with_attribute
+   

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421441#comment-16421441
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r178437056
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/resources/docs/org.apache.nifi.xml.XMLReader/additionalDetails.html
 ---
@@ -0,0 +1,378 @@
+
+
+
+
+
+XMLReader
+
+
+
+
+
+The XMLReader Controller Service reads XML content and creates 
Record objects. The Controller Service
+must be configured with a schema that describes the structure of 
the XML data. Fields in the XML data
+that are not defined in the schema will be skipped.
+
+
+Records are expected in the second level of the XML data, embedded 
within an enclosing root tag:
+
+
+
+root
+  record
+field1content/field1
+field2content/field2
+  /record
+  record
+field1content/field1
+field2content/field2
+  /record
+/root
+
+
+
+
+For the following examples, it is assumed that the exemplary 
records are enclosed by a root tag.
+
+
+Example 1: Simple Fields
+
+
+The simplest kind of data within XML data are tags / fields only 
containing content (no attributes, no embedded tags).
+They can be described in the schema by simple types (e. g. INT, 
STRING, ...).
+
+
+
+
+record
+  simple_fieldcontent/simple_field
+/record
+
+
+
+
+This record can be described by a schema containing one field (e. 
g. of type string). By providing this schema,
+the reader expects zero or one occurrences of "simple_field" in 
the record.
+
+
+
+
+{
+  "namespace": "nifi",
+  "name": "test",
+  "type": "record",
+  "fields": [
+{ "name": "simple_field", "type": "string" }
+  ]
+}
+
+
+
+Example 2: Arrays with Simple Fields
+
+
+Arrays are considered as repetitive tags / fields in XML data. For 
the following XML data, "array_field" is considered
+to be an array enclosing simple fields, whereas "simple_field" is 
considered to be a simple field not enclosed in
+an array.
+
+
+
+
+record
+  array_fieldcontent/array_field
+  array_fieldcontent/array_field
+  simple_fieldcontent/simple_field
+/record
+
+
+
+
+This record can be described by the following schema:
+
+
+
+
+{
+  "namespace": "nifi",
+  "name": "test",
+  "type": "record",
+  "fields": [
+{ "name": "array_field", "type":
+  { "type": "array", "items": string }
+},
+{ "name": "simple_field", "type": "string" }
+  ]
+}
+
+
+
+
+If a field in a schema is embedded in an array, the reader expects 
zero, one or more occurrences of the field
+in a record. The field "array_field" principally also could be 
defined as a simple field, but then the second occurrence
+of this field would replace the first in the record object. 
Moreover, the field "simple_field" could also be defined
+as an array. In this case, the reader would put it into the record 
object as an array with one element.
+
+
+Example 3: Tags with Attributes
+
+
+XML fields frequently not only contain content, but also 
attributes. The following record contains a field with
+an attribute "attr" and content:
+
+
+
+
+record
+  field_with_attribute attr="attr_content"content 
of field/field_with_attribute
+   

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421442#comment-16421442
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r178437600
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
+"XML data, embedded in an enclosing root tag.")
+public class XMLReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+public static final PropertyDescriptor VALIDATE_ROOT_TAG = new 
PropertyDescriptor.Builder()
+.name("validate_root_tag")
+.displayName("Validate Root Tag")
+.description("If this property is set, the name of root tags 
(e. g. ...) of incoming FlowFiles will be 
evaluated against this value. " +
+"In the case of a mismatch, an exception is thrown. 
The treatment of such FlowFiles depends on the implementation " +
+"of respective Processors.")
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+.required(false)
+.build();
+
+public static final PropertyDescriptor VALIDATE_RECORD_TAG = new 
PropertyDescriptor.Builder()
--- End diff --

Same here (and for the next properties).


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421440#comment-16421440
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r178437593
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/xml/XMLReader.java
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.xml;
+
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Tags({"xml", "record", "reader", "parser"})
+@CapabilityDescription("Reads XML content and creates Record objects. 
Records are expected in the second level of " +
+"XML data, embedded in an enclosing root tag.")
+public class XMLReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+public static final PropertyDescriptor VALIDATE_ROOT_TAG = new 
PropertyDescriptor.Builder()
--- End diff --

Could this property supports expression language against incoming flow 
files? I don't think that's an easy change (and it could introduce a perf hit) 
but that would allow using the same reader for completely different XML inputs.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421443#comment-16421443
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2587#discussion_r178437093
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/resources/docs/org.apache.nifi.xml.XMLReader/additionalDetails.html
 ---
@@ -0,0 +1,378 @@
+
+
+
+
+
+XMLReader
+
+
+
+
+
+The XMLReader Controller Service reads XML content and creates 
Record objects. The Controller Service
+must be configured with a schema that describes the structure of 
the XML data. Fields in the XML data
+that are not defined in the schema will be skipped.
+
+
+Records are expected in the second level of the XML data, embedded 
within an enclosing root tag:
+
+
+
+root
+  record
+field1content/field1
+field2content/field2
+  /record
+  record
+field1content/field1
+field2content/field2
+  /record
+/root
+
+
+
+
+For the following examples, it is assumed that the exemplary 
records are enclosed by a root tag.
+
+
+Example 1: Simple Fields
+
+
+The simplest kind of data within XML data are tags / fields only 
containing content (no attributes, no embedded tags).
+They can be described in the schema by simple types (e. g. INT, 
STRING, ...).
+
+
+
+
+record
+  simple_fieldcontent/simple_field
+/record
+
+
+
+
+This record can be described by a schema containing one field (e. 
g. of type string). By providing this schema,
+the reader expects zero or one occurrences of "simple_field" in 
the record.
+
+
+
+
+{
+  "namespace": "nifi",
+  "name": "test",
+  "type": "record",
+  "fields": [
+{ "name": "simple_field", "type": "string" }
+  ]
+}
+
+
+
+Example 2: Arrays with Simple Fields
+
+
+Arrays are considered as repetitive tags / fields in XML data. For 
the following XML data, "array_field" is considered
+to be an array enclosing simple fields, whereas "simple_field" is 
considered to be a simple field not enclosed in
+an array.
+
+
+
+
+record
+  array_fieldcontent/array_field
+  array_fieldcontent/array_field
+  simple_fieldcontent/simple_field
+/record
+
+
+
+
+This record can be described by the following schema:
+
+
+
+
+{
+  "namespace": "nifi",
+  "name": "test",
+  "type": "record",
+  "fields": [
+{ "name": "array_field", "type":
+  { "type": "array", "items": string }
+},
+{ "name": "simple_field", "type": "string" }
+  ]
+}
+
+
+
+
+If a field in a schema is embedded in an array, the reader expects 
zero, one or more occurrences of the field
+in a record. The field "array_field" principally also could be 
defined as a simple field, but then the second occurrence
+of this field would replace the first in the record object. 
Moreover, the field "simple_field" could also be defined
+as an array. In this case, the reader would put it into the record 
object as an array with one element.
+
+
+Example 3: Tags with Attributes
+
+
+XML fields frequently not only contain content, but also 
attributes. The following record contains a field with
+an attribute "attr" and content:
+
+
+
+
+record
+  field_with_attribute attr="attr_content"content 
of field/field_with_attribute
+   

[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419507#comment-16419507
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2587
  
Thanks for pinging me @JohannesDaniel - I'll also try to have a look over 
the WE / next week. It'd be a great addition!


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418054#comment-16418054
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@JohannesDaniel thanks for the contribution! This is definitely something 
that I've been wanting to implement for a while but haven't had a chance yet. 
Will be happy to review.


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416274#comment-16416274
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

Github user JohannesDaniel commented on the issue:

https://github.com/apache/nifi/pull/2587
  
@pvillard31 here we go :)


> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416226#comment-16416226
 ] 

ASF GitHub Bot commented on NIFI-4185:
--

GitHub user JohannesDaniel opened a pull request:

https://github.com/apache/nifi/pull/2587

NIFI-4185 Add XML Record Reader

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JohannesDaniel/nifi NIFI-4185

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2587


commit 9b4bd0dd8f1d30bfe1597d4cd069df414eb968a0
Author: JohannesDaniel 
Date:   2018-03-06T23:02:43Z

Add XML Record Reader




> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-17 Thread Johannes Peter (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403399#comment-16403399
 ] 

Johannes Peter commented on NIFI-4185:
--

Hi [~pvillard],

for this reader I have not planned to require an XSD schema. My intention is 
that it can be configured in the same way like readers of other formats. I 
therefore translate Avro definitions to XML structures that are expected by the 
reader. Generally, the reader expects an array containing zero, one or more 
records. I use StAX as its pulling logic suits well to the record-lookup 
requirement.

BTW: Do you have an idea which XML structure the reader could expect when users 
define a map in their schema? Maybe something like this?

{code}




content
content or object


...



{code}

> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-16 Thread Pierre Villard (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402528#comment-16402528
 ] 

Pierre Villard commented on NIFI-4185:
--

Hi [~jope],

I'm really interested by this work and will be happy to have a look when you 
submit a PR. A quick question: are you going to require an XSD schema in your 
reader? or will you infer the schema by reading the input XML data? (XSD would 
be cool because it allows much more specifications than an Avro schema)

I'm asking because one of the issue if you don't have an XSD is to make a 
difference between an array of 1 record (that could be translated into a single 
record) and an array of multiple records.

Example: I want the two below inputs


content


and


content


content


to be converted with the same output schema having an array of records.

Does it make sense?

> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Assignee: Johannes Peter
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2018-03-11 Thread Johannes Peter (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487
 ] 

Johannes Peter commented on NIFI-4185:
--

[~alopresto]:
 Started implementing an XML Record Reader. Shall I create a separate ticket 
for this?

Similar to the JSON readers, the XML reader will expect either a single record 
(e. g. content ... ) or an array of 
records (e. g. content ... 
 ... )

The reader will be aligned with common transformators. "Normal" fields (e. g. 
String, Integer) can be described by simple key-value pairs:

XML definition


content
123



Schema definition
{
"name": "testschema",
"namespace": "nifi",
"type": "record",
"fields": [
{ "name": "field1", "type": "string" },
{ "name": "field2", "type": "int" }
]
}

Parsing of attributes or nested fields require the definition of nested records 
and a field name for the content (optional, a prefix for attributes can be 
defined):
Property: CONTENT_FIELD=content_field
Property: ATTRIBUTE_PREFIX=attr.

XML definition


some text

some nested text
some other nested text




Schema definition
{  
"name": "testschema",
"namespace": "nifi",
"type": "record",
"fields": [
{
"name": "field1", 
"type": {
"name": "NestedRecord",
"type": "record",
"fields" : [
{"name": "attr.attribute", "type": 
"string"},
{"name": "content_field", "type": 
"string"}
]
}
},
{
"name": "field2", 
"type": {
"name": "NestedRecord",
"type": "record",
"fields" : [
{"name": "attr.attribute", "type": 
"string"},
{"name": "nested1", "type": "string"},
{"name": "nested2", "type": "string"}
]
}
}
]
}

What do you say?

> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>Priority: Major
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4185) Add XML record reader & writer services

2017-07-14 Thread Andy LoPresto (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088201#comment-16088201
 ] 

Andy LoPresto commented on NIFI-4185:
-

With the new record model, the prior approach is OBE. 

> Add XML record reader & writer services
> ---
>
> Key: NIFI-4185
> URL: https://issues.apache.org/jira/browse/NIFI-4185
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Andy LoPresto
>  Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)