[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/2587 This is now merged to master. Thanks again for the contribution! ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/2587 @JohannesDaniel that would be great! I think the Record attribute stuff will significantly improve how we are able to handle XML-based records. But I think the approach that you've taken here is a great first step and will be very beneficial to the community. I'd recommend going ahead with the XML Writer and then we can work in the attribute-related stuff later. ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user JohannesDaniel commented on the issue: https://github.com/apache/nifi/pull/2587 @markap14 thanks for the help with the patch file. I am fine with it :) If you have not planned to do that yourself, I would start implementing a XMLWriter as soon as this has been merged. Or should we first discuss your plan with the Record attributes you mentioned above? ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/2587 @JohannesDaniel this is great! I've been doing a good bit of testing to ensure that everything works as expected. I had just a few more comments, mostly around the descriptions in the property descriptor and the handling of invalid values for the "Expect Records as Array" property. To make it a little easier to understand, I'm just going to attach a PATCH file to the JIRA. Can you please look over the patch file? If you're okay with the patch file, then I'm ready to merge. Thanks again, you've put in a lot of work here to make this consistent with the current approaches and to provide a huge new capability for many users! ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user JohannesDaniel commented on the issue: https://github.com/apache/nifi/pull/2587 @markap14 - Added EL for record format property - Removed record tag validation ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user JohannesDaniel commented on the issue: https://github.com/apache/nifi/pull/2587 @markap14 thank you for the response. I will simply remove that record tag validation as there are indeed many ways to do that before the data is processed by this reader. There is one little corner case, I need to discuss: Assuming we have the following data: ``` value1... ``` If the reader is used with (coerce==true), the field "map_field" can be parsed by defining a map in the schema. The embedded key fields do not have to be defined, its values only have to be of the defined type for the map. If the reader is used with (coerce==false && dropUnknown==true), the reader will parse all fields that exist in the schema ignoring its type. However, the data above will not be parsable even if the map exists in the schema. In this case, the reader identifies "map_field" as a field that exists in the schema, but the reader is not aware that it is of type map. Therefore, the reader will not parse the embedded key fields, as they don't exist in the schema. The field "map_field" will be classified as an empty field and not added to the record. Furthermore, even if the reader is used with (coerce==false && dropUnknown==true), it will be type-aware to some extent. The reader first checks, whether fields exist in the schema. If that is the case, the reader additionally will check whether they are of type record (or of type array embedding records, respectively). If that is also the case, the reader will retrieve the subschema in order to be enabled to check whether subtags of the current tag are known. ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/2587 @JohannesDaniel thanks for the update! I commented above re: the use of Expression Language in the property descriptor. I do still feel like the check for 'record tag names' is unnecessary, as the reader should not be responsible for filtering the data but rather just for reading it. There already exist mechanisms for filtering the data (You could use PartitionRecord + RouteOnAttribute, ValidateRecord, or QueryRecord just off the top of my head to achieve this). Additionally, we have the Schema for the Record Reader. So if the element name matches the top-level Schema name (or one of them, if the top-level field is a UNION/CHOICE element), then we could use that. So, with your example above, if you only want to read the `` part, your schema should indicate that the top-level field name is `record`. In that case, it should filter out the `other` record. Does that make sense? ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user JohannesDaniel commented on the issue: https://github.com/apache/nifi/pull/2587 @markap14 @pvillard31 - I refactored some code as the cases (coerce==true && drop==false) and (coerce==false && drop==true) in some cases showed an unexpected behavior - Data like contentcontentcontent now can be parsed - Maps (e. g. value1value2) are now supported - The reader is now able to parse single records (e. g. ) as well as arrays of records (e. g. ). I added a property to make it configurable whether the reader shall expect a single record or an array. One question: As there are only two options for this, I defined AllowableValues for this property. Despite that, I think it would be reasonable to enable EL for this property. But how can this be realized? - I removed the root validation, but remained the check for record tag names in order to support processing data like this (tag "other" will be ignored if check for record tag name is activated) ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user JohannesDaniel commented on the issue: https://github.com/apache/nifi/pull/2587 @markap14 thank you for the comprehensive review. I will start refactoring the implementations with respect to the improvements that are clear. ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2587 Hey @JohannesDaniel - just made new tests and can't reproduce what I saw previously... so let's forget about it :) (thanks for the unit tests though!) I tested the EL support for tag validation, working well, thanks for the addition. I'd just make one suggestion: right now for the record tag, "in the case of a mismatch, the respective record will be skipped". I'd suggest to make this behavior configurable through a parameter in the CS: skip the record (current behavior) or throw an exception for the flow file (as you're doing for the root tag validation). Regarding the performance test, it was a GFF => ConvertRecord (XML to JSON) => UpdateAttribute. Don't remember the exact numbers but IIRC it was around 1300 flowfile per second with an XML payload of 6KB (single record in my case). Overall, it's really cool! I'll try to have a look over the code during the week. ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user JohannesDaniel commented on the issue: https://github.com/apache/nifi/pull/2587 Can you maybe post the XML that led to the empty record? ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user JohannesDaniel commented on the issue: https://github.com/apache/nifi/pull/2587 Hi @pvillard31 thank you for your comments! I realized all your suggestions. I like your news regarding the performance :-) Which kind of transformation did you test? XML => Record or XML => JSON (e. g. with ConvertRecord)? For any reason some tests disappeared for a certain commit at my local git (probably, I wanted to reorder the tests, but deleted them, omg ...). However, I inserted them again (this is why there are many more tests now). In addition, I adjusted the definition about how namespaces shall be treated. I implemented several tests for XMLReader to verify that the usage of expression language works as expected. However, I was not able to reproduce your observation regarding the empty record for the header ``` ``` I implemented the following tests: ``` testSimpleRecordWithHeader() testSimpleRecordWithHeaderNoValidation() ``` Actually, they work as expected. ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2587 Thanks for pinging me @JohannesDaniel - I'll also try to have a look over the WE / next week. It'd be a great addition! ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/2587 @JohannesDaniel thanks for the contribution! This is definitely something that I've been wanting to implement for a while but haven't had a chance yet. Will be happy to review. ---
[GitHub] nifi issue #2587: NIFI-4185 Add XML Record Reader
Github user JohannesDaniel commented on the issue: https://github.com/apache/nifi/pull/2587 @pvillard31 here we go :) ---