[jira] [Commented] (NIFI-12426) Support microseconds in RegexDateTimeMatcher
[ https://issues.apache.org/jira/browse/NIFI-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790794#comment-17790794 ] David Handermann commented on NIFI-12426: - Thanks for providing the JoltTransform as a potential option, that is a helpful point of reference for existing flows. Although DateTimeFormatter is largely compatible with SimpleDateFormat expressions, there are a few subtle differences, which was one of the reasons for not moving to it directly on the 1.x branch. It might be possible to consider a way to opt-in to DateTimeFormatter, but it would take some evaluation to consider how complicated it would be to implement a conditional approach. There is still additional work to do on the main branch to move away from SimpleDateFormat in places, so the best strategy seems to be completing more of those changes and evaluating potential solutions for the 1.x branch at that time. Definitely open to other ideas on the best solution, as it is difficult to work around in cases where the input has microsecond or nanosecond precision as you highlighted. > Support microseconds in RegexDateTimeMatcher > > > Key: NIFI-12426 > URL: https://issues.apache.org/jira/browse/NIFI-12426 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Matt Burgess >Assignee: Matt Burgess >Priority: Major > Fix For: 1.25.0, 2.0.0 > > > If a timestamp in the input has microseconds and a RecordReader is using > Infer Schema, the data type will be inferred as a string rather than a > timestamp regardless of the Timestamp Format property in the reader. Although > SimpleDateFormat doesn't support microseconds, it is forgiving in the parsing > of a timestamp string and accepts ".SS" as a milliseconds format even > though the microseconds will not be honored. > However when inferring the schema, the input must also pass the > RegexDateTimeMatcher which checks that it "looks like" a timestamp and within > the legitimate length boundaries. This matcher enforces a 3-digit length of > milliseconds and will fail to match input with microseconds. This matcher > should accept 6 digits of fractional seconds and allow the other matchers to > proceed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12426) Support microseconds in RegexDateTimeMatcher
[ https://issues.apache.org/jira/browse/NIFI-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790792#comment-17790792 ] Matt Burgess commented on NIFI-12426: - Ugh I forgot it doesn't parse correctly. Are we going to backport the DateTimeFormatter stuff to 1.x? As a workaround the user will have to manually truncate the microseconds in the meantime. If you can configure a RecordReader to ensure it will ensure it will infer timestamps as strings (such as JsonTreeReader with no Timestamp Format set), you can use JoltTransform if you know the timestamp fields and length of the values using a spec such as: {{[ { "operation": "modify-overwrite-beta", "spec": { "ts_evt": "=substring(@(1,ts_evt),0,23)" } } ]}} or a ScriptedTransformRecord using a Regex to programmatically find all the timestamp fields with microseconds and truncate them. Then downstream you can use a Reader that infers using a Timestamp Format of milliseconds and it will infer the values as timestamps correctly. > Support microseconds in RegexDateTimeMatcher > > > Key: NIFI-12426 > URL: https://issues.apache.org/jira/browse/NIFI-12426 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Matt Burgess >Assignee: Matt Burgess >Priority: Major > Fix For: 1.25.0, 2.0.0 > > > If a timestamp in the input has microseconds and a RecordReader is using > Infer Schema, the data type will be inferred as a string rather than a > timestamp regardless of the Timestamp Format property in the reader. Although > SimpleDateFormat doesn't support microseconds, it is forgiving in the parsing > of a timestamp string and accepts ".SS" as a milliseconds format even > though the microseconds will not be honored. > However when inferring the schema, the input must also pass the > RegexDateTimeMatcher which checks that it "looks like" a timestamp and within > the legitimate length boundaries. This matcher enforces a 3-digit length of > milliseconds and will fail to match input with microseconds. This matcher > should accept 6 digits of fractional seconds and allow the other matchers to > proceed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12426) Support microseconds in RegexDateTimeMatcher
[ https://issues.apache.org/jira/browse/NIFI-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790765#comment-17790765 ] David Handermann commented on NIFI-12426: - Thanks for highlighting this issue [~mattyb149]. It is worth noting that the SimpleDateFormat does not round microseconds to milliseconds when parsing a timestamp string containing microsecond precision, and the actual timestamp is off by several seconds. For this reason, it may be better to avoid changing the RegexDateTimeMatcher until the SimpleDateFormat references have been changes to use the DateTimeFormatter, which supports nanosecond precision. Otherwise, passing the check could result in unexpected timestamp conversion. > Support microseconds in RegexDateTimeMatcher > > > Key: NIFI-12426 > URL: https://issues.apache.org/jira/browse/NIFI-12426 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Matt Burgess >Assignee: Matt Burgess >Priority: Major > Fix For: 1.25.0, 2.0.0 > > > If a timestamp in the input has microseconds and a RecordReader is using > Infer Schema, the data type will be inferred as a string rather than a > timestamp regardless of the Timestamp Format property in the reader. Although > SimpleDateFormat doesn't support microseconds, it is forgiving in the parsing > of a timestamp string and accepts ".SS" as a milliseconds format even > though the microseconds will not be honored. > However when inferring the schema, the input must also pass the > RegexDateTimeMatcher which checks that it "looks like" a timestamp and within > the legitimate length boundaries. This matcher enforces a 3-digit length of > milliseconds and will fail to match input with microseconds. This matcher > should accept 6 digits of fractional seconds and allow the other matchers to > proceed. -- This message was sent by Atlassian Jira (v8.20.10#820010)