whsoul created KAFKA-9436: ----------------------------- Summary: New Kafka Connect SMT for plainText => Struct(or Map) Key: KAFKA-9436 URL: https://issues.apache.org/jira/browse/KAFKA-9436 Project: Kafka Issue Type: Improvement Components: KafkaConnect Reporter: whsoul
I'd like to parse and convert plain text rows to struct(or map) data, and load into documented database such as mongoDB, elasticSearch, etc... with SMT For example plain text apache log {code:java} "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36\"" {code} SMT connect config with regular expression below can easily transform a plain text to struct (or map) data. {code:java} "transforms": "TimestampTopic, RegexTransform", "transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value", "transforms.RegexTransform.struct.field": "message", "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\"" "transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent" {code} I have PR about this -- This message was sent by Atlassian Jira (v8.3.4#803005)