Tamas Nemeth created GOBBLIN-231:
------------------------------------

             Summary: Grok to Json Converter
                 Key: GOBBLIN-231
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-231
             Project: Apache Gobblin
          Issue Type: New Feature
          Components: gobblin-core
            Reporter: Tamas Nemeth
            Assignee: Abhishek Tiwari
            Priority: Minor


Converter can convert text to json base on a GROK pattern.
GrokToJsonConverter accepts already deserialized text row, String.

Converts Text to JSON based on Grok pattern. Schema is represented by the form 
of JsonArray same interface being used by CsvToJonConverter.
Each text record is represented by a String.
The converter only supports Grok patterns where every group is named because it 
uses the group names as column names.

The following config properties can be set:

The grok pattern to use for the conversion:
converter.grok_to_json.pattern=^%{IPORHOST:clientip} (?:-|%{USER:ident}) 
(?:-|%{USER:auth}) \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} 
%{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|-)\" %{NUMBER:response} 
(?:-|%{NUMBER:bytes})

Path to the grok patterns (if not set it will use the default ones):
converter.grok_to_json.patterns=/tmp/grok_patterns

Treat empty string as null value:

converter.grok_to_json.empty_as_null=true

Specify the null string:
converter.grok_to_json.null_string=null

Example of schema:

 [
  {
    "columnName": "Day",
    "comment": "",
    "isNullable": "true",
    "dataType": {
      "type": "string"
    }
  },
  {
    "columnName": "Pageviews",
    "comment": "",
    "isNullable": "true",
    "dataType": {
      "type": "long"
    }
  }
]





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to