Hello! I want to receive syslog, parse out the input using regex into fields (for example username, source IP, destination IP), and store the data in HBase into columns corresponding to those fields. I know how to do the syslog source, but how do I go about doing the extraction+storing?
My thoughts: 1. Can I use a Regex Extractor Interceptor to make my own serializer implementation that extracts data into multiple headers in the event? Then use the AsyncHBase sink serializer to simply store the header values into columns? Can I do that? 2. Should I pass the data to the AsyncHBase sink unaltered, and implement everything in the sink's serializer. It is worth noting that the input is in different formats, so my regex implementation isn't one simple regex and will probably contain a lot of ifs to, for example, extract the username because it won't always be in the same place in the log. Which approach is best, or is there another approach, or am I getting it wrong? - Alaa Ali
