I just googled and found this. Not sure if there is a better one. http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/
On Wed, Oct 30, 2013 at 12:34 AM, George Pang <[email protected]> wrote: > Is there a tutorial for this topic out there? > > Thanks, > > George > > > On Tue, Oct 29, 2013 at 6:50 PM, George Pang <[email protected]> wrote: > >> Hi Brock, >> >> The morphline comand addValue looks like the one I need, but how can I >> add the event head key-value pair? >> >> Thank you, >> >> George >> >> >> On Tue, Oct 29, 2013 at 1:02 PM, George Pang <[email protected]> wrote: >> >>> Hi Brock, >>> >>> Yes, I think morphline interceptor should be something I am looking for. >>> I am studying it now. >>> >>> Thank you, >>> >>> George >>> >>> >>> On Tue, Oct 29, 2013 at 12:56 PM, Brock Noland <[email protected]>wrote: >>> >>>> In a very simple demo you could use the static interceptor: >>>> http://flume.apache.org/FlumeUserGuide.html#static-interceptor >>>> >>>> but you probably want to use morphlines interceptor a custom >>>> interceptor: >>>> http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor >>>> >>>> >>>> On Tue, Oct 29, 2013 at 2:52 PM, Hari Shreedharan < >>>> [email protected]> wrote: >>>> >>>>> Nope. You need to insert it at some other location. >>>>> >>>>> >>>>> Thanks, >>>>> Hari >>>>> >>>>> On Tuesday, October 29, 2013 at 12:48 PM, George Pang wrote: >>>>> >>>>> Hi Hari, >>>>> >>>>> Is it (inserting a rowKey header into event) something I can do in >>>>> flume.conf? I tried to do that but I am new to flume. >>>>> >>>>> Thank you, >>>>> >>>>> George >>>>> >>>>> >>>>> On Tue, Oct 29, 2013 at 12:40 PM, Hari Shreedharan < >>>>> [email protected]> wrote: >>>>> >>>>> Did you insert a rowKey header into the event? If the header is not >>>>> there, you are obviously going to get null returned from >>>>> currentEvent.getHeaders().get(“rowKey”). You need to insder the header >>>>> into >>>>> the event at some point. >>>>> >>>>> >>>>> Thanks, >>>>> Hari >>>>> >>>>> On Tuesday, October 29, 2013 at 12:30 PM, George Pang wrote: >>>>> >>>>> Hi Ashish, >>>>> >>>>> Actually it starts with headers. In the example code has " String >>>>> rowKeyStr = currentEvent.getHeaders().get("rowKey");" but there is no such >>>>> header found. If I get rid of this line, the rest will complain unable to >>>>> deliver event. But I checked the event, it's not null. >>>>> >>>>> I am trying to use flume to save to hbase, and use the example >>>>> http://blog.cloudera.com/blog/2012/11/streaming-data-into-apache-hbase-using-apache-flume/for >>>>> customized serializer. >>>>> >>>>> flume.conf: >>>>> >>>>> logger-agent.sources = Syslog-UDP >>>>> logger-agent.sinks = Syslog-HBase >>>>> logger-agent.channels = Syslog-HBase-Channel >>>>> >>>>> logger-agent.sources.Syslog-UDP.channels = Syslog-HBase-Channel >>>>> logger-agent.sinks.Syslog-HBase.channel = Syslog-HBase-Channel >>>>> >>>>> logger-agent.sources.Syslog-UDP.type = syslogudp >>>>> logger-agent.sources.Syslog-UDP.port = 5140 >>>>> logger-agent.sources.Syslog-UDP.host = localhost >>>>> >>>>> logger-agent.sinks.Syslog-HBase.type = org.apache.flume.sink.hbase. >>>>> AsyncHBaseSink >>>>> logger-agent.sinks.Syslog-HBase.table = syslog2 >>>>> logger-agent.sinks.Syslog-HBase.columnFamily = cluster >>>>> logger-agent.sinks.Syslog-HBase.serializer.payloadColumn = dev >>>>> logger-agent.sinks.Syslog-HBase.serializer.incrementColumn = icol >>>>> logger-agent.sinks.Syslog-HBase.serializer.columns = >>>>> forum,inbound,outbound >>>>> logger-agent.sinks.Syslog-HBase.batchSize = 5000 >>>>> logger-agent.sinks.Syslog-HBase.serializer = >>>>> org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer >>>>> >>>>> logger-agent.channels.Syslog-HBase-Channel.type = memory >>>>> >>>>> >>>>> Flume version: 1.4 >>>>> >>>>> org.apache.flume.FlumeException: No row key found in headers! >>>>> at com.ib.SplittingSerializer.setEvent(SplittingSerializer.java:43) >>>>> at >>>>> org.apache.flume.sink.hbase.AsyncHBaseSink.process(AsyncHBaseSink.java:184) >>>>> at >>>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) >>>>> at >>>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) >>>>> at java.lang.Thread.run(Thread.java:662) >>>>> >>>>> Thank you, >>>>> >>>>> George >>>>> >>>>> >>>>> >>>>> On Tue, Oct 29, 2013 at 2:29 AM, Ashish <[email protected]>wrote: >>>>> >>>>> George, >>>>> >>>>> Can you share more details about what you are trying to achieve? If >>>>> possible, please share Flume version, Agent configuration and exception >>>>> stacktrace. >>>>> You may also look at HBase Sink for more info >>>>> http://flume.apache.org/FlumeUserGuide.html#hbasesinks >>>>> >>>>> >>>>> On Tue, Oct 29, 2013 at 2:50 PM, George Pang <[email protected]> wrote: >>>>> >>>>> I use the serializer example in this blog post: >>>>> http://blog.cloudera.com/blog/2012/11/streaming-data-into-apache-hbase-using-apache-flume/ >>>>> >>>>> but got "Unable to deliver event. Exception follows. >>>>> java.lang.NullPointerException". From looking it up in forums, I think it >>>>> may be caused by empty header. If so, how is a timestamp header is added? >>>>> if not what cause the event undelivery to happen? >>>>> >>>>> Thank you, >>>>> >>>>> George >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> thanks >>>>> ashish >>>>> >>>>> Blog: http://www.ashishpaliwal.com/blog >>>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org >>>> >>> >>> >> > -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
