Thank you, but I am not so sure I can insert header with the example in this blog. I miss a part for the whole picture.
George On Wed, Oct 30, 2013 at 6:56 AM, Brock Noland <[email protected]> wrote: > I just googled and found this. Not sure if there is a better one. > > > http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/ > > > On Wed, Oct 30, 2013 at 12:34 AM, George Pang <[email protected]> wrote: > >> Is there a tutorial for this topic out there? >> >> Thanks, >> >> George >> >> >> On Tue, Oct 29, 2013 at 6:50 PM, George Pang <[email protected]> wrote: >> >>> Hi Brock, >>> >>> The morphline comand addValue looks like the one I need, but how can I >>> add the event head key-value pair? >>> >>> Thank you, >>> >>> George >>> >>> >>> On Tue, Oct 29, 2013 at 1:02 PM, George Pang <[email protected]> wrote: >>> >>>> Hi Brock, >>>> >>>> Yes, I think morphline interceptor should be something I am looking >>>> for. I am studying it now. >>>> >>>> Thank you, >>>> >>>> George >>>> >>>> >>>> On Tue, Oct 29, 2013 at 12:56 PM, Brock Noland <[email protected]>wrote: >>>> >>>>> In a very simple demo you could use the static interceptor: >>>>> http://flume.apache.org/FlumeUserGuide.html#static-interceptor >>>>> >>>>> but you probably want to use morphlines interceptor a custom >>>>> interceptor: >>>>> http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor >>>>> >>>>> >>>>> On Tue, Oct 29, 2013 at 2:52 PM, Hari Shreedharan < >>>>> [email protected]> wrote: >>>>> >>>>>> Nope. You need to insert it at some other location. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Hari >>>>>> >>>>>> On Tuesday, October 29, 2013 at 12:48 PM, George Pang wrote: >>>>>> >>>>>> Hi Hari, >>>>>> >>>>>> Is it (inserting a rowKey header into event) something I can do in >>>>>> flume.conf? I tried to do that but I am new to flume. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> George >>>>>> >>>>>> >>>>>> On Tue, Oct 29, 2013 at 12:40 PM, Hari Shreedharan < >>>>>> [email protected]> wrote: >>>>>> >>>>>> Did you insert a rowKey header into the event? If the header is not >>>>>> there, you are obviously going to get null returned from >>>>>> currentEvent.getHeaders().get(“rowKey”). You need to insder the header >>>>>> into >>>>>> the event at some point. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Hari >>>>>> >>>>>> On Tuesday, October 29, 2013 at 12:30 PM, George Pang wrote: >>>>>> >>>>>> Hi Ashish, >>>>>> >>>>>> Actually it starts with headers. In the example code has " String >>>>>> rowKeyStr = currentEvent.getHeaders().get("rowKey");" but there is no >>>>>> such >>>>>> header found. If I get rid of this line, the rest will complain unable to >>>>>> deliver event. But I checked the event, it's not null. >>>>>> >>>>>> I am trying to use flume to save to hbase, and use the example >>>>>> http://blog.cloudera.com/blog/2012/11/streaming-data-into-apache-hbase-using-apache-flume/for >>>>>> customized serializer. >>>>>> >>>>>> flume.conf: >>>>>> >>>>>> logger-agent.sources = Syslog-UDP >>>>>> logger-agent.sinks = Syslog-HBase >>>>>> logger-agent.channels = Syslog-HBase-Channel >>>>>> >>>>>> logger-agent.sources.Syslog-UDP.channels = Syslog-HBase-Channel >>>>>> logger-agent.sinks.Syslog-HBase.channel = Syslog-HBase-Channel >>>>>> >>>>>> logger-agent.sources.Syslog-UDP.type = syslogudp >>>>>> logger-agent.sources.Syslog-UDP.port = 5140 >>>>>> logger-agent.sources.Syslog-UDP.host = localhost >>>>>> >>>>>> logger-agent.sinks.Syslog-HBase.type = org.apache.flume.sink.hbase. >>>>>> AsyncHBaseSink >>>>>> logger-agent.sinks.Syslog-HBase.table = syslog2 >>>>>> logger-agent.sinks.Syslog-HBase.columnFamily = cluster >>>>>> logger-agent.sinks.Syslog-HBase.serializer.payloadColumn = dev >>>>>> logger-agent.sinks.Syslog-HBase.serializer.incrementColumn = icol >>>>>> logger-agent.sinks.Syslog-HBase.serializer.columns = >>>>>> forum,inbound,outbound >>>>>> logger-agent.sinks.Syslog-HBase.batchSize = 5000 >>>>>> logger-agent.sinks.Syslog-HBase.serializer = >>>>>> org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer >>>>>> >>>>>> logger-agent.channels.Syslog-HBase-Channel.type = memory >>>>>> >>>>>> >>>>>> Flume version: 1.4 >>>>>> >>>>>> org.apache.flume.FlumeException: No row key found in headers! >>>>>> at >>>>>> com.ib.SplittingSerializer.setEvent(SplittingSerializer.java:43) >>>>>> at >>>>>> org.apache.flume.sink.hbase.AsyncHBaseSink.process(AsyncHBaseSink.java:184) >>>>>> at >>>>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) >>>>>> at >>>>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) >>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>> >>>>>> Thank you, >>>>>> >>>>>> George >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Oct 29, 2013 at 2:29 AM, Ashish <[email protected]>wrote: >>>>>> >>>>>> George, >>>>>> >>>>>> Can you share more details about what you are trying to achieve? If >>>>>> possible, please share Flume version, Agent configuration and exception >>>>>> stacktrace. >>>>>> You may also look at HBase Sink for more info >>>>>> http://flume.apache.org/FlumeUserGuide.html#hbasesinks >>>>>> >>>>>> >>>>>> On Tue, Oct 29, 2013 at 2:50 PM, George Pang <[email protected]>wrote: >>>>>> >>>>>> I use the serializer example in this blog post: >>>>>> http://blog.cloudera.com/blog/2012/11/streaming-data-into-apache-hbase-using-apache-flume/ >>>>>> >>>>>> but got "Unable to deliver event. Exception follows. >>>>>> java.lang.NullPointerException". From looking it up in forums, I think it >>>>>> may be caused by empty header. If so, how is a timestamp header is added? >>>>>> if not what cause the event undelivery to happen? >>>>>> >>>>>> Thank you, >>>>>> >>>>>> George >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> thanks >>>>>> ashish >>>>>> >>>>>> Blog: http://www.ashishpaliwal.com/blog >>>>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org >>>>> >>>> >>>> >>> >> > > > -- > Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org >
