Here is some material to get started with morphlines:

http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/index.html

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#/addValues

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#/generateUUID

Wolfgang.

On Oct 30, 2013, at 6:53 PM, Ashish wrote:

> George,
> 
> Just to get things working, you can use UUID Interceptor 
> http://flume.apache.org/FlumeUserGuide.html#uuid-interceptor
> 
> Put the headerName field value as rowKey and the code should work. I have not 
> used this, but if it still doesn't work let us know. I will quickly hack out 
> a working example.
> 
> 
> On Thu, Oct 31, 2013 at 1:22 AM, George Pang <[email protected]> wrote:
> Thank you, but I am not so sure I can insert header with the example in this 
> blog. I miss a part for the whole picture. 
> 
> George
> 
> 
> On Wed, Oct 30, 2013 at 6:56 AM, Brock Noland <[email protected]> wrote:
> I just googled and found this. Not sure if there is a better one.
> 
> http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/
> 
> 
> On Wed, Oct 30, 2013 at 12:34 AM, George Pang <[email protected]> wrote:
> Is there a tutorial for this topic out there? 
> 
> Thanks, 
> 
> George
> 
> 
> On Tue, Oct 29, 2013 at 6:50 PM, George Pang <[email protected]> wrote:
> Hi Brock, 
> 
> The morphline comand addValue looks like the one I need, but how can I add 
> the event head key-value pair?
> 
> Thank you, 
> 
> George
> 
> 
> On Tue, Oct 29, 2013 at 1:02 PM, George Pang <[email protected]> wrote:
> Hi Brock, 
> 
> Yes, I think morphline interceptor should be something I am looking for. I am 
> studying it now. 
> 
> Thank you, 
> 
> George
> 
> 
> On Tue, Oct 29, 2013 at 12:56 PM, Brock Noland <[email protected]> wrote:
> In a very simple demo you could use the static interceptor:
> http://flume.apache.org/FlumeUserGuide.html#static-interceptor
> 
> but you probably want to use morphlines interceptor a custom interceptor:
> http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor
> 
> 
> On Tue, Oct 29, 2013 at 2:52 PM, Hari Shreedharan <[email protected]> 
> wrote:
> Nope. You need to insert it at some other location. 
> 
> 
> Thanks,
> Hari
> 
> On Tuesday, October 29, 2013 at 12:48 PM, George Pang wrote:
> 
>> Hi Hari, 
>> 
>> Is it (inserting a rowKey header into event) something I can do in 
>> flume.conf? I tried to do that but I am new to flume. 
>> 
>> Thank you, 
>> 
>> George
>> 
>> 
>> On Tue, Oct 29, 2013 at 12:40 PM, Hari Shreedharan 
>> <[email protected]> wrote:
>>> Did you insert a rowKey header into the event? If the header is not there, 
>>> you are obviously going to get null returned from 
>>> currentEvent.getHeaders().get(“rowKey”). You need to insder the header into 
>>> the event at some point.
>>> 
>>> 
>>> Thanks,
>>> Hari
>>> 
>>> On Tuesday, October 29, 2013 at 12:30 PM, George Pang wrote:
>>> 
>>>> Hi Ashish, 
>>>> 
>>>> Actually it starts with headers. In the example code has "  String 
>>>> rowKeyStr = currentEvent.getHeaders().get("rowKey");" but there is no such 
>>>> header found. If I get rid of this line, the rest will complain unable to 
>>>> deliver event. But I checked the event, it's not null. 
>>>> 
>>>> I am trying to use flume to save to hbase, and use the example 
>>>> http://blog.cloudera.com/blog/2012/11/streaming-data-into-apache-hbase-using-apache-flume/
>>>>  for customized serializer.  
>>>> 
>>>> flume.conf:
>>>> 
>>>> logger-agent.sources = Syslog-UDP
>>>> logger-agent.sinks = Syslog-HBase
>>>> logger-agent.channels = Syslog-HBase-Channel
>>>> 
>>>> logger-agent.sources.Syslog-UDP.channels = Syslog-HBase-Channel
>>>> logger-agent.sinks.Syslog-HBase.channel = Syslog-HBase-Channel
>>>> 
>>>> logger-agent.sources.Syslog-UDP.type = syslogudp
>>>> logger-agent.sources.Syslog-UDP.port = 5140
>>>> logger-agent.sources.Syslog-UDP.host = localhost
>>>> 
>>>> logger-agent.sinks.Syslog-HBase.type = 
>>>> org.apache.flume.sink.hbase.AsyncHBaseSink
>>>> logger-agent.sinks.Syslog-HBase.table = syslog2
>>>> logger-agent.sinks.Syslog-HBase.columnFamily = cluster
>>>> logger-agent.sinks.Syslog-HBase.serializer.payloadColumn = dev
>>>> logger-agent.sinks.Syslog-HBase.serializer.incrementColumn = icol
>>>> logger-agent.sinks.Syslog-HBase.serializer.columns = forum,inbound,outbound
>>>> logger-agent.sinks.Syslog-HBase.batchSize = 5000
>>>> logger-agent.sinks.Syslog-HBase.serializer = 
>>>> org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
>>>> 
>>>> logger-agent.channels.Syslog-HBase-Channel.type = memory 
>>>> 
>>>> 
>>>> Flume version: 1.4
>>>> 
>>>> org.apache.flume.FlumeException: No row key found in headers!
>>>>     at com.ib.SplittingSerializer.setEvent(SplittingSerializer.java:43)
>>>>     at 
>>>> org.apache.flume.sink.hbase.AsyncHBaseSink.process(AsyncHBaseSink.java:184)
>>>>     at 
>>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>>     at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>>     at java.lang.Thread.run(Thread.java:662)
>>>> 
>>>> Thank you, 
>>>> 
>>>> George
>>>> 
>>>> 
>>>> 
>>>> On Tue, Oct 29, 2013 at 2:29 AM, Ashish <[email protected]> wrote:
>>>>> George,
>>>>> 
>>>>> Can you share more details about what you are trying to achieve? If 
>>>>> possible, please share Flume version, Agent configuration and exception 
>>>>> stacktrace.
>>>>> You may also look at HBase Sink for more info 
>>>>> http://flume.apache.org/FlumeUserGuide.html#hbasesinks
>>>>> 
>>>>> 
>>>>> On Tue, Oct 29, 2013 at 2:50 PM, George Pang <[email protected]> wrote:
>>>>>> I use the serializer example in this blog post: 
>>>>>> http://blog.cloudera.com/blog/2012/11/streaming-data-into-apache-hbase-using-apache-flume/
>>>>>> 
>>>>>> but got "Unable to deliver event. Exception follows. 
>>>>>> java.lang.NullPointerException". From looking it up in forums, I think 
>>>>>> it may be caused by empty header. If so, how is a timestamp header is 
>>>>>> added? if not what cause the event undelivery to happen? 
>>>>>> 
>>>>>> Thank you, 
>>>>>> 
>>>>>> George
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> thanks
>>>>> ashish
>>>>> 
>>>>> Blog: http://www.ashishpaliwal.com/blog
>>>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>>> 
>>> 
>> 
> 
> 
> 
> 
> -- 
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
> 
> 
> 
> 
> 
> 
> -- 
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
> 
> 
> 
> 
> -- 
> thanks
> ashish
> 
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal

Reply via email to