Never mind. Stupid me. I load config from property files and there is extra space there. Thanks, Chen
On Fri, Jan 10, 2014 at 5:47 PM, Chen Wang <[email protected]>wrote: > I tried to telnet, also get connection refused: > telnet localhost 4141 > Trying ::1... > telnet: connect to address ::1: Connection refused > Trying 127.0.0.1... > telnet: connect to address 127.0.0.1: Connection refused > > > On Fri, Jan 10, 2014 at 5:15 PM, Chen Wang <[email protected]>wrote: > >> Hey guys, >> I think i still need some help on the custom flume client. Right now I >> have finished the Avro sink client in my storm bolt. On a test machine, i >> started a flume agent: >> StormAgent.sources = avro >> StormAgent.channels = MemChannel >> StormAgent.sinks = HDFS >> >> StormAgent.sources.avro.type = avro >> StormAgent.sources.avro.channels = MemChannel >> StormAgent.sources.avro.bind = localhost >> StormAgent.sources.avro.port = 10001 >> >> I assume this will automatically wait on the localhost:10001? >> >> When I run my LoadBalancingRpcClient. on the same machine, I receive >> connection refused exception: >> org.apache.flume.FlumeException: NettyAvroRpcClient { host: localhost, >> port: 10001 }: RPC connection error >> at >> org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:161) >> at >> org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:115) >> at >> org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:590) >> at >> org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88) >> at >> org.apache.flume.api.LoadBalancingRpcClient.createClient(LoadBalancingRpcClient.java:214) >> at >> org.apache.flume.api.LoadBalancingRpcClient.getClient(LoadBalancingRpcClient.java:197) >> at >> org.apache.flume.api.LoadBalancingRpcClient.append(LoadBalancingRpcClient.java:71) >> at >> com.walmartlabs.targeting.storm.bolt.HubbleStreamAvroSinkBolt.execute(HubbleStreamAvroSinkBolt.java:89) >> at >> backtype.storm.daemon.executor$fn__4050$tuple_action_fn__4052.invoke(executor.clj:566) >> at >> backtype.storm.daemon.executor$mk_task_receiver$fn__3976.invoke(executor.clj:345) >> at >> backtype.storm.disruptor$clojure_handler$reify__1606.onEvent(disruptor.clj:43) >> at >> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:84) >> at >> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58) >> at >> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) >> at >> backtype.storm.daemon.executor$fn__4050$fn__4059$fn__4106.invoke(executor.clj:658) >> at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377) >> at clojure.lang.AFn.run(AFn.java:24) >> at java.lang.Thread.run(Thread.java:662) >> Caused by: java.io.IOException: Error connecting to localhost/ >> 127.0.0.1:10001 >> >> Is this still some config issue? I tried ip address as well, but with the >> same error. I am this close now... >> Thank you for any help! >> Chen >> >> >> >> On Thu, Jan 9, 2014 at 10:09 PM, Chen Wang <[email protected]>wrote: >> >>> Ashish, >>> Interesting enough, i was initially doing 1 too, and had a working >>> version. But finally I give it up because in my bolt i have to flush to >>> hdfs either when data reaching certain size or a timer times out, which is >>> exactly what flume can offer. Also it has some complexity of grouping >>> entries within the same partition while with flume it is a piece of cake. >>> >>> Thank you so much for all you guys's input. It helped me a lot ! >>> Chen >>> >>> >>> >>> On Thu, Jan 9, 2014 at 10:00 PM, Ashish <[email protected]> wrote: >>> >>>> Got it! >>>> >>>> My first reaction was to use HDFS bolt to write data directly to HDFS, >>>> but couldn't find an implementation for the same. My knowledge is limited >>>> for Storm. >>>> If the data is already flowing through Storm, you got two options >>>> 1. Write a bolt to dump data to HDFS >>>> 2. Write a Flume bolt using RPC client as recommended in thread, and >>>> reuse Flume's capabilities. >>>> >>>> If you already have Flume installation running, #2 is quickest way of >>>> running. Otherwise also, installing and running Flume is like a walk in the >>>> park :) >>>> >>>> You can also watch related discussion on >>>> https://issues.apache.org/jira/browse/FLUME-1286. There is some good >>>> info in the JIRA. >>>> >>>> thanks >>>> ashish >>>> >>>> >>>> >>>> >>>> On Fri, Jan 10, 2014 at 11:08 AM, Chen Wang <[email protected] >>>> > wrote: >>>> >>>>> Ashish, >>>>> Since we already use storm for other real time processing, i thus want >>>>> to re utilize it. The biggest advantage for me of using storm in this case >>>>> is that i could use storm's spout to read from our socket server >>>>> continuously, and the storm framework can ensure it never stops. Meantime, >>>>> i can also easily filter out /translate the data in bolt before sending to >>>>> flume. For this piece of data stream, right now my first step is to get it >>>>> into hdfs, but will add real time processing soon. >>>>> Does that make sense to you? >>>>> Thanks, >>>>> Chen >>>>> >>>>> >>>>> On Thu, Jan 9, 2014 at 9:29 PM, Ashish <[email protected]>wrote: >>>>> >>>>>> Why do you need Storm? Are you doing any real time processing? If >>>>>> not, IMHO, avoid Storm. >>>>>> >>>>>> Can use something like this >>>>>> >>>>>> Socket -> Load Balanced RPC Client -> Flume Topology with HA >>>>>> >>>>>> What Application level protocol are you using at Socket level? >>>>>> >>>>>> >>>>>> On Fri, Jan 10, 2014 at 10:50 AM, Chen Wang < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Jeff, Joao, >>>>>>> Thanks for the pointer! >>>>>>> I think i am getting close here: >>>>>>> 1. set up a cluster of flume agent with redundancies, source as >>>>>>> avro, sink as HDFS. >>>>>>> 2 use storm(not quite necessary) to read from our socket server, >>>>>>> then in the bolt, using flume client (load balancing rpc client) to send >>>>>>> the event to the agent set up in step 1. >>>>>>> >>>>>>> Then I thus get all the benefit of storm and flume. Does this set up >>>>>>> look right to you? >>>>>>> thank you very much, >>>>>>> Chen >>>>>>> >>>>>>> >>>>>>> On Thu, Jan 9, 2014 at 8:58 PM, Joao Salcedo <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> Hi Chen, >>>>>>>> >>>>>>>> Maybe it would be worth checking this >>>>>>>> >>>>>>>> http://flume.apache.org/FlumeDeveloperGuide.html#loadbalancing-rpc-client >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Joao >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jan 10, 2014 at 3:50 PM, Jeff Lord <[email protected]>wrote: >>>>>>>> >>>>>>>>> Have you taken a look at the load balancing rpc client? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jan 9, 2014 at 8:43 PM, Chen Wang < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Jeff, >>>>>>>>>> I have read this ppt at the beginning, but didn't find solution >>>>>>>>>> to my user case. To simplify my case, I only have 1 data >>>>>>>>>> source(composed of >>>>>>>>>> 5 socket server) and i am looking for a fault tolerant deployment of >>>>>>>>>> flume, that can read from this single data source and sink to hdfs >>>>>>>>>> in fault >>>>>>>>>> tolerant mode: when one node dies, another flume node can pick up and >>>>>>>>>> continue; >>>>>>>>>> Thanks, >>>>>>>>>> Chen >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jan 9, 2014 at 7:49 PM, Jeff Lord <[email protected]>wrote: >>>>>>>>>> >>>>>>>>>>> Chen, >>>>>>>>>>> >>>>>>>>>>> Have you taken a look at this presentation on Planning and >>>>>>>>>>> Deploying Flume from ApacheCon? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://archive.apachecon.com/na2013/presentations/27-Wednesday/Big_Data/11:45-Mastering_Sqoop_for_Data_Transfer_for_Big_Data-Arvind_Prabhakar/Arvind%20Prabhakar%20-%20Planning%20and%20Deploying%20Apache%20Flume.pdf >>>>>>>>>>> >>>>>>>>>>> It may have the answers you need. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Jeff >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jan 9, 2014 at 7:24 PM, Chen Wang < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Saurabh. >>>>>>>>>>>> If that is the case, I am actually thinking about using storm >>>>>>>>>>>> spout to talk to our socket server so that the storm cluster can >>>>>>>>>>>> take care >>>>>>>>>>>> of the reading socket server part. Then in each storm node, start >>>>>>>>>>>> a flume >>>>>>>>>>>> agent, listening on a RPC port and write to HDFS(with fail over) >>>>>>>>>>>> .Then in >>>>>>>>>>>> the storm bolt, simply send the data to RPC so that flume can get >>>>>>>>>>>> it. >>>>>>>>>>>> How do you think of this setup? It takes care of both failover >>>>>>>>>>>> on the source(by storm) and on the sink(by flume) But It looks a >>>>>>>>>>>> little >>>>>>>>>>>> complicated for me. >>>>>>>>>>>> Chen >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jan 9, 2014 at 7:18 PM, Saurabh B < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Chen, >>>>>>>>>>>>> >>>>>>>>>>>>> I think Flume doesn't have a way to configure multiple sources >>>>>>>>>>>>> pointing to same data source. Of course you can do that, but you >>>>>>>>>>>>> will end >>>>>>>>>>>>> up with duplicate data. Flume offers fail over at the sink level. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jan 9, 2014 at 6:56 PM, Chen Wang < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Ok. so after more researching:) It seems that what i need is >>>>>>>>>>>>>> the failover for agent source, (not fail over for sink): >>>>>>>>>>>>>> If one agent dies, another same kind of agent will start >>>>>>>>>>>>>> running. >>>>>>>>>>>>>> Does flume support this scenario? >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Chen >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jan 9, 2014 at 3:12 PM, Chen Wang < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> After reading more docs, it seems that if I want to achieve >>>>>>>>>>>>>>> my goal, i have to do the following: >>>>>>>>>>>>>>> 1. Having one agent with the custom source running on one >>>>>>>>>>>>>>> node. This agent reads from those 5 socket server, and sink to >>>>>>>>>>>>>>> some kind of >>>>>>>>>>>>>>> sink(maybe another socket?) >>>>>>>>>>>>>>> 2. On another(or more) machines, setting up collectors that >>>>>>>>>>>>>>> read from the agent sink in 1, and sink to hdfs. >>>>>>>>>>>>>>> 3. Having a master node managing nodes in 1,2. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But it seems to be overskilled in my case: in 1, i can >>>>>>>>>>>>>>> already sink to hdfs. Since the data available at socket server >>>>>>>>>>>>>>> are much >>>>>>>>>>>>>>> faster than the data translation part. I want to be able to >>>>>>>>>>>>>>> later add more >>>>>>>>>>>>>>> nodes to do the translation job. so what is the correct setup? >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Chen >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jan 9, 2014 at 2:38 PM, Chen Wang < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Guys, >>>>>>>>>>>>>>>> In my environment, the client is 5 socket servers. Thus i >>>>>>>>>>>>>>>> wrote a custom source spawning 5 threads reading from each of >>>>>>>>>>>>>>>> them >>>>>>>>>>>>>>>> infinitely,and the sink is hdfs(hive table). The work fine by >>>>>>>>>>>>>>>> running flume-ng >>>>>>>>>>>>>>>> agent. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> But how can i deploy this in distributed mode(cluster)? I >>>>>>>>>>>>>>>> am confused about the 3 ties(agent,collector,storage) >>>>>>>>>>>>>>>> mentioned in the doc. >>>>>>>>>>>>>>>> Does it apply to my case? How can I separate my >>>>>>>>>>>>>>>> agent/collect/storage? >>>>>>>>>>>>>>>> Apparently i can only have one agent running: multiple agent >>>>>>>>>>>>>>>> will result in >>>>>>>>>>>>>>>> getting duplicates from the socket server. But I want that if >>>>>>>>>>>>>>>> one agent >>>>>>>>>>>>>>>> dies, other agent can take it up. I would also like to be able >>>>>>>>>>>>>>>> to add >>>>>>>>>>>>>>>> horizontal scalability for writing to hdfs. How can I achieve >>>>>>>>>>>>>>>> all this? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> thank you very much for your advice. >>>>>>>>>>>>>>>> Chen >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Mailing List Archives, >>>>>>>>>>>>> QnaList.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> thanks >>>>>> ashish >>>>>> >>>>>> Blog: http://www.ashishpaliwal.com/blog >>>>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> thanks >>>> ashish >>>> >>>> Blog: http://www.ashishpaliwal.com/blog >>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal >>>> >>> >>> >> >
