All, I have run Flume agents on a pusedo-distributed VM from Cloudera ingesting tweets from twitter. When I paste the same configuratons into the Flume section of Ambari I do not get any data from twitter. The screen in Ambari says the agents are running but when I go to the directory, I see no files:
[root@namenode PBX]# hadoop fs -ls /user/flume/tweets
[root@namenode PBX]# hadoop fs -ls /user/flume/tweets
[root@namenode PBX]# hadoop fs -ls /user/flume/tweets/
[root@namenode PBX]#
I have attached the cluster parameters in a PDF.
Here is the URL I am using to add the configuration to the Flume agents:
http://namenode.localdomain.com:8080/#/main/services/FLUME/configs
Here is the configuration for the twitter agent:
# defining the source for the agent for Twitter
TwitterAgent.sources.Twitter.type =
org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemoryChannel
TwitterAgent.sources.Twitter.consumerKey = (just removing for security)
TwitterAgent.sources.Twitter.accessToken = (removing)
TwitterAgent.sources.Twitter.accessTokenSecret =(removing)
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics,
bigdata, cloudera, data science, data scientist, business
intelligence, mapreduce, data warehouse, data warehousing, mahout,
hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sources.Twitter.maxBatchSize = 10
TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200
# defining the interceptors
TwitterAgent.sources.Twitter.interceptors = i1
TwitterAgent.sources.Twitter.interceptors.i1.type = timestamp
# defining the sink for the agent
TwitterAgent.sinks.HDFS.channel = MemoryChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/flume/tweets/%Y/%m/%d
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 100000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 6000
TwitterAgent.sinks.HDFS.hdfs.filePrefix = events-
# definning the channel for the agent
TwitterAgent.channels.MemoryChannel.type = memory
TwitterAgent.channels.MemoryChannel.capacity = 10000
TwitterAgent.channels.MemoryChannel.transactionCapacity = 10000
David Novogrodsky
[email protected]
http://www.linkedin.com/in/davidnovogrodsky
aMBARIsETuP.pdf
Description: Adobe PDF document
