[jira] [Commented] (FLUME-2851) Flume Source IgnorePattern multiple regex
[ https://issues.apache.org/jira/browse/FLUME-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046581#comment-15046581 ] Gonzalo Herreros commented on FLUME-2851: - Please use the distribution list for questions. You have to build a regular expression which you can do using an online tool to test it before pasting it to the Flume configuration. I guess what you are looking for is: (Pqr\.log)|(Xyz\.log) but I haven't tested it > Flume Source IgnorePattern multiple regex > - > > Key: FLUME-2851 > URL: https://issues.apache.org/jira/browse/FLUME-2851 > Project: Flume > Issue Type: Bug >Reporter: Shashikant Kulkarni > > As I understood from the documentation, I can specify the file pattern which > will be ignored by the flume agent. But when I tried it then I found that I > can specify only one file pattern which will be ignored. How can I specify > the value so that multiple files can be ignore by agent. For example, > I have a spool directory in which I have 3 different log files > Abc.log > Pqr.log > Xyz.log > Now I need to configure the agent source in such a way that it just reads the > file Abc.log and ignore Pqr.log and Xyz.log. So how do I specify this using > agent1.sources.source1.ignorePattern= > Please help me if I am missing on the correct regex for this. Is it a bug? > Thanks. > [~tomwhite]: Please help -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2818) Problems with Avro data and not Json and no data in HDFS
[ https://issues.apache.org/jira/browse/FLUME-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046590#comment-15046590 ] Gonzalo Herreros commented on FLUME-2818: - Please use the distribution list to ask for help, especially if you are new into Flume. I don't think the files are corrupted, avro is a binary format that looks like json. If you send to the user list your configuration and how you define your table, maybe somebody can help you. Otherwise it's impossible. > Problems with Avro data and not Json and no data in HDFS > > > Key: FLUME-2818 > URL: https://issues.apache.org/jira/browse/FLUME-2818 > Project: Flume > Issue Type: Request > Components: Sinks+Sources >Affects Versions: v1.5.2 > Environment: HDP-2.3.0.0-2557 Sandbox >Reporter: Kettler Karl >Priority: Critical > Fix For: v1.5.2 > > > Flume supplies twitter data in avro format and not in Json. > Why? > Flume Config Agent: > TwitterAgent.sources = Twitter > TwitterAgent.channels = MemChannel > TwitterAgent.sinks = HDFS > TwitterAgent.sources.Twitter.type = > org.apache.flume.source.twitter.TwitterSource > TwitterAgent.sources.Twitter.channels = MemChannel > TwitterAgent.sources.Twitter.consumerKey = xxx > TwitterAgent.sources.Twitter.consumerSecret = xxx > TwitterAgent.sources.Twitter.accessToken = xxx > TwitterAgent.sources.Twitter.accessTokenSecret = xxx > TwitterAgent.sources.Twitter.maxBatchSize = 10 > TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200 > TwitterAgent.sources.Twitter.keywords = United Nations > TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL > # HDFS Sink > TwitterAgent.sinks.HDFS.channel = MemChannel > TwitterAgent.sinks.HDFS.type = hdfs > TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S > TwitterAgent.sinks.HDFS.hdfs.filePrefix = events > TwitterAgent.sinks.HDFS.hdfs.round = true > TwitterAgent.sinks.HDFS.hdfs.roundValue = 5 > TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute > TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true > TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream > TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text > TwitterAgent.channels.MemChannel.type = memory > TwitterAgent.channels.MemChannel.capacity = 1000 > TwitterAgent.channels.MemChannel.transactionCapacity = 100 > Twitter Data from Flume: > Obj avro.schema� > {"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}�]3hˊى���|$656461386520784896� > �お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n� ユハズ > yhzz_(2015-10-20T13:26:05Z� はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める > https://t.co/ZpfDqw4l9g � http://twitter.com; > rel="nofollow">Twitter Web Client ^ > https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984� > no me veais ni noteis mi presencia no quiere decir que no os este observando > desde las sombras�� � JKP® BakasumaUserSinCausa(2015-10-20T13:26:06Z� RT > @NaiiVicious: @Lisi_Hattori @UserSinCausa https://t.co/M2LTJWwqae � http://twitter.com/download/android; rel="nofollow">Twitter for Android ^ > https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpg� > http://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|��� > By loading this twitter data into a HDFS table. It is not possible to convert > with avro-tools-1.7.7.jar. into Json. We get error message: "No data" > If we want to read this file we get following error message: > "java -jar avro-tools-1.7.7.jar tojson twitter.avro > twitter.json > Exception in thread "main" org.apache.avro.AvroRuntimeException: > java.io.EOFException" > I hope you could help us. > Kind regards, > Karl > > > Details > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLUME-2851) Flume Source IgnorePattern multiple regex
[ https://issues.apache.org/jira/browse/FLUME-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Kulkarni updated FLUME-2851: --- Description: As I understood from the documentation, I can specify the file pattern which will be ignored by the flume agent. But when I tried it then I found that I can specify only one file pattern which will be ignored. How can I specify the value so that multiple files can be ignore by agent. For example, I have a spool directory in which I have 3 different log files Abc.log Pqr.log Xyz.log Now I need to configure the agent source in such a way that it just reads the file Abc.log and ignore Pqr.log and Xyz.log. So how do I specify this using agent1.sources.source1.ignorePattern= Please help me if I am missing on the correct regex for this. Is it a bug? Thanks. [~tomwhite]: Please help was: As I understood from the documentation, I can specify the file pattern which will be ignored by the flume agent. But when I tried it then I found that I can specify only one file pattern which will be ignored. How can I specify the value so that multiple files can be ignore by agent. For example, I have a spool directory in which I have 3 different log files Abc.log Pqr.log Xyz.log Now I need to configure the agent source in such a way that it just reads the file Abc.log and ignore Pqr.log and Xyz.log. So how do I specify this using agent1.sources.source1.ignorePattern= Please help me if I am missing on the correct regex for this. Is it a bug? Thanks. > Flume Source IgnorePattern multiple regex > - > > Key: FLUME-2851 > URL: https://issues.apache.org/jira/browse/FLUME-2851 > Project: Flume > Issue Type: Bug >Reporter: Shashikant Kulkarni > > As I understood from the documentation, I can specify the file pattern which > will be ignored by the flume agent. But when I tried it then I found that I > can specify only one file pattern which will be ignored. How can I specify > the value so that multiple files can be ignore by agent. For example, > I have a spool directory in which I have 3 different log files > Abc.log > Pqr.log > Xyz.log > Now I need to configure the agent source in such a way that it just reads the > file Abc.log and ignore Pqr.log and Xyz.log. So how do I specify this using > agent1.sources.source1.ignorePattern= > Please help me if I am missing on the correct regex for this. Is it a bug? > Thanks. > [~tomwhite]: Please help -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2851) Flume Source IgnorePattern multiple regex
[ https://issues.apache.org/jira/browse/FLUME-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046677#comment-15046677 ] Shashikant Kulkarni commented on FLUME-2851: Thank you [~gherreros] for your comments. I will use the distribution list for questions like this in future. I will try and see if this kind if REGEX works or not. Thanks, Shashikant > Flume Source IgnorePattern multiple regex > - > > Key: FLUME-2851 > URL: https://issues.apache.org/jira/browse/FLUME-2851 > Project: Flume > Issue Type: Bug >Reporter: Shashikant Kulkarni > > As I understood from the documentation, I can specify the file pattern which > will be ignored by the flume agent. But when I tried it then I found that I > can specify only one file pattern which will be ignored. How can I specify > the value so that multiple files can be ignore by agent. For example, > I have a spool directory in which I have 3 different log files > Abc.log > Pqr.log > Xyz.log > Now I need to configure the agent source in such a way that it just reads the > file Abc.log and ignore Pqr.log and Xyz.log. So how do I specify this using > agent1.sources.source1.ignorePattern= > Please help me if I am missing on the correct regex for this. Is it a bug? > Thanks. > [~tomwhite]: Please help -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: FLUME-2719
I am getting DNS errors while trying to get to ASF jira right now. I will take a look later. Thanks, Hari On Tue, Dec 8, 2015 at 7:24 PM, Gonzalo Herreroswrote: > Hi, > > Could you have a look at this issue > https://issues.apache.org/jira/browse/FLUME-2719 and provide some > feedback/review on the patch? > > Thanks, > Gonzalo >
[jira] [Created] (FLUME-2852) Kafka Source/Sink should optionally read/write Avro Datums
Tristan Stevens created FLUME-2852: -- Summary: Kafka Source/Sink should optionally read/write Avro Datums Key: FLUME-2852 URL: https://issues.apache.org/jira/browse/FLUME-2852 Project: Flume Issue Type: Improvement Components: Sinks+Sources Affects Versions: v1.6.0 Reporter: Tristan Stevens Currently the Kafka Sink only writes the event body to Kafka rather than an Avro Datum. This works fine when being used with a Kafka Source, or when being used with Kafka Channel, however it does mean that any Flume headers are lost when transported via Kafka. Request is to implement an equivalent of the Kafka Channel's parseAsFlumeEvent parameter to the sink/source so that optionally Avro Datums can be read from and written to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2852) Kafka Source/Sink should optionally read/write Avro Datums
[ https://issues.apache.org/jira/browse/FLUME-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046919#comment-15046919 ] Jeff Holoman commented on FLUME-2852: - Just want to make sure I understand the issue as I'm working on all of the Kafka components as part of FLUME-2820. Are you suggesting to write Flume's AvroFlumeEvent as the message? Would you propose to persist the schema in the header? Can you help me understand your flow? > Kafka Source/Sink should optionally read/write Avro Datums > -- > > Key: FLUME-2852 > URL: https://issues.apache.org/jira/browse/FLUME-2852 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Tristan Stevens >Assignee: Jeff Holoman > > Currently the Kafka Sink only writes the event body to Kafka rather than an > Avro Datum. This works fine when being used with a Kafka Source, or when > being used with Kafka Channel, however it does mean that any Flume headers > are lost when transported via Kafka. > Request is to implement an equivalent of the Kafka Channel's > parseAsFlumeEvent parameter to the sink/source so that optionally Avro Datums > can be read from and written to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
FLUME-2719
Hi, Could you have a look at this issue https://issues.apache.org/jira/browse/FLUME-2719 and provide some feedback/review on the patch? Thanks, Gonzalo
[jira] [Assigned] (FLUME-2852) Kafka Source/Sink should optionally read/write Avro Datums
[ https://issues.apache.org/jira/browse/FLUME-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Holoman reassigned FLUME-2852: --- Assignee: Jeff Holoman > Kafka Source/Sink should optionally read/write Avro Datums > -- > > Key: FLUME-2852 > URL: https://issues.apache.org/jira/browse/FLUME-2852 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Tristan Stevens >Assignee: Jeff Holoman > > Currently the Kafka Sink only writes the event body to Kafka rather than an > Avro Datum. This works fine when being used with a Kafka Source, or when > being used with Kafka Channel, however it does mean that any Flume headers > are lost when transported via Kafka. > Request is to implement an equivalent of the Kafka Channel's > parseAsFlumeEvent parameter to the sink/source so that optionally Avro Datums > can be read from and written to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2852) Kafka Source/Sink should optionally read/write Avro Datums
[ https://issues.apache.org/jira/browse/FLUME-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046941#comment-15046941 ] Tristan Stevens commented on FLUME-2852: Yes, this would absolutely be the AvroFlumEvent and would be without the schema as in the Avro Sink / Avro Source pairing. Here is the code from the Kafka Channel: if (parseAsFlumeEvent) { if (!tempOutStream.isPresent()) { tempOutStream = Optional.of(new ByteArrayOutputStream()); } if (!writer.isPresent()) { writer = Optional.of(new SpecificDatumWriter(AvroFlumeEvent.class)); } tempOutStream.get().reset(); AvroFlumeEvent e = new AvroFlumeEvent( toCharSeqMap(event.getHeaders()), ByteBuffer.wrap(event.getBody())); encoder = EncoderFactory.get() .directBinaryEncoder(tempOutStream.get(), encoder); writer.get().write(e, encoder); // Not really possible to avoid this copy :( serializedEvents.get().add(tempOutStream.get().toByteArray()); } else { serializedEvents.get().add(event.getBody()); } The flow in this case is Syslog Source -> Memory Channel -> Kafka Sink -> Kafka Broker -> Kafka Source -> Memory Channel -> HDFS Sink Although in the future I'd like to make it: Syslog Source -> Kafka Channel -> Kafka Sink -> Kafka Broker -> Kafka Source -> Kafka Channel -> HDFS Sink N.B. The three tiers run in different sites. > Kafka Source/Sink should optionally read/write Avro Datums > -- > > Key: FLUME-2852 > URL: https://issues.apache.org/jira/browse/FLUME-2852 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Tristan Stevens >Assignee: Jeff Holoman > > Currently the Kafka Sink only writes the event body to Kafka rather than an > Avro Datum. This works fine when being used with a Kafka Source, or when > being used with Kafka Channel, however it does mean that any Flume headers > are lost when transported via Kafka. > Request is to implement an equivalent of the Kafka Channel's > parseAsFlumeEvent parameter to the sink/source so that optionally Avro Datums > can be read from and written to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2818) Problems with Avro data and not Json and no data in HDFS
[ https://issues.apache.org/jira/browse/FLUME-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047203#comment-15047203 ] Cord Thomas commented on FLUME-2818: Thank you Gonzalo, I would not typically think to do that as the norm is to first search a forum to see whether anyone has experienced and/or resolved your issue. Karl's seemed very similar to mine. I understand this is not a forum and apologize for the intrusion. Just to close this - if anyone else runs into this thread having the same experience as I: I have since resolved (at least notionally) my problem, understanding the sources, sinks and channels a little better. I had thought that the sinks wanted Avro as the standard and would magically transform in the sinking process back to a JSON structure. Understanding this is not the case, I have moved to using the old Cloudera TwitterSource implementation for the time being and will look to learn more about the "highly experimental" flume TwitterSource and Avro. > Problems with Avro data and not Json and no data in HDFS > > > Key: FLUME-2818 > URL: https://issues.apache.org/jira/browse/FLUME-2818 > Project: Flume > Issue Type: Request > Components: Sinks+Sources >Affects Versions: v1.5.2 > Environment: HDP-2.3.0.0-2557 Sandbox >Reporter: Kettler Karl >Priority: Critical > Fix For: v1.5.2 > > > Flume supplies twitter data in avro format and not in Json. > Why? > Flume Config Agent: > TwitterAgent.sources = Twitter > TwitterAgent.channels = MemChannel > TwitterAgent.sinks = HDFS > TwitterAgent.sources.Twitter.type = > org.apache.flume.source.twitter.TwitterSource > TwitterAgent.sources.Twitter.channels = MemChannel > TwitterAgent.sources.Twitter.consumerKey = xxx > TwitterAgent.sources.Twitter.consumerSecret = xxx > TwitterAgent.sources.Twitter.accessToken = xxx > TwitterAgent.sources.Twitter.accessTokenSecret = xxx > TwitterAgent.sources.Twitter.maxBatchSize = 10 > TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200 > TwitterAgent.sources.Twitter.keywords = United Nations > TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL > # HDFS Sink > TwitterAgent.sinks.HDFS.channel = MemChannel > TwitterAgent.sinks.HDFS.type = hdfs > TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S > TwitterAgent.sinks.HDFS.hdfs.filePrefix = events > TwitterAgent.sinks.HDFS.hdfs.round = true > TwitterAgent.sinks.HDFS.hdfs.roundValue = 5 > TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute > TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true > TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream > TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text > TwitterAgent.channels.MemChannel.type = memory > TwitterAgent.channels.MemChannel.capacity = 1000 > TwitterAgent.channels.MemChannel.transactionCapacity = 100 > Twitter Data from Flume: > Obj avro.schema� > {"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}�]3hˊى���|$656461386520784896� > �お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n� ユハズ > yhzz_(2015-10-20T13:26:05Z� はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める > https://t.co/ZpfDqw4l9g � http://twitter.com; > rel="nofollow">Twitter Web Client ^ > https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984� > no me veais ni noteis mi presencia no quiere decir que no os este observando > desde las sombras�� � JKP® BakasumaUserSinCausa(2015-10-20T13:26:06Z� RT > @NaiiVicious: @Lisi_Hattori @UserSinCausa https://t.co/M2LTJWwqae � http://twitter.com/download/android; rel="nofollow">Twitter for Android ^ > https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpg� > http://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|��� > By loading this twitter data into a HDFS table. It is not possible to convert > with avro-tools-1.7.7.jar. into Json. We get error message: "No data" > If we want to read this
[jira] [Created] (FLUME-2853) Allow for YAML configuration files
Christopher White created FLUME-2853: Summary: Allow for YAML configuration files Key: FLUME-2853 URL: https://issues.apache.org/jira/browse/FLUME-2853 Project: Flume Issue Type: Improvement Components: Configuration Reporter: Christopher White Priority: Minor Allow for YAML formatted configuration files (http://www.yaml.org/spec/1.2/spec.html). This provides: * A more condensed format than properties files * Less 'typo' prone for repetitive common prefixes * Ability to define a value once and reuse via references (see [spec - Structures - Example 2.10|http://www.yaml.org/spec/1.2/spec.html#id2760395] For example compare the following properties file and potential YAML equivalent: {code:title=agent.properties} host1.sources = source1 host1.channels = channel1 host1.sinks = sink1 host1.sources.source1.type = seq host1.sources.source1.channels = channel1 host1.channels.channel1.type = memory host1.channels.channel1.capacity = 1 host1.sinks.sink1.type = null host1.sinks.sink1.channel = channel1 {code} {code:title=agent.yaml} host1: sources: _: source1 source1: type: seq channels: channel1 channels: _: channel1 channel1: type: memory capacity: 1 sinks: _: sink1 sink1: type: null channel: channel1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)