[jira] [Created] (FLUME-2748) ThriftLegacySource produces exception due to wrongly compiled thrift definitions
Tobias Heintz created FLUME-2748: Summary: ThriftLegacySource produces exception due to wrongly compiled thrift definitions Key: FLUME-2748 URL: https://issues.apache.org/jira/browse/FLUME-2748 Project: Flume Issue Type: Bug Components: Sinks+Sources Affects Versions: v1.6.0 Reporter: Tobias Heintz We are in the process of upgrading our Flume installation from 0.9.2 to 1.6.0. Currently we are using the ThriftLegacySource to allow the Flume server to receive messages without having to update all components at the same time. For every received message, we are seeing this exception: {code} 2015-07-24 17:15:28,892 (pool-3-thread-5) [ERROR - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:215)] Error occurred during processing of message. java.lang.NullPointerException at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} I've done some digging in the code and it appears that there is an error in the Java classes that were compiled from the legacy thrift definitions: the method [{{append}} is defined as {{oneway}}|https://github.com/apache/flume/blob/344e0accae5675fd3d14b8414531528607865aae/flume-ng-legacy-sources/flume-thrift-source/src/main/thrift/flumeCompatibility.thrift#L61], however in the compiled class, the method [{{isOneway()}} returns {{false}}|https://github.com/apache/flume/blob/344e0accae5675fd3d14b8414531528607865aae/flume-ng-legacy-sources/flume-thrift-source/src/main/java/com/cloudera/flume/handlers/thrift/ThriftFlumeEventServer.java#L223]. This then leads to the NullPointerException, when the [ProcessFunction tries to write the result|https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/ProcessFunction.java#L53] back to the producer. I'm not sure how this happened, maybe the very old version (0.7) of the thrift compiler is at fault here. The fix however would be to simply make the {{isOneway()}} method return {{true}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLUME-2458) Separate hdfs tmp directory for flume hdfs sink
[ https://issues.apache.org/jira/browse/FLUME-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neerja Khattar updated FLUME-2458: -- Assignee: Neerja Khattar Separate hdfs tmp directory for flume hdfs sink --- Key: FLUME-2458 URL: https://issues.apache.org/jira/browse/FLUME-2458 Project: Flume Issue Type: Improvement Components: Sinks+Sources Affects Versions: v1.5.0.1 Reporter: Sverre Bakke Assignee: Neerja Khattar Priority: Minor Attachments: FLUME-2458.patch, patch-2458.txt The current HDFS sink will write temporary files to the same directory as the final file will be stored. This is a problem for several reasons: 1) File moving When mapreduce fetches a list of files to be processed and then processes files that are then gone (i.e. are moved from .tmp to whatever final name it is suppose to have), then the mapreduce job will crash. 2) File type When mapreduce decides how to process files, then it looks at files extension. If using compressed files, then it will decompress it for you. If the file has a .tmp file extension (in the same folder) then it will treat a compressed file as an uncompressed files, thus breaking the results of the mapreduce job. I propose that the sink gets an optional tmp path for storing these files to avoid these issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLUME-2749) Kerberos configuration error when using short names in multiple HDFS Sinks
Johny Rufus created FLUME-2749: -- Summary: Kerberos configuration error when using short names in multiple HDFS Sinks Key: FLUME-2749 URL: https://issues.apache.org/jira/browse/FLUME-2749 Project: Flume Issue Type: Bug Affects Versions: v1.6.0 Reporter: Johny Rufus Assignee: Johny Rufus When we have more thank one HDFS Sink, configured in kerberos mode, and principal is configured with a short name like 'flume' (without the @REALM information), we get a java.lang.IllegalStateException: Cannot use multiple kerberos principals in the same agent. Must restart agent to use new principal or keytab. Previous = fl...@example.com (auth:KERBEROS), New = flume at com.google.common.base.Preconditions.checkState(Preconditions.java:172) at org.apache.flume.auth.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:131) at org.apache.flume.auth.FlumeAuthenticationUtil.getAuthenticator(FlumeAuthenticationUtil.java:67) at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:261) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2749) Kerberos configuration error when using short names in multiple HDFS Sinks
[ https://issues.apache.org/jira/browse/FLUME-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643838#comment-14643838 ] ASF subversion and git services commented on FLUME-2749: Commit 1161b044930579ebb803685753eb5b3363ee5178 in flume's branch refs/heads/flume-1.7 from [~hshreedharan] [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=1161b04 ] FLUME-2749. Fix kerberos configuration error when using short names in multiple HDFS Sinks (Johny Rufus via Hari) Kerberos configuration error when using short names in multiple HDFS Sinks -- Key: FLUME-2749 URL: https://issues.apache.org/jira/browse/FLUME-2749 Project: Flume Issue Type: Bug Affects Versions: v1.6.0 Reporter: Johny Rufus Assignee: Johny Rufus Attachments: FLUME-2749.patch When we have more thank one HDFS Sink, configured in kerberos mode, and principal is configured with a short name like 'flume' (without the @REALM information), we get a java.lang.IllegalStateException: Cannot use multiple kerberos principals in the same agent. Must restart agent to use new principal or keytab. Previous = fl...@example.com (auth:KERBEROS), New = flume at com.google.common.base.Preconditions.checkState(Preconditions.java:172) at org.apache.flume.auth.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:131) at org.apache.flume.auth.FlumeAuthenticationUtil.getAuthenticator(FlumeAuthenticationUtil.java:67) at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:261) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2749) Kerberos configuration error when using short names in multiple HDFS Sinks
[ https://issues.apache.org/jira/browse/FLUME-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643839#comment-14643839 ] Hari Shreedharan commented on FLUME-2749: - Committed! Thanks Johny! Kerberos configuration error when using short names in multiple HDFS Sinks -- Key: FLUME-2749 URL: https://issues.apache.org/jira/browse/FLUME-2749 Project: Flume Issue Type: Bug Affects Versions: v1.6.0 Reporter: Johny Rufus Assignee: Johny Rufus Fix For: v1.7.0 Attachments: FLUME-2749.patch When we have more thank one HDFS Sink, configured in kerberos mode, and principal is configured with a short name like 'flume' (without the @REALM information), we get a java.lang.IllegalStateException: Cannot use multiple kerberos principals in the same agent. Must restart agent to use new principal or keytab. Previous = fl...@example.com (auth:KERBEROS), New = flume at com.google.common.base.Preconditions.checkState(Preconditions.java:172) at org.apache.flume.auth.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:131) at org.apache.flume.auth.FlumeAuthenticationUtil.getAuthenticator(FlumeAuthenticationUtil.java:67) at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:261) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2749) Kerberos configuration error when using short names in multiple HDFS Sinks
[ https://issues.apache.org/jira/browse/FLUME-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643875#comment-14643875 ] Hudson commented on FLUME-2749: --- UNSTABLE: Integrated in Flume-trunk-hbase-1 #114 (See [https://builds.apache.org/job/Flume-trunk-hbase-1/114/]) FLUME-2749. Fix kerberos configuration error when using short names in multiple HDFS Sinks (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.gita=commith=a4946111383b3dfdb4c128fe5390ff3983213cbb) * flume-ng-auth/src/main/java/org/apache/flume/auth/FlumeAuthenticationUtil.java * flume-ng-auth/src/main/java/org/apache/flume/auth/KerberosAuthenticator.java * flume-ng-auth/src/test/java/org/apache/flume/auth/TestFlumeAuthenticator.java * flume-ng-auth/src/main/java/org/apache/flume/auth/KerberosUser.java Kerberos configuration error when using short names in multiple HDFS Sinks -- Key: FLUME-2749 URL: https://issues.apache.org/jira/browse/FLUME-2749 Project: Flume Issue Type: Bug Affects Versions: v1.6.0 Reporter: Johny Rufus Assignee: Johny Rufus Fix For: v1.7.0 Attachments: FLUME-2749.patch When we have more thank one HDFS Sink, configured in kerberos mode, and principal is configured with a short name like 'flume' (without the @REALM information), we get a java.lang.IllegalStateException: Cannot use multiple kerberos principals in the same agent. Must restart agent to use new principal or keytab. Previous = fl...@example.com (auth:KERBEROS), New = flume at com.google.common.base.Preconditions.checkState(Preconditions.java:172) at org.apache.flume.auth.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:131) at org.apache.flume.auth.FlumeAuthenticationUtil.getAuthenticator(FlumeAuthenticationUtil.java:67) at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:261) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Jenkins build became unstable: Flume-trunk-hbase-1 #114
See https://builds.apache.org/job/Flume-trunk-hbase-1/114/changes
Talk About RegexExtractorInterceptorSerializer Implements's Thread Safety
hello ! A Chinese with pool English, be careful ! My github about Flume : https://github.com/hotfey/flume.ng.1.5.2 Four sources flume on 4 machines, with logs files as sources, avro as sinks. One sink flume on an other machine, with avro as source, hdfs as sink. Create a class implements RegexExtractorInterceptorSerializer, that is the annex(also see github). My logs files start with timestamp every line, so as events. I implements RegexExtractorInterceptorSerializer, just want to create directorys reference the timestamp in hdfs. (e.g. A timestamp 28/Jul/2015, will create a hdfs directory .../2015/07/28) But, when i start all the flumes, i do not know how to ensure the thread safety about my implements. (e.g. If one of the sources machines's timestamp is 28/Jul/2015, and an other's 21/Jun/2015, The fact, may create .../2015/06/21, .../2015/06/28, .../2015/07/21 or .../2015/07/28.) Can you give me some advices about it. That's all, Thanks ! The Best Wishes For You !
[jira] [Updated] (FLUME-2749) Kerberos configuration error when using short names in multiple HDFS Sinks
[ https://issues.apache.org/jira/browse/FLUME-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johny Rufus updated FLUME-2749: --- Attachment: FLUME-2749.patch Modified to pre-1.6 style of checking if the current user trying to log in, is different than the already logged in user (Using KerberosUser class, that stores the configured Principal and keytab) Kerberos configuration error when using short names in multiple HDFS Sinks -- Key: FLUME-2749 URL: https://issues.apache.org/jira/browse/FLUME-2749 Project: Flume Issue Type: Bug Affects Versions: v1.6.0 Reporter: Johny Rufus Assignee: Johny Rufus Attachments: FLUME-2749.patch When we have more thank one HDFS Sink, configured in kerberos mode, and principal is configured with a short name like 'flume' (without the @REALM information), we get a java.lang.IllegalStateException: Cannot use multiple kerberos principals in the same agent. Must restart agent to use new principal or keytab. Previous = fl...@example.com (auth:KERBEROS), New = flume at com.google.common.base.Preconditions.checkState(Preconditions.java:172) at org.apache.flume.auth.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:131) at org.apache.flume.auth.FlumeAuthenticationUtil.getAuthenticator(FlumeAuthenticationUtil.java:67) at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:261) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2749) Kerberos configuration error when using short names in multiple HDFS Sinks
[ https://issues.apache.org/jira/browse/FLUME-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643812#comment-14643812 ] Hari Shreedharan commented on FLUME-2749: - Looks good. Running tests now. Kerberos configuration error when using short names in multiple HDFS Sinks -- Key: FLUME-2749 URL: https://issues.apache.org/jira/browse/FLUME-2749 Project: Flume Issue Type: Bug Affects Versions: v1.6.0 Reporter: Johny Rufus Assignee: Johny Rufus Attachments: FLUME-2749.patch When we have more thank one HDFS Sink, configured in kerberos mode, and principal is configured with a short name like 'flume' (without the @REALM information), we get a java.lang.IllegalStateException: Cannot use multiple kerberos principals in the same agent. Must restart agent to use new principal or keytab. Previous = fl...@example.com (auth:KERBEROS), New = flume at com.google.common.base.Preconditions.checkState(Preconditions.java:172) at org.apache.flume.auth.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:131) at org.apache.flume.auth.FlumeAuthenticationUtil.getAuthenticator(FlumeAuthenticationUtil.java:67) at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:261) -- This message was sent by Atlassian JIRA (v6.3.4#6332)