Re: All but one TMs connect when JM has more than 16G of memory

2015-09-30 Thread Robert Metzger
Hi Robert, the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit

Re: All but one TMs connect when JM has more than 16G of memory

2015-10-01 Thread Robert Metzger
us will check the > documentation later to see how to deploy the TMs and JM on separate > machines each, since that is not what's happening at the moment, but this > is what I'd like to have. Thanks again and see you in an hour. > > Cheers > Robert > > On Wed,

Re: All but one TMs connect when JM has more than 16G of memory

2015-10-01 Thread Robert Metzger
esource Manager sits, and all the TMs with the remaining > Node Managers? > > Robert > > On Thu, Oct 1, 2015 at 10:53 AM, Robert Metzger > wrote: > >> Hi, >> >> It is interesting to note that when I set both >> yarn.nodemanager.resource.memory-mb >

Re: All but one TMs connect when JM has more than 16G of memory

2015-10-01 Thread Robert Metzger
ting the master and slaves > with SLURM's srun instead of ssh (I guess a slight modification of > start-cluster.sh should do the job). > > On Thu, Oct 1, 2015 at 11:30 AM, Robert Metzger > wrote: > >> Hi, >> there is currently no option for forcing certain containers

Re: DataSet transformation

2015-10-01 Thread Robert Metzger
Hi, for that you have to collect the dataset to your local machine and then transform the collection into the array. Note that this only advised for small data sets. Robert On Thu, Oct 1, 2015 at 2:13 PM, Lydia Ickler wrote: > Hi all, > > so I have a case class Spectrum(mz: Float, intensity:

Re: data flow example on cluster

2015-10-02 Thread Robert Metzger
I think there is a version mismatch between the Flink version you've used to compile your job and the Flink version installed on the cluster. Maven automagically pulls newer 0.10-SNAPSHOT versions every time you're building your job. On Fri, Oct 2, 2015 at 11:45 AM, Lydia Ickler wrote: > Hi Til

Re: data flow example on cluster

2015-10-02 Thread Robert Metzger
on set to 0.8 > > 0.8-incubating-SNAPSHOT > > how can I change it to the newest? > 0.10-SNAPSHOT > Is not working > > Am 02.10.2015 um 11:48 schrieb Robert Metzger : > > I think there is a version mismatch between the Flink version you've used > to compile

Re: data flow example on cluster

2015-10-02 Thread Robert Metzger
ly re-create your project's POM files with a new >> quickstart? >> >> I think that the POMS between 0.8-incubating-SNAPSHOT and 0.10-SNAPSHOT >> may not be quite compatible any more... >> >> On Fri, Oct 2, 2015 at 12:07 PM, Robert Metzger >> wrote: >>

Re: Error trying to access JM through proxy

2015-10-05 Thread Robert Metzger
I filed a bug for this issue in our bug tracker https://issues.apache.org/jira/browse/FLINK-2821 (even though we can not do much about it, we should track the resolution of the issue). On Mon, Oct 5, 2015 at 5:34 AM, Stephan Ewen wrote: > I think this is yet another problem caused by Akka's over

Re: Processing S3 data with Apache Flink

2015-10-05 Thread Robert Metzger
Hi Kostia, thank you for writing to the Flink mailing list. I actually started to try out our S3 File system support after I saw your question on StackOverflow [1]. I found that our S3 connector is very broken. I had to resolve two more issues with it, before I was able to get the same exception y

Re: Processing S3 data with Apache Flink

2015-10-06 Thread Robert Metzger
advance! > Kostia > > Thank you, > Konstantin Kudryavtsev > > On Mon, Oct 5, 2015 at 10:13 PM, Robert Metzger > wrote: > >> Hi Kostia, >> >> thank you for writing to the Flink mailing list. I actually started to >> try out our S3 File system

Re: Processing S3 data with Apache Flink

2015-10-06 Thread Robert Metzger
tter visibility > > WBR, > Kostia > > Thank you, > Konstantin Kudryavtsev > > On Tue, Oct 6, 2015 at 2:12 PM, Robert Metzger > wrote: > >> Mh. I tried out the code I've posted yesterday and it was working >> immediately. >> The security settin

Re: FlinkKafkaConsumer bootstrap.servers vs. broker hosts

2015-10-14 Thread Robert Metzger
Hi Juho, sorry for the late reply, I was busy with Flink Forward :) The Flink Kafka Consumer needs both addresses. Kafka uses the bootstrap servers to connect to the brokers to consume messages. The Zookeeper connection is used to commit the offsets of the consumer group once a state snapshot in

Re: Building Flink with hadoop 2.6

2015-10-14 Thread Robert Metzger
Hi Gwen, can you tell us the "mvn" command you're using for building Flink? On Wed, Oct 14, 2015 at 4:37 PM, Gwenhael Pasquiers < gwenhael.pasqui...@ericsson.com> wrote: > Hi ; > > > > We need to test some things with flink and hadoop 2.6 (the trunc method). > > > > We’ve set up a build task o

Re: Building Flink with hadoop 2.6

2015-10-14 Thread Robert Metzger
” mvn clean install -DskipTests -Dhadoop.version=2.6.0 “. > > > > > > *From:* Robert Metzger [mailto:rmetz...@apache.org] > *Sent:* mercredi 14 octobre 2015 16:47 > *To:* user@flink.apache.org > *Subject:* Re: Building Flink with hadoop 2.6 > > > > Hi Gwen, >

Re: Building Flink with hadoop 2.6

2015-10-14 Thread Robert Metzger
> mvn command, but it is in the one you distributed on your website. > > > > *From:* Robert Metzger [mailto:rmetz...@apache.org] > *Sent:* mercredi 14 octobre 2015 16:54 > *To:* user@flink.apache.org > *Subject:* Re: Building Flink with hadoop 2.6 > > > > Grea

Re: Building Flink with hadoop 2.6

2015-10-14 Thread Robert Metzger
One more thing regarding the truncate method: Its supported as of Hadoop 2.7.0 (https://issues.apache.org/jira/browse/HDFS-3107) On Wed, Oct 14, 2015 at 5:00 PM, Robert Metzger wrote: > Ah, I know what's causing this issue. > In the latest 0.10-SNAPSHOT, we have removed log4j from

Re: Building Flink with hadoop 2.6

2015-10-14 Thread Robert Metzger
ed error so it just probably > moved. > > > > Thanks ! > > > > Gwen’ > > > > *From:* Robert Metzger [mailto:rmetz...@apache.org] > *Sent:* mercredi 14 octobre 2015 17:06 > > *To:* user@flink.apache.org > *Subject:* Re: Building Flink with hadoop 2.6 > >

Re: flink kafka question

2015-10-14 Thread Robert Metzger
I would also suggest to create a mapper after the source. Make sure the mapper is chained to the kafka source, then, you'll not really see a big delay in the timestamp written to redis. Just out of curiosity, why do you need to write a timestamp to redis for each record from Kafka? On Wed, Oct 14

Re: Running on a firewalled Yarn cluster?

2015-11-02 Thread Robert Metzger
Hi Niels, so the problem is that you can not submit a job to Flink using the "/bin/flink" tool, right? I assume Flink and its TaskManagers properly start and connect to each other (the number of TaskManagers is shown correctly in the web interface). I see the following solutions for the problem a

Re: Could not load the task's invokable class.

2015-11-04 Thread Robert Metzger
It seems that Eclipse's export jar functionality is broken. But since maven is working properly, I assume the issue is resolved. For the other error, can you post the source code of the main() method somewhere? There are some hints how to resolve the issue in the exceptions (use of .returns() anno

Re: Zeppelin Integration

2015-11-04 Thread Robert Metzger
For those interested, Trevor wrote a blog post describing how to setup Spark, Flink and Zeppelin, both locally and on clusters: http://trevorgrant.org/2015/11/03/apache-casserole-a-delicious-big-data-recipe-for-the-whole-family/ Thanks Trevor for the great tutorial! On Thu, Oct 22, 2015 at 4:23 PM

Re: Running continuously on yarn with kerberos

2015-11-05 Thread Robert Metzger
Hi Niels, thank you for analyzing the issue so properly. I agree with you. It seems that HDFS and HBase are using their own tokes which we need to transfer from the client to the YARN containers. We should be able to port the fix from Spark (which they got from Storm) into our YARN client. I think

Re: IT's with POJO's

2015-11-05 Thread Robert Metzger
Hi Nick, both JIRA and the mailing list are good. In this case I'd say JIRA would be better because then everybody has the full context of the discussion. The issue is fixed in 0.10, which is not yet released. You can work around the issue by implementing a custom SourceFunction which returns th

Re: Running on a firewalled Yarn cluster?

2015-11-05 Thread Robert Metzger
g/jira/browse/FLINK-2960 >> >> Best regards, >> Max >> >> On Tue, Nov 3, 2015 at 11:02 AM, Niels Basjes wrote: >> >>> Hi, >>> >>> I forgot to answer your other question: >>> >>> On Mon, Nov 2, 2015 at 4:34 PM, Robert Me

Re: Running on a firewalled Yarn cluster?

2015-11-05 Thread Robert Metzger
via the proxy that runs > over the master on a single fixed port. > > Niels > > On Thu, Nov 5, 2015 at 2:46 PM, Robert Metzger > wrote: > >> While discussing with my colleagues about the issue today, we came up >> with another approach to resolve the issue: >>

Re: Published test artifacts for flink streaming

2015-11-05 Thread Robert Metzger
Hi Nick, we are usually publishing the test artifacts. Can you try and replace the tag by test-jar: org.apache.flink flink-streaming-core ${flink.version} test-jar test On Thu, Nov 5, 2015 at 8:18 PM, Nick Dimiduk wrote: > Hello, > > I'm attempting integration tests for my

Re: Implementing samza table/stream join

2015-11-11 Thread Robert Metzger
In Flink 0.9.1 keyBy is called "groupBy()". We've reworked the DataStream API between 0.9 and 0.10, that's why we had to rename the method. On Wed, Nov 11, 2015 at 9:37 AM, Stephan Ewen wrote: > I would encourage you to use the 0.10 version of Flink. Streaming has made > some major improvements

Re: finite subset of an infinite data stream

2015-11-11 Thread Robert Metzger
I think what you call "union" is a "connected stream" in Flink. Have a look at this example: https://gist.github.com/fhueske/4ea5422edb5820915fa4 It shows how to dynamically update a list of filters by external requests. Maybe that's what you are looking for? On Wed, Nov 11, 2015 at 12:15 PM, St

Re: Cluster installation gives java.lang.NoClassDefFoundError for everything

2015-11-11 Thread Robert Metzger
Is "jar tf /users/camelia/thecluster/flink-0.9.1/lib/flink-dist-0.9.1.jar" listing for example the "org/apache/flink/client/CliFrontend" class? On Wed, Nov 11, 2015 at 12:09 PM, Maximilian Michels wrote: > Hi Camelia, > > Flink 0.9.X supports Java 6. So this can't be the issue. > > Out of curios

Re: Join Stream with big ref table

2015-11-13 Thread Robert Metzger
Hi Arnaud, I'm happy that you were able to resolve the issue. If you are still interested in the first approach, you could try some things, for example using only one slot per task manager (the slots share the heap of the TM). Regards, Robert On Fri, Nov 13, 2015 at 9:18 AM, LINZ, Arnaud wrote:

Re: Apache Flink Operator State as Query Cache

2015-11-13 Thread Robert Metzger
Hi Welly, Flink 0.10.0 is out, its just not announced yet. Its available on maven central and the global mirrors are currently syncing it. This mirror for example has the update already: http://apache.mirror.digionline.de/flink/flink-0.10.0/ On Fri, Nov 13, 2015 at 9:50 AM, Welly Tambunan wrote:

Re: Hello World Flink 0.10

2015-11-13 Thread Robert Metzger
Hi Kamil, The EOFTrigger is not part of Flink. However, I've also tried implementing the Hello World from the presentation here: https://github.com/rmetzger/scratch/blob/flink0.10-scala2.11/src/main/scala/com/dataartisans/Job.scala Stephan Ewen told me that there is a more elegant way of implemen

Re: Hello World Flink 0.10

2015-11-13 Thread Robert Metzger
s I understand the line ".trigger(new EOFTrigger)" is > simply not needed in this case (it looks that it works as expected when I > remove it). > > Thanks for your help. > > pt., 13.11.2015 o 11:01 użytkownik Robert Metzger > napisał: > >> Hi Kamil, >> &

Re: Crash in a simple "mapper style" streaming app likely due to a memory leak ?

2015-11-13 Thread Robert Metzger
sage, I will try using a monitoring app on the yarn jvm to understand. > > > > How do I set this yarn.heap-cutoff-ratio parameter for a specific > application ? I don’t want to modify the “root-protected” flink-conf.yaml > for all the users & flink jobs with that v

Re: MaxPermSize on yarn

2015-11-16 Thread Robert Metzger
Hi Gwen, there is a configuration value called "env.java.opts", that allows you to pass custom JVM args to JM and TM JVMs. I hope that helps. On Mon, Nov 16, 2015 at 5:30 PM, Gwenhael Pasquiers < gwenhael.pasqui...@ericsson.com> wrote: > Hi, > > > > We’re having some OOM permgen exceptions wh

Re: How to read from a Kafka topic from the beginning

2015-11-17 Thread Robert Metzger
Hi Will, In Kafka's consumer configuration [1] there is a configuration parameter called "auto.offset.reset". Setting it to "smallest" will tell the consumer to start reading a topic from the smallest available offset. You can pass the configuration using the properties of the Kafka consumer. [

Re: MaxPermSize on yarn

2015-11-17 Thread Robert Metzger
incorrectly passed to > the VM (maybe the “-XX:” causes some issues in the scripts). > > > > We’ll investigate. > > > > *From:* Robert Metzger [mailto:rmetz...@apache.org] > *Sent:* lundi 16 novembre 2015 19:18 > *To:* user@flink.apache.org > *Subject:* Re: MaxP

Re: Flink on EC"

2015-11-17 Thread Robert Metzger
Hi Thomas, I'm sorry that nobody responded anymore. As you you've probably noticed, there is a lot of traffic on the mailing lists and sometimes stuff gets lost. Were you able to get the S3 file system running with Flink? If not, lets try to figure out why it is not picking up the config correctly

Re: Creating a representative streaming workload

2015-11-18 Thread Robert Metzger
Hey Vasia, I think a very common workload would be an event stream from web servers of an online shop. Usually, these shops have multiple servers, so events arrive out of order. I think there are plenty of different use cases that you can build around that data: - Users perform different actions t

Re: YARN High Availability

2015-11-19 Thread Robert Metzger
I agree with Aljoscha. Many companies install Flink (and its config) in a central directory and users share that installation. On Thu, Nov 19, 2015 at 10:45 AM, Aljoscha Krettek wrote: > I think we should find a way to randomize the paths where the HA stuff > stores data. If users don’t realize

Re: flink yarn-session failure

2015-11-19 Thread Robert Metzger
The exception is thrown even before Flink code is executed, so I assume that your YARN setup is not properly working. Did you try running any other YARN application on the setup? I suspect that other systems like MapReduce or Spark will also not run on the environment. Maybe the yarn-site.xml on t

Re: flink yarn-session failure

2015-11-19 Thread Robert Metzger
> > > > Could someone provide me a correct yarn-site.xml in order to make it work? > Should the yarn-site.xml be the same in both namenode and datanodes? Sorry > for this question but different tutorials on google refer to different > configurations and i am confused. > > T

Re: Flink execution time benchmark.

2015-11-19 Thread Robert Metzger
Hi Saleh, The new web interface in Flink 0.10 has also a REST API that you can use for querying job information. GET http://localhost:8081/jobs/83547b683ad5b388355a49911168fbc7 will give you the following JSON object: { "jid":"83547b683ad5b388355a49911168fbc7", "name":"com.dataartisans.

Re: Error handling

2015-11-19 Thread Robert Metzger
Hi Nick, regarding the Kafka example: What happens is that the FlinkKafkaConsumer will throw an exception. The JobManager then cancels the entire job and restarts it. It will then try to continue reading from the last valid checkpoint or the consumer offset in zookeeper. Since the data in the topi

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Robert Metzger
Hi Ovidiu, you can submit multiple programs to a running Flink cluster (or a YARN session). Flink does currently not have any queuing mechanism. The JobManager will reject a program if there are not enough free resources for it. If there are enough resources for multiple programs, they'll run conc

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Robert Metzger
.10/faq.html>.* > > Is the scheduling features considered for next releases? > > Thank you. > Best regards, > Ovidiu > > On 20 Nov 2015, at 11:59, Robert Metzger wrote: > > Hi Ovidiu, > > you can submit multiple programs to a running Flink cluster (or a YARN >

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Robert Metzger
understand > correctly, is to either wait for the current job to finish (assuming it has > acquired all the available resources) or to stop the current job, in case I > have other jobs with higher priorities. This could be related also to the > resource elasticity you mentioned. >

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Robert Metzger
oin as a > contributor also. > > Best regards, > Ovidiu > > > On 20 Nov 2015, at 14:53, Robert Metzger wrote: > > Hi Ovidiu, > > good choice on your research topic ;) > > I think doing some hands on experiments will help you to understand much > bett

Re: Processing S3 data with Apache Flink

2015-11-21 Thread Robert Metzger
Hi, It seems that you've set the "fs.hdfs.hadoopconf" configuration parameter to a file. I think you have to set it the directory containing the configuration. Sorry, I know that's not very intuitive, but in Hadoop the settings for in different files (hdfs|yarn|core)-site.xml. On Sat, Nov 21, 20

Re: Processing S3 data with Apache Flink

2015-11-21 Thread Robert Metzger
he description. So I should have known :/ > > On additional question though. If I use the flink binary for Hadoop > 1.2.1 and run flink in standalone mode, should I use the *-hadoop1 > dependencies even If I am not interacting with HDFS 1.x? > > Cheers, > > Konstantin > > O

Re: Processing S3 data with Apache Flink

2015-11-21 Thread Robert Metzger
g to S3 in this > job. So I am using the Hadoop S3 FileSystem classes, but that's it. > > Cheers, > > Konstantin > > > On 21.11.2015 15:16, Robert Metzger wrote: > > Hi, > > > > great to hear that its working. I've updated the documentation (for

Re: How to pass hdp.version to flink on yarn

2015-11-23 Thread Robert Metzger
Hi, In Flink the configuration parameter for passing custom JVM options is "env.java.opts". I would recommend to put it into the conf/flink-config.yaml like this: env.java.opts: "-Dhdp.version=2.3.0.0-2557 -Dhdp.version=2.3.0.0-2557" Please let me know if this works. Maybe you are the first user

[VOTE] Release Apache Flink 0.10.1 (release-0.10.0-rc1)

2015-11-23 Thread Robert Metzger
Hi All, this is the first bugfix release for the 0.10 series of Flink. I've CC'ed the user@ list if users are interested in helping to verify the release. It contains fixes for critical issues, in particular: - FLINK-3021 Fix class loading issue for streaming sources - FLINK-2974 Add periodic off

Re: Using Hadoop Input/Output formats

2015-11-25 Thread Robert Metzger
I agree with Stephan. Reading static files is quite uncommon with the DataStream API. Before We add such a method, we should add a convenience method for Kafka ;) But in general, I'm not a big fan of adding too many of these methods because they pull in so many external classes, which lead to brea

Re: Running on a firewalled Yarn cluster?

2015-11-25 Thread Robert Metzger
obServer's random port binding? Or, to control the port BlobServer binds >>> to? >>> >>> Cheers, >>> >>> Cory >>> >>> On Thu, Nov 5, 2015 at 8:07 AM, Niels Basjes wrote: >>> >>>> That is what I tried. Couldn

Re: flink connectors

2015-11-27 Thread Robert Metzger
Maybe there is a maven mirror you can access from your network? This site contains a list of some mirrors http://stackoverflow.com/questions/5233610/what-are-the-official-mirrors-of-the-maven-central-repository You don't have to use the maven tool, you can also manually browse for the jars and dow

[ANNOUNCE] Flink 0.10.1 released

2015-11-27 Thread Robert Metzger
The Flink PMC is pleased to announce the availability of Flink 0.10.1. The official release announcement: http://flink.apache.org/news/2015/11/27/release-0.10.1.html Release binaries: http://apache.openmirror.de/flink/flink-0.10.1/ Please update your maven dependencies to the new 0.10.1 version a

Re: Working with protobuf wrappers

2015-12-01 Thread Robert Metzger
Also, we don't add serializers automatically for DataStream programs. I've opened a JIRA for this a while ago. On Tue, Dec 1, 2015 at 10:20 AM, Till Rohrmann wrote: > Hi Kryzsztof, > > it's true that we once added the Protobuf serializer automatically. > However, due to versioning conflicts (see

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Robert Metzger
Hi Mihail, the issue is actually a bug in Kafka. We have a JIRA in Flink for this as well: https://issues.apache.org/jira/browse/FLINK-3067 Sadly, we haven't released a fix for it yet. Flink 0.10.2 will contain a fix. Since the kafka connector is not contained in the flink binary, you can just s

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Robert Metzger
a wrote: > Hi, > > I think Robert meant to write setting the connector dependency to > 1.0-SNAPSHOT. > > Cheers, > Gyula > > Robert Metzger ezt írta (időpont: 2015. dec. 1., K, > 17:10): > >> Hi Mihail, >> >> the issue is actually a bug in Kaf

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Robert Metzger
this is a Zookeeper version > mismatch. > > On Tue, Dec 1, 2015 at 5:16 PM, Robert Metzger > wrote: > > Hi Gyula, > > > > no, I didn't ;) We still deploy 0.10-SNAPSHOT versions from the > > "release-0.10" branch to Apache's maven snapshot repos

Re: Question about DataStream serialization

2015-12-01 Thread Robert Metzger
Hi Radu, both emails reached the mailing list :) You can not reference to DataSets or DataStreams from inside user defined functions. Both are just abstractions for a data set or stream, so the elements are not really inside the set. We don't have any support for mixing the DataSet and DataStrea

Re: Working with protobuf wrappers

2015-12-01 Thread Robert Metzger
a simple snippet of a Flink program that >show when it's better to the original Person, its POJO version or it's >Tuple version (assuming that is a flat object)? >3. Does this further change when I use Table APIs? > > > Best, > Flavio > > O

Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Robert Metzger
No, its not yet merged into the source repo of Flink. You can find the code here: https://github.com/apache/flink/pull/1425 You can also check out the code of the PR or download the PR contents as a patch and apply it to the Flink source. I think the change will be merged tomorrow and then you'll

Re: NPE with Flink Streaming from Kafka

2015-12-02 Thread Robert Metzger
:218) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) > at java.lang.Thread.run(Thread.java:745) > > > > > 2015-12-01 17:42 GMT+01:00 Maximilian Michels : > >> Thanks! I've linked the issue in JIRA. >> >> On Tue, Dec 1, 2015

Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-03 Thread Robert Metzger
Hi Welly, the fix has been merged and should be available in 0.10-SNAPSHOT. On Wed, Dec 2, 2015 at 10:12 AM, Maximilian Michels wrote: > Hi Welly, > > We still have to decide on the next release date but I would expect > Flink 0.10.2 within the next weeks. If you can't work around the union > l

Re: HA Mode and standalone containers compatibility ?

2015-12-03 Thread Robert Metzger
There is a configuration parameter called "yarn.properties-file.location" which allows setting a custom path for the properties file. If the batch and streaming jobs are using different configuration files, it should work. On Thu, Dec 3, 2015 at 1:51 PM, Ufuk Celebi wrote: > I opened an issue fo

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-12-04 Thread Robert Metzger
I think we need to find a solution for this problem soon. Another user is most likely affected: http://stackoverflow.com/q/34090808/568695 I've filed a JIRA for the problem: https://issues.apache.org/jira/browse/FLINK-3121 On Mon, Nov 30, 2015 at 5:58 PM, Aljoscha Krettek wrote: > Maybe. In th

Re: Read Kafka topic from the beginning

2015-12-05 Thread Robert Metzger
Hi Vladimir, Does current Kafka Consumer implementation allow to read all messages in a > topic from the beginning or from a specific offset. For reading from the beginning, setting "auto.offset.reset" to "smallest" will do the job. Reading from a specific offset is not yet supported yet, but it

Re: [IE] Re: passing environment variables to flink program

2015-12-05 Thread Robert Metzger
Just a little note, the feature requested in FLINK-2954 has been implemented and is available in 1.0-SNAPSHOT now. Please let me know if its working as expected. On Wed, Nov 4, 2015 at 6:35 PM, Jian Jiang wrote: > I have to continue the evaluat

Re: Running on a firewalled Yarn cluster?

2015-12-10 Thread Robert Metzger
pache/flink/blob/master/docs/setup/yarn_setup.md#running-flink-on-yarn-behind-firewalls Please let me know if that fixes your issues. Note that the fix is only available in 1.0-SNAPSHOT. On Wed, Nov 25, 2015 at 6:58 PM, Robert Metzger wrote: > Hi, > I just wanted to let you know that I di

Re: Serialisation problem

2015-12-12 Thread Robert Metzger
Hi, Can you check the log output in your IDE or the log files of the Flink client (./bin/flink). The TypeExtractor is logging why a POJO is not recognized as a POJO. The log statements look like this: 20:42:43,465 INFO org.apache.flink.api.java.typeutils.TypeExtractor - class com.dataarti

Re: S3 Input/Output with temporary credentials (IAM Roles)

2015-12-12 Thread Robert Metzger
Hi Vladimir, Flink is using Hadoop's S3 File System implementation. It seems that this feature is not supported by their implementation: https://issues.apache.org/jira/browse/HADOOP-9680 This issue contains some more information: https://issues.apache.org/jira/browse/HADOOP-9384 It seems that the

Re: Configure log4j with XML files

2015-12-21 Thread Robert Metzger
as an additional note: Flink is sending all files in the /lib folder to all YARN containers. So you could place the XML file in "/lib" and override the properties. I think you need to delete the log4j properties from the conf/ directory, then at least on YARN, we'll not set the -Dlog4j.configurati

Re: kafka integration issue

2016-01-05 Thread Robert Metzger
I think the problem is that you only set the version of the Kafka connector to 1.0-SNAPSHOT, not for the rest of the Flink dependencies. On Tue, Jan 5, 2016 at 6:18 PM, Alex Rovner wrote: > Thanks Till for the info. I tried switching to 1.0-SNAPSHOT and now facing > another error: > > Caused by:

Re: kafka integration issue

2016-01-05 Thread Robert Metzger
it/flink/flink-dist/target/flink-1.0-SNAPSHOT-bin/flink-1.0-SNAPSHOT/bin/flink > run -c com.magnetic.KafkaWordCount > ~/git/flink-poc/target/flink-poc-1.0-SNAPSHOT.jar > > On Tue, Jan 5, 2016 at 12:47 PM Robert Metzger > wrote: > >> I think the problem is that you only set t

Re: Problem to show logs in task managers

2016-01-06 Thread Robert Metzger
Maybe the isConverged() method is never called? For making that sure, just throw a RuntimeException inside the method to see whats happening. On Wed, Jan 6, 2016 at 3:58 PM, Ana M. Martinez wrote: > Hi Till, > > I am afraid it does not work in any case. > > I am following the steps you indicate

Re: sources not available for flink-streaming

2016-01-06 Thread Robert Metzger
I also saw that this is not working for SNAPSHOT releases (for stable releases it seems to work). We are publishing the source artifacts, as you can see here https://repository.apache.org/content/repositories/snapshots/org/apache/flink/flink-streaming-scala/1.0-SNAPSHOT/ Maybe its an issue with

Re: Where is Flink 0.10.1 documentation?

2016-01-08 Thread Robert Metzger
I updated the version in the "release-0.10" branch, that should fix the issue: http://git-wip-us.apache.org/repos/asf/flink/commit/1d05dbe5 On Fri, Jan 8, 2016 at 10:25 AM, Stephan Ewen wrote: > Hi! > > I think we missed updating the variable "version" in the > "docs/_config.yml" for the 0.10.1

Re: DeserializationSchema isEndOfStream usage?

2016-01-11 Thread Robert Metzger
Hi David, In theory isEndOfStream() is absolutely the right way to go for stopping data sources in Flink. That its not working as expected is a bug. I have a pending pull request for adding a Kafka 0.9 connector, which fixes this issue as well (for all supported Kafka versions). Sorry for the

Re: Flink v0.10.2

2016-01-13 Thread Robert Metzger
Hi, there are currently no planned releases. I would actually like to start preparing for the 1.0 release soon, but the community needs to discuss that first. How urgently do you need a 0.10.2 release? If this is the last blocker for using Flink in production at your company, I can push for the b

Re: akka.pattern.AskTimeoutException

2016-01-15 Thread Robert Metzger
Hi Frederick, sorry for the delayed response. I have no idea what the problem could be. Has the exception been thrown from the env.execute() call? Why did you set the akka.ask.timeout to 10k seconds? On Wed, Jan 13, 2016 at 2:13 AM, Frederick Ayala wrote: > Hi, > > I am having an error whil

Re: global function over partitions

2016-01-15 Thread Robert Metzger
Hi Radu, I'm sorry for the delayed response. I'm not sure what the purpose of DataStream.global() actually is. I've opened a JIRA to document or remove it: https://issues.apache.org/jira/browse/FLINK-3240. For getting the final metrics, you can just call ".timeWindowAll()", without a ".global()"

Re: Frequent exceptions killing streaming job

2016-01-16 Thread Robert Metzger
Hi Nick, I'm sorry you ran into the issue. Is it possible that Flink's Kafka consumer falls back in the topic so far that the offsets it's requesting are invalid? For that, the retention time of Kafka has to be pretty short. Skipping records under load is something currently not supported by Fli

Re: Compile fails with scala 2.11.4

2016-01-18 Thread Robert Metzger
How did start the Flink for Scala 2.11 compilation ? On Mon, Jan 18, 2016 at 11:41 AM, Chiwan Park wrote: > Hi Ritesh, > > This problem seems already reported [1]. Flink community is investigating > this issue. I think that if you don’t need Scala 2.11, use Scala 2.10 until > the issue is solved

Re: Accessing configuration in RichFunction

2016-01-18 Thread Robert Metzger
Hi Christian, I think the DataStream API does not allow you to pass any parameter to the open(Configuration) method. That method is only used in the DataSet (Batch) API, and its use is discouraged. A much better option to pass a Configuration into your function is as follows: Configuration mapC

Re: JDBCInputFormat GC overhead limit exceeded error

2016-01-19 Thread Robert Metzger
Hi Max, which version of Flink are you using? On Tue, Jan 19, 2016 at 5:35 PM, Maximilian Bode < maximilian.b...@tngtech.com> wrote: > Hi everyone, > > I am facing a problem using the JDBCInputFormat which occurred in a larger > Flink job. As a minimal example I can reproduce it when just writin

Re: Compile fails with scala 2.11.4

2016-01-20 Thread Robert Metzger
.version=2.11.4 > > I hope I'm doing it right ? > > Thanks, > > *Ritesh Kumar Singh,* > *https://riteshtoday.wordpress.com/* <https://riteshtoday.wordpress.com/> > > On Mon, Jan 18, 2016 at 12:03 PM, Robert Metzger > wrote: > >> How did start the Flink for Scala 2.11 comp

Re: flink 1.0-SNAPSHOT scala 2.11 compilation error

2016-01-20 Thread Robert Metzger
Hey David, the issue should be resolved now. Please let me know if its still an issue for you. Regards, Robert On Fri, Jan 15, 2016 at 4:02 PM, David Kim wrote: > Thanks Till! I'll keep an eye out on the JIRA issue. Many thanks for the > prompt reply. > > Cheers, > David > > On Fri, Jan 15, 2

Re: Frequent exceptions killing streaming job

2016-01-20 Thread Robert Metzger
would be desirable sometimes. >>>> >>>> More control over skipping the records could be something to implement >>>> in an extended version of the Kafka Consumer. A user could define a policy >>>> that, in case consumer falls behind producer more than X

Re: Actual byte-streams in multiple-node pipelines

2016-01-20 Thread Robert Metzger
Hi Tal, that sounds like an interesting use case. I think I need a bit more details about your use case to see how it can be done with Flink. You said you need low latency, what latency is acceptable for you? Also, I was wondering how are you going to feed the input data into Flink? If the data i

Re: Results of testing Flink quickstart against 0.10-SNAPSHOT and 1.0-SNAPSHOT (re. Dependency on non-existent org.scalamacros:quasiquotes_2.11:)

2016-01-20 Thread Robert Metzger
Hi Prez, thanks a lot for the thorough research you did on this issue. The issue with "1.0-SNAPSHOT with fetched binary dependencies" should be resolved by a fix I've pushed to master yesterday: a) The "change-scala-version" script wasn't adopted to the renamed examples directory, that's why it f

Re: integration with a scheduler

2016-01-20 Thread Robert Metzger
Hi Serkan, I would suggest to have a look at the "./bin/flink" tool. It allows you to start ("run") and stop ("cancel") batch and streaming jobs. Flink doesn't support suspending jobs. You can also use the JobManager web interface (default port: 8081) to get the status of the job and also to canc

Re: Could not upload the jar files to the job manager IOException

2016-01-20 Thread Robert Metzger
Hi, can you check the log file of the JobManager you're trying to submit the job to? Maybe there you can find helpful information why it failed. On Wed, Jan 20, 2016 at 3:23 PM, Ana M. Martinez wrote: > Hi all, > > I am running some experiments with flink in an Amazon cluster and every > now an

Re: parallelism parameter and output relation

2016-01-20 Thread Robert Metzger
Hi Serkan, yes, with parallelism=1, you'll get one file, with everything higher, Flink is creating a directory with a file for each parallel instance. In your case, Flink can not create (or write to) the file because there is already a directory with the same name. Can you delete the directory and

Re: groupBy(int) is undefined for the type SingleOutputStreamOperator while running streaming example provided on webpage

2016-01-20 Thread Robert Metzger
Hi. which Flink version are you using? Starting from Flink 0.10., "groupBy" has been renamed to "keyBy". Where did you find the "groupBy" example? On Wed, Jan 20, 2016 at 3:37 PM, Vinaya M S wrote: > Hi, > > I'm new to Flink. Can anyone help me with the error below. > > Exception in thread "ma

Re: groupBy(int) is undefined for the type SingleOutputStreamOperator while running streaming example provided on webpage

2016-01-20 Thread Robert Metzger
Thank you. > > On Wed, Jan 20, 2016 at 9:41 AM, Robert Metzger > wrote: > >> Hi. >> >> which Flink version are you using? >> Starting from Flink 0.10., "groupBy" has been renamed to "keyBy". >> >> Where did you find the "grou

Re: DeserializationSchema isEndOfStream usage?

2016-01-20 Thread Robert Metzger
Thanks Robert! I'll be keeping tabs on the PR. > > Cheers, > David > > On Mon, Jan 11, 2016 at 4:04 PM, Robert Metzger > wrote: > >> Hi David, >> >> In theory isEndOfStream() is absolutely the right way to go for stopping >> data sources in Flink. >

Re: Could not upload the jar files to the job manager IOException

2016-01-21 Thread Robert Metzger
a.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) > at java.net.SocketOutputStream.write(SocketOutputStream.java:153) > at org.apache.flink.runtime.blob.BlobUtils.writeLength(BlobUtils.java:262) > at > org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.ja

  1   2   3   4   5   6   7   8   9   10   >