Re: Migrating createTemporaryView to new Table api.

2021-10-14 Thread Niels Basjes
Thanks. That works like a charm. Niels On Thu, Oct 14, 2021 at 5:17 AM Caizhi Weng wrote: > Hi! > > To implement the renaming of fields with the new API, try this: > > tableEnv.createTemporaryView( > "AgentStream", > inputStre

Migrating createTemporaryView to new Table api.

2021-10-13 Thread Niels Basjes
ect I'm doing something wrong regarding the mentioned "generic raw type" and the way I'm trying to define the Schema. What I essentially am looking for is the correct way to give the 3 provided columns a new name and type. How do I do this correctly in the new API? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Flink to BigTable

2021-01-24 Thread Niels Basjes
Hi, I haven't tried it myself yet but there is a Flink connector for HBase and I remember someone telling me that Google has made a library available which is effectively the HBase client which talks to BigTable in the backend. Like I said: I haven't tried this yet myself. Niels Bas

Re: [ANNOUNCE] Apache Flink 1.12.0 released

2020-12-11 Thread Niels Basjes
mmunity who > made this release possible! > > Regards, > Dian & Robert > > -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: [Flink Unit Tests] Unit test for Flink streaming codes

2020-08-03 Thread Niels Basjes
erminate a stream. Niels On Mon, Aug 3, 2020 at 5:24 PM Vijayendra Yadav wrote: > Thank you Arvid, David and Niels for your valuable inputs. One last > Question: How do I terminate the flink streaming execution environment > after the integration test is completed? > > Regards

Re: [Flink Unit Tests] Unit test for Flink streaming codes

2020-08-01 Thread Niels Basjes
No, I only have Java. On Fri, 31 Jul 2020, 21:57 Vijayendra Yadav, wrote: > Thank You Niels. Would you have something for the scala object class. Say > for example if I want to implement a unit test ( not integration test) for > below code or similar : > > > https://githu

Re: [Flink Unit Tests] Unit test for Flink streaming codes

2020-07-31 Thread Niels Basjes
Does this test in one of my own projects do what you are looking for? https://github.com/nielsbasjes/yauaa/blob/1e1ceb85c507134614186e3e60952112a2daabff/udfs/flink/src/test/java/nl/basjes/parse/useragent/flink/TestUserAgentAnalysisMapperClass.java#L107 On Fri, 31 Jul 2020, 20:20 Vijayendra Yadav

Re: Handle idle kafka source in Flink 1.9

2020-07-22 Thread Niels Basjes
Have a look at this presentation I gave a few weeks ago. https://youtu.be/bQmz7JOmE_4 Niels Basjes On Wed, 22 Jul 2020, 08:51 bat man, wrote: > Hi Team, > > Can someone share their experiences handling this. > > Thanks. > > On Tue, Jul 21, 2020 at 11:30 AM bat man wrote

Re: Chaining the creation of a WatermarkStrategy doesn't work?

2020-07-08 Thread Niels Basjes
Thanks guys, It is clear this is a Java thing. Niels On Wed, Jul 8, 2020 at 9:56 AM Tzu-Li (Gordon) Tai wrote: > Ah, didn't realize Chesnay has it answered already, sorry for the > concurrent > reply :) > > > > -- > Sent from: > http://apache-flink-user

Chaining the creation of a WatermarkStrategy doesn't work?

2020-07-07 Thread Niels Basjes
Why is that? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: [PROPOSAL] Contribute training materials to Apache Flink

2020-04-09 Thread Niels Basjes
Hi, Sounds like a very nice thing to have as part of the project ecosystem. Niels On Thu, Apr 9, 2020 at 8:10 PM David Anderson wrote: > Dear Flink Community! > > For some time now Ververica has been hosting some freely available Apache > Flink training materi

Deploying native Kubernetes via yaml files?

2020-03-24 Thread Niels Basjes
s / Met vriendelijke groeten, Niels Basjes

Re: [Kubernetes] java.lang.OutOfMemoryError: Metaspace ?

2020-03-02 Thread Niels Basjes
I've put some information about my situation in the ticket https://issues.apache.org/jira/browse/FLINK-16142?focusedCommentId=17049679&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17049679 On Mon, Mar 2, 2020 at 2:55 PM Arvid Heise wrote: > Hi Niels

[Kubernetes] java.lang.OutOfMemoryError: Metaspace ?

2020-03-02 Thread Niels Basjes
Hi, I'm running a lot of batch jobs on Kubernetes once in a while I get this exception. What is causing this? How can I fix this? Niels Basjes java.lang.OutOfMemoryError: Metaspace at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader

Re: Giving useful names to the SQL steps/operators.

2020-03-01 Thread Niels Basjes
Thanks. On Sat, Feb 29, 2020 at 4:20 PM Yuval Itzchakov wrote: > > Unfortunately, it isn't possible. You can't set names to steps like > ordinary Java/Scala functions. > > On Sat, 29 Feb 2020, 17:11 Niels Basjes, wrote: > >> Hi, >> >> I

Writing a DataSet to ElasticSearch

2020-03-01 Thread Niels Basjes
ernative I came up with is to write the output of my batch to a file and then load that (with a stream) into ES. What is the proper solution? Is there an OutputFormat for ES I can use that I overlooked? -- Best regards / Met vriendelijke groeten, Niels Basjes

Giving useful names to the SQL steps/operators.

2020-02-29 Thread Niels Basjes
NetworkType, clicks, visitors) exceeded the 80 characters length limit and was truncated. As you can see this impacts not only the names of the steps but also the metrics. My question if it is possible to specify a name for the step, similar to what I can do in the Java code? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: [Native Kubernetes] [Ceph S3] Unable to load credentials from service endpoint.

2020-02-28 Thread Niels Basjes
now ALL jobs in this Flink cluster have the same credentials. Is there a way to set the S3 credentials on a per job or even per connection basis? Niels Basjes On Fri, Feb 28, 2020 at 4:38 AM Yang Wang wrote: > Hi Niels, > > Glad to hear that you are trying Flink native K8s integration

[Native Kubernetes] [Ceph S3] Unable to load credentials from service endpoint.

2020-02-27 Thread Niels Basjes
setup. [default] access_key = myAccessKey secret_key = mySecretKey host_base = s3.example.nl *I'm stuck, please help:* - What is causing the differences in behaviour between local and in k8s? It works locally but not in the cluster. - How do I figure out what network it is trying to reach in k8s? Thanks. -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: [Flink 1.10] How do I use LocalCollectionOutputFormat now that writeUsingOutputFormat is deprecated?

2020-02-22 Thread Niels Basjes
.collect(resultDataStream) .forEachRemaining(result::add); assertEquals(2, result.size()); And as you explained because the 'collect' already does an execute this works like a charm. Niels On Sat, Feb 22, 2020 at 1:38 AM Robert Metzger wrote: > Hey, > you are right. I'

Re: [Flink 1.10] How do I use LocalCollectionOutputFormat now that writeUsingOutputFormat is deprecated?

2020-02-21 Thread Niels Basjes
. On Fri, Feb 21, 2020 at 1:00 PM Robert Metzger wrote: > Hey Niels, > > This minimal Flink job executes in Flink 1.10: > > public static void main(String[] args) throws Exception { >final StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutio

Re: [Flink 1.10] How do I use LocalCollectionOutputFormat now that writeUsingOutputFormat is deprecated?

2020-02-18 Thread Niels Basjes
finitionDataStream(TestUserAgentAnalysisMapperInline.java:144) Did I do something wrong? Is this a bug in the DataStreamUtils ? Niels Basjes On Mon, Feb 17, 2020 at 8:56 AM Tzu-Li Tai wrote: > Hi, > > To collect the elements of a DataStream (usually only meant for testing > pur

[Flink 1.10] How do I use LocalCollectionOutputFormat now that writeUsingOutputFormat is deprecated?

2020-02-14 Thread Niels Basjes
t regards / Met vriendelijke groeten, Niels Basjes

Re: Flink SQL: How to tag a column as 'rowtime' from a Avro based DataStream?

2019-09-10 Thread Niels Basjes
Hi. Can you give me an example of the actual syntax of such a cast? On Tue, 10 Sep 2019, 16:30 Fabian Hueske, wrote: > Hi Niels, > > I think (not 100% sure) you could also cast the event time attribute to > TIMESTAMP before you emit the table. > This should remove the event tim

Re: Flink SQL: How to tag a column as 'rowtime' from a Avro based DataStream?

2019-08-21 Thread Niels Basjes
upleType = new RowTypeInfo(fieldTypes); DataStream resultSet = tableEnv.toAppendStream(resultTable, tupleType); Which gives me the desired DataStream. Niels Basjes On Wed, Aug 14, 2019 at 5:13 PM Timo Walther wrote: > Hi Niels, > > if you are coming from DataStream

Flink SQL: How to tag a column as 'rowtime' from a Avro based DataStream?

2019-08-14 Thread Niels Basjes
that the timestamp column show be treated as the rowtime. How do I do that? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Is the provided Serializer/TypeInformation checked "too late"?

2019-07-09 Thread Niels Basjes
Hi Timo, Thanks for the clarification. It reassuring to hear that my code does the right thing. I'll just ignore these messages for now. Niels On Mon, 8 Jul 2019, 15:09 Timo Walther, wrote: > Hi Niels, > > the type handling evolved during the years and is a bit messed up

Is the provided Serializer/TypeInformation checked "too late"?

2019-07-08 Thread Niels Basjes
th mentioned examples the correct serialization classes when running. So what is happening here? Did I forget to do a required call? So is this a bug? Is the provided serialization via TypeInformation 'skipped' during startup and only used during runtime? -- Best regards / Met vriendelijke groeten, Niels Basjes

BigQuery source ?

2019-05-31 Thread Niels Basjes
have not been able to find anything yet. Any pointers/hints/code fragments are welcome. Thanks -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-11 Thread Niels Basjes
Hi, The Beam project has something in this area that is simply a page within their documentation website: https://beam.apache.org/documentation/sdks/java-thirdparty/ Niels Basjes On Fri, Mar 8, 2019 at 11:39 PM Bowen Li wrote: > > Confluent hub for Kafka is another good example of this k

Re: Clean shutdown of streaming job

2018-10-22 Thread Niels van Kaam
to exactly-once guarantee. -- Niels On Mon, Oct 22, 2018 at 1:26 AM Ning Shi wrote: > I'm implementing a streaming job that consumes events from Kafka and > writes results to Cassandra. The writes to Cassandra are not > idempotent. In preparation for planned maintenance events like F

Re: Manual trigger the window in fold operator or incremental aggregation

2018-10-17 Thread Niels van Kaam
Sorry, I would not know that. I have worked with custom triggers, but haven't actually had to implement a custom window function yet. By looking at the interfaces I would not say that is possible. Niels On Wed, Oct 17, 2018 at 2:18 PM Ahmad Hassan wrote: > Hi Niels, > > Can

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Niels van Kaam
Hi All, Thanks for the responses, the finished source explains my issue then. I can work around the problem by letting my sources negotiate a "final" checkpoint via zookeeper. @Paul, I think your answer was meant for the earlier question asked by Joshua? Cheers, Niels On Wed, Oct 1

Re: Manual trigger the window in fold operator or incremental aggregation

2018-10-17 Thread Niels van Kaam
ting https://ci.apache.org/projects/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/windowing/triggers/Trigger.html . Note that this does not change the window, but just causes the windowedstream to emit intermediate results to the next operator. Does this answer your question? Cheers, Nie

Checkpointing when one of the sources has completed

2018-10-17 Thread Niels van Kaam
ed to continue when one or more sources of a job are in the "FINISHED" state? Cheers, Niels

Re: [DISCUSS] Dropping flink-storm?

2018-09-29 Thread Niels Basjes
I would drop it. Niels Basjes On Sat, 29 Sep 2018, 10:38 Kostas Kloudas, wrote: > +1 to drop it as nobody seems to be willing to maintain it and it also > stands in the way for future developments in Flink. > > Cheers, > Kostas > > > On Sep 29, 2018, at 8:19 AM, Tzu-Li

Re: Order of events in a Keyed Stream

2018-07-29 Thread Niels Basjes
. Niels Basjes On Sun, Jul 29, 2018 at 9:25 AM, Congxian Qiu wrote: > Hi, > Maybe the messages of the same key should be in the *same partition* of > Kafka topic > > 2018-07-29 11:01 GMT+08:00 Hequn Cheng : > >> Hi harshvardhan, >> If 1.the messages exist on the

Dealing with an asynchronous source (and sink) in Flink 1.5.0. Await.Result() does not complete.

2018-06-13 Thread Niels van Kaam
(or Sink for that matter)? With Await.result I do make sure the calls are created and awaited within a single checkpoint. Any other suggestions where to look for the problem, or explanation why this issue could occur when upgrading from 1.4.2 to 1.5.0? Thank you for your help! Cheers, Niels

Re: Akka Http used in custom RichSourceFunction

2018-05-25 Thread Niels van Kaam
: https://doc.akka.io/docs/akka/2.5.12/general/configuration.html#When_using_JarJar__OneJar__Assembly_or_any_jar-bundler Example POM: https://github.com/nvankaam/websocketclient Niels On Fri, May 25, 2018 at 11:00 AM Piotr Nowojski wrote: > Hi, > > Yes, this might be the cause of the issue

Re: Akka Http used in custom RichSourceFunction

2018-05-25 Thread Niels van Kaam
nment.java:1501) at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:629) at net.vankaam.flink.WebSocketSample$.main(WebSocketSample.scala:42) at net.vankaam.flink.WebSocketSample.main(WebSocketSample.scala) Cheers, Niels On Thu, May 24,

Akka Http used in custom RichSourceFunction

2018-05-24 Thread Niels van Kaam
is conflict? If not, does this mean I should avoid using Akka (or at least other versions than Flink's) within my sources/sinks? Or can I safely catch and ignore the error? My dependencies are: Flink: 1.4.2 akka-actor: 2.5.12 akka-stream: 2.5.12 akka-http: 10.1.1 Thank you for your help! Cheers, Niels

Re: "Close()" aborts last transaction in TwoPhaseCommitSinkFunction

2018-03-09 Thread Niels van Kaam
Thank you! I already have a custom source function so adding the hacky solution would not be too much effort. Looking forward to the "proper" solution! Niels On Fri, Mar 9, 2018, 16:00 Piotr Nowojski wrote: > Hi, > > Short answer is: no, at the moment clean shutdown is

"Close()" aborts last transaction in TwoPhaseCommitSinkFunction

2018-03-09 Thread Niels van Kaam
our and perform a commit, but then I would perform a commit without getting the checkpoint completed notification, thus not properly maintaining exactly once guarantees Is (and how is) it possible to have end-to-end exactly once guarantees when dealing with (sometimes) finite jobs? Thanks! Niels

Re: Fat jar fails deployment (streaming job too large)

2018-02-27 Thread Niels
In case it's useful I've found how to enable bit more debug logging on the jobmanager: flink_jobmanager_log.txt -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nab

Re: Fat jar fails deployment (streaming job too large)

2018-02-27 Thread Niels
e fat jar isn't deployed. See the attached files for the logs of the jobmanager and the deployer. Let me know if I can provide you with any additional info. Thanks for your help! Cheers, Niels Flink_deploy_log.txt <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.c

Fat jar fails deployment (streaming job too large)

2018-02-27 Thread Niels
meout to several minutes. Any help would be very much appreciated! Thanks, Niels -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Managed State Custom Serializer with Avro

2018-02-26 Thread Niels
, Niels -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Managed State Custom Serializer with Avro

2018-02-19 Thread Niels Denissen
Hi Till, Thanks for the quick reply, I'm using 1.3.2 atm. Cheers, Niels On Feb 19, 2018 19:10, "Till Rohrmann" wrote: > Hi Niels, > > which version of Flink are you using? Currently, Flink does not support to > upgrade the TypeSerializer itself, if I'm not mis

Managed State Custom Serializer with Avro

2018-02-19 Thread Niels
serializer provided is only used for writes. If this is indeed the case it explains our aforementioned problem. If you have any pointers as to whether this is true and what a possible solution would be that would be very much appreciated! Thanks! Niels -- Sent from: http://apache-fli

Re: HBase config settings go missing within Yarn.

2017-10-26 Thread Niels Basjes
I have an idea how we can reduce the impact this class of problem. If we can detect that we are running in a distributed environment then in order to use HBase you MUST have an hbase-site.xml I'll see if I can make a proof of concept. Niels On Wed, Oct 25, 2017 at 11:27 AM, Till Rohrmann

Re: HBase config settings go missing within Yarn.

2017-10-24 Thread Niels Basjes
roach. Niels Basjes On Tue, Oct 24, 2017 at 11:29 AM, Niels Basjes wrote: > Minor correction: The HBase jar files are on the classpath, just in a > different order. > > On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes wrote: > >> I did some more digging. >> >>

Re: HBase config settings go missing within Yarn.

2017-10-24 Thread Niels Basjes
Minor correction: The HBase jar files are on the classpath, just in a different order. On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes wrote: > I did some more digging. > > I added extra code to print both the environment variables and the > classpath that is used by the HBaseConf

Re: HBase config settings go missing within Yarn.

2017-10-24 Thread Niels Basjes
y needs the HBase client (Jar, packaged into application) and the HBase zookeeper settings (present on the machine where it is started). Niels Basjes On Mon, Oct 23, 2017 at 10:23 AM, Piotr Nowojski wrote: > Till do you have some idea what is going on? I do not see any meaningful >

Re: HBase config settings go missing within Yarn.

2017-10-22 Thread Niels Basjes
://github.com/nielsbasjes/FlinkHBaseConnectProblem Niels Basjes On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski wrote: > Is this /etc/hbase/conf/hbase-site.xml file is present on all of the > machines? If yes, could you share your code? > > On 20 Oct 2017, at 16:29, Niels Basjes

Re: HBase config settings go missing within Yarn.

2017-10-20 Thread Niels Basjes
sk/job actually running in the cluster does not have the same settings. So it seems in the transition into the cluster the application does not copy everything it has available locally for some reason. There is a very high probability I did something wrong, I'm just not seeing it at this mome

Re: HBase config settings go missing within Yarn.

2017-10-20 Thread Niels Basjes
To facilitate you guys helping me I put this test project on github: https://github.com/nielsbasjes/FlinkHBaseConnectProblem Niels Basjes On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes wrote: > Hi, > > Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn > cluster

HBase config settings go missing within Yarn.

2017-10-20 Thread Niels Basjes
s. As a workaround I currently put this extra line in my code which I know is nasty but "works on my cluster" hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml")); What am I doing wrong? What is the right way to fix this? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: PartitionNotFoundException when running in yarn-session.

2017-10-13 Thread Niels Basjes
Hi I did some tests and it turns out I was really overloading the cluster which caused the problems. I tried the timeout setting but that didn't help. Simply 'not overloading' the system did help. Thanks. Niels On Thu, Oct 12, 2017 at 10:42 AM, Ufuk Celebi wrote: > He

Re: PartitionNotFoundException when running in yarn-session.

2017-10-12 Thread Niels Basjes
have expected something similar as with 'good old' MapReduce: The missing task is simply resubmitted and ran again. Why doesn't that happen? Niels On Wed, Oct 11, 2017 at 8:49 AM, Ufuk Celebi wrote: > Hey Niels, > > any update on this? > > – Ufuk > > > On Mon,

PartitionNotFoundException when running in yarn-session.

2017-10-09 Thread Niels Basjes
.2? Thanks. -- Best regards / Met vriendelijke groeten, Niels Basjes

Re:Re: Just do a survey, how many people give up the storm and turn to Flink ?

2017-08-20 Thread Niels Basjes
If you combine Storm with Redis for managing state you still do not have "exactly once" when a failure of a processing node occurs. With Flink you do have that. Niels On 19 Aug 2017 12:00, "mingleizhang" <18717838...@163.com> wrote: > Thanks Niels for your answer

Re: Just do a survey, how many people give up the storm and turn to Flink ?

2017-08-19 Thread Niels Basjes
st 2 years. But comparing the two at this moment would still let me choose Flink. Niels On 19 Aug 2017 10:28, "mingleizhang" <18717838...@163.com> wrote: > Hi, flink user > > I just want to do a survey as the subject said. How many of you that > used to use *storm

Re: [POLL] Who still uses Java 7 with Flink ?

2017-07-13 Thread Niels Basjes
+1 For dropping java 1.7 On 13 Jul 2017 04:11, "Jark Wu" wrote: > +1 for dropping Java 7 > > 2017-07-13 9:34 GMT+08:00 ☼ R Nair (रविशंकर नायर) < > ravishankar.n...@gmail.com>: > >> +1 for dropping Java 1.7. >> >> On Wed, Jul 12, 2017 at 9:10 PM, Kurt Young wrote: >> >>> +1 for droppint Java 7,

Re: Writing groups of Windows to files

2017-07-04 Thread Niels Basjes
by capping the maximum number of events in a session to a value that is only reached by robots (for which the whole idea os sessions is bogus anyway). Thanks for the tips. Niels On Tue, Jul 4, 2017 at 12:07 PM, Fabian Hueske wrote: > You are right. Building a large record might result in an O

Re: Writing groups of Windows to files

2017-07-04 Thread Niels Basjes
roblem (in my mind) that I have with this is that a single session with a LOT of events would bring the system to a halt because it can trigger OOM events. How should I handle those? Niels

Writing groups of Windows to files

2017-06-30 Thread Niels Basjes
rds / Met vriendelijke groeten, Niels Basjes

Re: Periodic flush sink?

2017-04-29 Thread Niels Basjes
nstruct to fail? Any edge cases I missed? Niels private transient BufferedMutator mutator = null; private transient Timer timer = null; @Override public void open(Configuration parameters) throws Exception { org.apache.hadoop.conf.Configuration hbaseConfig = HBaseConfiguration.create(); C

Periodic flush sink?

2017-04-29 Thread Niels Basjes
he buffers atleast every few seconds. Simply implement a standard Java TimerTask and fire that using a Timer? Or is there a better way of doing that in Flink? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Flink job on secure Yarn fails after many hours

2017-04-14 Thread Niels Basjes
Hi, No, this issue is now gone for us. The fixed in 1.2.0 ensured that we are now able to run jobs on our cluster beyond the 7 days limit. Niels On Wed, Apr 12, 2017 at 5:35 PM, Robert Metzger wrote: > Niels, are you still facing this issue? > > As far as I understood it, the securit

Re: Streaming file source?

2017-01-20 Thread Niels Basjes
Thanks! This sounds really close to what I had in mind. I'll use this first and see how far I get. Niels On Fri, Jan 20, 2017 at 11:27 AM, Stephan Ewen wrote: > Hi Niels! > > There is the Continuous File Monitoring Source, used via > > StreamExecutionEnvironment.read

Streaming file source?

2017-01-20 Thread Niels Basjes
data which (in the live situation) come from a single Kafka partition. I hate reinventing the wheel so I'm wondering is something like this already been built by someone? If so, where can I find it? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Kafka KeyedStream source

2017-01-18 Thread Niels Basjes
to determine the partition it should be able to make to work. Right now I simply provide the 'sessionId' to Kafka and thenit is probably a hashing function IN kafka that does the magic. I'm not sure if we can control that enough with Kafka right now. Niels On Mon, Jan 16, 2017 a

Re: Kafka KeyedStream source

2017-01-11 Thread Niels Basjes
ns I would read these records in and I could filter the data more efficiently because the data would not need to go over the network before this filter. Afterwards I can scale it up to 'many' tasks for the heavier processing that follows. As a concept: Could that be made to work? Niels

Kafka KeyedStream source

2017-01-05 Thread Niels Basjes
produces a keyed data stream? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: microsecond resolution

2016-12-06 Thread Niels Basjes
you MUST make sure you do the same with things like watermarks. And if you want to have a watermark that is 5 seconds before the current time stamp you must be sure to substract 500 instead of 5000 fom the timestamp. Niels Basjes On Mon, Dec 5, 2016 at 2:48 PM, jeff jacobson

Re: Flink Avro Kafka Reading/Writing

2016-11-12 Thread Niels Basjes
consumer side. See: https://issues.apache.org/jira/browse/AVRO-1704 https://github.com/apache/avro/blob/master/lang/java/ipc/src/test/java/org/apache/avro/message/TestCustomSchemaStore.java Niels Basjes On Fri, Nov 11, 2016 at 3:05 PM, daviD wrote: > Hi All, > > Does anyone know if Flink

Re: Unit testing a Kafka stream based application?

2016-10-27 Thread Niels Basjes
Thanks. This is exactly what I needed. Niels On Wed, Oct 26, 2016 at 11:03 AM, Robert Metzger wrote: > Hi Niels, > > Sorry for the late response. > > you can launch a Kafka Broker within a JVM and use it for testing purposes. > Flink's Kafka connector is using that a lo

Unit testing a Kafka stream based application?

2016-10-21 Thread Niels Basjes
. -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Delaying starting the jobmanager in yarn?

2016-08-26 Thread Niels Basjes
Thanks! I'm going to work with this next week. Have a nice weekend. Niels On Fri, Aug 26, 2016 at 2:49 PM, Maximilian Michels wrote: > It is a bit more involved as I thought. We could simply the API further: > > import org.apache.flink.client.program.PackagedPro

Re: Running multiple jobs on yarn (without yarn-session)

2016-08-25 Thread Niels Basjes
owse/FLINK-4495 for you Niels Basjes On Thu, Aug 25, 2016 at 3:34 PM, Maximilian Michels wrote: > Hi Niels, > > This is with 1.1.1? We could fix this in the upcoming 1.1.2 release by > only using automatic shut down for detached jobs. In all other cases > we should be able to shut

Re: Delaying starting the jobmanager in yarn?

2016-08-25 Thread Niels Basjes
Sounds good. Is there a basic example somewhere I can have a look at? Niels On Thu, Aug 25, 2016 at 2:55 PM, Maximilian Michels wrote: > Hi Niels, > > If you're using 1.1.1, then you can instantiate the > YarnClusterDescriptor and supply it with the Flink jar and &g

Running multiple jobs on yarn (without yarn-session)

2016-08-25 Thread Niels Basjes
on it works but then I have the troubles of starting a (detached yarn-session) AND to terminate that thing again after my jobs have run. -- Best regards / Met vriendelijke groeten, Niels Basjes

Delaying starting the jobmanager in yarn?

2016-08-25 Thread Niels Basjes
o run? Can we 'manually' start and stop the jobmanager in yarn in some way from our java code? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Batch jobs with a very large number of input splits

2016-08-23 Thread Niels Basjes
jobs in yarn-session then you MUST specify the parallelism for all steps or otherwise it will fill the yarn-session completely and you cannot run multiple jobs in parallel. Is this conclusion correct? Niels Basjes On Fri, Aug 19, 2016 at 3:18 PM, Robert Metzger wrote: > Hi Niels, > >

Re: Checking for existance of output directory/files before running a batch job

2016-08-22 Thread Niels Basjes
path (i.e. "foo") and run it anywhere (local, Yarn, Mesos, etc.) without any problems. What do you guys think? Is this desirable? Possible? Niels. On Fri, Aug 19, 2016 at 3:22 PM, Robert Metzger wrote: > Ooops. Looks like Google Mail / Apache / the internet needs 13 minutes to >

Checking for existance of output directory/files before running a batch job

2016-08-18 Thread Niels Basjes
onment yet I was unable to get the 'correct' filesystem from there. What is the proper way to check this? -- Best regards / Met vriendelijke groeten, Niels Basjes

Batch jobs with a very large number of input splits

2016-08-18 Thread Niels Basjes
vriendelijke groeten, Niels Basjes

Re: Running yarn-session a kerberos secured Yarn/HBase cluster.

2016-08-01 Thread Niels Basjes
https://github.com/apache/flink/pull/2317 On Mon, Aug 1, 2016 at 11:54 AM, Niels Basjes wrote: > Thanks for the pointers towards the work you are doing here. > I'll put up a patch for the jars and such in the next few days. > https://issues.apache.org/jira/browse/FLINK-4287 &g

Re: Running yarn-session a kerberos secured Yarn/HBase cluster.

2016-08-01 Thread Niels Basjes
Thanks for the pointers towards the work you are doing here. I'll put up a patch for the jars and such in the next few days. https://issues.apache.org/jira/browse/FLINK-4287 Niels Basjes On Mon, Aug 1, 2016 at 11:46 AM, Stephan Ewen wrote: > Thank you for the breakdown of the

Running yarn-session a kerberos secured Yarn/HBase cluster.

2016-08-01 Thread Niels Basjes
all long running jobs) I would really like to have this to be a 'long lived' thing. As far as I know this is just the tip of the security ice berg and I would like to know what the correct approach is to solve this. Thanks. -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Debugging watermarks?

2016-05-26 Thread Niels Basjes
Thanks guys, Using the above code as a reference I was quickly able to find the problems in my code. Niels Basjes On Sun, May 22, 2016 at 2:00 PM, Stephan Ewen wrote: > Hi Niels! > > It may also be interesting for you to know that with the extension of the > metrics and the

Debugging watermarks?

2016-05-21 Thread Niels Basjes
ctual data. Thanks. Niels Basjes

Re: throttled stream

2016-04-16 Thread Niels Basjes
Simple idea: create a map function that only does "sleep 1/5 second" and put that in your pipeline somewhere. Niels On 16 Apr 2016 22:38, "Chen Bekor" wrote: > is there a way to consume a kafka stream using flink with a predefined > rate limit (eg 5 events per second

Re: Flink job on secure Yarn fails after many hours

2016-03-19 Thread Niels Basjes
fails once in a while and have an automatic restart feature (i.e. shell script with a while true loop). The best guess at a root cause is this https://issues.apache.org/jira/browse/HDFS-9276 If you have a real solution or a reference to a related bug report to this problem then please share! Nie

Re: Stack overflow from self referencing Avro schema

2016-03-10 Thread Niels Basjes
/jira/browse/AVRO/ Thanks Niels Basjes On Thu, Mar 10, 2016 at 4:11 PM, David Kim wrote: > Hello! > > Just wanted to check up on this again. Has anyone else seen this before or > have any suggestions? > > Thanks! > David > > On Tue, Mar 8, 2016 at 12

Re: How to ensure exactly-once semantics in output to Kafka?

2016-02-05 Thread Niels Basjes
Skip Get message 8 -> Read from Kafka --> Already have this --> Skip Get message 9 -> Read from Kafka --> Not yet in Kafka --> Write and resume normal operations. Like I said: This is just the first rough idea I had on a possible direction how this can be solved without the latency imp

Re: How to ensure exactly-once semantics in output to Kafka?

2016-02-05 Thread Niels Basjes
isturbance. What do you think? Niels Basjes On Fri, Feb 5, 2016 at 11:57 AM, Stephan Ewen wrote: > Hi Niels! > > In general, exactly once output requires transactional cooperation from > the target system. Kafka has that on the roadmap, we should be able to > integrate that onc

How to ensure exactly-once semantics in output to Kafka?

2016-02-05 Thread Niels Basjes
) each message that is read from Kafka (my input) is written to Kafka (my output) exactly once? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Comparison of storm and flink

2016-01-24 Thread Niels Basjes
g situation. As a final note: I've been hacking at Storm for over a year now and last summer I found Flink. Today Storm is for me no longer an option and we are taking down what we already had running. Niels Basjes On 23 Jan 2016 20:59, "Vinaya M S" wrote: > Hi Flink user grou

Re: Redeployements and state

2016-01-22 Thread Niels Basjes
he configured checkpoint persistance and recovers the most recent one. Apparently there is a mismatch between what I think is useful and what has been implemented so far. Am I missing something or should I submit this as a Jira ticket for a later version? Niels Basjes On Mon, Jan 18, 2016 at 12:

  1   2   >