application with kerberos that includes
a solution for the kerberos ticket timeout ?
Thanks
Niels Basjes
Hi,
Thanks for your feedback.
So I guess I'll have to talk to the security guys about having special
kerberos ticket expiry times for these types of jobs.
Niels Basjes
On Fri, Oct 23, 2015 at 11:45 AM, Maximilian Michels <m...@apache.org> wrote:
> Hi Niels,
>
> Thank you fo
t; build, it'll likely take 30-40 minutes. Subsequent builds might take 10
>> minutes approx. [I have the same PC configuration.]
>>
>> -- Sachin Goel
>> Computer Science, IIT Delhi
>> m. +91-9871457685
>>
>> On Sun, Nov 8, 2015 at 2:05 AM, Niels
? ( Perhaps the
application manager can proxy the RPC calls? )
--
Best regards / Met vriendelijke groeten,
Niels Basjes
Great!
I'll watch the issue and give it a test once I see a working patch.
Niels Basjes
On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels <m...@apache.org> wrote:
> Hi Niels,
>
> Thanks a lot for reporting this issue. I think it is a very common setup
> in corporate infr
x yourself? If you are to busy
> at the moment, we can also discuss how we share the work (I'm implementing
> it, you test the fix)
>
>
> Robert
>
> On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes <ni...@basjes.nl> wrote:
>
>> Update on the status so far I suspect I
:57, "Maximilian Michels" <m...@apache.org> wrote:
> Thank you for looking into the problem, Niels. Let us know if you need
> anything. We would be happy to merge a pull request once you have verified
> the fix.
>
> On Thu, Nov 5, 2015 at 1:38 PM, Niels Basjes <ni..
this sounds like the most viable solution.
I don't know how they implemented this in MR.
I know the way they did it actually works on our clusters (with firewalls).
Niels Basjes
On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <rmetz...@apache.org> wrote:
> Hi Niels,
>
> so the prob
me your firewall allows outside connections from that
> port range.
> So we also have a new approach:
>
> f) Allocate the YARN application master (and blob manager) within a
> user-specified port-range.
>
> This would be really easy to implement, because we would just need to go
start and connect to each
> other (the number of TaskManagers is shown correctly in the web interface).
>
Correct. Flink starts (i see the jobmanager UI) but the actual job is not
started.
Niels Basjes
e the complete
> logs available and could you share them?
>
>
> Best regards,
> Max
>
> On Wed, Dec 2, 2015 at 11:47 AM, Niels Basjes <ni...@basjes.nl> wrote:
> > Hi,
> >
> >
> > We have a Kerberos secured Yarn cluster here and I'm experimenting w
option to update Hadoop and redeploy the
> >> job. Would be great if you could do that and let us know how it turns
> >> out.
> >>
> >> Cheers,
> >> Max
> >>
> >> On Wed, Dec 2, 2015 at 3:45 PM, Niels Basjes <ni...@basjes.nl> wrote:
Aljoscha, one way to make this more flexible is to
> enhance what you can do with custom state:
> - State has timeouts (for cleanup)
> - Functions allow you to schedule event-time progress notifications
>
> Stephan
>
>
>
> On Thu, Dec 10, 2015 at 11:55 AM, Niels B
tance of the QueueSource itself will be running
>> in each parallel instance of the source operator. And there is no way for
>> there being communication between the trigger and source, since they might
>> now even run on the same machine in the end.
>>
>> Cheers,
>> Aljo
to my Trigger and then onEventTime just output a 'new event' ?
What do you recommend?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
uch as bytes,
> records in, out of Streaming sources and sinks?
>
> On Tue, Dec 15, 2015 at 5:24 AM, Niels Basjes <ni...@basjes.nl> wrote:
>
>> Hi,
>>
>> @Ufuk: I added the env.disableOperatorChaining() and indeed now I see two
>> things on the screen and the
I working in the right direction for what I'm trying to achieve; or
should I use a different API? a different approach?
Thanks
--
Best regards / Met vriendelijke groeten,
Niels Basjes
I now understand I have to be more careful with these timers!.
Niels Basjes
On Fri, Nov 27, 2015 at 11:28 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:
> Hi Niels,
> do the records that arrive from Kafka already have the session ID or do
> you want to assign them inside
People seem to
> like working on state directly, but it should clean up automatically.
>
> Can you see if your use case fits onto windows, otherwise open a ticket
> for state expiry?
>
> Greetings,
> Stephan
>
>
> On Thu, Nov 26, 2015 at 10:42 PM, Niels Basjes <ni...@b
ng I want to be able to 'reprocess' everything from
the start of the queue.
Here the matter of 'event time' becomes a big question for me; In those
'replay' situations the event time will progress at a much higher speed
than the normal 1sec/sec.
How does this work in Apache Flink?
Niels Basjes
On Fri, Nov
the bottleneck of the computation. I
> would be very interested in seeing how this behaves since I only did tests
> with regular time windows, where the first if statement almost always
> directly returns, which is very cheap.
>
> Cheers,
> Aljoscha
> > On 27 Nov 2015, at 13:5
eamTimeCharacteristic(EventTime);
>
> or:
>
> env.getConfig().enableTimestamps();
>
> I know, not very intuitive.
>
> Cheers,
> Aljoscha
>
> > On 30 Nov 2015, at 14:47, Niels Basjes <ni...@basjes.nl> wrote:
> >
> > Hi,
> >
> > I'm exper
er way of doing this.
Niels Basjes
On Mon, Nov 30, 2015 at 9:29 PM, Stephan Ewen <se...@apache.org> wrote:
> Hi Niels!
>
> Nice use case that you have!
> I think you can solve this super nicely with Flink, such that "replay" and
> "realtime" are literally
groeten,
Niels Basjes
|
>>> +--> (window session) --> (rolling
>>> sink)
>>>
>>>
>>> You can put this all into one operator that accumulates the session
>>> elements but still immediately emits the new records (the real
162)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:440)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$1.onEvent(StreamTask.java:574)
... 8 more
Niels
On Tue, Dec 1, 2015 at 4:41 PM, Niels Basjes <ni...@basjes.nl> wrote:
> Thanks!
> I'm going to study this code clo
Thanks guys,
Using the above code as a reference I was quickly able to find the problems
in my code.
Niels Basjes
On Sun, May 22, 2016 at 2:00 PM, Stephan Ewen <se...@apache.org> wrote:
> Hi Niels!
>
> It may also be interesting for you to know that with the extension o
; > Hello,
> >
> > You are probably looking for this feature:
> > https://issues.apache.org/jira/browse/FLINK-2976
> >
> > Best,
> > Gábor
> >
> >
> >
> >
> > 2016-01-14 11:05 GMT+01:00 Niels Basjes <ni...@basjes.nl>:
&
Skip
Get message 8 -> Read from Kafka --> Already have this --> Skip
Get message 9 -> Read from Kafka --> Not yet in Kafka --> Write and resume
normal operations.
Like I said: This is just the first rough idea I had on a possible
direction how this can be solved without the latenc
s a final note: I've been hacking at Storm for over a year now and last
summer I found Flink. Today Storm is for me no longer an option and we are
taking down what we already had running.
Niels Basjes
On 23 Jan 2016 20:59, "Vinaya M S" <vinay...@gmail.com> wrote:
> Hi Flink us
ersistance and recovers the most recent one.
Apparently there is a mismatch between what I think is useful and what has
been implemented so far.
Am I missing something or should I submit this as a Jira ticket for a later
version?
Niels Basjes
On Mon, Jan 18, 2016 at 12:13 PM, Maximilian
/jira/browse/AVRO/
Thanks
Niels Basjes
On Thu, Mar 10, 2016 at 4:11 PM, David Kim <david@braintreepayments.com>
wrote:
> Hello!
>
> Just wanted to check up on this again. Has anyone else seen this before or
> have any suggestions?
>
> Thanks!
> David
>
> On Tu
fails once in a while and have an
automatic restart feature (i.e. shell script with a while true loop).
The best guess at a root cause is this
https://issues.apache.org/jira/browse/HDFS-9276
If you have a real solution or a reference to a related bug report to this
problem then please share!
Ni
ctual data.
Thanks.
Niels Basjes
Simple idea: create a map function that only does "sleep 1/5 second" and
put that in your pipeline somewhere.
Niels
On 16 Apr 2016 22:38, "Chen Bekor" wrote:
> is there a way to consume a kafka stream using flink with a predefined
> rate limit (eg 5 events per second)
>
>
ng jobs) I would really like to have this to be a 'long lived'
thing.
As far as I know this is just the tip of the security ice berg and I would
like to know what the correct approach is to solve this.
Thanks.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
https://github.com/apache/flink/pull/2317
On Mon, Aug 1, 2016 at 11:54 AM, Niels Basjes <ni...@basjes.nl> wrote:
> Thanks for the pointers towards the work you are doing here.
> I'll put up a patch for the jars and such in the next few days.
> https://issues.apache.org/jira/br
> file order deterministic.
>
> Stephan
>
>
> On Fri, Jan 20, 2017 at 11:20 AM, Niels Basjes <ni...@basjes.nl> wrote:
>
>> Hi,
>>
>> For testing and optimizing a streaming application I want to have a "100%
>> accurate repeatable" sub
data which (in the live situation) come
from a single Kafka partition.
I hate reinventing the wheel so I'm wondering is something like this
already been built by someone?
If so, where can I find it?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
file system.
>>
>> For example, hdfs:///path/to/foo
>>
>> If that doesn't work, do you have the same Hadoop configuration on the
>> machine where you test?
>>
>> Cheers,
>> Max
>>
>> On Thu, Aug 18, 2016 at 2:02 PM, Niels Basjes <ni...@basjes
?
Can we 'manually' start and stop the jobmanager in yarn in some way from
our java code?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
but then I have the
troubles of starting a (detached yarn-session) AND to terminate that thing
again after my jobs have run.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
ble to get
the 'correct' filesystem from there.
What is the proper way to check this?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
ke groeten,
Niels Basjes
gt;
> descriptor.setLocalJarPath(new Path(flinkJarPath));
> descriptor.setTaskManagerCount(2);
> descriptor.setName("Testing the YarnClusterClient");
>
> final YarnClusterClient client = descriptor.deploy();
> client.run(packagedProgra
jobs in
yarn-session then you MUST specify the parallelism for all steps or
otherwise it will fill the yarn-session completely and you cannot run
multiple jobs in parallel.
Is this conclusion correct?
Niels Basjes
On Fri, Aug 19, 2016 at 3:18 PM, Robert Metzger <rmetz...@apache.org> wrote:
st/java/org/
> apache/flink/streaming/connectors/kafka/KafkaConsumerTestBase.java#L1408
>
> I hope you can find the right code lines to copy for your purposes.
>
> Regards,
> Robert
>
>
> On Fri, Oct 21, 2016 at 4:00 PM, Niels Basjes <ni...@basjes.nl> wrote:
>
&g
.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
at the consumer side.
See:
https://issues.apache.org/jira/browse/AVRO-1704
https://github.com/apache/avro/blob/master/lang/java/ipc/src/test/java/org/apache/avro/message/TestCustomSchemaStore.java
Niels Basjes
On Fri, Nov 11, 2016 at 3:05 PM, daviD <duno...@yahoo.com> wrote:
> Hi All,
>
> D
able to adapt).
>
> Cheers,
> Gordon
>
> [1] http://apache-flink-mailing-list-archive.1008284.
> n3.nabble.com/kafka-partition-assignment-td12123.html
>
> On January 6, 2017 at 1:38:05 AM, Niels Basjes (ni...@basjes.nl) wrote:
>
> Hi,
>
> In my scenario I have click
the you MUST make
sure you do the same with things like watermarks.
And if you want to have a watermark that is 5 seconds before the current
time stamp you must be sure to substract 500 instead of 5000 fom the
timestamp.
Niels Basjes
On Mon, Dec 5, 2016 at 2:48 PM, jeff jacobson
produces a keyed data stream?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
vriendelijke groeten,
Niels Basjes
Hi Fabian,
On Fri, Jun 30, 2017 at 6:27 PM, Fabian Hueske wrote:
> If I understand your use case correctly, you'd like to hold back all
> events of a session until it ends/timesout and then write all events out.
> So, instead of aggregating per session (the common use case),
Since all records of a session are emitted by a single WIndowFunction
> call, these records won't be interrupted by a barrier. Hence, you'll have a
> "consistent" state for all windows when a checkpoint is triggered.
>
> I'm afraid, I'm not aware of a simpler solution f
atleast every few seconds.
Simply implement a standard Java TimerTask and fire that using a Timer?
Or is there a better way of doing that in Flink?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
gt; in many small hfiles, leading to more work for the compaction.
>
> FYI
>
> On Sat, Apr 29, 2017 at 7:32 AM, Niels Basjes <ni...@basjes.nl> wrote:
>
>> Hi,
>>
>> I have a sink that writes my records into HBase.
>>
>> The data stream is a
Hi,
The company I work for switched about 2 years ago because of these reasons
AT THAT moment!
1) Storm doesn't run on Yarn
2) Storm doesn't support statefull processing components.
3) Storm has a bad Java api.
4) Storm is not fast enough.
Some of these things have changed over the last 2 years.
. Yes, storm does not support stateful
> processing components. So, I have to use something like Redis to store it's
> stateful.
>
>
>
>
>
> At 2017-08-19 16:57:13, "Niels Basjes" <ni...@basj.es> wrote:
>
> Hi,
>
> The company I work for switched ab
report back when you have more info :-)
>
> – Ufuk
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 1+%3A+Fine+Grained+Recovery+from+Task+Failures
>
> [2] https://issues.apache.org/jira/browse/FLINK-4256
>
> On Thu, Oct 12, 2017 at 10:17 AM, Niels Basjes
s.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
Minor correction: The HBase jar files are on the classpath, just in a
different order.
On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes <ni...@basjes.nl> wrote:
> I did some more digging.
>
> I added extra code to print both the environment variables and the
> clas
.
Niels Basjes
On Tue, Oct 24, 2017 at 11:29 AM, Niels Basjes <ni...@basjes.nl> wrote:
> Minor correction: The HBase jar files are on the classpath, just in a
> different order.
>
> On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes <ni...@basjes.nl> wrote:
>
>> I did
he HBase client (Jar, packaged into application)
and the HBase zookeeper settings (present on the machine where it is
started).
Niels Basjes
On Mon, Oct 23, 2017 at 10:23 AM, Piotr Nowojski <pi...@data-artisans.com>
wrote:
> Till do you have some idea what is going on? I do not see any mean
protected String mapResultToOutType(Result result) {
> return new
> String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
> }
>
> @Override
> protected Scan getScanner() {
> return new Scan();
> }
>
://github.com/nielsbasjes/FlinkHBaseConnectProblem
Niels Basjes
On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <pi...@data-artisans.com>
wrote:
> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the
> machines? If yes, could you share your code?
>
> On 20
orkaround I currently put this extra line in my code which I know is
nasty but "works on my cluster"
hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
What am I doing wrong?
What is the right way to fix this?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
To facilitate you guys helping me I put this test project on github:
https://github.com/nielsbasjes/FlinkHBaseConnectProblem
Niels Basjes
On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <ni...@basjes.nl> wrote:
> Hi,
>
> Ik have a Flink 1.3.2 application that I want to run o
.
Niels Basjes
On Sun, Jul 29, 2018 at 9:25 AM, Congxian Qiu
wrote:
> Hi,
> Maybe the messages of the same key should be in the *same partition* of
> Kafka topic
>
> 2018-07-29 11:01 GMT+08:00 Hequn Cheng :
>
>> Hi harshvardhan,
>> If 1.the messages
I would drop it.
Niels Basjes
On Sat, 29 Sep 2018, 10:38 Kostas Kloudas,
wrote:
> +1 to drop it as nobody seems to be willing to maintain it and it also
> stands in the way for future developments in Flink.
>
> Cheers,
> Kostas
>
> > On Sep 29, 2018, at 8:19 AM, Tzu-Li
not been able to find anything yet.
Any pointers/hints/code fragments are welcome.
Thanks
--
Best regards / Met vriendelijke groeten,
Niels Basjes
d as the rowtime.
How do I do that?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
new RowTypeInfo(fieldTypes);
DataStream resultSet =
tableEnv.toAppendStream(resultTable, tupleType);
Which gives me the desired DataStream.
Niels Basjes
On Wed, Aug 14, 2019 at 5:13 PM Timo Walther wrote:
> Hi Niels,
>
> if you are coming from DataStream API, all y
t time property (and thereby the
> TimeIndicatorTypeInfo) and you wouldn't know to fiddle with the output
> types.
>
> Best, Fabian
>
> Am Mi., 21. Aug. 2019 um 10:51 Uhr schrieb Niels Basjes :
>
>> Hi,
>>
>> It has taken me quite a bit of time to figure this out.
&
ntioned
examples the correct serialization classes when running.
So what is happening here?
Did I forget to do a required call?
So is this a bug?
Is the provided serialization via TypeInformation 'skipped' during
startup and only used during runtime?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
aware of these confusions and the Table
> & SQL API will hopefully not use the TypeExtractor anymore in 1.10. This
> is what I am working on at the moment.
>
> Regards,
> Timo
>
> [0]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/types_serialization.ht
,
Niels Basjes
ive I came up with is to write the output of my batch to a
file and then load that (with a stream) into ES.
What is the proper solution?
Is there an OutputFormat for ES I can use that I overlooked?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
Thanks.
On Sat, Feb 29, 2020 at 4:20 PM Yuval Itzchakov wrote:
>
> Unfortunately, it isn't possible. You can't set names to steps like
> ordinary Java/Scala functions.
>
> On Sat, 29 Feb 2020, 17:11 Niels Basjes, wrote:
>
>> Hi,
>>
>> I'm playing
Hi,
I'm running a lot of batch jobs on Kubernetes once in a while I get this
exception.
What is causing this?
How can I fix this?
Niels Basjes
java.lang.OutOfMemoryError: Metaspace
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java
now ALL jobs in this Flink cluster have the same
credentials.
Is there a way to set the S3 credentials on a per job or even per
connection basis?
Niels Basjes
On Fri, Feb 28, 2020 at 4:38 AM Yang Wang wrote:
> Hi Niels,
>
> Glad to hear that you are trying Flink native K8s integration and
[default]
access_key = myAccessKey
secret_key = mySecretKey
host_base = s3.example.nl
*I'm stuck, please help:*
- What is causing the differences in behaviour between local and in k8s?
It works locally but not in the cluster.
- How do I figure out what network it is trying to reach in k8s?
Thanks.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
>> "-XX:MaxMetaspaceSize"
>> by default. The default value is 96m, loading too many classes will cause
>> "OutOfMemoryError: Metaspace"[1]. So you need to increase the configured
>> value.
>>
>>
>> [1].
>> https://ci.apache.org/projects/
finitionDataStream(TestUserAgentAnalysisMapperInline.java:144)
Did I do something wrong?
Is this a bug in the DataStreamUtils ?
Niels Basjes
On Mon, Feb 17, 2020 at 8:56 AM Tzu-Li Tai wrote:
> Hi,
>
> To collect the elements of a DataStream (usually only meant for testing
>
gentAnalysisMapperInline class is doing some magic
> that breaks with the StreamGraphGenerator?
>
> Best,
> Robert
>
> On Tue, Feb 18, 2020 at 9:59 AM Niels Basjes wrote:
>
>> Hi Gordon,
>>
>> Thanks. This works for me.
>>
>> I find it strange tha
defined" error.
> However, if you have collect(), print(), execute(), then the print() is
> filling the stream graph again, and you are executing two Flink jobs: the
> collect job and the execute job.
>
> I hope I got it right this time :)
>
> Best,
> Robert
>
> On Fri
/ Met vriendelijke groeten,
Niels Basjes
ache.org, and to add the
> exercises to flink-playgrounds -- but these points can be discussed
> separately once we've established that the community wants this content.
>
> Looking forward to hearing what you think!
>
> Best regards,
> David
>
--
Best regards / Met vriendelijke groeten,
Niels Basjes
iendelijke groeten,
Niels Basjes
b.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/socket/SocketWindowWordCount.scala
>
> Regards,
> Vijay
>
> On Fri, Jul 31, 2020 at 12:22 PM Niels Basjes wrote:
>
>> Does this test in one
logic under
>>> test in the middle. That may be a part of your pipeline or even the whole
>>> pipeline.
>>>
>>> If you want to have some scala inspiration, have a look at:
>>>
>>> https://github.com/apache/flink/blob/5f0183fe79d10ac36101f60f2589062a39630f96/flin
Does this test in one of my own projects do what you are looking for?
https://github.com/nielsbasjes/yauaa/blob/1e1ceb85c507134614186e3e60952112a2daabff/udfs/flink/src/test/java/nl/basjes/parse/useragent/flink/TestUserAgentAnalysisMapperClass.java#L107
On Fri, 31 Jul 2020, 20:20 Vijayendra
)
Why is that?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
ing-list-archive.2336050.n4.nabble.com/
>
--
Best regards / Met vriendelijke groeten,
Niels Basjes
Have a look at this presentation I gave a few weeks ago.
https://youtu.be/bQmz7JOmE_4
Niels Basjes
On Wed, 22 Jul 2020, 08:51 bat man, wrote:
> Hi Team,
>
> Can someone share their experiences handling this.
>
> Thanks.
>
> On Tue, Jul 21, 2020 at 11:30 AM bat man wrote
gt; made this release possible!
>
> Regards,
> Dian & Robert
>
>
--
Best regards / Met vriendelijke groeten,
Niels Basjes
Hi,
I haven't tried it myself yet but there is a Flink connector for HBase and
I remember someone telling me that Google has made a library available
which is effectively the HBase client which talks to BigTable in the
backend.
Like I said: I haven't tried this yet myself.
Niels Basjes
Op zo
g something wrong regarding the mentioned "generic
raw type" and the way I'm trying to define the Schema.
What I essentially am looking for is the correct way to give the 3
provided columns a new name and type.
How do I do this correctly in the new API?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
mTableEnvironment#fromDataStream [1]
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.14/api/java/org/apache/flink/table/api/bridge/java/StreamTableEnvironment.html#fromDataStream-org.apache.flink.streaming.api.datastream.DataStream-org.apache.flink.table.api.Schema-
>
> Niels Basjes 于2021年1
99 matches
Mail list logo