Hi Gyula,
Thanks for your response. Seems i will use filter and map for now as that
one is really make the intention clear, and not a big performance hit.
Thanks again.
Cheers
On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra wrote:
> Hey Welly,
>
> If you call filter and map one after the other li
Hey Welly,
If you call filter and map one after the other like you mentioned, these
operators will be chained and executed as if they were running in the same
operator.
The only small performance overhead comes from the fact that the output of
the filter will be copied before passing it as input t
Hi All,
I would like to filter some item from the event stream. I think there are
two ways doing this.
Using the regular pipeline filter(...).map(...). We can also use flatMap
for doing both in the same operator.
Any performance improvement if we are using flatMap ? As that will be done
in one o
When workers spill more than 128 files, I have seen these fully merged into
one or more much larger files. Does the following parameter allow more
files to be stored without requiring the intermediate merge-sort? I have
changed it to 1024 without effect. Also, it appears that the entire set of
smal
Ok. As I see only the method name was changed. It was an unnecessary
modification which caused the incompatibility.
F.
On Wed, Sep 2, 2015 at 8:53 PM, Márton Balassi
wrote:
> Dear Ferenc,
>
> The Kafka consumer implementations was modified from 0.9.0 to 0.9.1,
> please use the new code. [1]
>
>
Dear Ferenc,
The Kafka consumer implementations was modified from 0.9.0 to 0.9.1, please
use the new code. [1]
I suspect that your com.nventdata.kafkaflink.sink.FlinkKafkaTopicWriterSink
depends on the way the Flink code used to look in 0.9.0, if you take a
closer look Robert changed the function
Hi,
I tried to use the latest 0.9.1 release but I got:
java.lang.NoSuchMethodError:
org.apache.flink.util.NetUtils.ensureCorrectHostnamePort(Ljava/lang/String;)V
at
com.nventdata.kafkaflink.sink.FlinkKafkaTopicWriterSink.(FlinkKafkaTopicWriterSink.java:69)
at
com.nventdata.kafkaflink.sink.FlinkKa
Hi Robert and Jay,
Thanks for your answers. The petstore jobs could indeed be used as a roseta
code for Flink and Spark.
Regarding the memory requirements, those are very good news to me, just 2GB
of RAM is certainly a modest amount of memory, you can use even some Single
Board Computers for that
Hi Kostas,
Thanks a lot for your answer. It's nice to know there are more training
videos on their way, they will be on my watch list. I guess you'll be using
the data Artisans channel for the new videos too.
Greetings,
Juan
2015-09-02 14:30 GMT+02:00 Kostas Tzoumas :
> Hi Juan,
>
> Flink is
Thanks for clarifying. shuffle() is similar to rebalance() -- however,
it redistributes randomly and not in round robin fashion.
However, the problem you describe sounds like a bug to me. I included
dev list. Maybe anyone else can step in so we can identify it there is a
bug or not.
-Matthias
O
Hello,
When I try to run a storm topology with a Kafka Spout on top of Flink, I
get an NPE at:
15:00:32,853 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask
- Error closing stream operators after an exception.
java.lang.NullPointerException
at storm.kafka.KafkaSpout.close(KafkaSp
I'm not sure what you mean by "Crucial cleanup is in shutdown hooks". Could
you elaborate?
-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685
On Wed, Sep 2, 2015 at 10:25 PM, Stephan Ewen wrote:
> You can always shut down a cluster manually (via shutdown()) and if the
> JVM simply exi
You can always shut down a cluster manually (via shutdown()) and if the JVM
simply exists, all is well as well. Crucial cleanup is in shutdown hooks.
On Wed, Sep 2, 2015 at 6:22 PM, Till Rohrmann
wrote:
> If I'm not mistaken, then the cluster should be properly terminated when
> it gets garbage
If I'm not mistaken, then the cluster should be properly terminated when it
gets garbage collected. Thus, also when the main method exits.
On Wed, Sep 2, 2015 at 6:14 PM, Sachin Goel
wrote:
> If I'm right, all Tests use either the MultipleProgramTestBase or
> JavaProgramTestBase. Those shut dow
Hi,
You are right, but in fact it does not solve my problem, since I have 100
parallelism everywhere. Each of my 100 sources gives only a few lines (say 14
max), and only the first 14 next nodes will receive data.
Same problem by replacing rebalance() with shuffle().
But I found a workaround: s
If I'm right, all Tests use either the MultipleProgramTestBase or
JavaProgramTestBase. Those shut down the cluster explicitly anyway.
I will make sure if this is the case.
Regards
Sachin
-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685
On Wed, Sep 2, 2015 at 9:40 PM, Till Rohrmann
Okay. No problem.
Any suggestions for the correct context though? :')
I don't think something like a {{FlinkProgram}} class is a good idea [User
would need to override a {{program}} method and we will make sure the
cluster is setup only once and torn down properly only after the user code
finishes
Maybe we can create a single PlanExecutor for the LocalEnvironment which is
used when calling execute. This of course entails that we don’t call stop
on the LocalCluster. For cases where the program exits after calling
execute, this should be fine because all resources will then be released
anyway.
Oh sorry, then I got the wrong context. I somehow thought it was about test
cases because I read `MultipleProgramTestBase` etc. Sorry my bad.
On Wed, Sep 2, 2015 at 6:00 PM, Sachin Goel
wrote:
> I was under the impression that the @AfterClass annotation can only be
> used in test classes.
> Even
I was under the impression that the @AfterClass annotation can only be used
in test classes.
Even so, the idea is that a user program running in the IDE should not be
starting up the cluster several times [my primary concern is the addition
of the persist operator], and we certainly cannot ask the
Hi,
If I understand you correctly, you want to have 100 mappers. Thus you
need to apply the .setParallelism() after .map()
> addSource(myFileSource).rebalance().map(myFileMapper).setParallelism(100)
The order of commands you used, set the dop for the source to 100 (which
might be ignored, if the
Why is it not possible to shut down the local cluster? Can’t you shut it
down in the @AfterClass method?
On Wed, Sep 2, 2015 at 4:56 PM, Sachin Goel
wrote:
> Yes. That will work too. However, then it isn't possible to shut down the
> local cluster. [Is it necessary to do so or does it shut dow
Hi,
I have a source that provides few items since it gives file names to the
mappers. The mapper opens the file and process records. As the files are huge,
one input line (a filename) gives a consequent work to the next stage.
My topology looks like :
addSource(myFileSource).rebalance().setParal
Yes. That will work too. However, then it isn't possible to shut down the
local cluster. [Is it necessary to do so or does it shut down automatically
when the program exists? I'm not entirely sure.]
-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685
On Wed, Sep 2, 2015 at 7:59 PM, Steph
Totally, it is the first time i am using them and I thought they were the same
I will try it asap
thanks
Il giorno 02/set/2015, alle ore 16:41, Till Rohrmann
mailto:trohrm...@apache.org>> ha scritto:
Can it be that you confuse the logback configuration file with the log4j
configuration file
Can it be that you confuse the logback configuration file with the log4j
configuration file? The log4j.properties file should look like
log4j.rootLogger=INFO, console
# Log all infos in the given file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache
Have a look at some other tests, like the checkpointing tests. They start
one cluster manually and keep it running. They connect against it using the
remote environment ("localhost", miniCluster.getJobManagerRpcPort()).
That works nicely...
On Wed, Sep 2, 2015 at 4:23 PM, Sachin Goel
wrote:
> H
thanks for answering,
yes I did it, in fact it is working for my logger, but I don’t know how to
limit flink logger using the log4j property
if i use this line
as suggested by doc, i get a compilation error of the log4j property document:
level should not be in that position
instead doing som
I see.
Manual serialization implies also manual deserialization (on the workers
only), which would give a better exception.
BTW: There is an opportunity to fix two problems with one patch: The
framesize overflow for the input format, and the serialization.
On Wed, Sep 2, 2015 at 4:16 PM, Maximil
hmmm interesting looks to be working magically now... :) I must have wrote
some code late at night that magically fixed it and forgot. The original
errors I was getting were kryo related.
The objects aren't being serialized on write to anything useful, but thats
I'm sure an easy fix.
Onward and
Hi all
While using LocalEnvironment, in case the program triggers execution
several times, the {{LocalFlinkMiniCluster}} is started as many times. This
can consume a lot of time in setting up and tearing down the cluster.
Further, this hinders with a new functionality I'm working on based on
persis
Ok but that would not prevent the above error, right? Serializing is
not the issue here.
Nevertheless, it would catch all errors during initial serialization.
Deserializing has its own hazards due to possible Classloader issues.
On Wed, Sep 2, 2015 at 4:05 PM, Stephan Ewen wrote:
> Yes, even ser
Hi Michele,
Please supply a log4j.properties file path as a Java VM property like
so: -Dlog4j.configuration=/path/to/log4j.properties
Your IDE should have an option to adjust VM arguments.
Cheers,
Max
On Wed, Sep 2, 2015 at 9:10 AM, Michele Bertoni
wrote:
> Hi everybody, I just found that in v
Yes, even serialize in the constructor. Then the failure (if serialization
does not work) comes immediately.
On Wed, Sep 2, 2015 at 4:02 PM, Maximilian Michels wrote:
> Nice suggestion. So you want to serialize and deserialize the InputFormats
> on the Client to check whether they can be transfe
Nice suggestion. So you want to serialize and deserialize the InputFormats
on the Client to check whether they can be transferred correctly? Merely
serializing is not enough because the above Exception occurs during
deserialization.
On Wed, Sep 2, 2015 at 2:29 PM, Stephan Ewen wrote:
> We should
If a lot of the data is generated locally, this may face the same issue as
Greg did with oversized payloads (dropped by Akka).
On Wed, Sep 2, 2015 at 3:21 PM, Robert Metzger wrote:
> I'm starting a new discussion thread for the bigpetstore-flink integration
> ...
>
>
> I took a closer look into
hmmm interesting looks to be working magically now... :) I must have wrote
some code late at night that magically fixed it and forgot. The original
errors I was getting were kayo related.
The objects aren't being serialized on write to anything useful, but thats
I'm sure an easy fix.
Onward and
Hey, thanks!
Those are just seeds, the files aren't large.
The scale out data is the transactions.
The seed data needs to be the same, shipped to ALL nodes, and then
the nodes generate transactions.
On Wed, Sep 2, 2015 at 9:21 AM, Robert Metzger wrote:
> I'm starting a new discussion thread
Okay, I see.
As I said before, I was not able to reproduce the serialization issue
you've reported.
Can you maybe post the exception you are seeing?
On Wed, Sep 2, 2015 at 3:32 PM, jay vyas
wrote:
> Hey, thanks!
>
> Those are just seeds, the files aren't large.
>
> The scale out data is the tra
I'm starting a new discussion thread for the bigpetstore-flink integration
...
I took a closer look into the code you've posted.
It seems to me that you are generating a lot of data locally on the client,
before you actually submit a job to Flink. (Both "customers" and "stores"
are generated loca
@Jay: I've looked into your code, but I was not able to reproduce the
issue.
I'll start a new discussion thread on the user@flink list for the
Flink-BigPetStore discussion. I don't want to take over Juan's
hardware-requirements discussion ;)
On Wed, Sep 2, 2015 at 3:01 PM, Jay Vyas
wrote:
> Jus
Just running the main class is sufficient
> On Sep 2, 2015, at 8:59 AM, Robert Metzger wrote:
>
> Hey jay,
>
> How can I reproduce the error?
>
>> On Wed, Sep 2, 2015 at 2:56 PM, jay vyas wrote:
>> We're also working on a bigpetstore implementation of flink which will help
>> onboard spark/m
Hey jay,
How can I reproduce the error?
On Wed, Sep 2, 2015 at 2:56 PM, jay vyas
wrote:
> We're also working on a bigpetstore implementation of flink which will
> help onboard spark/mapreduce folks.
>
> I have prototypical code here that runs a simple job in memory,
> contributions welcome,
>
>
We're also working on a bigpetstore implementation of flink which will help
onboard spark/mapreduce folks.
I have prototypical code here that runs a simple job in memory,
contributions welcome,
right now there is a serialization error
https://github.com/bigpetstore/bigpetstore-flink .
On Wed, Se
I have actually run Flink nodes with 50 MB of memory and processed multiple
gigabytes, but that is truely a toy setup for experimentation.
As Robert said, a Mini-Cluster with two local workers (each around 300-400
MB memory) plus a master node (200-300 MB) gives you 1 GB of total needed
memory and
Hi Juan,
I think the recommendations in the Spark guide are quite good, and are
similar to what I would recommend for Flink as well.
Depending on the workloads you are interested to run, you can certainly use
Flink with less than 8 GB per machine. I think you can start Flink
TaskManagers with 500
Hi Juan,
Flink is quite nimble with hardware requirements; people have run it in
old-ish laptops and also the largest instances available in cloud
providers. I will let others chime in with more details.
I am not aware of something along the lines of a cheatsheet that you
mention. If you actually
We should try to improve the exception here. More people will run into this
issue and the exception should help them understand it well.
How about we do eager serialization into a set of byte arrays? Then the
serializability issue comes immediately when the program is constructed,
rather than late
Answering to myself, I have found some nice training material at
http://dataartisans.github.io/flink-training. There are even videos at
youtube for some of the slides
- http://dataartisans.github.io/flink-training/overview/intro.html
https://www.youtube.com/watch?v=XgC6c4Wiqvs
- http://da
Here's the JIRA issue: https://issues.apache.org/jira/browse/FLINK-2608
On Wed, Sep 2, 2015 at 12:49 PM, Maximilian Michels wrote:
> Hi Andreas,
>
> Thank you for reporting the problem and including the code to reproduce
> the problem. I think there is a problem with the class serialization or
>
Hi list,
I'm new to Flink, and I find this project very interesting. I have
experience with Apache Spark, and for I've seen so far I find that Flink
provides an API at a similar abstraction level but based on single record
processing instead of batch processing. I've read in Quora that Flink
exten
Hi Andreas,
Thank you for reporting the problem and including the code to reproduce the
problem. I think there is a problem with the class serialization or
deserialization. Arrays.asList uses a private ArrayList class
(java.util.Arrays$ArrayList) which is not the one you would normally use
(java.u
Hi Sachin,
I also noticed that the GitHub integration is not working properly. I'll
ask the Apache Infra team.
On Wed, Sep 2, 2015 at 10:20 AM, Sachin Goel
wrote:
> Hi all
> Is there some issue with travis integration? The last three pull requests
> do not have their build status on Github page
Hi,
I get a bug when trying to broadcast a list of integers created with the
primitive "Arrays.asList(...)".
For example, if you try to run this "wordcount" example, you can
reproduce the bug.
public class WordCountExample {
public static void main(String[] args) throws Exception {
Hi,
StormFileSpout uses a simple FileReader internally an cannot deal with
HDFS. It would be a nice extension to have. I just opened a JIRA for it:
https://issues.apache.org/jira/browse/FLINK-2606
Jerry, feel to work in this feature and contribute code to Flink ;)
-Matthias
On 09/02/2015 07:52 A
Hi all
Is there some issue with travis integration? The last three pull requests
do not have their build status on Github page. The builds are getting
triggered though.
Regards
Sachin
-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685
Hi everybody, I just found that in version 0.9.1 it is possibile to disable
that verbose console, can you please explain how to do it both in IDE and local
environment?
Especially in IDE I am able to set property of log4j for my logger, but
everything I try for flink internal one does not work
57 matches
Mail list logo