Hi Flinksters,
we had recently a discussion in our working group which Language we should
use with Flink. To bring it to the point: most people would like to use
Python because the are familiar with it and there is a nice scientific
stack to f.e. print and analyse the results. But our concern is
You can also do myDir.getFileSystem().exists(myDir), but I don't think
there is a shorter way...
On Mon, Jun 29, 2015 at 12:39 PM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Hi to all,
in my job I have to check if a directory exists and currently I have to
write:
Path myDir = new
It is not quite easy to understand what you are trying to do.
Can you post your program here? Then we can take a look and give you a good
answer...
On Mon, Jun 29, 2015 at 3:47 PM, Juan Fumero
juan.jose.fumero.alfo...@oracle.com wrote:
Is there any other way to apply the function in parallel
Hi Stephan,
so should I use another method instead of collect? It seems
multithread is not working with this.
Juan
On Mon, 2015-06-29 at 14:51 +0200, Stephan Ewen wrote:
Hi Juan!
This is an artifact of a workaround right now. The actual collect()
logic happens in the flatMap() and
Hi Juan!
This is an artifact of a workaround right now. The actual collect() logic
happens in the flatMap() and the sink is a dummy that executes nothing. The
flatMap writes the data to be collected to the accumulator that delivers
it back.
Greetings,
Stephan
On Mon, Jun 29, 2015 at 2:30 PM,
In general, avoid collect if you can. Collect brings data top the client,
where the computation is not parallel any more.
Try to do as much on the DataSet as possible.
On Mon, Jun 29, 2015 at 2:58 PM, Juan Fumero
juan.jose.fumero.alfo...@oracle.com wrote:
Hi Stephan,
so should I use
Is there any other way to apply the function in parallel and return the
result to the client in parallel?
Thanks
Juan
On Mon, 2015-06-29 at 15:01 +0200, Stephan Ewen wrote:
In general, avoid collect if you can. Collect brings data top the
client, where the computation is not parallel any
Yeah, sorry. I would like to do something simple like this, but using
Java Threads.
DataSetTuple2Integer, Integer input = env.fromCollection(in);
DataSetInteger output = input.map(new HighWorkLoad());
ArrayListInteger result = output.consume(); // ? like collect but in
parallel, some operation
@Stephan: I don't think there is a way to deal with this. In my
understanding, the (main) purpose of the user@ list is not to report Flink
bugs. It is a forum for users to help each other.
Flink committers happen to know a lot about the system, so its easy for
them to help users. Also, its a good
Hi to all,
I'm restarting the discussion about a problem I alredy dicussed on this
mailing list (but that started with a different subject).
I'm running Flink 0.9.0 on CDH 5.1.3 so I compiled the sources as:
mvn clean install -Dhadoop.version=2.3.0-cdh5.1.3
-Dhbase.version=0.98.1-cdh5.1.3
Hi,
I am starting with Flink. I have tried to look for the documentation
but I havent found it clear.
I wonder the difference between these two states:
FlatMap RUNNING vs DataSink RUNNIG.
FlatMap is doing data any data transformation? Compilation? In which
point is actually executing the
Hi Flavio!
I had a look at the logs. There seems nothing suspicious - at some point,
the TaskManager and JobManager declare each other unreachable.
A pretty common cause for that is that the JVMs stall for a long time due
to garbage collection. The JobManager cannot see the difference between a
I think that actually there's an Exception thrown within the code that I
suspect it's not reported anywhere..could it be?
On Mon, Jun 29, 2015 at 3:28 PM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Which file and which JVM options do I have to modify to try options 1 and
3..?
1. Don't
thanks both for answering,
that’s what i expected
I was using join at first but sadly i had to move from join to cogroup because
I need outer join
the alternative to the cogroup is to “complete” the inner join extracting from
the original dataset what did not matched in the cogroup by
ok thanks!
then by now i will use it until true outer join is ready
Il giorno 29/giu/2015, alle ore 18:22, Fabian Hueske
fhue...@gmail.commailto:fhue...@gmail.com ha scritto:
Yes, if you need outer join semantics you have to go with CoGroup.
Some members of the Flink community are working on
Hi Flavio!
Can you post the JobManager's log here? It should have the message about
what is going wrong...
Stephan
On Mon, Jun 29, 2015 at 11:43 AM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Hi to all,
I'm restarting the discussion about a problem I alredy dicussed on this
mailing
Something similar in flink-0.10-SNAPSHOT:
06/29/2015 10:33:46 CHAIN Join(Join at main(TriangleCount.java:79)) -
Combine (Reduce at main(TriangleCount.java:79))(222/224) switched to FAILED
java.lang.Exception: The slot in which the task was executed has been
released. Probably loss of
Hi Hawin!
The performance tuning of Kafka is much trickier than that of Flink. Your
performance bottleneck may be Kafka at this point, not Flink.
To make Kafka fast, make sure you have the right setup for the data
directories, and you set up zookeeper properly (for good throughput).
To test the
Thank you for the answer Robert!
I realize it's a single JVM running, yet I would expect programs to behave
in the same way, i.e. serialization to happen (even if not necessary), in
order to catch this kind of bugs before cluster deployment.
Is this simply not possible or is it a design choice we
Hi, I was trying to run my program in the flink web environment (the local one)
when I run it I get the graph of the planned execution but in each node there
is a parallelism = 1”, instead i think it runs with par = 8 (8 core, i always
get 8 output)
what does that mean?
is that wrong or is it
20 matches
Mail list logo