That's not the point. In Machine Learning one often divides a data set X
into f.e. three sets, one for the training, one for the validation, one for
the final testing. The sets are usually created randomly according to some
ratio. Thus it would be important to keep the ratio and to do the whole
Hi Hawin!
If you are creating code for such an output into different
files/partitions, it would be amazing if you could contribute this code to
Flink.
It seems like a very common use case, so this functionality will be useful
to other user as well!
Greetings,
Stephan
On Tue, Jun 23, 2015 at
A very simple way to achieve is to generate a random variate on the
driver that describes a mapping of datapoints to samples. Then you
simply join the dataset with this mapping to generate the samples.
This approach requires you to know the size of the dataset in advance,
but has the
Hey Aaron,
thanks for preparing the example. I've checked it out and tried it with a
similar setup (12 task managers with 1 slots each, running the job with
parallelism of 12).
I couldn't reproduce the problem. What have you configured in the slaves
file? I think Flink does not allow you to
Hi Max,
Thanks for noticing! Fixed on the master and for the 0.9.1 release.
Cheers,
Max
On Tue, Jun 23, 2015 at 5:09 PM, Maximilian Alber
alber.maximil...@gmail.com wrote:
Hi Flinksters,
just some minor:
http://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html
in the
The Apache Flink community is pleased to announce the availability of the
0.9.0 release.
Apache Flink is an open source platform for scalable batch and stream data
processing. Flinkās core consists of a streaming dataflow engine that
provides data distribution, communication, and fault tolerance
Thanks. My setup is actually 3 task managers x 4 slots. I played with the
parallelism and found that at low values, the error did not occur. I can
only conclude that there is some form of data shuffling that is occurring
that is sensitive to the data source. Yes, seems a little odd to me as
Hi to all,
I'm facing an OutOfMemoryError: PermGen space running multiple times my
job from the web client interface.
Where do I need to increase it?
The full stacktrace is:
org.apache.flink.client.program.ProgramInvocationException: The program's
entry point class '' caused an exception
Hi everybody,
this question may sounds stupid, but i would like to have it clear
what happens if inside a dataset transformation (e.g. a map) I use something
that is declared somewhere else, like a variable or a dataset, and not passed
as broadcast dataset nor parameter in the constructor of a
ok thanks Matthias
On 24 Jun 2015 21:00, Matthias J. Sax mj...@informatik.hu-berlin.de
wrote:
Hi,
you need to increase JVM parameter -XX:MaxPermSize=
The default value should be something like 64m
Just add the flag to variable JVM_ARGS in bin/webclient.sh (line 33).
- Compare
Aaron,
Can you check how the TaskManagers register at the JobManager? When you
look at the 'TaskManagers' section in the JobManager's web Interface (at
port 8081), what does it say as the TaskManager host names?
Does it list host1, host2, host3...?
Thanks,
Stephan
Am 24.06.2015 20:31 schrieb
11 matches
Mail list logo