I think you can work on it. By the way, there are actually two serializers.
For Scala, CaseClassSerializer is responsible for tuples as well. In Java,
TupleSerializer is responsible for, well, Tuples.
On Tue, 16 Jun 2015 at 06:25 Shiti Saxena wrote:
> Hi,
>
> Can I work on the issue with TupleSe
Hi,
Can I work on the issue with TupleSerializer or is someone working on it?
On Mon, Jun 15, 2015 at 11:20 AM, Aljoscha Krettek
wrote:
> Hi,
> the reason why this doesn't work is that the TupleSerializer cannot deal
> with null values:
>
> @Test
> def testAggregationWithNull(): Unit = {
>
> v
Cross is a quadratic operation. As such, it produces very large results on
moderate inputs, which can easily exceed memory and disk space, if the
subsequent operation requires to gather all data (such as for the sort in
your case).
If you use on both inputs 10 MB of 100 byte elements (100K element
Hi,
I get the following *"No space left on device" IOException* when using
the following Cross operator.
The inputs for the operator are each just *10MB* in size (same input for
IN1 and IN2; 1000 tuples) and I get the exception after Flink manages to
fill *50GB* of SSD space and the partition
Hi,
using partitionCustom, the data distribution depends only on your
probability distribution. If it is uniform, you should be fine (ie,
choosing the channel like
> private final Random random = new Random(System.currentTimeMillis());
> int partition(K key, int numPartitions) {
> return random
Thanks!
Ok, so for a random shuffle I need partitionCustom. But in that case the
data might be out of balance then?
For the splitting. Is there no way to have exact sizes?
Cheers,
Max
On Mon, Jun 15, 2015 at 2:26 PM, Till Rohrmann wrote:
> Hi Max,
>
> you can always shuffle your elements usin
Hi everyone!
Thanks! It seems the variable that makes the problems. Making an inner
class solved the issue.
Cheers,
Max
On Mon, Jun 15, 2015 at 2:58 PM, Kruse, Sebastian
wrote:
> Hi everyone,
>
>
>
> I did not reenact it, but I think the problem here is rather the anonymous
> class. It looks li
Ok great, I will try this out and get back to you. Thanks =)
On Mon, Jun 15, 2015 at 2:52 PM, Till Rohrmann wrote:
> Hi Tamara,
>
> you can instruct Flink to write the current memory statistics to the log
> by setting taskmanager.debug.memory.startLogThread: true in the Flink
> configuration. Fu
Hi everyone,
I did not reenact it, but I think the problem here is rather the anonymous
class. It looks like it is created within a class, not an object. Thus it is
not “static” in Java terms, which means that also its surrounding class (the
job class) will be serialized. And in this job class,
Hi Tamara,
you can instruct Flink to write the current memory statistics to the log by
setting taskmanager.debug.memory.startLogThread: true in the Flink
configuration. Furthermore, you can control the logging interval with
taskmanager.debug.memory.logIntervalMs where the interval is specified in
Hi Tamara,
what kind of information do you need? Something like, size and usage of
in-memory sort buffers or hash tables?
Some information might written in DEBUG logs, but I'm not sure about that.
Besides logs, I doubt that Flink monitors memory usage.
Cheers, Fabian
2015-06-15 14:34 GMT+02:00 T
Hi,
I am running some experiments on Flink and was wondering if there is some
way to monitor the memory usage of a Flink Job (running locally and on a
cluster). I need to run multiple jobs and compare their memory usage.
Cheers,
Tamara
Hi Max,
you can always shuffle your elements using the rebalance method. What Flink
here does is to distribute the elements of each partition among all
available TaskManagers. This happens in a round-robin fashion and is thus
not completely random.
A different mean is the partitionCustom method w
I think, you need to implement an own Partitioner.java and hand it via
DataSet.partitionCustom(partitioner, field)
(Just specify any field you like; as you don't want to group by key, it
doesn't matter.)
When implementing the partitionier, you can ignore the key parameter and
compute the output c
Hi Max,
the problem is that you’re trying to serialize the companion object of
scala.util.Random. Try to create an instance of the scala.util.Random class
and use this instance within your RIchFilterFunction to generate the random
numbers.
Cheers,
Till
On Mon, Jun 15, 2015 at 1:56 PM Maximilian
Hi Flinksters,
I would like to randomly choose a element of my data set. But somehow I
cannot use scala.util inside my filter functions:
val sample_x = X filter(new RichFilterFunction[Vector](){
var i: Int = -1
override def open(config: Configuration) = {
i = scal
Hi Flinksters,
I would like to shuffle my elements in the data set and then split it in
two according to some ratio. Each element in the data set has an unique id.
Is there a nice way to do it with the flink api?
(It would be nice to have guaranteed random shuffling.)
Thanks!
Cheers,
Max
+1 for giving only those modules a version suffix which depend on Scala.
On Sun, Jun 14, 2015 at 8:03 PM Robert Metzger wrote:
> There was already a discussion regarding the two options here [1], back
> then we had a majority for giving all modules a scala suffix.
>
> I'm against giving all modu
Hi Gianmarco,
The processing time is quadratic in the size of the single elements. I was
already applying that strategy that you also proposed, but tried to find out if
there is a way of balancing the subitems of these large items over the workers
without shuffling the whole dataset. However, I
19 matches
Mail list logo