Hi Flavio,
Do you have an exapmple? The DistinctOperator should return a typed output
just like all the other operators do.
Best,
Max
On Fri, Apr 17, 2015 at 10:07 AM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Hi guys,
I'm trying to make (in Java) a project().distinct() but then I
No its not, but at the moment there is afaik no other way around it. There
is an issue for proper outer join support [1]
[1] https://issues.apache.org/jira/browse/FLINK-687
On Fri, Apr 17, 2015 at 10:01 AM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Could resolve the problem but the fact
Hi guys,
I'm trying to make (in Java) a project().distinct() but then I cannot
create the generated dataset with a typed tuple because the distinct
operator returns just an untyped Tuple.
Is this an error in the APIs or am I doing something wrong?
Best,
Flavio
Could resolve the problem but the fact to accumulate stuff in a local
variable is it safe if datasets are huge..?
On Fri, Apr 17, 2015 at 9:54 AM, Till Rohrmann till.rohrm...@gmail.com
wrote:
If it's fine when you have null string values in the cases where
D1.f1!=a1 or D1.f2!=a2 then a
Could you explain a little more in detail this caching mechanism with a
simple code snippet...?
Thanks,
Flavio
On Apr 17, 2015 1:12 PM, Fabian Hueske fhue...@gmail.com wrote:
If you know that the group cardinality of one input is always 1 (or 0) you
can make that input the one to cache in
Hi to all,
I have this strange exception in my program, do you know what could be the
cause of it?
java.lang.RuntimeException: Hash join exceeded maximum number of
recursions, without reducing partitions enough to be memory resident.
Probably cause: Too many duplicate keys.
at
Hi,
the log4j.properties file looks nice. The issue is that the resources
folder is not marked as a source/resource folder, that's why Eclipse is not
adding the file to the classpath.
Have a look here:
Hi Squirrels,
I have some trouble with a delta-iteration transitive closure program [1].
When I run the program, I get the following error:
java.io.EOFException
at
org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.nextSegment(InMemoryPartition.java:333)
at
I just set the -Xmx in the VM parameters to 10g and I have 8 virtual cores
(4 physical).
On Fri, Apr 17, 2015 at 2:48 PM, Robert Metzger rmetz...@apache.org wrote:
How much memory are you giving to each Flink TaskManager?
On Fri, Apr 17, 2015 at 2:05 PM, Flavio Pompermaier
If you know that the group cardinality of one input is always 1 (or 0) you
can make that input the one to cache in memory and stream the other input
with potentially more group elements.
2015-04-17 4:09 GMT-05:00 Flavio Pompermaier pomperma...@okkam.it:
That would be very helpful...
Thanks
Here are some rough cornerpoints for serialization efficiency in Flink:
- Tuples are a bit more efficient than POJOs, because they do not support
(and encode) possible subclasses and they do not involve and reflection
code at all.
- Arrays are more efficient than collections (collections go in
Hi Robertm
I forgot to update about this error.
The root cause was an OOM cause by Jena RDF serialization that was causing
the failing of the entire job.
I also
https://stackoverflow.com/questions/29660894/jena-thrift-serialization-oom-due-to-gc-overhead
I created a thread on StackOverflow, let's
That worked like a charm!
Thanks a lot Fabian!
On Fri, Apr 17, 2015 at 12:37 PM, Fabian Hueske fhue...@gmail.com wrote:
The problem is cause by the project() operator.
The Java compiler does infer its return type and defaults to Tuple.
You can help the compiler like this:
I think running the program multiple times is a reasonable way to start
working on this.
I would try and see whether this can be re-written to a non-nested
iterations case. Nestes iterations algorithms may have much more overhead
to start with.
Stephan
On Tue, Apr 14, 2015 at 3:53 PM, BenoƮt
Hi!
After a quick look over the code, this seems like a bug. One cornercase of
the overflow handling code does not check for the running out of memory
condition.
I would like to wait if Robert Waury has some ideas about that, he is the
one most familiar with the code.
I would guess, though,
Hi guys,
how can I read a csv but keeping the header in some variable without
throwing it away?
Thanks in advance,
Flavio
I don't think there is any built-in functionality for that.
You can probably read the header with some custom code (in the driver
program) and use the CSV reader (skipping the header) as a regular data
source.
On Fri, Apr 17, 2015 at 5:03 PM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Hi
Yes, using a map() function that is transforming the input lines into the
String array.
On Fri, Apr 17, 2015 at 9:03 PM, Flavio Pompermaier pomperma...@okkam.it
wrote:
My csv has 32 columns,ia there a way to create a DatasetString[32]?
On Apr 17, 2015 7:53 PM, Stephan Ewen se...@apache.org
18 matches
Mail list logo