[jira] [Created] (FLINK-3773) Scanners are left unclosed in SqlExplainTest

2016-04-16 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3773:
-

 Summary: Scanners are left unclosed in SqlExplainTest
 Key: FLINK-3773
 URL: https://issues.apache.org/jira/browse/FLINK-3773
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


e.g.
{code}
String source = new Scanner(new File(testFilePath +
"../../src/test/scala/resources/testFilter0.out"))
{code}
Scanner implements AutoCloseable.
Using try-with-resources would be a good pattern for closing the Scanners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: GSoC Project Proposal Draft: Code Generation in Serializers

2016-04-16 Thread Gábor Horváth
Hi!

Table API already uses code generation and the Janino compiler [1]. Is it a
dependency that is ok to add to flink-core? In case it is ok, I think I
will use the same in order to be consistent with the other code generation
efforts.

I started to look at the Table API code generation [2] and it uses Scala
extensively. There are several Scala features that can make Java code
generation easier such as pattern matching and string interpolation. I did
not see any Scala code in flink-core yet. Is it ok to implement the code
generation inside the flink-core using Scala?

Regards,
Gábor

[1] http://unkrig.de/w/Janino
[2]
https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/codegen/CodeGenerator.scala

On 18 March 2016 at 19:37, Gábor Horváth  wrote:

> Thank you! I finalized the project.
>
>
> On 18 March 2016 at 10:29, Márton Balassi 
> wrote:
>
>> Thanks Gábor, now I also see it on the internal GSoC interface. I have
>> indicated that I wish to mentor your project, I think you can hit finalize
>> on your project there.
>>
>> On Mon, Mar 14, 2016 at 11:16 AM, Gábor Horváth 
>> wrote:
>>
>> > Hi,
>> >
>> > I have updated this draft to include preliminary benchmarks, mentioned
>> the
>> > interaction of annotations with savepoints, extended it with a timeline,
>> > and some notes about scala case classes.
>> >
>> > Regards,
>> > Gábor
>> >
>> > On 9 March 2016 at 16:12, Gábor Horváth  wrote:
>> >
>> > > Hi!
>> > >
>> > > As far as I can see the formatting was not correct in my previous
>> mail. A
>> > > better formatted version is available here:
>> > >
>> >
>> https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk
>> > > Sorry for that.
>> > >
>> > > Regards,
>> > > Gábor
>> > >
>> > > On 9 March 2016 at 15:51, Gábor Horváth  wrote:
>> > >
>> > >> Hi,I did not want to send this proposal out before the I have some
>> > >> initial benchmarks, but this issue was mentioned on the mailing list
>> (
>> > >>
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html
>> > ),
>> > >> and I wanted to make this information available to be able to
>> > incorporate
>> > >> this into that discussion. I have written this draft with the help of
>> > Gábor
>> > >> Gévay and Márton Balassi and I am open to every suggestion.
>> > >>
>> > >>
>> > >> The proposal draft:
>> > >> Code Generation in Serializers and Comparators of Apache Flink
>> > >>
>> > >> I am doing my last semester of my MSc studies and I’m a former GSoC
>> > >> student in the LLVM project. I plan to improve the serialization
>> code in
>> > >> Flink during this summer. The current implementation of the
>> serializers
>> > can
>> > >> be a performance bottleneck in some scenarios. These performance
>> > problems
>> > >> were also reported on the mailing list recently [1]. I plan to
>> implement
>> > >> code generation into the serializers to improve the performance (as
>> > Stephan
>> > >> Ewen also suggested.)
>> > >>
>> > >> TODO: I plan to include some preliminary benchmarks in this section.
>> > >> Performance problems with the current serializers
>> > >>
>> > >>1.
>> > >>
>> > >>PojoSerializer uses reflection for accessing the fields, which is
>> > >>slow (eg. [2])
>> > >>
>> > >>
>> > >>-
>> > >>
>> > >>This is also a serious problem for the comparators
>> > >>
>> > >>
>> > >>1.
>> > >>
>> > >>When deserializing fields of primitive types (eg. int), the
>> reusing
>> > >>overload of the corresponding field serializers cannot really do
>> any
>> > reuse,
>> > >>because boxed primitive types are immutable in Java. This results
>> in
>> > lots
>> > >>of object creations. [3][7]
>> > >>2.
>> > >>
>> > >>The loop to call the field serializers makes virtual function
>> calls,
>> > >>that cannot be speculatively devirtualized by the JVM or predicted
>> > by the
>> > >>CPU, because different serializer subclasses are invoked for the
>> > different
>> > >>fields. (And the loop cannot be unrolled, because the number of
>> > iterations
>> > >>is not a compile time constant.) See also the following discussion
>> > on the
>> > >>mailing list [1].
>> > >>3.
>> > >>
>> > >>A POJO field can have the value null, so the serializer inserts 1
>> > >>byte null tags, which wastes space. (Also, the type extractor
>> logic
>> > does
>> > >>not distinguish between primitive types and their boxed versions,
>> so
>> > even
>> > >>an int field has a null tag.)
>> > >>4.
>> > >>
>> > >>Subclass tags also add a byte at the beginning of every POJO
>> > >>5.
>> > >>
>> > >>getLength() does not know the size in most cases [4]
>> > >>Knowing the size of a type when serialized has numerous
>> performance
>> > >>

[jira] [Created] (FLINK-3772) Graph algorithms for vertex and edge degree

2016-04-16 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3772:
-

 Summary: Graph algorithms for vertex and edge degree
 Key: FLINK-3772
 URL: https://issues.apache.org/jira/browse/FLINK-3772
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan
 Fix For: 1.1.0


Many graph algorithms require vertices or edges to be marked with the degree. 
This ticket provides algorithms for annotating

* vertex degree for undirected graphs
* vertex out-, in-, and out- and in-degree for directed graphs
* edge source, target, and source and target degree for undirected graphs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Flink optimizer optimizations

2016-04-16 Thread Matthias J. Sax
Sure. WITHOUT.

Thanks. Good catch :)

On 04/16/2016 01:18 PM, Ufuk Celebi wrote:
> On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax  wrote:
>> (with the need to sort the data, because both
>> datasets will be sorted on A already). Thus, the overhead of sorting in
>> the group might pay of in the join.
> 
> I think you meant to write withOUT the need to the sort the data, right?
> 



signature.asc
Description: OpenPGP digital signature


Re: Flink optimizer optimizations

2016-04-16 Thread Ufuk Celebi
On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax  wrote:
> (with the need to sort the data, because both
> datasets will be sorted on A already). Thus, the overhead of sorting in
> the group might pay of in the join.

I think you meant to write withOUT the need to the sort the data, right?


Re: Flink optimizer optimizations

2016-04-16 Thread Matthias J. Sax
Assume you have a groupBy followed by a join.

DataSet1 (nor sorted) -> groupBy(A) --> join(1.A == 2.A)
^
DataSet2 (sorted on A) -+

For groupBy(A) of DataSet1 the optimizer can pick hash-grouping or the
more expensive sort-based-grouping. If the optimizer pick
sort-based-grouping, the join becomes super cheap because if can just
perform a merge-join (with the need to sort the data, because both
datasets will be sorted on A already). Thus, the overhead of sorting in
the group might pay of in the join.

-Matthias

On 04/15/2016 10:50 PM, CPC wrote:
> Hi
> 
> When i look for what kind of optimizations flink does, i found
> https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals  is
> it up to date? Also i couldnt understand:
> 
> "Reusing of partitionings and sort orders across operators. If one operator
> leaves the data in partitioned fashion (and or sorted order), the next
> operator will automatically try and reuse these characteristics. The
> planning for this is done holistically and can cause earlier operators to
> pick more expensive algorithms, if they allow for better reusing of
> sort-order and partitioning."
> 
> Can you give example for "earlier operators to pick more expensive
> algorithms" ?
> 
> Regards
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3771) Methods for translating Graphs

2016-04-16 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3771:
-

 Summary: Methods for translating Graphs
 Key: FLINK-3771
 URL: https://issues.apache.org/jira/browse/FLINK-3771
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan
 Fix For: 1.1.0


Provide methods for translation of the type or value of graph labels, vertex 
values, and edge values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)