Re: Adding non-core API features to Flink

2015-01-24 Thread Fabian Hueske
. There are other projects having contrib package such as Akka, Django. Regards, Chiwan Park (Sent with iPhone) 2015. 1. 24. 오후 7:15 Fabian Hueske fhue...@gmail.com 작성: Hi all, we got a few contribution requests lately to add cool but non-core features to our API. In previous

YARN ITCases fail, master broken?

2015-01-23 Thread Fabian Hueske
Hi all, I tried to build the current master (mvn clean install) and some tests in the flink-yarn-tests module fail: Failed tests: YARNSessionCapacitySchedulerITCase.testClientStartup:50-YarnTestBase.runWithArgs:314 During the timeout period of 60 seconds the expected string did not show up

Naming of semantic annotations

2015-01-23 Thread Fabian Hueske
Hi all, I have a pending pull request (#311) to fix and enable semantic information for functions with nested and Pojo types. Semantic information is used to tell the optimizer about the behavior of user-defined functions. The optimizer can use this information to generate more efficient

Re: Sorting of fields

2015-02-04 Thread Fabian Hueske
I just merged support for local output sorting yesterday :-) This allows to sort the data before it is given to the OutputFormat. It is done like this: myData.write(myOF).sortLocalOutput(1, Order.ASCENDING); See the programming guide for details (only in master, not online). Full sorting can be

Re: Google Summer of Code 2015 is coming

2015-02-08 Thread Fabian Hueske
I think it would be good to participate in GSoC and would be available as a mentor this year as well. The following projects from our project wiki page could serve as nice GSoC projects, IMO: - Improving monitoring (I hope we make some progress in that direction until GSoC starts, but there will

Re: [jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-02-08 Thread Fabian Hueske
Timo, thanks for picking up this very cool feature! I think as well that an integrated approach would be the better solution, if it can be done with reasonable effort. +1 implementing a prototype using ASM. Let me know, if I can help somehow. Cheers, Fabian 2015-02-05 14:31 GMT+01:00 Timo

Re: Task manager memory configuration with intermediate results

2015-02-03 Thread Fabian Hueske
Yes, I would really like to get rid of the distinction between operator and network buffers. Having all buffers been taken from the same pool is a good step towards that goal. Until the assignment is dynamic, I prefer to have a config option for the network / operator ratio. +1 for the proposal

Re: Very strange behaviour of groupBy() - sort() - first()

2015-01-21 Thread Fabian Hueske
Chesnay is right. Right now, it is not possible to do want you want in a straightforward way because Flink does not support to fully sort a data set (there are several related issues in JIRA). A workaround would be to attach a constant value to each tuple, group on that (all tuples are sent to

Re: [ANNOUNCE] Apache Flink 0.8.0 released

2015-01-22 Thread Fabian Hueske
Awesome! Thank you very much Marton and Robert! Cheers, Fabian 2015-01-22 9:04 GMT+01:00 Robert Metzger rmetz...@apache.org: The Apache Flink team is proud to announce the next version of Apache Flink. Find the blogpost with the change log here:

Re: How to use org.apache.hadoop.mapreduce.lib.input.MultipleInputs in Flink

2015-01-17 Thread Fabian Hueske
Why don't you just create two data sources that each wrap the ParquetFormat using a HadoopInputFormat and join them as for example done in the TPCH Q3 example [1] I always found the MultipleInputFormat to be an ugly workaround for Hadoop's deficiency to read data from multiple sources. AFAIK,

Re: Merge guidelines / policies

2015-02-11 Thread Fabian Hueske
Hi Vasia, AFAIK, there is no merging guide for committers. I am doing this as follows: - I am merging only code for components that I know well or where I am sure to know the implications. - If in doubt, I wait until another committer gives a +1 - I am merging my own code only if another

Re: Merge guidelines / policies

2015-02-11 Thread Fabian Hueske
/Apache+Flink+development+guidelines ) I do it pretty much the same way as Fabian... On Wed, Feb 11, 2015 at 5:06 PM, Fabian Hueske fhue...@gmail.com wrote: Hi Vasia, AFAIK, there is no merging guide for committers. I am doing this as follows: - I am merging

Re: [VOTE] Release Apache Flink 0.8.1 (RC2)

2015-02-16 Thread Fabian Hueske
all in all :-) On Mon, Feb 16, 2015 at 4:38 PM, Fabian Hueske fhue...@gmail.com wrote: - checked all checksums and signatures - checked running examples with build-in data on local setup on Windows 8.1 (hadoop1.tgz, hadoop2.tgz) 2015-02-16 15:54 GMT+01:00 Robert Metzger rmetz

Re: Question about Commit Policy

2015-01-27 Thread Fabian Hueske
: But thats very long, and together with the issue tag I almost always have I lose a lot of my precious 80 characters. On Tue, Jan 27, 2015 at 1:17 PM, Fabian Hueske fhue...@gmail.com wrote: I know I argued against enforcing commit tags, but how about we make two tags mandatory

Re: YARN ITCases fail, master broken?

2015-01-24 Thread Fabian Hueske
currently running the tests on my machine as well, just to make sure. I haven't ran the tests on OS X, maybe that's causing the issues. Can you send me (privately) the full output of the tests? Best, Robert On Sat, Jan 24, 2015 at 11:00 AM, Fabian Hueske fhue...@gmail.com wrote

Re: YARN ITCases fail, master broken?

2015-01-24 Thread Fabian Hueske
, Skipped: 0 - Henry On Fri, Jan 23, 2015 at 2:16 PM, Fabian Hueske fhue...@gmail.com wrote: Hi all, I tried to build the current master (mvn clean install) and some tests in the flink-yarn-tests module fail: Failed tests

Re: Tweets Custom Input Format

2015-01-24 Thread Fabian Hueske
Hi Mustafa, that would be a nice contribution! We are currently discussing how to add non-core API features into Flink [1]. I will move this discussion onto the mailing list to decide where to add cool add-ons like yours. Cheers, Fabian [1] https://issues.apache.org/jira/browse/FLINK-1398

Re: [Gelly]Distributed Minimum Spanning Tree Example

2015-02-14 Thread Fabian Hueske
Hi Andra, I haven't had a detailed look at Gelly and its functions, but Flink has only few operators which can cause undeterministic behavior. In general, user code should be implemented without side effects, i.e., the result of each function call may only depend on its arguments. This principle

Re: Gelly is in!

2015-02-11 Thread Fabian Hueske
Indeed! Thanks for this fantastic contribution! Cheers, Fabian 2015-02-11 11:53 GMT+01:00 Stephan Ewen se...@apache.org: Hi everyone! I am happy to say that the graph library Gelly is finally in the code :-) Thanks Vasia, Daniel, Andra, and Carsten for the great work! Greetings, Stephan

Re: Queries regarding RDFs with Flink

2015-03-22 Thread Fabian Hueske
Hi Flavio, also, Gelly is a superset of Spargel. It provides the same features and much more. Since RDF is graph-structured, Gelly might be a good fit for your use case. Cheers, Fabian

Re: [DISCUSS] Name of Expression API and DataSet abstraction

2015-03-16 Thread Fabian Hueske
I am also more in favor of Rel and Relation, but DataTable nicely follows the terms DataSet and DataStream. On Mar 16, 2015 4:58 PM, Aljoscha Krettek aljos...@apache.org wrote: I like Relation or Rel, is shorter. On Mar 16, 2015 4:52 PM, Hermann Gábor reckone...@gmail.com wrote: +1 for

Re: Restructuring the maven projects

2015-03-17 Thread Fabian Hueske
I agree that it's a good idea to move the APIs into one module. But why should we merge client and compiler (optimizer) and the examples into one module? I think modules with clearly separated responsibilities can also help new contributors to navigate the code. 2015-03-17 16:16 GMT+01:00

Re: [DISCUSS] Issues with heterogeneity of the code

2015-03-17 Thread Fabian Hueske
Touching every file of the code would also be a good opportunity to switch from tab to space indention. So if we enforce a strict style, we could also address this issue which causes discussions every now and then. 2015-03-16 21:53 GMT+01:00 Aljoscha Krettek aljos...@apache.org: No, but I don't

Re: [DISCUSS] Name of Expression API and DataSet abstraction

2015-03-21 Thread Fabian Hueske
wrote: I like the Relation or Relational. So maybe we could use DataRelation as the abstraction? - Henry On Mon, Mar 16, 2015 at 9:30 AM, Fabian Hueske fhue...@gmail.com wrote: I am also more in favor of Rel and Relation, but DataTable nicely follows the terms DataSet and DataStream

Re: Semantic Properties and Functions with Iterables

2015-03-06 Thread Fabian Hueske
Hi Timo, there are several restrictions for forwarded fields of operators with iterator input. 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field

Re: Building Flink takes long time now =(

2015-03-13 Thread Fabian Hueske
Just ran mvn clean install -DskipTests in 6:46min on my MBP. So 8mins on Linux sounds more reasonable than 40mins. Without skipping tests it should be around 18min. 2015-03-13 19:03 GMT+01:00 Henry Saputra henry.sapu...@gmail.com: In my MBP (OSX) is about 40mins to do mvn clean install

Re: [jira] [Commented] (FLINK-1106) Deprecate old Record API

2015-03-10 Thread Fabian Hueske
Yeah, I spotted a good amount of optimizer tests that depend on the Record API. I implemented the last optimizer tests with the new API and would volunteer to port the other optimizer tests. 2015-03-10 16:32 GMT+01:00 Stephan Ewen (JIRA) j...@apache.org: [

Re: Inconsistent git master

2015-03-11 Thread Fabian Hueske
Apparently Github sync was/is down https://issues.apache.org/jira/browse/INFRA-9259 On Mar 11, 2015 7:18 AM, Ufuk Celebi u...@apache.org wrote: Hey Gyula, Syncing between the two sometimes takes time. :( I don't think that anything is broken. Let's wait a little longer. – Ufuk On

Re: [jira] [Commented] (FLINK-1659) Rename classes and packages that contains Pact

2015-03-13 Thread Fabian Hueske
+1 for Optimizer 2015-03-13 16:01 GMT+01:00 Henry Saputra (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/FLINK-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360462#comment-14360462 ] Henry Saputra commented on FLINK-1659:

Re: Semantic Properties and Functions with Iterables

2015-03-08 Thread Fabian Hueske
, Fabian Hueske fhue...@gmail.com wrote: Hi Timo, there are several restrictions for forwarded fields of operators with iterator input. 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick

Re: [jira] [Commented] (FLINK-1106) Deprecate old Record API

2015-03-12 Thread Fabian Hueske
And I'm +1 for removing the old API with the next release. 2015-03-10 17:38 GMT+01:00 Fabian Hueske fhue...@gmail.com: Yeah, I spotted a good amount of optimizer tests that depend on the Record API. I implemented the last optimizer tests with the new API and would volunteer to port the other

Re: ApacheCon 2015 is coming to Austin, Texas, USA

2015-03-25 Thread Fabian Hueske
Thanks Henry for sharing! I will be in Austin and give a talk on Flink [1]. Just ping me if you'd like to meet and chat :-) Cheers, Fabian [1] http://sched.co/2P9s 2015-03-25 1:11 GMT+01:00 Henry Saputra henry.sapu...@gmail.com: Dear Apache Flink enthusiast, In just a few weeks, we'll be

Re: [VOTE] Name of Expression API Representation

2015-03-25 Thread Fabian Hueske
+Relation 2015-03-25 17:52 GMT+01:00 Aljoscha Krettek aljos...@apache.org: Please vote on the new name of the equivalent to DataSet and DataStream in the new expression-based API. From the previous discussion thread three names emerged: Relation, Table and DataTable. The vote is open for

Re: Travis-CI builds queuing up

2015-03-26 Thread Fabian Hueske
Great! Thanks Robert for sharing the good news :-) 2015-03-26 9:08 GMT+01:00 Robert Metzger rmetz...@apache.org: Travis replied me with very good news: Somebody from INFRA was asking the same question around the same time as I did and Travis is working on adding more build capacity for the

Re: Memory segment error

2015-03-30 Thread Fabian Hueske
list does not support attachments :) https://gist.github.com/andralungu/fba36d77f79189daa183 On Fri, Mar 27, 2015 at 12:02 AM, Andra Lungu lungu.an...@gmail.com wrote: Hi Fabian, I uploaded a file with my execution plan. On Thu, Mar 26, 2015 at 11:50 PM, Fabian Hueske fhue

Re: Memory segment error

2015-03-30 Thread Fabian Hueske
to alphaSplit branch 2). Run CounDegreeITCase.java Hope we can get to the bottom of this! If you need something, just ask. On Mon, Mar 30, 2015 at 10:54 AM, Fabian Hueske fhue...@gmail.com wrote: Hmm, that is really weird. Can you point me to a branch in your repository and the test case

Re: Memory segment error

2015-03-30 Thread Fabian Hueske
computations :) And I have an even bigger one for which the test also passed... On Mon, Mar 30, 2015 at 2:31 PM, Fabian Hueske fhue...@gmail.com wrote: Hi Andra, I found the cause for the exception. Your test case is simply too complex for our testing environment. We restrict the TM

Re: A small Project I've been working on

2015-04-01 Thread Fabian Hueske
:-D This is awesome! Do you have some performance numbers? On Apr 1, 2015 8:43 AM, Aljoscha Krettek aljos...@apache.org wrote: Hi, I've been working on a little side project in my free time: Ruby on Flink (RoF). This should finally allow us to tap into the whole web developer ruby world. The

Re: HBase TableOutputFormat fix (Flink 0.8.1)

2015-04-01 Thread Fabian Hueske
What ever works best for you. We can easily backport or forwardport the patch. 2015-04-01 14:12 GMT+02:00 Flavio Pompermaier pomperma...@okkam.it: Ok..I'd like to have this fix in the next release. Should I branch Flink 0.8.1 or 0.9 or which version? On Wed, Apr 1, 2015 at 2:04 PM,

Re: HBase TableOutputFormat fix (Flink 0.8.1)

2015-04-01 Thread Fabian Hueske
As I said before, I think the configure() method of the original HadoopOutputFormat should be called in the configure() method of the Flink HadoopOutputFormatBase. Flink calls configure() before open() and finalizeOnMaster(), so that should work. Have you checked if that fixes your problem? If

Re: Make docs searchable

2015-04-01 Thread Fabian Hueske
+1! Would also be good if we could make the documentation less monolithic. There are some really large pages which would benefit from an in-page search ;-) 2015-04-01 13:57 GMT+02:00 Maximilian Michels m...@apache.org: +1 Nice idea! We would just have to filter the /api/java and /api/scala

Re: 答复: [VOTE] Name of Expression API Representation

2015-03-29 Thread Fabian Hueske
is more the pragmatic developer term. (As a reason for my choice) Am 25.03.2015 20:37 schrieb Fabian Hueske [hidden email]/user/SendEmail.jtp?type=nodenode=4743i=7: I think the voting scheme is clear. The mail that started the thread says: The name

Re: Problem mvn install

2015-03-02 Thread Fabian Hueske
Hi Matthias, I just checked and could not reproduce the error. The files that Maven RAT complained about do not exist in Flink's master branch. I don't think they are put there as part of the build process. Best, Fabian 2015-03-02 15:09 GMT+01:00 Matthias J. Sax

Re: Flink Master broken...

2015-02-24 Thread Fabian Hueske
The master builds for me as well. Can you try to clone a new copy and do a mvn -DskipTests clean install? 2015-02-24 18:26 GMT+01:00 Matthias J. Sax mj...@informatik.hu-berlin.de: Hi, I build on command line: mjsax@T420s-dbis-mjsax:~/workspace_flink/flink$ git pull flink master From

Re: gelli graph algorithm

2015-02-26 Thread Fabian Hueske
Hi Martin, as a start, there is a PR with Gelly documentation: https://github.com/vasia/flink/blob/gelly-guide/docs/gelly_guide.md Cheers, Fabian 2015-02-26 17:12 GMT+01:00 Martin Neumann mneum...@spotify.com: Hej, I was busy with other stuff for a while but I hope I will have more time to

Re: [DISCUSS] Distributed TPC-H DataGenerator for flink-contrib

2015-03-23 Thread Fabian Hueske
we have to respect. On Wed, Feb 11, 2015 at 2:16 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for reaching out to the TPC. It might also be that it is OK to add the code but not under the name TPC-H. 2015-02-11 13:55 GMT+01:00

Re: Release 0.9.0-milestone-1 preview

2015-04-03 Thread Fabian Hueske
Thanks Robert for pushing this forward. I'd like to have the following issues fixed in the release: - FLINK[1656] by PR #525 - FLINK[1776] by PR #532 - FLINK[1664] by PR #541 - FLINK[1817] by PR #565 - Failed tests on Windows by PR #491 Especially the first two fixes crucial. They address

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

2015-04-03 Thread Fabian Hueske
That looks pretty much like a bug. As you said, fwd fields annotations are optional and may improve the performance of a program, but never change its semantics (if set correctly). I'll have a look at it later. Would be great if you could provide some data to reproduce the bug. On Apr 3, 2015

Re: Storm compatibility layer for Flink (first beta available)

2015-04-02 Thread Fabian Hueske
Hi Matthias, this is really cool!I especially like that you can use Storm code within a Flink streaming program :-) One thing that might be good to do rather soon is to collect all your commits and put them on top of a fresh forked Flink master branch. When merging we cannot change the history

Re: Hadoop ETLing with Flink

2015-04-20 Thread Fabian Hueske
integration option. Although, I think this has not been tried before. So it would be nice to know whether it actually works or not. 2015-04-20 16:44 GMT+02:00 Fabian Hueske fhue...@gmail.com: I agree, that looks very much like a common use case. Right now, there is only support to read from HCatalog

Re: Hadoop ETLing with Flink

2015-04-20 Thread Fabian Hueske
I agree, that looks very much like a common use case. Right now, there is only support to read from HCatalog tables, but not to write data to existing tables or create new ones. Would be a very nice feature to add, IMO. My guess (without having closely looked at the Hadoop HCatOutputFormat) is

Re: Periodic full stream aggregations

2015-04-21 Thread Fabian Hueske
Is it possible to switch the order of the statements, i.e., dataStream.every(Time.of(4,sec)).reduce(...) instead of dataStream.reduce(...).every(Time.of(4,sec)) I think that would be more consistent with the structure of the remaining API. Cheers, Fabian 2015-04-21 10:57 GMT+02:00 Gyula Fóra

Re: [DISCUSS] Flink and Ignite integration

2015-04-29 Thread Fabian Hueske
would be lost. So some kind of disk persistence would be good for certain use cases. 2015-04-29 1:28 GMT+02:00 Dmitriy Setrakyan dsetrak...@apache.org: On Tue, Apr 28, 2015 at 5:55 PM, Fabian Hueske fhue...@gmail.com wrote: Thanks Cos for starting this discussion, hi to the Ignite community

Re: Flink's multi-user support

2015-04-29 Thread Fabian Hueske
components that we changing right now. 2015-04-29 18:11 GMT+02:00 Stephan Ewen se...@apache.org: Tough question. I'd actually rather go for single user and multi user through YARN, than a not really thought through multi-user version. On Wed, Apr 29, 2015 at 5:51 PM, Fabian Hueske fhue

Re: Flink's multi-user support

2015-04-29 Thread Fabian Hueske
I agree that Flink's multi-user support is not very good at the moment. However, dropping it completely instead of improving it would make Flink setups on dedicated clusters quite useless, right? 2015-04-29 17:33 GMT+02:00 Maximilian Michels m...@apache.org: Hi everyone, Currently Flink

Re: Migrating our website from SVN to Git

2015-04-30 Thread Fabian Hueske
excellent! :-) 2015-04-30 11:47 GMT+02:00 Stephan Ewen se...@apache.org: git for the win! On Thu, Apr 30, 2015 at 11:39 AM, Robert Metzger rmetz...@apache.org wrote: Great, thank you for taking care of this. On Thu, Apr 30, 2015 at 11:29 AM, Maximilian Michels m...@apache.org wrote:

Re: Adding a new operator

2015-04-27 Thread Fabian Hueske
not be a good idea to add it as a Flink operator and we will need to evaluate that (as part of the thesis), so we don't have a JIRA for this :-) -Vasia. On 27 April 2015 at 10:20, Fabian Hueske fhue...@gmail.com wrote: Hi Andra, is there a JIRA for the new runtime

Re: Adding a new operator

2015-04-27 Thread Fabian Hueske
Hi Andra, is there a JIRA for the new runtime operator? Adding a new operator is a lot of work and touches many core parts of the system. It would be good to start a discussion about that early in the process to make sure that the design is aligned with the system. Otherwise, duplicated work

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

2015-04-27 Thread Fabian Hueske
2015 at 14:44, Vasiliki Kalavri vasilikikala...@gmail.com wrote: Hi Fabian, thanks for looking into this. Let me know if there's anything I can do to help! Cheers, V. On 3 April 2015 at 22:31, Fabian Hueske fhue...@gmail.com wrote: Thanks for the nice setup! I

Re: New project website

2015-05-11 Thread Fabian Hueske
Hi, I like the new website a lot. Great Job Ufuk! Here are some things that I notices while checking it out: General: - I find the text area a bit too wide for comfortable reading, especially for long texts such as blog posts or the how to contribute guide. Front page: - Stack Figure: Don't

Re: Migrating our website from SVN to Git

2015-05-12 Thread Fabian Hueske
Thanks Max! Happy to have the website on Git :-) 2015-05-11 18:56 GMT+02:00 Maximilian Michels m...@apache.org: We're now on Git for our website! Instructions for changing the website have been updated in the How to contribute guide:

Re: New project website

2015-05-15 Thread Fabian Hueske
+1 one minor thing (could also be fixed later): The head line says only batch and stream processing. It might be good to add data and scalable or large-scale. 2015-05-15 13:56 GMT+02:00 Kostas Tzoumas ktzou...@apache.org: +1 On Fri, May 15, 2015 at 11:49 AM, Vasiliki Kalavri

Re: Gelly Roadmap

2015-05-18 Thread Fabian Hueske
Integration with Apache TinkerPop3 could also be interesting. TinkerPop3 is an API for transactional and analytical graph processing and supported by several Graph engines/databases. It might be interesting to see if/how Gelly's and TinkerPop's concepts match and think about whether it makes

Re: About Operator and OperatorBase

2015-04-16 Thread Fabian Hueske
Renaming the core operators is fine with me, but I would not touch API facing classes. A big +1 for Timo's suggestion. 2015-04-16 6:30 GMT-05:00 Timo Walther twal...@apache.org: I share Stephans opinion. By the way, we could also find a common name for operators with two inputs. Sometimes

Re: [RESULT] [VOTE] Release Apache Flink 0.9.0-milestone-1 (RC1)

2015-04-12 Thread Fabian Hueske
. I hope that is okay for you. On Thu, Apr 9, 2015 at 9:55 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Do you think it could be possible to include the Hadoop outputFormat fix (FLINK-1828)? On Thu, Apr 9, 2015 at 9:42 AM, Fabian Hueske fhue...@gmail.com

Re: [VOTE] Release Apache Flink 0.9.0-milestone-1 (RC1)

2015-04-09 Thread Fabian Hueske
+1 I ran tests the following tests. 1. Cygwin/Windows: - start/stop local - run all examples with build-in data from ./bin/flink - run wordcount with build-in data from webclient - run wordcount with external data - start JM + 2TMs, run wordcount from ./bin/flink 2. Windows native (.bat

Re: [DISCUSS] Create a Flink 0.8.2 release

2015-04-12 Thread Fabian Hueske
We should also get the HadoopOF fix in. On Apr 12, 2015 10:14 AM, Robert Metzger rmetz...@apache.org wrote: Hi, in this thread [1] we started a discussion whether we should cut a 0.8.2 release. We have 7 fixes for 0.8.2:

Re: [QUESTION] Sort Key Types

2015-04-07 Thread Fabian Hueske
unmaintainable. On Tue, Apr 7, 2015 at 10:01 AM, Fabian Hueske fhue...@gmail.com wrote: Regular keys differ from sort keys in that they can be (somehow) sorted, but their order is not necessarily intuitive. So regular keys are sufficient for sort-based grouping, but not for explicit sorting

Re: [QUESTION] Sort Key Types

2015-04-07 Thread Fabian Hueske
Regular keys differ from sort keys in that they can be (somehow) sorted, but their order is not necessarily intuitive. So regular keys are sufficient for sort-based grouping, but not for explicit sorting (groupSort, partitionSort, outputSort). Right now, this difference is only relevant for

Re: Parquet Article / Tutorial

2015-04-07 Thread Fabian Hueske
Very nice article! How about adding the full article to the wiki and having a shorter version as a blog post (with a link to the wiki)? Adding the code to contrib would be great! 2015-04-07 12:45 GMT+02:00 Kostas Tzoumas ktzou...@apache.org: Looks very nice! Would love to see a blog post on

Re: Storm compatibility layer for Flink (first beta available)

2015-04-03 Thread Fabian Hueske
, Thanks, this is a really nice contribution. I just scrolled through the code, but I really like it and big thanks for the the tests for the examples. The rebase Fabian suggested would help a lot when merging. On Thu, Apr 2, 2015 at 9:19 PM, Fabian Hueske fhue...@gmail.com

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

2015-04-03 Thread Fabian Hueske
it separately. The annotation that creates the error is in line #172. Thanks a lot :)) -Vasia. On 3 April 2015 at 13:09, Fabian Hueske fhue...@gmail.com wrote: That looks pretty much like a bug. As you said, fwd fields annotations are optional and may improve the performance of a program

Re: Test sources in wrong folder

2015-04-04 Thread Fabian Hueske
Thanks! 2015-04-04 12:54 GMT+02:00 Flavio Pompermaier pomperma...@okkam.it: I opened a JIRA for this porblem https://issues.apache.org/jira/browse/FLINK-1827. Obviously it's an improvement with minor priority but I think this will be a nice fix for user that want to compile java sources

Re: HBase TableOutputFormat fix (Flink 0.8.1)

2015-04-04 Thread Fabian Hueske
User functions are still serialized using Java serialization, not Kryo. Kryo is only used for data exchange at runtime between tasks. If a function such as your MapFunction has a non-serializable member variable, you need to declare it as transient and initialize it before it is executed, e.g.,

Fwd: [GitHub] incubator-zeppelin pull request: ZEPPELIN-44 Interpreter for Apach...

2015-05-21 Thread Fabian Hueske
Hi Flink folks, the Flink interpreter PR for Apache Zeppelin is blocked by a failing test case (see below). Does anybody have an idea what is going on and can maybe help to resolve the problem? Thanks, Fabian -- Forwarded message -- From: Leemoonsoo g...@git.apache.org Date:

Re: How do we want to maintain our documentation?

2015-06-03 Thread Fabian Hueske
+1 for Robert's suggestion. In fact, I thought this was already our practice. Also I would not allow exceptions from that rule in the stable codebase. Writing documentation and describing how stuff should be used lets you think about it in a different way and can help to make the feature better.

[DISCUSS] TableAPI renaming toTable

2015-06-05 Thread Fabian Hueske
Hi folks, I thought about renaming the TableEnvironment.toTable() method to TableEnvironment.fromDataSet(). This would be closer to SQL FROM and allow to add other methods like fromCSV(), fromHCat(), fromParquet(), fromORC(), etc. If we decide for the renaming, we should do it before the

Re: ALS implementation

2015-06-05 Thread Fabian Hueske
Hi, the problem with the maximum number of recursions is the distribution of join keys. If a partition does not fit into memory, HybridHashJoin tries to solve this problem by recursively partitioning the partition using a different hash function. If join keys are heavily skewed, this strategy

Re: [jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-06-05 Thread Fabian Hueske
The owner of the repository can trigger as many builds on Travis as required including rerunning failed builds. The Apache repository is controlled by the ASF infra team, so we (the Flink community) do not have the rights to retrigger builds. To trigger an initial build on your repository, you

Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Fabian Hueske
Adding one more thing to the list: The code contains a misplaced class (mea culpa) in flink-java, org.apache.flink.api.java.SortPartitionOperator which is API facing and should be moved to the operators package. If we do that after the release, it will break binary compatibility. I created

Re: Build works locally but fails on travis (Storm compatibility)

2015-06-10 Thread Fabian Hueske
Travis caches Maven dependendies and sometimes fails to update them. Try to clear you Travis cache via Settings (up right) - Caches Cheers, Fabian 2015-06-10 14:22 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de: Hi, the current PR of storm compatibility layer builds successfully on

Re: The correct location for zipWithIndex and zipWithUniqueId

2015-06-10 Thread Fabian Hueske
As Andra said, I'd would not add it to the API at this point. However, I don't think it should go into a separate Maven module (flink-contrib) that needs to be added as dependency but rather into some DataSetUtils class in flink-java. We can easily add it to the API later, if necessary. We should

Re: Force enabling checkpoints for iterative streaming jobs

2015-06-10 Thread Fabian Hueske
Without going into the details, how well tested is this feature? The PR only extends one test by a few lines. Is that really enough to ensure that 1) the change does not cause trouble 2) is working as expected If this feature should go into the release, it must be thoroughly checked and we must

Re: Failing tests policy

2015-06-04 Thread Fabian Hueske
I think the problem is less with bugs being introduced by new commits but rather bugs which are already in the code base. 2015-06-04 11:52 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de: I have another idea: the problem is, that some commit might de-stabilize a former stable test.

Re: Quickstart POMs

2015-06-18 Thread Fabian Hueske
Why? mvn package builds the program correctly, no? On Jun 18, 2015 16:53, Ufuk Celebi u...@apache.org wrote: On 18 Jun 2015, at 16:49, Fabian Hueske fhue...@gmail.com wrote: I don't think that many users care about the internals of the quickstart pom file and are just happy if it works

Re: Flink Runtime Exception

2015-06-19 Thread Fabian Hueske
woops, sorry! Whenever I read the word deadlock I getting a bit nervous and distracted ;-) 2015-06-19 15:21 GMT+02:00 Till Rohrmann trohrm...@apache.org: I think Andra wrote that there is *no deadlock*. On Fri, Jun 19, 2015 at 3:18 PM Fabian Hueske fhue...@gmail.com http://mailto:fhue

Re: Reduce combiner not chained

2015-06-19 Thread Fabian Hueske
This is not a bug. Chained combiners are not supported for ReduceFunctions yet. :-( I updated the JIRA accordingly. 2015-06-19 13:04 GMT+02:00 Ufuk Celebi u...@apache.org: Hey all, on the current master running the WordCount example with a text file input/output results and a manual reduce

Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Fabian Hueske
Yes, that needs to be fixed IMO 2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org: Yes since it is clearly a deadlock in the scheduler, the current version shouldn't be released. On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote: On 10 Jun 2015, at 16:18,

Re: Testing Apache Flink 0.9.0-rc1

2015-06-11 Thread Fabian Hueske
a new release candidate later on. I think we have gotten the most critical issues out of the way. Would that be ok for you? On Wed, Jun 10, 2015 at 5:56 PM, Fabian Hueske fhue...@gmail.com wrote: Yes, that needs to be fixed IMO 2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org

Re: Run scala.App on Cluster

2015-06-10 Thread Fabian Hueske
Hi, use ./bin/flink run -c your.MainClass yourJar to specify the Main class. Check the documentation of the CLI client for details. Cheers, Fabian On Jun 10, 2015 22:24, Felix Neutatz neut...@googlemail.com wrote: Hi, I try to run this Scala program:

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Fabian Hueske
I have another fix, but this is just a documentation update (FLINK-2207) and will be done soon. 2015-06-12 10:02 GMT+02:00 Maximilian Michels m...@apache.org: We should have a nightly cluster test for every library. Let's keep that in mind for the future. Very nice find, Till! Since there

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Fabian Hueske
+1 for b) I'm organizing + merging the commits that need to go the new candidate right now. Will let you know, when I am done. 2015-06-12 14:03 GMT+02:00 Till Rohrmann till.rohrm...@gmail.com: I'm in favour of option b) as well. On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi u...@apache.org

Re: The correct location for zipWithIndex and zipWithUniqueId

2015-06-12 Thread Fabian Hueske
datasets. On Wed, Jun 10, 2015 at 10:56 AM, Fabian Hueske fhue...@gmail.com wrote: As Andra said, I'd would not add it to the API at this point. However, I don't think it should go into a separate Maven module (flink-contrib) that needs to be added as dependency but rather

Re: [DISCUSS] Consolidate method naming between the batch and streaming API

2015-06-01 Thread Fabian Hueske
Thanks for bringing up this point! +1 for the renaming. @Marton: Is this a complete list, i.e., did you go through both APIs or might there be more methods that are semantically identical but named differently? 2015-06-01 17:31 GMT+02:00 Gyula Fóra gyf...@apache.org: +1 for the changes

Re: SQL on Flink

2015-05-27 Thread Fabian Hueske
to Flink if Flink doesn't provide enough typing meta-data to do traditional SQL. On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske fhue...@gmail.com wrote: Hi, Flink's Table API is pretty close to what SQL provides. IMO, the best approach would be to leverage

Re: SQL on Flink

2015-05-27 Thread Fabian Hueske
PM, Fabian Hueske fhue...@gmail.com wrote: Hi, Flink's Table API is pretty close to what SQL provides. IMO, the best approach would be to leverage that and build a SQL parser (maybe together with a logical optimizer) on top of the Table API. Parser

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Fabian Hueske
+1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution?

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Fabian Hueske
printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Fabian Hueske
) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it printXYZ, since print()'s behavior changed

Re: Monitoring a Flink Job

2015-06-30 Thread Fabian Hueske
the computation... On Mon, Jun 29, 2015 at 1:58 PM, Fabian Hueske fhue...@gmail.com wrote: Have you tried to use a custom accumulator that just appends to a list? 2015-06-29 12:59 GMT+02:00 Andra Lungu lungu.an...@gmail.com: Hey Fabian, I am aware of the way open, preSuperstep

  1   2   3   4   5   6   7   8   9   10   >