Re: ALS implementation

2015-06-05 Thread Stephan Ewen
There are two different issues here: 1) Flink does figure out how much memory a join gets, but that memory may be too little for the join to accept it. Flink plans highly conservative right now - too conservative often, which is something we have on the immediate roadmap to fix. 2) The Hash Join

Re: QA-Bot

2015-06-05 Thread Robert Metzger
It is getting a -1 because the QA bot is broken. Its just a minor issue I guess. On Fri, Jun 5, 2015 at 10:46 AM, Ufuk Celebi u...@apache.org wrote: I didn't see these emails before. I think it needs more love at the moment. Your simple docs change ( https://github.com/apache/flink/pull/786)

Re: Planning the 0.9 Release

2015-06-05 Thread Robert Metzger
I'll address the remaining documentation issues today. What about - Sync Streaming Java/Scala API - Consolidate names across batch/streaming (discussion) - Merge static code analysis and the gelly TODOs - FLINK-1522 Add tests for the library methods and examples -

[DISCUSS] TableAPI renaming toTable

2015-06-05 Thread Fabian Hueske
Hi folks, I thought about renaming the TableEnvironment.toTable() method to TableEnvironment.fromDataSet(). This would be closer to SQL FROM and allow to add other methods like fromCSV(), fromHCat(), fromParquet(), fromORC(), etc. If we decide for the renaming, we should do it before the

Re: [DISCUSS] TableAPI renaming toTable

2015-06-05 Thread Stephan Ewen
Great! On Fri, Jun 5, 2015 at 10:24 AM, Aljoscha Krettek aljos...@apache.org wrote: Of course, everything that doesn't need windows is supported. On Fri, Jun 5, 2015 at 10:21 AM, Robert Metzger rmetz...@apache.org wrote: +1 I didn't know that the Table API supports DataStreams as well.

Re: pull request for FLINK-2155 documentation

2015-06-05 Thread Lokesh Rajaram
Thanks Chiwan. fixed it. On Thu, Jun 4, 2015 at 10:48 PM, Chiwan Park chiwanp...@icloud.com wrote: Hi. You should send your PR to apache/flink-web repository not your flink-web repository. Regards, Chiwan Park On Jun 5, 2015, at 2:46 PM, Lokesh Rajaram rajaram.lok...@gmail.com wrote:

[jira] [Created] (FLINK-2164) Document batch and streaming startup modes

2015-06-05 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2164: - Summary: Document batch and streaming startup modes Key: FLINK-2164 URL: https://issues.apache.org/jira/browse/FLINK-2164 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-2165) Rename Table conversion methods in TableEnvironment

2015-06-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2165: Summary: Rename Table conversion methods in TableEnvironment Key: FLINK-2165 URL: https://issues.apache.org/jira/browse/FLINK-2165 Project: Flink Issue

Re: ALS implementation

2015-06-05 Thread Felix Neutatz
Shouldn't Flink figure it out on its own, how much memory there is for the join? The detailed trace for the Nullpointer exception can be found here: https://github.com/FelixNeutatz/IMPRO-3.SS15/blob/8b679f1c2808a2c6d6900824409fbd47e8bed826/NullPointerException.txt Best regards, Felix 2015-06-04

Re: [DISCUSS] TableAPI renaming toTable

2015-06-05 Thread Aljoscha Krettek
Of course, everything that doesn't need windows is supported. On Fri, Jun 5, 2015 at 10:21 AM, Robert Metzger rmetz...@apache.org wrote: +1 I didn't know that the Table API supports DataStreams as well. On Fri, Jun 5, 2015 at 10:18 AM, Aljoscha Krettek aljos...@apache.org wrote: +1, then

QA-Bot

2015-06-05 Thread Robert Metzger
Did they really come to the dev@ list? The QA Bot is automatically testing pull requests for stuff we can not cover with maven. The checks the QA bot is performing are in tools/qa-check.sh. Currently, we have checks for: - javadoc errors - compiler warnings - files in lib/ - @author tag I think

Re: ALS implementation

2015-06-05 Thread Fabian Hueske
Hi, the problem with the maximum number of recursions is the distribution of join keys. If a partition does not fit into memory, HybridHashJoin tries to solve this problem by recursively partitioning the partition using a different hash function. If join keys are heavily skewed, this strategy

[jira] [Created] (FLINK-2167) Add fromHCat() to TableEnvironment

2015-06-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2167: Summary: Add fromHCat() to TableEnvironment Key: FLINK-2167 URL: https://issues.apache.org/jira/browse/FLINK-2167 Project: Flink Issue Type: New Feature

[jira] [Created] (FLINK-2170) Add fromOrcFile() to TableEnvironment

2015-06-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2170: Summary: Add fromOrcFile() to TableEnvironment Key: FLINK-2170 URL: https://issues.apache.org/jira/browse/FLINK-2170 Project: Flink Issue Type: New Feature

[jira] [Created] (FLINK-2169) Add fromParquet() to TableEnvironment

2015-06-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2169: Summary: Add fromParquet() to TableEnvironment Key: FLINK-2169 URL: https://issues.apache.org/jira/browse/FLINK-2169 Project: Flink Issue Type: New Feature

Re: Planning the 0.9 Release

2015-06-05 Thread Stephan Ewen
Thanks Vasia! We should clearly label Gelly as Work in Progress and at Beta status, then it should be okay. This is very fair, it is the first version, people understand that. In that sense, let us not call Spargel deprecated in favor of Gelly yet, but do that for the next release when Gelly is

Re: Planning the 0.9 Release

2015-06-05 Thread Andra Lungu
The Pregel-like, vertex-cenric part of Gelly is as stable as it will ever be. I vote for deprecating Spargel in this release, but keep in mind that this is just an opinion :) On Fri, Jun 5, 2015 at 1:50 PM, Stephan Ewen se...@apache.org wrote: Okay, I was not aware it is only two missing tests.

Re: Planning the 0.9 Release

2015-06-05 Thread Vasiliki Kalavri
Hi, let me clarify: FLINK-1252 is missing a test for PageRank (which might not even be needed, since the implementation is basically identical to the existing Spargel one) and a test for MusicProfiles, which is basically using LabelProgation (and we have a separate test for this). FLINK-1943 is

Re: QA-Bot

2015-06-05 Thread Matthias J. Sax
Robert, your are right. It is not dev list, but issues list. (It says reply-to dev -- I mixed it up). Thanks for the explanation. :) On 06/05/2015 10:50 AM, Robert Metzger wrote: It is getting a -1 because the QA bot is broken. Its just a minor issue I guess. On Fri, Jun 5, 2015 at 10:46

Re: Planning the 0.9 Release

2015-06-05 Thread Andra Lungu
Hi Stephan, I don't know if I have a saying in this, but I will give it a go :) The two unsolved issues don't affect the functionality at all. Gelly can, at the moment, support anything Spargel could. There is a guide in the documentation explaining how to migrate Spargel code to Gelly. I don't

Re: Local Python Test Execution Problem [Bug in Python Layer?]

2015-06-05 Thread Matthias J. Sax
I just figured out that the missing file is placed at a different location: There is: /tmp/flink_data/output But python looks in /tmp/users/1000/flink_data/output (1000 is my Linux user-id) I guess, the test expects that the file is created from the test base class and python looks for the

Re: [jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-06-05 Thread Fabian Hueske
The owner of the repository can trigger as many builds on Travis as required including rerunning failed builds. The Apache repository is controlled by the ASF infra team, so we (the Flink community) do not have the rights to retrigger builds. To trigger an initial build on your repository, you

Re: Planning the 0.9 Release

2015-06-05 Thread Stephan Ewen
I will address the ExecutionGraphDeadlock today... On Fri, Jun 5, 2015 at 1:40 PM, Stephan Ewen se...@apache.org wrote: Thanks Vasia! We should clearly label Gelly as Work in Progress and at Beta status, then it should be okay. This is very fair, it is the first version, people understand

Re: Planning the 0.9 Release

2015-06-05 Thread Stephan Ewen
Okay, I was not aware it is only two missing tests. That is not that big a deal. I am not very attached to the Spargel Stuff, I just want to make sure we do not deprecate something that works well for something that is still work in progress. On Fri, Jun 5, 2015 at 1:46 PM, Andra Lungu

Local Python Test Execution Problem

2015-06-05 Thread Matthias J. Sax
Hi, I have a local setup problem on my Linux that let Python tests fail. For some reason, it cannot write to /tmp/ file... (see error message below). I can resolve the issue with sudo rm -rf /tmp/*, but this raises other problems on my system. Furthermore, the testing problem is back, after

Re: Commment slaves

2015-06-05 Thread Maximilian Michels
+1 Would be a useful feature! On Fri, Jun 5, 2015 at 3:48 PM, Stephan Ewen se...@apache.org wrote: Definitely. The slaves file is only evaluated in the start-cluster.sh bash script. If you want, you can try and add code to respect comments there. On Fri, Jun 5, 2015 at 3:34 PM, Flavio

Re: Planning the 0.9 Release

2015-06-05 Thread Stephan Ewen
Fair enough about including the issues into 0.9.1 Concerning Gelly, would you recommend people to use that in production today? If not, it would be nice to have some non-deprecated code where we are confident about that. On Fri, Jun 5, 2015 at 2:08 PM, Vasiliki Kalavri vasilikikala...@gmail.com

Re: Commment slaves

2015-06-05 Thread Stephan Ewen
Definitely. The slaves file is only evaluated in the start-cluster.sh bash script. If you want, you can try and add code to respect comments there. On Fri, Jun 5, 2015 at 3:34 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi flinkers, at the moment it's not possible to comment slaves in

Commment slaves

2015-06-05 Thread Flavio Pompermaier
Hi flinkers, at the moment it's not possible to comment slaves in the slaves config file (eg #myserver.org). Don't you think it could be a useful feature? Best, Flavio

Re: Planning the 0.9 Release

2015-06-05 Thread Vasiliki Kalavri
If you want my personal opinion, I'd say yes. I am currently using Gelly for all my projects. For one of them, we have been running experiments over the last 4 months and we'll be deploying it in production very soon :) Gelly did not change any internals or runtime features; it simply builds on

Re: Planning the 0.9 Release

2015-06-05 Thread Stephan Ewen
Okay, then I agree with you! Thanks for clarifying that :-) On Fri, Jun 5, 2015 at 4:13 PM, Vasiliki Kalavri vasilikikala...@gmail.com wrote: If you want my personal opinion, I'd say yes. I am currently using Gelly for all my projects. For one of them, we have been running experiments over

Re: Planning the 0.9 Release

2015-06-05 Thread Maximilian Michels
Hi everyone, I'm excited about the upcoming release. I think a few issues still need to be addressed. At least for me, those were - fixing errors messages on builds with the JDK8 - removing Apache thrift dependencies as of https://issues.apache.org/jira/browse/FLINK-1635 - Possibly fix an issue

[jira] [Created] (FLINK-2175) Allow multiple jobs in single jar file

2015-06-05 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2175: -- Summary: Allow multiple jobs in single jar file Key: FLINK-2175 URL: https://issues.apache.org/jira/browse/FLINK-2175 Project: Flink Issue Type:

[jira] [Created] (FLINK-2176) Add support for ProgramDesctiption interface in clients

2015-06-05 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2176: -- Summary: Add support for ProgramDesctiption interface in clients Key: FLINK-2176 URL: https://issues.apache.org/jira/browse/FLINK-2176 Project: Flink

Re: Closing JIRA issues

2015-06-05 Thread Lokesh Rajaram
Thanks Marton. On Fri, Jun 5, 2015 at 8:29 AM, Márton Balassi balassi.mar...@gmail.com wrote: Hey Lokesh, The implicit practice is that the committer merging you PR closes the JIRA. Please do not close the JIRA, before your patch is merged. If the committer forgets to close it after your

[jira] [Created] (FLINK-2173) Python used diffente tmp file than Flink

2015-06-05 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2173: -- Summary: Python used diffente tmp file than Flink Key: FLINK-2173 URL: https://issues.apache.org/jira/browse/FLINK-2173 Project: Flink Issue Type: Bug

Re: Closing JIRA issues

2015-06-05 Thread Maximilian Michels
Hi Lokesh, It depends. If you created the issue and submitted the pull request, then you are free to close the issue. A reviewer or another committer might actually mark the issue as resolved beforehand because he thinks that everything is fixed. However, only the original reporter should close

Re: Closing JIRA issues

2015-06-05 Thread Lokesh Rajaram
Got it. Thanks for detailed explanation. On Fri, Jun 5, 2015 at 8:34 AM, Maximilian Michels m...@apache.org wrote: Hi Lokesh, It depends. If you created the issue and submitted the pull request, then you are free to close the issue. A reviewer or another committer might actually mark the

[jira] [Created] (FLINK-2172) Stabilize SocketOutputFormatTest

2015-06-05 Thread JIRA
Márton Balassi created FLINK-2172: - Summary: Stabilize SocketOutputFormatTest Key: FLINK-2172 URL: https://issues.apache.org/jira/browse/FLINK-2172 Project: Flink Issue Type: Test

[jira] [Created] (FLINK-2174) Allow comments in 'slaves' file

2015-06-05 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2174: -- Summary: Allow comments in 'slaves' file Key: FLINK-2174 URL: https://issues.apache.org/jira/browse/FLINK-2174 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-2168) Add fromHBase() to TableEnvironment

2015-06-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2168: Summary: Add fromHBase() to TableEnvironment Key: FLINK-2168 URL: https://issues.apache.org/jira/browse/FLINK-2168 Project: Flink Issue Type: New Feature

Re: Planning the 0.9 Release

2015-06-05 Thread Vasiliki Kalavri
Hi all, regarding the 2 gelly issues, I'm sorry but I haven't had time to work on these. And most certainly I won't be able to work on these today :S In any case, I wouldn't consider them blocker issues, so if you agree, please go ahead with the release candidate. -Vasia. On 5 June 2015 at

[jira] [Created] (FLINK-2171) Add instruction to build Flink with Scala 2.11

2015-06-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2171: Summary: Add instruction to build Flink with Scala 2.11 Key: FLINK-2171 URL: https://issues.apache.org/jira/browse/FLINK-2171 Project: Flink Issue Type:

Closing JIRA issues

2015-06-05 Thread Lokesh Rajaram
Hello, Typically when a pull request is accepted for a JIRA issue, can I close the JIRA or should I wait for pull request reviewer to close the issue. it's done differently in various teams/projects. Just would like to know how it's done here in Flink project so that I can follow for future pull

Re: Local Python Test Execution Problem [Bug in Python Layer?]

2015-06-05 Thread Maximilian Michels
Hi Matthias, Exactly. The Java program gets the temp file path using System.getProperty( java.io.tmpdir) while Python uses the tempfile.gettempdir() method. Turns out, they are semantically different in your case. This is one of the problems with bootstraping a memory mapped file communication.

Re: Closing JIRA issues

2015-06-05 Thread Márton Balassi
Hey Lokesh, The implicit practice is that the committer merging you PR closes the JIRA. Please do not close the JIRA, before your patch is merged. If the committer forgets to close it after your code is in you are very welcome to close it yourself. Best, Marton On Fri, Jun 5, 2015 at 5:24 PM,

[jira] [Created] (FLINK-2177) NillPointer in task resource release

2015-06-05 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2177: --- Summary: NillPointer in task resource release Key: FLINK-2177 URL: https://issues.apache.org/jira/browse/FLINK-2177 Project: Flink Issue Type: Bug

Re: ALS implementation

2015-06-05 Thread Till Rohrmann
I'll look into it to find the responsible join operation. On Jun 5, 2015 10:50 AM, Stephan Ewen se...@apache.org wrote: There are two different issues here: 1) Flink does figure out how much memory a join gets, but that memory may be too little for the join to accept it. Flink plans highly

Travis build issue

2015-06-05 Thread Sachin Goel
Travis build is failing on SimpleRecoveryITCase. Randomly again, I think. Also, the KafkaITCase. Are these also known issues? Regards Sachin Goel