Hi. You should send your PR to apache/flink-web repository not your flink-web
repository.
Regards,
Chiwan Park
On Jun 5, 2015, at 2:46 PM, Lokesh Rajaram rajaram.lok...@gmail.com wrote:
Hello,
For JIRA FLINK-2155 updated the document and created a pull request with
flink-web project as
I think that the NPE in second condition is bug in HashTable.
I just found that ConnectedComponents with small memory segments causes same
error. (I thought I fixed the bug, but It is still alive.)
Regards,
Chiwan Park
On Jun 5, 2015, at 2:35 AM, Felix Neutatz neut...@googlemail.com wrote:
Hi,
I have the following use case: I want to to regression for a timeseries
dataset like:
id, x1, x2, ..., xn, y
id = point in time
x = features
y = target value
In the Flink frame work I would map this to a LabeledVector (y,
DenseVector(x)). (I don't want to use the id as a feature)
When I
Hi Felix,
Passing a JoinHint to your function should help.
see:
http://mail-archives.apache.org/mod_mbox/flink-user/201504.mbox/%3ccanc1h_vffbqyyiktzcdpihn09r4he4oluiursjnci_rwc+c...@mail.gmail.com%3E
Cheers,
Andra
On Thu, Jun 4, 2015 at 7:07 PM, Felix Neutatz neut...@googlemail.com
wrote:
I think it is not a problem of join hints, but rather of too little memory
for the join operator. If you set the temporary directory, then the job
will be split in smaller parts and thus each operator gets more memory.
Alternatively, you can increase the memory you give to the Task Managers.
The
Wouldn't this kind of cross-task communication break the whole dataflow
abstraction? How can recovery be implemented if we allowed something like
this?
On Thu, Jun 4, 2015 at 5:14 PM, Stephan Ewen se...@apache.org wrote:
That is not what Ufuk said. You can use a singleton auxiliary task that
For linear regression, the main tasks are computing the covariance
matrix and X * y, which can both be parallelized well, and then you
need to solve a linear equation whose dimension consists of the number
of features. So if number of features is small, it actually makes
sense to do the setup in
I agree that given a small data set it's probably better to solve the
linear regression problem directly. However, I'm not so sure how well this
performs if the data gets really big (more in terms of number of data
points). But maybe we can find something like a sweet spot when to switch
between
On 04 Jun 2015, at 17:02, Maximilian Michels m...@apache.org wrote:
I think ResultPartition is a pretty accurate description of what it is: a
partition of the result of an operator. ResultStream on the other hand,
seems very generic to me. Just because we like to think of Flink nowadays
as a
Aljoscha Krettek created FLINK-2163:
---
Summary: VertexCentricConfigurationITCase sometimes fails on Travis
Key: FLINK-2163
URL: https://issues.apache.org/jira/browse/FLINK-2163
Project: Flink
I think ResultPartition is a pretty accurate description of what it is: a
partition of the result of an operator. ResultStream on the other hand,
seems very generic to me. Just because we like to think of Flink nowadays
as a streaming data flow engine, we don't have to change the core
classes'
That is not what Ufuk said. You can use a singleton auxiliary task that
communicates in both directions with the vertices and acts as a coordinator
between vertices on the same level.
On Thu, Jun 4, 2015 at 2:55 PM, Gyula Fóra gyula.f...@gmail.com wrote:
Thank you!
I was aware of the
I am using eclipse kepler
I tried to replicate the same problem in another workspace.
When i try to test the plugin using Junit PlugIn Test it throws me the
classNotFoundException.
However, when i try to test it as JunitTest,it works fine.
Am i missing something here?
--
View this message in
+1 :-)
On Wed, Jun 3, 2015 at 4:53 PM, Vasiliki Kalavri vasilikikala...@gmail.com
wrote:
Hi Sachin,
great idea to keep a blog! Thanks a lot for sharing :))
-V.
On 3 June 2015 at 16:41, Sachin Goel sachingoel0...@gmail.com wrote:
Hi everyone
I'm maintaining a blog detailing my work
I tend to agree with Ufuk, although it would be nice to fix them very quickly.
On Thu, Jun 4, 2015 at 1:26 AM, Stephan Ewen se...@apache.org wrote:
@matthias: That is the implicit policy right now. Seems not to work...
On Thu, Jun 4, 2015 at 12:40 AM, Matthias J. Sax
I think, people should be forced to fixed failing tests asap. One way to
go, could be to lock the master branch until the test is fixed. If
nobody can push to the master, pressure is very high for the responsible
developer to get it done asap. Not sure if this is Apache compatible.
Just a thought
Hi Admin
Do we have insert, update and remove operations on Apache Flink?
For example: I have 10 million records in my test file. I want to add one
record, update one record and remove one record from this test file.
How to implement it by Flink?
Thanks.
Best regards
Hi Admin
Do we have insert, update and remove operations on Apache Flink?
For example: I have 10 million records in my test file. I want to add one
record, update one record and remove one record from this test file.
How to implement it by Flink?
Thanks.
Best regards
The tests that Ufuk is referring to are not deterministically failing. This is
about hard to debug and hard to fix tests where it is not clear who broke them.
Fixing such a test can take a several days or even more… So locking the master
branch is not an option IMO.
Deactivating the tests
I'm also in favour of quickly fixing the failing test cases but I think
that blocking the master is a kind of drastic measure. IMO this creates a
culture of blaming someone whereas I would prefer a more proactive
approach. When you see a failing test case and know that someone recently
worked on
Yes, this is indeed a big change, but it was openly discussed multiple
times here on the mailing list and in a number of PRs. I am pretty sure
that we do not want to break the source interface any more, but there is
still some open discussion on it. Let us keep an eye on PR 742 where it is
I am simply thinking about the best way to send data to different subtasks
of the same operator.
Can we go back to the original question? :D
Stephan Ewen se...@apache.org ezt írta (időpont: 2015. jún. 3., Sze,
23:45):
I think that it may be a bit pre-mature to invest heavily into the parallel
I have another idea: the problem is, that some commit might de-stabilize
a former stable test. This in not detected, because the build was
(accidentally) green and the code in merged.
We could reduce the probability that this happens, if a pull request
must pass the test-run multiple times (maybe
Thanks for your feedback. I am neither running IPSec nor the aesni-intel module.
So far, I could not reproduce the reordering issue. I also have detected that
my code might have created String objects with invalid UTF16 content in exactly
those jobs that suffered from the reordering. I wanted
I agree. It does not help with the current unstable tests. However, I
can help to prevent to run into instability issues in the future.
On 06/04/2015 11:58 AM, Fabian Hueske wrote:
I think the problem is less with bugs being introduced by new commits but
rather bugs which are already in the
There is no lateral communication right now. Typical pattern is to break
it up in two operators that communicate in an all-to-all fashion.
On Thu, Jun 4, 2015 at 11:52 AM, Gyula Fóra gyula.f...@gmail.com wrote:
I am simply thinking about the best way to send data to different subtasks
of the
+1 for your proposed changes, Robert. I would argue that is even more
crucial that big pull requests contain documentation because a lot of times
only the contributor can create this documentation. Additionally,
documentation makes reviewing a pull request much easier.
Fragmented documentation is
Rename what to streams? Do you mean ResultPartition = StreamPartition?
I'm not sure if that makes it easier to understand what the classes do.
On Mon, Jun 1, 2015 at 10:11 AM, Aljoscha Krettek aljos...@apache.org
wrote:
+1
I like it. We are a streaming system underneath after all.
On Jun 1,
Hi,
I played a bit with the ALS recommender algorithm. I used the movielens
dataset: http://files.grouplens.org/datasets/movielens/ml-latest-README.html
The rating matrix has 21.063.128 entries (ratings).
I run the algorithm with 3 configurations:
1. standard jvm heap space:
val als = ALS()
I think both are bugs. They are triggered by the different memory
configurations.
@chiwan: is the 2nd error fixed by your recent change?
@felix: if yes, can you try the 2nd run again with the changes?
On Thursday, June 4, 2015, Felix Neutatz neut...@googlemail.com wrote:
Hi,
I played a bit
Hi. The second bug is fixed by the recent change in PR.
But there is just no test case for first bug.
Regards,
Chiwan Park
On Jun 4, 2015, at 5:09 PM, Ufuk Celebi u...@apache.org wrote:
I think both are bugs. They are triggered by the different memory
configurations.
@chiwan: is the 2nd
The back-and-forth on the Source interface was unfortunate, yes.
In general, I think, that we should not doctor around on other
peoples's pull requests in semi secrecy. Some small cosmetic fixes or
rewordings of the commit message are OK. But if the PR needs rework
then this should be voiced in
Till Rohrmann created FLINK-2156:
Summary: Scala modules cannot create logging file
Key: FLINK-2156
URL: https://issues.apache.org/jira/browse/FLINK-2156
Project: Flink
Issue Type: Bug
If the first error is not fixed by Chiwans PR, then we should create a JIRA
for it to not forget it.
@Felix: Chiwan's PR is here [1]. Could you try to run ALS again with this
version?
Cheers,
Till
[1] https://github.com/apache/flink/pull/751
On Thu, Jun 4, 2015 at 10:10 AM, Chiwan Park
Robert Metzger created FLINK-2158:
-
Summary: NullPointerException in DateSerializer.
Key: FLINK-2158
URL: https://issues.apache.org/jira/browse/FLINK-2158
Project: Flink
Issue Type: Bug
Aljoscha Krettek created FLINK-2160:
---
Summary: Change Streaming Source Interface to run(Context)/cancel()
Key: FLINK-2160
URL: https://issues.apache.org/jira/browse/FLINK-2160
Project: Flink
I think the problem is less with bugs being introduced by new commits but
rather bugs which are already in the code base.
2015-06-04 11:52 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de:
I have another idea: the problem is, that some commit might de-stabilize
a former stable test.
At the moment the current SGD implementation works like (modulo
regularization): newWeights = oldWeights - adaptedStepsize *
sumOfGradients/numberOfGradients where adaptedStepsize =
initialStepsize/sqrt(iterationNumber) and sumOfGradients is the simple sum
of the gradients for all points in the
Big +1 :)
On 06/04/2015 01:33 PM, Robert Metzger wrote:
I would also say that in particular big changes should include an update to
the documentation as well!
I'll add a rule to the guidelines and I'll start annoying you to write
documentation in pull requests.
On Thu, Jun 4, 2015 at
Thanks for helping us debug this.
You can start many taskmanagers in one JVM, by using the LocalMiniCluster.
Have a look at this (manually triggered) test, which runs 100 TaskManagers
in one JVM:
Resolved in https://issues.apache.org/jira/browse/FLINK-2070.
I'll update the documentation.
On Thu, Jun 4, 2015 at 12:22 AM, Stephan Ewen se...@apache.org wrote:
I'll prepare a fix...
On Wed, Jun 3, 2015 at 10:24 PM, Stephan Ewen se...@apache.org wrote:
+1 for printOnTaskManager(prefix)
Till Rohrmann created FLINK-2162:
Summary: Implement adaptive learning rate strategies for SGD
Key: FLINK-2162
URL: https://issues.apache.org/jira/browse/FLINK-2162
Project: Flink
Issue
On 04 Jun 2015, at 13:10, Maximilian Michels m...@apache.org wrote:
Rename what to streams? Do you mean ResultPartition = StreamPartition?
Exactly along those lines, but maybe ResultStream.
I'm not sure if that makes it easier to understand what the classes do.
It fits better into the
On 04 Jun 2015, at 12:46, Stephan Ewen se...@apache.org wrote:
There is no lateral communication right now. Typical pattern is to break
it up in two operators that communicate in an all-to-all fashion.
You can look at the iteration tasks: the iteration sync task is communicating
with the
I would also say that in particular big changes should include an update to
the documentation as well!
I'll add a rule to the guidelines and I'll start annoying you to write
documentation in pull requests.
On Thu, Jun 4, 2015 at 1:06 PM, Maximilian Michels m...@apache.org wrote:
+1 for your
On 03 Jun 2015, at 17:00, Robert Metzger rmetz...@apache.org wrote:
What is the status of the 0.9 release planning.
It seems like many of the open issues from the document have been closed.
When do you think are we able to fork off the release-0.9 branch and
create the first RC ?
It would
Thanks Stephan for clarifying :)
@kostas: i am just playing around with some ideas. Only in my head so far,
so lets not worry about these things
On Thu, Jun 4, 2015 at 6:33 PM Kostas Tzoumas ktzou...@apache.org wrote:
Wouldn't this kind of cross-task communication break the whole dataflow
It's true that we can and should look into methods to make sgd more
resilient, however, especially for linear regression, which even has a
closed form solution, all this seems too excessive.
I mean in the end, if the number of features is small (lets say less
than 2000), the best way is to
On Thu, Jun 4, 2015 at 1:26 PM, Till Rohrmann trohrm...@apache.org wrote:
Maybe also the default learning rate of 0.1 is set too high.
Could be.
But grid search on learning rate is pretty standard practice. Running
multiple learning engines at the same time with different learning rates is
Thank you!
I was aware of the iterations as a possibility, but I was wondering if we
might have lateral communications.
Ufuk Celebi u...@apache.org ezt írta (időpont: 2015. jún. 4., Cs, 13:29):
On 04 Jun 2015, at 12:46, Stephan Ewen se...@apache.org wrote:
There is no lateral communication
+1 for simple learning for simple cases.
Where normal equations have a reasonable condition number, using them is
good.
For large sparse systems, SGD with Adagrad will crush direct solutions,
however, even for linear problems.
On Thu, Jun 4, 2015 at 2:38 PM, Mikio Braun
51 matches
Mail list logo