You are right, to have consistent results we would need to persist the
records.
But since we cannot do that right now, we can still checkpoint all operator
states and understand that inflight records in the loop are lost on failure.
This is acceptable for most the use-cases that we have
Fabian Hueske created FLINK-2198:
Summary: Build fails on Windows
Key: FLINK-2198
URL: https://issues.apache.org/jira/browse/FLINK-2198
Project: Flink
Issue Type: Bug
Components:
Regarding the iteration partitioning feature, since I use it of course I
find it very useful, but it is true that it needs to be tested more
extensively and also be discussed by the community before it is added in a
release.
Moreover, given the fact that I can still use it for research purposes (I
I agree with Theo. I think it’s a nice feature to have as part of the
standard API because only few users will be aware of something like
DataSetUtils. However, as a first version we can make it part of
DataSetUtils.
Cheers,
Till
On Wed, Jun 10, 2015 at 11:52 AM Theodore Vasiloudis
I don't understand why having the state inside an iteration but not
the elements that correspond to this state or created this state is
desirable. Maybe an example could help understand this better?
On Wed, Jun 10, 2015 at 11:27 AM, Gyula Fóra gyula.f...@gmail.com wrote:
The other tests verify
Here is an example for you:
Parallel streaming kmeans, the state we keep is the current cluster
centers, and we use iterations to sync the centers across parallel
instances.
We can afford lost model updated in the loop but we need the checkpoint the
models.
I don't understand the question, I vote for checkpointing all state in the
job, even inside iterations (its more of a loop).
Aljoscha Krettek aljos...@apache.org ezt írta (időpont: 2015. jún. 10.,
Sze, 12:34):
I don't understand why having the state inside an iteration but not
the elements
Till Rohrmann created FLINK-2194:
Summary: Type extractor does not support Writable type
Key: FLINK-2194
URL: https://issues.apache.org/jira/browse/FLINK-2194
Project: Flink
Issue Type: Bug
And also I would like to remind everyone that any fault tolerance we
provide is only as good as the fault tolerance of the master node. Which is
non existent at the moment.
So I don't see a reason why a user should not be able to choose whether he
wants state checkpoints for iterations as well.
Adding one more thing to the list:
The code contains a misplaced class (mea culpa) in flink-java,
org.apache.flink.api.java.SortPartitionOperator which is API facing and
should be moved to the operators package. If we do that after the release,
it will break binary compatibility. I created
I disagree. Not having checkpointed operators inside the iteration still
breaks the guarantees.
It is not about the states it is about the loop itself.
On Wed, Jun 10, 2015 at 10:12 AM Aljoscha Krettek aljos...@apache.org
wrote:
This is the answer I gave on the PR (we should have one place for
To continue Gyula's point, for consistent snapshots we need to persist the
records in transit within the loop and also slightly change the current
protocol since it works only for DAGs. Before going into that direction though
I would propose we first see whether there is a nice way to make
Hey Gyula, Max,
On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote:
This feature needs to be included in the release, it has been tested and
used extensively. And many applciations depend on it.
It would be nice to announce/discuss this before just cherry-picking it into
the
Hello,
I would like to ask if it would be OK if I added flink-scala as a
dependency to flink-streaming-core. An alternative would be to move
the Scala typeutils to flink-core (to where the Java typeutils are).
Why I need this:
While I am implementing the fast median calculation for windows as
I added a section at the top of the release testing document to keep
track of commits that we might want to cherry-pick to the release. I
included the YARNSessionFIFOITCase fix and the optional stream
iteration partitioning (both already on release branch).
On Wed, Jun 10, 2015 at 12:51 PM,
This seems like a version mismatch. For example,
DataStream.distribute() was changed to DataStream.rebalance()
recently. Maybe your build getting some outdated jars from the travis
cache.
On Wed, Jun 10, 2015 at 2:22 PM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:
Hi,
the current PR
On 10 Jun 2015, at 14:29, Gyula Fóra gyula.f...@gmail.com wrote:
Max suggested that I add this feature slightly hidden to the execution
config instance.
The problem then is that I either make a public field in the config or once
again add a method.
Any ideas?
I thought about this as
Hey,
As the storm-compatibility-core build goes fine this is a dependency issue
with storm-compatibility-examples. As a first try replace:
dependency
groupIdorg.apache.flink/groupId
artifactIdflink-streaming-core/artifactId
version${project.version}/version
scopetest/scope
Travis caches Maven dependendies and sometimes fails to update them.
Try to clear you Travis cache via Settings (up right) - Caches
Cheers, Fabian
2015-06-10 14:22 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de:
Hi,
the current PR of storm compatibility layer builds successfully on
Then I suggest we leave it in the environment along with the other
checkpointing methods.
I updated my PR so it includes hints how to force enable checkpoints (and
the reduced guarantees) when an error is thrown for iterative jobs.
On Wed, Jun 10, 2015 at 2:46 PM, Aljoscha Krettek
We could add a method on the ExecutionConfig but mark it as deprecated
and explain that it will go away once the interplay of iterations,
state and so on is properly figured out.
On Wed, Jun 10, 2015 at 2:36 PM, Ufuk Celebi u...@apache.org wrote:
On 10 Jun 2015, at 14:29, Gyula Fóra
This is the answer I gave on the PR (we should have one place for
discussing this, though):
I would be against merging this in the current form. What I propose is
to analyse the topology to verify that there are no checkpointed
operators inside iterations. Operators before and after iterations
Ufuk Celebi created FLINK-2195:
--
Summary: Set Configuration for Configurable Hadoop InputFormats
Key: FLINK-2195
URL: https://issues.apache.org/jira/browse/FLINK-2195
Project: Flink
Issue Type:
With all the issues discovered, it looks like we'll have another release
candidate. Right now, we have discovered the following problems:
1 YARN ITCase fails [fixed via 2eb5cfe]
2 No Jar for SessionWindowing example [fixed in #809]
3 Wrong description of the input format for the graph examples
I have run mvn clean verify five times now and every time I'm getting
these failed tests:
BlobUtilsTest.before:45 null
BlobUtilsTest.before:45 null
BlobServerDeleteTest.testDeleteFails:291 null
BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not
remove write permissions from
Sebastian Kruse created FLINK-2193:
--
Summary: Partial shuffling
Key: FLINK-2193
URL: https://issues.apache.org/jira/browse/FLINK-2193
Project: Flink
Issue Type: Improvement
I agree that for the sake of the above mentioned use cases it is reasonable
to add this to the release with the right documentation, for machine
learning potentially loosing one round of feedback data should not matter.
Let us not block prominent users until the next release on this.
On Wed, Jun
Hey everyone,
We needed to assign unique labels as vertex values in Gelly at some point.
We got a nice suggestion on how to do that in parallel (Implemented in
https://github.com/apache/flink/pull/801#issuecomment-110654447).
Now the question is where should these two functions go? Should they
I agree with Gyula regarding the iteration partitioning.
I have also been using this feature for developing machine learning
algorithms. And I think SAMOA also needs this feature.
Faye
2015-06-10 10:54 GMT+02:00 Gyula Fóra gyula.f...@gmail.com:
This feature needs to be included in the release,
I'm not against including the feature but I'd like to discuss it first. I
believe that only very carefully selected commits should be added to
release-0.9. If that feature happens to be tested extensively and is very
important for user satisfactory then we might include it.
On Wed, Jun 10, 2015
As Andra said, I'd would not add it to the API at this point.
However, I don't think it should go into a separate Maven module
(flink-contrib) that needs to be added as dependency but rather into some
DataSetUtils class in flink-java.
We can easily add it to the API later, if necessary. We should
Without going into the details, how well tested is this feature? The PR
only extends one test by a few lines.
Is that really enough to ensure that
1) the change does not cause trouble
2) is working as expected
If this feature should go into the release, it must be thoroughly checked
and we must
Let's mark the method of the environment as deprecated like Aljoscha
suggested. Then I think we could merge it.
On Wed, Jun 10, 2015 at 2:50 PM, Gyula Fóra gyula.f...@gmail.com wrote:
Then I suggest we leave it in the environment along with the other
checkpointing methods.
I updated my PR so
I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its
cause but still need to find out how to fix it.
On Wed, Jun 10, 2015 at 2:25 PM, Aljoscha Krettek aljos...@apache.org
wrote:
I added a section at the top of the release testing document to keep
track of commits that we
If they can be easily converted to Java code, that would be a good solution.
On Wed, 10 Jun 2015 at 15:56 Gábor Gévay gga...@gmail.com wrote:
it does not feel right to add an API package to a core package
Yes, that makes sense. I just tried removing the flink-java dependency
from
Btw: I noticed that all streaming modules depend on flink-core,
flink-runtime, flink-clients and flink-java. Is there a particular reason
why the streaming connectors depend on flink-clients and flink-java?
On Wed, Jun 10, 2015 at 3:41 PM Till Rohrmann trohrm...@apache.org wrote:
I see the
Philipp Götze created FLINK-2200:
Summary: Flink API with Scala 2.11 - Maven Repository
Key: FLINK-2200
URL: https://issues.apache.org/jira/browse/FLINK-2200
Project: Flink
Issue Type: Wish
Done, I will merge it after travis passes.
Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze,
15:25):
Let's mark the method of the environment as deprecated like Aljoscha
suggested. Then I think we could merge it.
On Wed, Jun 10, 2015 at 2:50 PM, Gyula Fóra
it does not feel right to add an API package to a core package
Yes, that makes sense. I just tried removing the flink-java dependency
from flink-streaming to see what needs it, and it builds fine without
it :)
What do you think about the second option? (to move the Scala
typeutils (or just
I see four options to solve this without adding the dependency:
1. Move CaseClassTypeInfo and CaseClassComparator to flink-core. Till
said that we want to avoid mixed Scala and Java modules, which rules
this out.
2. Create a new toplevel maven project scala-core, and move things there.
3. Hacky
As for people currently suffering from it:
An application King is developing requires iterations, and they need
checkpoints. Practically all SAMOA programs would need this.
It is very likely that the state interfaces will be changed after the
release, so this is not something that we can just
This doesn't look good, yes.
On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote:
While looking into FLINK-2188 (HBase input) I've discovered that Hadoop
input formats implementing Configurable (like mapreduce.TableInputFormat)
don't have the Hadoop configuration set via
Yes, that needs to be fixed IMO
2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org:
Yes since it is clearly a deadlock in the scheduler, the current version
shouldn't be released.
On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote:
On 10 Jun 2015, at 16:18,
Thanks for spotting the documentation issues. I'm fixing them quickly for
the release then.
The RebalancePartitioner indeed is obfuscated, let me see what can be done
there.
On Wed, Jun 10, 2015 at 6:34 PM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:
Thanks!
About shuffle() vs
Thanks!
About shuffle() vs rebalance(): I would suggest to explain the
difference (random vs round-robin) in the JavaDoc of DataStream.
Furthermore, I was wondering if the JavaDoc for @return is correct for
forward(), rebalance(), and global(). They all state
@return The DataStream with
Hi,
I try to run this Scala program:
https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/main/scala/io/sanfran/wikiTrends/extraction/flink/DownloadTopKPages.scala
on a cluster.
I tried this command:
/share/flink/flink-0.9-SNAPSHOT/bin/flink run
Hi,
use ./bin/flink run -c your.MainClass yourJar to specify the Main class.
Check the documentation of the CLI client for details.
Cheers, Fabian
On Jun 10, 2015 22:24, Felix Neutatz neut...@googlemail.com wrote:
Hi,
I try to run this Scala program:
Thanks :) works like a charm.
2015-06-10 22:28 GMT+02:00 Fabian Hueske fhue...@gmail.com:
Hi,
use ./bin/flink run -c your.MainClass yourJar to specify the Main class.
Check the documentation of the CLI client for details.
Cheers, Fabian
On Jun 10, 2015 22:24, Felix Neutatz
It doesn't evaluate the member forward, but it calls the super constructor
with a partitioning strategy that depends on on the forward parameter.
That's how it works.
On Wed, 10 Jun 2015 at 18:51 Márton Balassi balassi.mar...@gmail.com
wrote:
Thanks for spotting the documentation issues. I'm
I am not sure about this... You are right about the super constructor,
however, selectChannels(...) does not call super.getStrategy() what is
the only way to get back the value set in the super class (ie,
StreamPartitioner.strategy).
selectChannels() computes the return value independently from
Yes since it is clearly a deadlock in the scheduler, the current version
shouldn't be released.
On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote:
On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote:
I'm debugging the TaskManagerFailsWithSlotSharingITCase.
In StreamingJobGraphGenerator.connect(headOfChain, edge) is checks the
strategy. If it is FORWARD it only does a POINTWISE connection to the
low-level downstream vertex. I know, this is all very unclear... :D
On Thu, 11 Jun 2015 at 00:13 Matthias J. Sax mj...@informatik.hu-berlin.de
wrote:
I am
Márton Balassi created FLINK-2201:
-
Summary: Inconsistent use of ClosureCleaner in streaming window
helpers
Key: FLINK-2201
URL: https://issues.apache.org/jira/browse/FLINK-2201
Project: Flink
53 matches
Mail list logo