[jira] [Resolved] (FLINK-1942) Add configuration options to Gelly-GSA

2015-05-17 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-1942.
--
   Resolution: Fixed
Fix Version/s: 0.9

 Add configuration options to Gelly-GSA
 --

 Key: FLINK-1942
 URL: https://issues.apache.org/jira/browse/FLINK-1942
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Andra Lungu
 Fix For: 0.9


 Currently, it is not possible to configure a GSA iteration. Similarly to 
 vertex-centric, we should allow setting the iteration name and degree of 
 parallelism, aggregators, broadcast variables and whether the solution set is 
 kept in unmanaged memory.
 The docs should be updated accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1942) Add configuration options to Gelly-GSA

2015-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547236#comment-14547236
 ] 

ASF GitHub Bot commented on FLINK-1942:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/635


 Add configuration options to Gelly-GSA
 --

 Key: FLINK-1942
 URL: https://issues.apache.org/jira/browse/FLINK-1942
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Andra Lungu

 Currently, it is not possible to configure a GSA iteration. Similarly to 
 vertex-centric, we should allow setting the iteration name and degree of 
 parallelism, aggregators, broadcast variables and whether the solution set is 
 kept in unmanaged memory.
 The docs should be updated accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1942][gelly] GSA Iteration Configuratio...

2015-05-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/635


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [streaming] Fast calculation of medians of win...

2015-05-17 Thread ggevay
GitHub user ggevay opened a pull request:

https://github.com/apache/flink/pull/684

[streaming] Fast calculation of medians of windows [WIP]

I started working on my Google Summer of Code Project. I decided to start 
with the fast calculation of medians of windows. The MedianPreReducer keeps 
track of the median at every window change by keeping the lower and upper 
halves of the window in two multisets. The updates have logarithmic runtime 
with the current window size. (Adding an element to or removing an element from 
the window involves moving at most one of the other elements between the lower 
and upper halves.)

The parameters of WindowedDataStream.median is the same as for the sum, 
min, max, etc. methods. (But here, the specified field can only be of Double 
type.) I implemented a few helper classes for accessing the field that is 
specified by these parameters. I think that the other methods with similar 
field selection (sum, min, max, etc.) could also be refactored to use these 
helper classes, so that the code duplication in SumAggregator and 
ComparableAggregator could be eliminated (the duplication of both the field 
access logic, and the logic of the byAggregate comparisons). Should I do this 
refactoring?

Currently, the implementation only handles non-grouped data streams, but I 
plan to create a GroupedMedianPreReducer (hence the [WIP] flag).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ggevay/flink median

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #684


commit 7131289e9f7ab56ec82f12c3a25f146a17354f61
Author: Gabor Gevay gga...@gmail.com
Date:   2015-05-17T23:59:31Z

[streaming] Fast calculation of medians of windows




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1985] Streaming does not correctly forw...

2015-05-17 Thread mbalassi
Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/682#issuecomment-102773476
  
Thanks, nice fix. I added one little amend: used `TestStreamEnvironment` 
instead of local. The former lets you force a given parallelism regardless of 
the machine. It is not really relevant in this case, but is just a general good 
practice to always run the environment related tests in parallel.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes

2015-05-17 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547105#comment-14547105
 ] 

Alexander Alexandrov commented on FLINK-2023:
-

I suggest to have a thin layer which implicitly provides the TypeInformations 
over the Gelly Java API in Scala (similar to what [~aljoscha] has done with the 
DataSet Scala API).

 TypeExtractor does not work for (some) Scala Classes
 

 Key: FLINK-2023
 URL: https://issues.apache.org/jira/browse/FLINK-2023
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Reporter: Aljoscha Krettek

 [~vanaepi] discovered some problems while working on the Scala Gelly API 
 where, for example, a Scala MapFunction can not be correctly analyzed by the 
 type extractor. For example, generic types will not be correctly detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1260) Add custom partitioning to documentation

2015-05-17 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1260.
---
   Resolution: Fixed
Fix Version/s: 0.9
 Assignee: Robert Metzger

Resolved as part of http://git-wip-us.apache.org/repos/asf/flink/commit/b335f587

 Add custom partitioning to documentation
 

 Key: FLINK-1260
 URL: https://issues.apache.org/jira/browse/FLINK-1260
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Java API, Scala API
Affects Versions: 0.7.0-incubating
Reporter: Fabian Hueske
Assignee: Robert Metzger
Priority: Minor
 Fix For: 0.9


 The APIs allow to define a custom partitioner to manually fix data skew.
 This feature is not documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-1985] Streaming does not correctly forw...

2015-05-17 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/682#issuecomment-102776567
  
Thank you for the contribution.
I would like to wait for some more feedback from other committers before we 
merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1985] Streaming does not correctly forw...

2015-05-17 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/682#discussion_r30469078
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java ---
@@ -543,6 +542,144 @@ public void disableAutoTypeRegistration() {
this.disableAutoTypeRegistration = false;
}
 
+   @Override
+   public int hashCode() {
+   final int prime = 31;
+   int result = 1;
+   result = prime
+   * result
+   + ((defaultKryoSerializerClasses == null) ? 0
+   : 
defaultKryoSerializerClasses.hashCode());
+   result = prime
+   * result
+   + ((defaultKryoSerializers == null) ? 0
+   : 
defaultKryoSerializers.hashCode());
+   result = prime * result + (disableAutoTypeRegistration ? 1231 : 
1237);
+   result = prime * result
+   + ((executionMode == null) ? 0 : 
executionMode.hashCode());
+   result = prime * result + (forceAvro ? 1231 : 1237);
+   result = prime * result + (forceKryo ? 1231 : 1237);
+   result = prime
+   * result
+   + ((globalJobParameters == null) ? 0 : 
globalJobParameters
+   .hashCode());
+   result = prime * result + numberOfExecutionRetries;
+   result = prime * result + (objectReuse ? 1231 : 1237);
+   result = prime * result + parallelism;
+   result = prime * result + (printProgressDuringExecution ? 1231 
: 1237);
+   result = prime
+   * result
+   + ((registeredKryoTypes == null) ? 0 : 
registeredKryoTypes
+   .hashCode());
+   result = prime
+   * result
+   + ((registeredPojoTypes == null) ? 0 : 
registeredPojoTypes
+   .hashCode());
+   result = prime
+   * result
+   + ((registeredTypesWithKryoSerializerClasses == 
null) ? 0
+   : 
registeredTypesWithKryoSerializerClasses.hashCode());
+   result = prime
+   * result
+   + ((registeredTypesWithKryoSerializers == null) 
? 0
+   : 
registeredTypesWithKryoSerializers.hashCode());
+   result = prime * result + (useClosureCleaner ? 1231 : 1237);
+   return result;
+   }
+
+   @Override
+   public boolean equals(Object obj) {
+   if (this == obj) {
+   return true;
+   }
+   if (obj == null) {
+   return false;
+   }
+   if (getClass() != obj.getClass()) {
+   return false;
+   }
+   ExecutionConfig other = (ExecutionConfig) obj;
+   if (defaultKryoSerializerClasses == null) {
+   if (other.defaultKryoSerializerClasses != null) {
+   return false;
+   }
+   } else if (!defaultKryoSerializerClasses
+   .equals(other.defaultKryoSerializerClasses)) {
+   return false;
+   }
+   if (defaultKryoSerializers == null) {
+   if (other.defaultKryoSerializers != null) {
+   return false;
+   }
+   } else if 
(!defaultKryoSerializers.equals(other.defaultKryoSerializers)) {
+   return false;
+   }
+   if (disableAutoTypeRegistration != 
other.disableAutoTypeRegistration) {
+   return false;
+   }
+   if (executionMode != other.executionMode) {
+   return false;
+   }
+   if (forceAvro != other.forceAvro) {
+   return false;
+   }
+   if (forceKryo != other.forceKryo) {
+   return false;
+   }
+   if (globalJobParameters == null) {
+   if (other.globalJobParameters != null) {
+   return false;
+   }
+   } else if 
(!globalJobParameters.equals(other.globalJobParameters)) {
+   return false;
+   }
+   if (numberOfExecutionRetries != other.numberOfExecutionRetries) 
{
+   

[GitHub] flink pull request: [Flink-1985] Streaming does not correctly forw...

2015-05-17 Thread mbalassi
Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/682#issuecomment-102779406
  
Ok, I am not pushing until we get some more feedback then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1985] Streaming does not correctly forw...

2015-05-17 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/682#issuecomment-102780139
  
Changed LocalStreamEnvironment to TestStreamEnvironment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1985) Streaming does not correctly forward ExecutionConfig to runtime

2015-05-17 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547107#comment-14547107
 ] 

Robert Metzger commented on FLINK-1985:
---

[~mjsax]: The issue is assigned to [~aljoscha]. I'm not 100% sure but I think I 
know from an offline discussion with him that he has already a fix for this in 
one of his open pull requests.
Lets wait for his comment on this.

I don't see why the variable is unused? The 
{{config.isAutoTypeRegistrationDisabled()}} is called in 
{{ExecutionEnvironment}}. So there is at least one piece of code which is using 
the variable.


 Streaming does not correctly forward ExecutionConfig to runtime
 ---

 Key: FLINK-1985
 URL: https://issues.apache.org/jira/browse/FLINK-1985
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker

 When running streaming jobs you see this log entry:
 Environment did not contain an ExecutionConfig - using a default config.
 Some parts of the code use an ExecutionConfig at runtime. This will be a 
 default config without registered serializers and other user settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2031) Missing images in blog posts

2015-05-17 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547110#comment-14547110
 ] 

Robert Metzger commented on FLINK-2031:
---

A user also reported numerous broken links in the blog posts due to the 
reorganization of the documentation.
I think somebody has to go over all the posts and fix them.

 Missing images in blog posts
 

 Key: FLINK-2031
 URL: https://issues.apache.org/jira/browse/FLINK-2031
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Fabian Hueske
  Labels: starter

 Images are missing in a couple of blog posts after the migration:
 - Introducing Flink Streaming
 - Peeking into Flink's Engine Room
 - Hadoop Compatibility in Flink
 - maybe others as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2031) Missing images in blog posts

2015-05-17 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-2031:
--
Labels: starter  (was: )

 Missing images in blog posts
 

 Key: FLINK-2031
 URL: https://issues.apache.org/jira/browse/FLINK-2031
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Fabian Hueske
  Labels: starter

 Images are missing in a couple of blog posts after the migration:
 - Introducing Flink Streaming
 - Peeking into Flink's Engine Room
 - Hadoop Compatibility in Flink
 - maybe others as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1985) Streaming does not correctly forward ExecutionConfig to runtime

2015-05-17 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547111#comment-14547111
 ] 

Matthias J. Sax commented on FLINK-1985:


Well, the default value of the variable is false and there is no way to set 
it to true. There is not method enableAutoTypeRegistration().

 Streaming does not correctly forward ExecutionConfig to runtime
 ---

 Key: FLINK-1985
 URL: https://issues.apache.org/jira/browse/FLINK-1985
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker

 When running streaming jobs you see this log entry:
 Environment did not contain an ExecutionConfig - using a default config.
 Some parts of the code use an ExecutionConfig at runtime. This will be a 
 default config without registered serializers and other user settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-1985] Streaming does not correctly forw...

2015-05-17 Thread mjsax
Github user mjsax commented on a diff in the pull request:

https://github.com/apache/flink/pull/682#discussion_r30469207
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java ---
@@ -543,6 +542,144 @@ public void disableAutoTypeRegistration() {
this.disableAutoTypeRegistration = false;
}
 
+   @Override
+   public int hashCode() {
+   final int prime = 31;
+   int result = 1;
+   result = prime
+   * result
+   + ((defaultKryoSerializerClasses == null) ? 0
+   : 
defaultKryoSerializerClasses.hashCode());
+   result = prime
+   * result
+   + ((defaultKryoSerializers == null) ? 0
+   : 
defaultKryoSerializers.hashCode());
+   result = prime * result + (disableAutoTypeRegistration ? 1231 : 
1237);
+   result = prime * result
+   + ((executionMode == null) ? 0 : 
executionMode.hashCode());
+   result = prime * result + (forceAvro ? 1231 : 1237);
+   result = prime * result + (forceKryo ? 1231 : 1237);
+   result = prime
+   * result
+   + ((globalJobParameters == null) ? 0 : 
globalJobParameters
+   .hashCode());
+   result = prime * result + numberOfExecutionRetries;
+   result = prime * result + (objectReuse ? 1231 : 1237);
+   result = prime * result + parallelism;
+   result = prime * result + (printProgressDuringExecution ? 1231 
: 1237);
+   result = prime
+   * result
+   + ((registeredKryoTypes == null) ? 0 : 
registeredKryoTypes
+   .hashCode());
+   result = prime
+   * result
+   + ((registeredPojoTypes == null) ? 0 : 
registeredPojoTypes
+   .hashCode());
+   result = prime
+   * result
+   + ((registeredTypesWithKryoSerializerClasses == 
null) ? 0
+   : 
registeredTypesWithKryoSerializerClasses.hashCode());
+   result = prime
+   * result
+   + ((registeredTypesWithKryoSerializers == null) 
? 0
+   : 
registeredTypesWithKryoSerializers.hashCode());
+   result = prime * result + (useClosureCleaner ? 1231 : 1237);
+   return result;
+   }
+
+   @Override
+   public boolean equals(Object obj) {
+   if (this == obj) {
+   return true;
+   }
+   if (obj == null) {
+   return false;
+   }
+   if (getClass() != obj.getClass()) {
+   return false;
+   }
+   ExecutionConfig other = (ExecutionConfig) obj;
+   if (defaultKryoSerializerClasses == null) {
+   if (other.defaultKryoSerializerClasses != null) {
+   return false;
+   }
+   } else if (!defaultKryoSerializerClasses
+   .equals(other.defaultKryoSerializerClasses)) {
+   return false;
+   }
+   if (defaultKryoSerializers == null) {
+   if (other.defaultKryoSerializers != null) {
+   return false;
+   }
+   } else if 
(!defaultKryoSerializers.equals(other.defaultKryoSerializers)) {
+   return false;
+   }
+   if (disableAutoTypeRegistration != 
other.disableAutoTypeRegistration) {
+   return false;
+   }
+   if (executionMode != other.executionMode) {
+   return false;
+   }
+   if (forceAvro != other.forceAvro) {
+   return false;
+   }
+   if (forceKryo != other.forceKryo) {
+   return false;
+   }
+   if (globalJobParameters == null) {
+   if (other.globalJobParameters != null) {
+   return false;
+   }
+   } else if 
(!globalJobParameters.equals(other.globalJobParameters)) {
+   return false;
+   }
+   if (numberOfExecutionRetries != other.numberOfExecutionRetries) 
{
+   

[jira] [Comment Edited] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-17 Thread Hae Joon Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547149#comment-14547149
 ] 

Hae Joon Lee edited comment on FLINK-1731 at 5/17/15 12:10 PM:
---

Hi, I am testing K-mean right now.
I faced an error could not find implicit value for parameter
I solved a lot of things except for this one.

* Error:(142, 53) could not find implicit value for parameter op: 
breeze.linalg.operators.OpDiv.Impl2[breeze.linalg.Vector[Double],Long,That].map(x
 = LabeledVector(x._1, x._2.asBreeze / 1L)).withForwardedFields(_1-id)

{code:title=KMeans.scala|borderStyle=solid}
val finalCentroids = centroids.iterate(numIterations) { currentCentroids =
val newCentroids: DataSet[LabeledVector] = input
  .map(new SelectNearestCenterMapper).withBroadcastSet(currentCentroids, 
CENTROIDS)
  .map(x = (x.label, x.vector, 1L)).withForwardedFields(_1; _2)
  .groupBy(x = x._1)
  .reduce((p1, p2) = (p1._1, (p1._2.asBreeze + p2._2.asBreeze).fromBreeze, 
p1._3 + p2._3)).withForwardedFields(_1)
  .map(x = LabeledVector(x._1, x._2.asBreeze :/ 
x._3)).withForwardedFields(_1-id)
  newCentroids
}
{code}

As far as I know, the error could not find implicit value for parameter can 
be solved by putting exact import class.
 
I think the error is from :/ operator, so I put 'import 
breeze.linalg.operators._' on import line as well. but it does not work. 
Have you ever seen this kind of error before?




was (Author: philjjoon):
Hi, I am testing K-mean right now.
I faced an error could not find implicit value for parameter
I solved a lot of things except for this one.

* Error:(142, 53) could not find implicit value for parameter op: 
breeze.linalg.operators.OpDiv.Impl2[breeze.linalg.Vector[Double],Long,That].map(x
 = LabeledVector(x._1, x._2.asBreeze / 1L)).withForwardedFields(_1-id)

{code:title=KMeans.scala|borderStyle=solid}
val finalCentroids = centroids.iterate(numIterations) { currentCentroids =
val newCentroids: DataSet[LabeledVector] = input
  .map(new SelectNearestCenterMapper).withBroadcastSet(currentCentroids, 
CENTROIDS)
  .map(x = (x.label, x.vector, 1L)).withForwardedFields(_1; _2)
  .groupBy(x = x._1)
  .reduce((p1, p2) = (p1._1, (p1._2.asBreeze + p2._2.asBreeze).fromBreeze, 
p1._3 + p2._3)).withForwardedFields(_1)
  .map(x = LabeledVector(x._1, x._2.asBreeze :/ 
x._3)).withForwardedFields(_1-id)
  newCentroids
}
{code}

As far as I know, the error could not find implicit value for parameter can 
be solved by putting exact import class.

I put 'import breeze.linalg.operators._' on import line as well. 
but it does not work.
Have you ever seen this kind of error before?



 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-17 Thread Hae Joon Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547149#comment-14547149
 ] 

Hae Joon Lee commented on FLINK-1731:
-

Hi, I am testing K-mean right now.
I faced an error could not find implicit value for parameter
I solved a lot of things except for this one.

* Error:(142, 53) could not find implicit value for parameter op: 
breeze.linalg.operators.OpDiv.Impl2[breeze.linalg.Vector[Double],Long,That].map(x
 = LabeledVector(x._1, x._2.asBreeze / 1L)).withForwardedFields(_1-id)

{code:title=KMeans.scala|borderStyle=solid}
val finalCentroids = centroids.iterate(numIterations) { currentCentroids =
val newCentroids: DataSet[LabeledVector] = input
  .map(new SelectNearestCenterMapper).withBroadcastSet(currentCentroids, 
CENTROIDS)
  .map(x = (x.label, x.vector, 1L)).withForwardedFields(_1; _2)
  .groupBy(x = x._1)
  .reduce((p1, p2) = (p1._1, (p1._2.asBreeze + p2._2.asBreeze).fromBreeze, 
p1._3 + p2._3)).withForwardedFields(_1)
  .map(x = LabeledVector(x._1, x._2.asBreeze :/ 
x._3)).withForwardedFields(_1-id)
  newCentroids
}
{code}

As far as I know, the error could not find implicit value for parameter can 
be solved by putting exact import class.

I put 'import breeze.linalg.operators._' on import line as well. 
but it does not work.
Have you ever seen this kind of error before?



 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2031) Missing images in blog posts

2015-05-17 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2031:


 Summary: Missing images in blog posts
 Key: FLINK-2031
 URL: https://issues.apache.org/jira/browse/FLINK-2031
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Fabian Hueske


Images are missing in a couple of blog posts after the migration:

- Introducing Flink Streaming
- Peeking into Flink's Engine Room
- Hadoop Compatibility in Flink
- maybe others as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes

2015-05-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547064#comment-14547064
 ] 

Aljoscha Krettek commented on FLINK-2023:
-

Yes, this is what I observed as well. Unfortunately this is quite bad for 
things like the Scala Gelli API on which [~vanaepi] is working.

 TypeExtractor does not work for (some) Scala Classes
 

 Key: FLINK-2023
 URL: https://issues.apache.org/jira/browse/FLINK-2023
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Reporter: Aljoscha Krettek

 [~vanaepi] discovered some problems while working on the Scala Gelly API 
 where, for example, a Scala MapFunction can not be correctly analyzed by the 
 type extractor. For example, generic types will not be correctly detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1985) Streaming does not correctly forward ExecutionConfig to runtime

2015-05-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547175#comment-14547175
 ] 

Aljoscha Krettek commented on FLINK-1985:
-

[~rmetzger] yes, I have a fix for this in my open pull request. But we can also 
merge matthias' work since it contains also a Unit test.

 Streaming does not correctly forward ExecutionConfig to runtime
 ---

 Key: FLINK-1985
 URL: https://issues.apache.org/jira/browse/FLINK-1985
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker

 When running streaming jobs you see this log entry:
 Environment did not contain an ExecutionConfig - using a default config.
 Some parts of the code use an ExecutionConfig at runtime. This will be a 
 default config without registered serializers and other user settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)