[jira] [Commented] (FLINK-2399) Fail when actor versions don't match

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031083#comment-15031083
 ] 

ASF GitHub Bot commented on FLINK-2399:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/945#issuecomment-160431771
  
Un-assigning myself from the issue for now.


> Fail when actor versions don't match
> 
>
> Key: FLINK-2399
> URL: https://issues.apache.org/jira/browse/FLINK-2399
> Project: Flink
>  Issue Type: Improvement
>  Components: JobManager, TaskManager
>Affects Versions: 0.9, 0.10.0
>Reporter: Ufuk Celebi
>Assignee: Sachin Goel
>Priority: Minor
> Fix For: 1.0.0
>
>
> Problem: there can be subtle errors when actors from different Flink versions 
> communicate with each other, for example when an old client (e.g. Flink 0.9) 
> communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT).
> We can check that the versions match on first communication between the 
> actors and fail if they don't match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2399) Fail when actor versions don't match

2015-11-29 Thread Sachin Goel (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goel updated FLINK-2399:
---
Assignee: (was: Sachin Goel)

> Fail when actor versions don't match
> 
>
> Key: FLINK-2399
> URL: https://issues.apache.org/jira/browse/FLINK-2399
> Project: Flink
>  Issue Type: Improvement
>  Components: JobManager, TaskManager
>Affects Versions: 0.9, 0.10.0
>Reporter: Ufuk Celebi
>Priority: Minor
> Fix For: 1.0.0
>
>
> Problem: there can be subtle errors when actors from different Flink versions 
> communicate with each other, for example when an old client (e.g. Flink 0.9) 
> communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT).
> We can check that the versions match on first communication between the 
> actors and fail if they don't match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...

2015-11-29 Thread sachingoel0101
Github user sachingoel0101 closed the pull request at:

https://github.com/apache/flink/pull/945


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2399) Fail when actor versions don't match

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031084#comment-15031084
 ] 

ASF GitHub Bot commented on FLINK-2399:
---

Github user sachingoel0101 closed the pull request at:

https://github.com/apache/flink/pull/945


> Fail when actor versions don't match
> 
>
> Key: FLINK-2399
> URL: https://issues.apache.org/jira/browse/FLINK-2399
> Project: Flink
>  Issue Type: Improvement
>  Components: JobManager, TaskManager
>Affects Versions: 0.9, 0.10.0
>Reporter: Ufuk Celebi
>Assignee: Sachin Goel
>Priority: Minor
> Fix For: 1.0.0
>
>
> Problem: there can be subtle errors when actors from different Flink versions 
> communicate with each other, for example when an old client (e.g. Flink 0.9) 
> communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT).
> We can check that the versions match on first communication between the 
> actors and fail if they don't match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...

2015-11-29 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/945#issuecomment-160431771
  
Un-assigning myself from the issue for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3075) Rename Either creation methods to avoid name clash with projection methods

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031179#comment-15031179
 ] 

ASF GitHub Bot commented on FLINK-3075:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1402


> Rename Either creation methods to avoid name clash with projection methods
> --
>
> Key: FLINK-3075
> URL: https://issues.apache.org/jira/browse/FLINK-3075
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> Currently the method signatures for creating Either values 
> `Either.left(left)` and the projection methods `either.left()` only differ in 
> the parameters. 
> This makes it awkward to use with lambdas such as: 
> 'eitherStream.filter(Either:isLeft).map(Either::left)'
> The above code is currently impossible.
> I suggest to change the creation methods to `Either.createLeft(left)` and 
> `Either.createRight(right)` and also to directly expose the Left, Right 
> classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3075) Rename Either creation methods to avoid name clash with projection methods

2015-11-29 Thread Gyula Fora (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora resolved FLINK-3075.
---
Resolution: Fixed

> Rename Either creation methods to avoid name clash with projection methods
> --
>
> Key: FLINK-3075
> URL: https://issues.apache.org/jira/browse/FLINK-3075
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> Currently the method signatures for creating Either values 
> `Either.left(left)` and the projection methods `either.left()` only differ in 
> the parameters. 
> This makes it awkward to use with lambdas such as: 
> 'eitherStream.filter(Either:isLeft).map(Either::left)'
> The above code is currently impossible.
> I suggest to change the creation methods to `Either.createLeft(left)` and 
> `Either.createRight(right)` and also to directly expose the Left, Right 
> classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3075] Change Either creation method nam...

2015-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1402


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092390
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
--- End diff --

Please remove empty line between scala doc and function definition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092387
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
--- End diff --

We can make this function more simple.

```scala
def isNear(queryPoint: Vector, radius: Double): Boolean = {
  minDist(queryPoint) < radius
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030953#comment-15030953
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092387
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
--- End diff --

We can make this function more simple.

```scala
def isNear(queryPoint: Vector, radius: Double): Boolean = {
  minDist(queryPoint) < radius
}
```


> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Blazevski
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092399
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
--- End diff --

To avoid use `var`, we should rewrite this method like following:

```scala
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030954#comment-15030954
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092390
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
--- End diff --

Please remove empty line between scala doc and function definition.


> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030955#comment-15030955
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092399
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
--- End diff --

To avoid use `var`, we should rewrite this method like following:

```scala
```


> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Blazevski
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030959#comment-15030959
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092526
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092526
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092532
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030960#comment-15030960
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092532
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092558
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030961#comment-15030961
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092558
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-3056][web-dashboard] Represent bytes in...

2015-11-29 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1419#issuecomment-160419611
  
+1 to merge

Validated the change on a cluster and it worked nicely: 

![image](https://cloud.githubusercontent.com/assets/89049/11457652/e02cff9e-96ae-11e5-806a-48b55ff17b1a.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3056) Show bytes sent/received as MBs/GB and so on in web interface

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030966#comment-15030966
 ] 

ASF GitHub Bot commented on FLINK-3056:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1419#issuecomment-160419611
  
+1 to merge

Validated the change on a cluster and it worked nicely: 

![image](https://cloud.githubusercontent.com/assets/89049/11457652/e02cff9e-96ae-11e5-806a-48b55ff17b1a.png)



> Show bytes sent/received as MBs/GB and so on in web interface
> -
>
> Key: FLINK-3056
> URL: https://issues.apache.org/jira/browse/FLINK-3056
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Reporter: Robert Metzger
>Assignee: Sachin Goel
>
> It would be great if the web interface would round show the bytes in an 
> appropriate (=human readable) unit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3083] [docs] Add docs on how to configu...

2015-11-29 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1413#issuecomment-160419805
  
+1 to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3083) Add docs how to configure streaming fault tolerance

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030968#comment-15030968
 ] 

ASF GitHub Bot commented on FLINK-3083:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1413#issuecomment-160419805
  
+1 to merge.


> Add docs how to configure streaming fault tolerance
> ---
>
> Key: FLINK-3083
> URL: https://issues.apache.org/jira/browse/FLINK-3083
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 0.10.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3077] Add functions to access Flink Ver...

2015-11-29 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1418#discussion_r46092993
  
--- Diff: flink-core/src/main/java/org/apache/flink/util/VersionUtils.java 
---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.util;
+
+import java.io.IOException;
+import java.net.URL;
+import java.util.Properties;
+import java.util.jar.Attributes;
+import java.util.jar.Manifest;
+
+/**
+ * Utility class which provides various methods for accessing version 
information.
+ */
+public final class VersionUtils {
+
+   private static final VersionUtils INSTANCE = new VersionUtils();
+
+   /**
+* Private constructor used to overwrite public one.
+*/
+   private VersionUtils() {}
+
+   /**
+* Returns the version of Flink.
+*/
+   public static String getFlinkVersion() {
+   // a version can only be provided when running from a Jar.
+   URL manifestUrl = 
INSTANCE.getClass().getClassLoader().getResource("META-INF/MANIFEST.MF");
+   if (manifestUrl != null) {
+   try {
+   Attributes attr = new 
Manifest(manifestUrl.openStream()).getMainAttributes();
+   return attr.getValue("Implementation-Version");
+   } catch (IOException e) {
+   //
+   }
+   }
+   return null;
+   }
+
+   /**
+* Returns the commit id of the source from which the flink jar is 
built.
+*/
+   public static String getCommitId() {
--- End diff --

Afaik this is a re-implementation of 
`EnvironmentInformation.getRevisionInformation()`. Also the "getFlinkVersion" 
method is already implemented in Flink.
Can you re-use the method from the EnvironmentInformation for the CLI 
frontend and the web interface?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3077] Add functions to access Flink Ver...

2015-11-29 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1418#discussion_r46093042
  
--- Diff: 
flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java ---
@@ -1000,11 +1001,17 @@ else if (action.equals("-h") || 
action.equals("--help")) {
CliFrontendParser.printHelp();
return 0;
}
+   else if (action.equals("-v") || action.equals("--version")) {
+   System.out.println("Version: " + 
VersionUtils.getFlinkVersion() + ", Commit ID: " + VersionUtils.getCommitId());
+   return 0;
+   }
else {
System.out.printf("\"%s\" is not a valid action.\n", 
action);
System.out.println();
System.out.println("Valid actions are \"run\", 
\"list\", \"info\", or \"cancel\".");
System.out.println();
+   System.out.println("Specify the version option (-v or 
--version) to print flink version.");
--- End diff --

Can you write "Flink" uppercase here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3077) Add "version" command to CliFrontend for showing the version of the installation

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030970#comment-15030970
 ] 

ASF GitHub Bot commented on FLINK-3077:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1418#discussion_r46092993
  
--- Diff: flink-core/src/main/java/org/apache/flink/util/VersionUtils.java 
---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.util;
+
+import java.io.IOException;
+import java.net.URL;
+import java.util.Properties;
+import java.util.jar.Attributes;
+import java.util.jar.Manifest;
+
+/**
+ * Utility class which provides various methods for accessing version 
information.
+ */
+public final class VersionUtils {
+
+   private static final VersionUtils INSTANCE = new VersionUtils();
+
+   /**
+* Private constructor used to overwrite public one.
+*/
+   private VersionUtils() {}
+
+   /**
+* Returns the version of Flink.
+*/
+   public static String getFlinkVersion() {
+   // a version can only be provided when running from a Jar.
+   URL manifestUrl = 
INSTANCE.getClass().getClassLoader().getResource("META-INF/MANIFEST.MF");
+   if (manifestUrl != null) {
+   try {
+   Attributes attr = new 
Manifest(manifestUrl.openStream()).getMainAttributes();
+   return attr.getValue("Implementation-Version");
+   } catch (IOException e) {
+   //
+   }
+   }
+   return null;
+   }
+
+   /**
+* Returns the commit id of the source from which the flink jar is 
built.
+*/
+   public static String getCommitId() {
--- End diff --

Afaik this is a re-implementation of 
`EnvironmentInformation.getRevisionInformation()`. Also the "getFlinkVersion" 
method is already implemented in Flink.
Can you re-use the method from the EnvironmentInformation for the CLI 
frontend and the web interface?


> Add "version" command to CliFrontend for showing the version of the 
> installation
> 
>
> Key: FLINK-3077
> URL: https://issues.apache.org/jira/browse/FLINK-3077
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Reporter: Robert Metzger
>Assignee: Sachin Goel
> Fix For: 1.0.0
>
>
> I have the bin directory of Flink in my $PATH variable, so I can just do 
> "flink run" on the command line for executing stuff.
> However, I have multiple Flink versions locally and its hard to find out 
> which installation the bash is picking in the end.
> adding a simple "version" command will resolve that issue and I consider it 
> helpful in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3077) Add "version" command to CliFrontend for showing the version of the installation

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030971#comment-15030971
 ] 

ASF GitHub Bot commented on FLINK-3077:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1418#discussion_r46093042
  
--- Diff: 
flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java ---
@@ -1000,11 +1001,17 @@ else if (action.equals("-h") || 
action.equals("--help")) {
CliFrontendParser.printHelp();
return 0;
}
+   else if (action.equals("-v") || action.equals("--version")) {
+   System.out.println("Version: " + 
VersionUtils.getFlinkVersion() + ", Commit ID: " + VersionUtils.getCommitId());
+   return 0;
+   }
else {
System.out.printf("\"%s\" is not a valid action.\n", 
action);
System.out.println();
System.out.println("Valid actions are \"run\", 
\"list\", \"info\", or \"cancel\".");
System.out.println();
+   System.out.println("Specify the version option (-v or 
--version) to print flink version.");
--- End diff --

Can you write "Flink" uppercase here?


> Add "version" command to CliFrontend for showing the version of the 
> installation
> 
>
> Key: FLINK-3077
> URL: https://issues.apache.org/jira/browse/FLINK-3077
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Reporter: Robert Metzger
>Assignee: Sachin Goel
> Fix For: 1.0.0
>
>
> I have the bin directory of Flink in my $PATH variable, so I can just do 
> "flink run" on the command line for executing stuff.
> However, I have multiple Flink versions locally and its hard to find out 
> which installation the bash is picking in the end.
> adding a simple "version" command will resolve that issue and I consider it 
> helpful in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093222
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-3077) Add "version" command to CliFrontend for showing the version of the installation

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030973#comment-15030973
 ] 

ASF GitHub Bot commented on FLINK-3077:
---

Github user sachingoel0101 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1418#discussion_r46093211
  
--- Diff: flink-core/src/main/java/org/apache/flink/util/VersionUtils.java 
---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.util;
+
+import java.io.IOException;
+import java.net.URL;
+import java.util.Properties;
+import java.util.jar.Attributes;
+import java.util.jar.Manifest;
+
+/**
+ * Utility class which provides various methods for accessing version 
information.
+ */
+public final class VersionUtils {
+
+   private static final VersionUtils INSTANCE = new VersionUtils();
+
+   /**
+* Private constructor used to overwrite public one.
+*/
+   private VersionUtils() {}
+
+   /**
+* Returns the version of Flink.
+*/
+   public static String getFlinkVersion() {
+   // a version can only be provided when running from a Jar.
+   URL manifestUrl = 
INSTANCE.getClass().getClassLoader().getResource("META-INF/MANIFEST.MF");
+   if (manifestUrl != null) {
+   try {
+   Attributes attr = new 
Manifest(manifestUrl.openStream()).getMainAttributes();
+   return attr.getValue("Implementation-Version");
+   } catch (IOException e) {
+   //
+   }
+   }
+   return null;
+   }
+
+   /**
+* Returns the commit id of the source from which the flink jar is 
built.
+*/
+   public static String getCommitId() {
--- End diff --

Ah. I was entirely unaware of this class. 
There's also a method already for accessing the revision id, albeit 
abbreviated. Is the short rev id okay or should I just place a full rev id 
string in the same class?


> Add "version" command to CliFrontend for showing the version of the 
> installation
> 
>
> Key: FLINK-3077
> URL: https://issues.apache.org/jira/browse/FLINK-3077
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Reporter: Robert Metzger
>Assignee: Sachin Goel
> Fix For: 1.0.0
>
>
> I have the bin directory of Flink in my $PATH variable, so I can just do 
> "flink run" on the command line for executing stuff.
> However, I have multiple Flink versions locally and its hard to find out 
> which installation the bash is picking in the end.
> adding a simple "version" command will resolve that issue and I consider it 
> helpful in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030974#comment-15030974
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093216
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030972#comment-15030972
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093209
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-3077] Add functions to access Flink Ver...

2015-11-29 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1418#discussion_r46093228
  
--- Diff: flink-core/src/main/java/org/apache/flink/util/VersionUtils.java 
---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.util;
+
+import java.io.IOException;
+import java.net.URL;
+import java.util.Properties;
+import java.util.jar.Attributes;
+import java.util.jar.Manifest;
+
+/**
+ * Utility class which provides various methods for accessing version 
information.
+ */
+public final class VersionUtils {
+
+   private static final VersionUtils INSTANCE = new VersionUtils();
+
+   /**
+* Private constructor used to overwrite public one.
+*/
+   private VersionUtils() {}
+
+   /**
+* Returns the version of Flink.
+*/
+   public static String getFlinkVersion() {
+   // a version can only be provided when running from a Jar.
+   URL manifestUrl = 
INSTANCE.getClass().getClassLoader().getResource("META-INF/MANIFEST.MF");
+   if (manifestUrl != null) {
+   try {
+   Attributes attr = new 
Manifest(manifestUrl.openStream()).getMainAttributes();
+   return attr.getValue("Implementation-Version");
+   } catch (IOException e) {
+   //
+   }
+   }
+   return null;
+   }
+
+   /**
+* Returns the commit id of the source from which the flink jar is 
built.
+*/
+   public static String getCommitId() {
--- End diff --

The short rev is okay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093216
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093209
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-3077] Add functions to access Flink Ver...

2015-11-29 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1418#discussion_r46093211
  
--- Diff: flink-core/src/main/java/org/apache/flink/util/VersionUtils.java 
---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.util;
+
+import java.io.IOException;
+import java.net.URL;
+import java.util.Properties;
+import java.util.jar.Attributes;
+import java.util.jar.Manifest;
+
+/**
+ * Utility class which provides various methods for accessing version 
information.
+ */
+public final class VersionUtils {
+
+   private static final VersionUtils INSTANCE = new VersionUtils();
+
+   /**
+* Private constructor used to overwrite public one.
+*/
+   private VersionUtils() {}
+
+   /**
+* Returns the version of Flink.
+*/
+   public static String getFlinkVersion() {
+   // a version can only be provided when running from a Jar.
+   URL manifestUrl = 
INSTANCE.getClass().getClassLoader().getResource("META-INF/MANIFEST.MF");
+   if (manifestUrl != null) {
+   try {
+   Attributes attr = new 
Manifest(manifestUrl.openStream()).getMainAttributes();
+   return attr.getValue("Implementation-Version");
+   } catch (IOException e) {
+   //
+   }
+   }
+   return null;
+   }
+
+   /**
+* Returns the commit id of the source from which the flink jar is 
built.
+*/
+   public static String getCommitId() {
--- End diff --

Ah. I was entirely unaware of this class. 
There's also a method already for accessing the revision id, albeit 
abbreviated. Is the short rev id okay or should I just place a full rev id 
string in the same class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093220
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-3081] Properly stop periodic Kafka comm...

2015-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1410


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3081) Kafka Periodic Offset Committer does not properly terminate on canceling

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030975#comment-15030975
 ] 

ASF GitHub Bot commented on FLINK-3081:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1410


> Kafka Periodic Offset Committer does not properly terminate on canceling
> 
>
> Key: FLINK-3081
> URL: https://issues.apache.org/jira/browse/FLINK-3081
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Robert Metzger
>Priority: Blocker
> Fix For: 1.0.0, 0.10.2
>
>
> The committer is only stopped at the end of the run method. Any termination 
> of the run method via an exception keeps the periodic committer thread 
> running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030978#comment-15030978
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093222
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-3077) Add "version" command to CliFrontend for showing the version of the installation

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030979#comment-15030979
 ] 

ASF GitHub Bot commented on FLINK-3077:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1418#discussion_r46093228
  
--- Diff: flink-core/src/main/java/org/apache/flink/util/VersionUtils.java 
---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.util;
+
+import java.io.IOException;
+import java.net.URL;
+import java.util.Properties;
+import java.util.jar.Attributes;
+import java.util.jar.Manifest;
+
+/**
+ * Utility class which provides various methods for accessing version 
information.
+ */
+public final class VersionUtils {
+
+   private static final VersionUtils INSTANCE = new VersionUtils();
+
+   /**
+* Private constructor used to overwrite public one.
+*/
+   private VersionUtils() {}
+
+   /**
+* Returns the version of Flink.
+*/
+   public static String getFlinkVersion() {
+   // a version can only be provided when running from a Jar.
+   URL manifestUrl = 
INSTANCE.getClass().getClassLoader().getResource("META-INF/MANIFEST.MF");
+   if (manifestUrl != null) {
+   try {
+   Attributes attr = new 
Manifest(manifestUrl.openStream()).getMainAttributes();
+   return attr.getValue("Implementation-Version");
+   } catch (IOException e) {
+   //
+   }
+   }
+   return null;
+   }
+
+   /**
+* Returns the commit id of the source from which the flink jar is 
built.
+*/
+   public static String getCommitId() {
--- End diff --

The short rev is okay.


> Add "version" command to CliFrontend for showing the version of the 
> installation
> 
>
> Key: FLINK-3077
> URL: https://issues.apache.org/jira/browse/FLINK-3077
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Reporter: Robert Metzger
>Assignee: Sachin Goel
> Fix For: 1.0.0
>
>
> I have the bin directory of Flink in my $PATH variable, so I can just do 
> "flink run" on the command line for executing stuff.
> However, I have multiple Flink versions locally and its hard to find out 
> which installation the bash is picking in the end.
> adding a simple "version" command will resolve that issue and I consider it 
> helpful in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093234
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030982#comment-15030982
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093240
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093242
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093248
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030976#comment-15030976
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093220
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030980#comment-15030980
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093234
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093243
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093240
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093251
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030987#comment-15030987
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093251
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030986#comment-15030986
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093248
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030985#comment-15030985
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093243
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030984#comment-15030984
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093242
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030990#comment-15030990
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093270
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030988#comment-15030988
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093265
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093280
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093270
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030989#comment-15030989
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093267
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093279
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093265
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093267
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030992#comment-15030992
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093279
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030993#comment-15030993
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093280
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093319
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093306
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030994#comment-15030994
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093306
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030996#comment-15030996
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093319
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Resolved] (FLINK-3081) Kafka Periodic Offset Committer does not properly terminate on canceling

2015-11-29 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-3081.
---
Resolution: Fixed

Fixed for 1.0 in http://git-wip-us.apache.org/repos/asf/flink/commit/a997dd61
Fixed for 0.10.2 in http://git-wip-us.apache.org/repos/asf/flink/commit/961adea5

> Kafka Periodic Offset Committer does not properly terminate on canceling
> 
>
> Key: FLINK-3081
> URL: https://issues.apache.org/jira/browse/FLINK-3081
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Robert Metzger
>Priority: Blocker
> Fix For: 1.0.0, 0.10.2
>
>
> The committer is only stopped at the end of the run method. Any termination 
> of the run method via an exception keeps the periodic committer thread 
> running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031001#comment-15031001
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093466
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
--- End diff --

How about checking type of `DistanceMetric` in `QuadTree` constructor?


> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Blazevski
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093466
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
--- End diff --

How about checking type of `DistanceMetric` in `QuadTree` constructor?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092369
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object KNN {
+
+  case object K extends Parameter[Int] {
+val defaultValue: Option[Int] = Some(5)
+  }
+
+  case object DistanceMetric extends Parameter[DistanceMetric] {
+val defaultValue: 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030951#comment-15030951
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092377
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092377
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object KNN {
+
+  case object K extends Parameter[Int] {
+val defaultValue: Option[Int] = Some(5)
+  }
+
+  case object DistanceMetric extends Parameter[DistanceMetric] {
+val defaultValue: 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030949#comment-15030949
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092369
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092375
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object KNN {
+
+  case object K extends Parameter[Int] {
+val defaultValue: Option[Int] = Some(5)
+  }
+
+  case object DistanceMetric extends Parameter[DistanceMetric] {
+val defaultValue: 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030950#comment-15030950
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46092375
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object 

[jira] [Commented] (FLINK-3023) Show Flink version + commit id for -SNAPSHOT versions in web frontend

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031011#comment-15031011
 ] 

ASF GitHub Bot commented on FLINK-3023:
---

GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/1422

[FLINK-3023][web-dashboard] Display version and commit information on 
Overview Page.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink 3023-web-version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1422


commit ccd05eacc4fe78bf199a7195ded15ba75e0951a5
Author: Sachin Goel 
Date:   2015-11-28T09:26:16Z

[FLINK-3023][web-dashboard] Display version and commit information on 
Overview Page.




> Show Flink version + commit id for -SNAPSHOT versions in web frontend
> -
>
> Key: FLINK-3023
> URL: https://issues.apache.org/jira/browse/FLINK-3023
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Reporter: Robert Metzger
>Assignee: Sachin Goel
>
> The old frontend was showing the Flink version and the commit id for SNAPSHOT 
> builds.
> This is a helpful feature to quickly see which Flink version is running.
> It would be nice to add this again to the web interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3023][web-dashboard] Display version an...

2015-11-29 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/1422

[FLINK-3023][web-dashboard] Display version and commit information on 
Overview Page.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink 3023-web-version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1422


commit ccd05eacc4fe78bf199a7195ded15ba75e0951a5
Author: Sachin Goel 
Date:   2015-11-28T09:26:16Z

[FLINK-3023][web-dashboard] Display version and commit information on 
Overview Page.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3077) Add "version" command to CliFrontend for showing the version of the installation

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031012#comment-15031012
 ] 

ASF GitHub Bot commented on FLINK-3077:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1418#issuecomment-160422689
  
@rmetzger I have split this PR to separate the Web dashboard specific work 
into another PR #1422 


> Add "version" command to CliFrontend for showing the version of the 
> installation
> 
>
> Key: FLINK-3077
> URL: https://issues.apache.org/jira/browse/FLINK-3077
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Reporter: Robert Metzger
>Assignee: Sachin Goel
> Fix For: 1.0.0
>
>
> I have the bin directory of Flink in my $PATH variable, so I can just do 
> "flink run" on the command line for executing stuff.
> However, I have multiple Flink versions locally and its hard to find out 
> which installation the bash is picking in the end.
> adding a simple "version" command will resolve that issue and I consider it 
> helpful in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3077][cli] Add version option to Cli.

2015-11-29 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1418#issuecomment-160422689
  
@rmetzger I have split this PR to separate the Web dashboard specific work 
into another PR #1422 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093688
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object KNN {
+
+  case object K extends Parameter[Int] {
+val defaultValue: Option[Int] = Some(5)
+  }
+
+  case object DistanceMetric extends Parameter[DistanceMetric] {
+val defaultValue: 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093714
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object KNN {
+
+  case object K extends Parameter[Int] {
+val defaultValue: Option[Int] = Some(5)
+  }
+
+  case object DistanceMetric extends Parameter[DistanceMetric] {
+val defaultValue: 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031014#comment-15031014
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093688
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object 

[jira] [Commented] (FLINK-2988) Cannot load DataSet[Row] from CSV file

2015-11-29 Thread Timo Walther (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031016#comment-15031016
 ] 

Timo Walther commented on FLINK-2988:
-

Yes that is a good example for a {{TableSource}} in {{TableEnvironment}}. But 
maybe it would also make sense to move Row to flink-core and provide an easy 
way for reading nullable, variable-length CSV files in DataSet API as well. 
Tuples and POJOs are sometimes simply too static. I also had an use-case with 
more than 25 columns. Defining a POJO for so many columns is quite cumbersome.

> Cannot load DataSet[Row] from CSV file
> --
>
> Key: FLINK-2988
> URL: https://issues.apache.org/jira/browse/FLINK-2988
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API, Table API
>Affects Versions: 0.10.0
>Reporter: Johann Kovacs
>Priority: Minor
>
> Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot 
> load a CSV file with more than 25 columns directly as a 
> DataSet\[TupleX\[...\]\].
> An alternative to using Tuples is using the Table API's Row class, which 
> allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using 
> RowTypeInfo) and index-based access.
> However, trying to load a CSV file as a DataSet\[Row\] yields an exception:
> {code}
> val env = ExecutionEnvironment.createLocalEnvironment()
> val filePath = "../someCsv.csv"
> val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, 
> BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number"))
> val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo)
> println(source.collect())
> {code}
> with someCsv.csv containing:
> {code}
> one,1
> two,2
> {code}
> yields
> {code}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase
>   at 
> org.apache.flink.api.scala.operators.ScalaCsvInputFormat.(ScalaCsvInputFormat.java:46)
>   at 
> org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282)
> {code}
> As a user I would like to be able to load a CSV file into a DataSet\[Row\], 
> preferably having a convenience method to specify the schema (RowTypeInfo), 
> without having to use the "explicit implicit parameters" syntax and 
> specifying the ClassTag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031015#comment-15031015
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093714
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093728
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object KNN {
+
+  case object K extends Parameter[Int] {
+val defaultValue: Option[Int] = Some(5)
+  }
+
+  case object DistanceMetric extends Parameter[DistanceMetric] {
+val defaultValue: 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031018#comment-15031018
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093728
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031017#comment-15031017
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093717
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093717
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/KNN.scala ---
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.utils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.{Vector => FlinkVector, DenseVector}
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+DistanceMetric, EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import org.apache.flink.ml.nn.util.QuadTree
+import scala.collection.mutable.ListBuffer
+
+import scala.collection.immutable.Vector
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * Calculates the `k` nearest neighbor points in the training set for 
each point in the test set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.nn.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''5''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.nn.KNN.DistanceMetric]]
+  * Sets the distance metric we use to calculate the distance between two 
points. If no metric is
+  * specified, then 
[[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] is used.
+  * (Default value: '''EuclideanDistanceMetric()''')
+  *
+  */
+
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[FlinkVector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k > 0, "K must be positive.")
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n > 0, "Number of blocks must be positive.")
+parameters.add(Blocks, n)
+this
+  }
+
+  /**
+   * Sets the Boolean variable that decides whether to use the QuadTree or 
not
+*/
+  def setUseQuadTree(UseQuadTree: Boolean): KNN = {
+parameters.add(UseQuadTreeParam, UseQuadTree)
+this
+  }
+
+}
+
+object KNN {
+
+  case object K extends Parameter[Int] {
+val defaultValue: Option[Int] = Some(5)
+  }
+
+  case object DistanceMetric extends Parameter[DistanceMetric] {
+val defaultValue: 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093744
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093740
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031020#comment-15031020
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093744
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031019#comment-15031019
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093740
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2015-11-29 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1220#discussion_r46093852
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn/QuadTree.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.nn.util
+
+import org.apache.flink.ml.math.{Breeze, Vector}
+import Breeze._
+
+import 
org.apache.flink.ml.metrics.distances.{SquaredEuclideanDistanceMetric,
+EuclideanDistanceMetric, DistanceMetric}
+
+import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.PriorityQueue
+
+/**
+ * n-dimensional QuadTree data structure; partitions
+ * spatial data for faster queries (e.g. KNN query)
+ * The skeleton of the data structure was initially
+ * based off of the 2D Quadtree found here:
+ * 
http://www.cs.trinity.edu/~mlewis/CSCI1321-F11/Code/src/util/Quadtree.scala
+ *
+ * Many additional methods were added to the class both for
+ * efficient KNN queries and generalizing to n-dim.
+ *
+ * @param minVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param maxVec vector of the corner of the bounding box with smallest 
coordinates
+ * @param distMetric metric, must be Euclidean or squareEuclidean
+ * @param maxPerBox threshold for number of points in each box before 
slitting a box
+ */
+class QuadTree(minVec: Vector, maxVec: Vector, distMetric: DistanceMetric, 
maxPerBox: Int){
+
+  class Node(center: Vector, width: Vector, var children: Seq[Node]) {
+
+val nodeElements = new ListBuffer[Vector]
+
+/** for testing purposes only; used in QuadTreeSuite.scala
+  *
+  * @return center and width of the box
+  */
+def getCenterWidth(): (Vector, Vector) = {
+  (center, width)
+}
+
+def contains(queryPoint: Vector): Boolean = {
+  overlap(queryPoint, 0.0)
+}
+
+/** Tests if queryPoint is within a radius of the node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def overlap(queryPoint: Vector, radius: Double): Boolean = {
+  var count = 0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) - radius < center(i) + width(i) / 2 &&
+  queryPoint(i) + radius > center(i) - width(i) / 2) {
+  count += 1
+}
+  }
+
+  if (count == queryPoint.size) {
+true
+  } else {
+false
+  }
+}
+
+/** Tests if queryPoint is near a node
+  *
+  * @param queryPoint
+  * @param radius
+  * @return
+  */
+def isNear(queryPoint: Vector, radius: Double): Boolean = {
+  if (minDist(queryPoint) < radius) {
+true
+  } else {
+false
+  }
+}
+
+/**
+ * used in error handling when computing minDist to make sure
+ * distMetric is Euclidean or SquaredEuclidean
+ * @param message
+ */
+case class metricException(message: String) extends Exception(message)
+
+/**
+ * minDist is defined so that every point in the box
+ * has distance to queryPoint greater than minDist
+ * (minDist adopted from "Nearest Neighbors Queries" by N. 
Roussopoulos et al.)
+ *
+ * @param queryPoint
+ * @return
+ */
+
+def minDist(queryPoint: Vector): Double = {
+  var minDist = 0.0
+  for (i <- 0 to queryPoint.size - 1) {
+if (queryPoint(i) < center(i) - width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) + width(i) / 2, 2)
+} else if (queryPoint(i) > center(i) + width(i) / 2) {
+  minDist += math.pow(queryPoint(i) - center(i) - width(i) / 2, 2)
+}
+  }
+
+  if 

[jira] [Resolved] (FLINK-2961) Add support for basic type Date in Table API

2015-11-29 Thread Timo Walther (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther resolved FLINK-2961.
-
   Resolution: Fixed
Fix Version/s: 1.0.0

Fixed in 31a2de86d9034def9c58ef7f3d3dcae3d4dafd6f.

> Add support for basic type Date in Table API
> 
>
> Key: FLINK-2961
> URL: https://issues.apache.org/jira/browse/FLINK-2961
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Minor
> Fix For: 1.0.0
>
>
> Currently, the basic type {{Date}} is not implemented in the Table API. In 
> order to have a mapping of the most important ANSI SQL types for FLINK-2099. 
> It makes sense to add support for {{Date}} to represent date, time and 
> timestamps of milliseconds precision.
> Only the types `LONG` and `STRING` can be casted to `DATE` and vice versa. A 
> `LONG` casted to `DATE` must be a milliseconds timestamp. A `STRING` casted 
> to `DATE` must have the format "`-MM-dd HH:mm:ss.SSS`", "`-MM-dd`", 
> "`HH:mm:ss`", or a milliseconds timestamp. All timestamps refer to the UTC 
> timezone beginning from January 1, 1970, 00:00:00 in milliseconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2961) Add support for basic type Date in Table API

2015-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031024#comment-15031024
 ] 

ASF GitHub Bot commented on FLINK-2961:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1322


> Add support for basic type Date in Table API
> 
>
> Key: FLINK-2961
> URL: https://issues.apache.org/jira/browse/FLINK-2961
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Minor
>
> Currently, the basic type {{Date}} is not implemented in the Table API. In 
> order to have a mapping of the most important ANSI SQL types for FLINK-2099. 
> It makes sense to add support for {{Date}} to represent date, time and 
> timestamps of milliseconds precision.
> Only the types `LONG` and `STRING` can be casted to `DATE` and vice versa. A 
> `LONG` casted to `DATE` must be a milliseconds timestamp. A `STRING` casted 
> to `DATE` must have the format "`-MM-dd HH:mm:ss.SSS`", "`-MM-dd`", 
> "`HH:mm:ss`", or a milliseconds timestamp. All timestamps refer to the UTC 
> timezone beginning from January 1, 1970, 00:00:00 in milliseconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2961] [table] Add support for basic typ...

2015-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1322


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2505) HDFSTest failure: VM crash

2015-11-29 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031028#comment-15031028
 ] 

Sachin Goel commented on FLINK-2505:


Haven't observed this in a long while. Closing.

> HDFSTest failure: VM crash
> --
>
> Key: FLINK-2505
> URL: https://issues.apache.org/jira/browse/FLINK-2505
> Project: Flink
>  Issue Type: Bug
>Reporter: Sachin Goel
>
> I observed a VM crash on one of my builds. Here's the build log:
> https://travis-ci.org/apache/flink/jobs/74988472
> {{Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
> project flink-fs-tests: ExecutionException: java.lang.RuntimeException: The 
> forked VM terminated without properly saying goodbye. VM crash or System.exit 
> called?}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2505) HDFSTest failure: VM crash

2015-11-29 Thread Sachin Goel (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goel closed FLINK-2505.
--
Resolution: Cannot Reproduce

> HDFSTest failure: VM crash
> --
>
> Key: FLINK-2505
> URL: https://issues.apache.org/jira/browse/FLINK-2505
> Project: Flink
>  Issue Type: Bug
>Reporter: Sachin Goel
>
> I observed a VM crash on one of my builds. Here's the build log:
> https://travis-ci.org/apache/flink/jobs/74988472
> {{Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
> project flink-fs-tests: ExecutionException: java.lang.RuntimeException: The 
> forked VM terminated without properly saying goodbye. VM crash or System.exit 
> called?}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >