[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-25 Thread burness
Github user burness commented on the pull request: https://github.com/apache/spark/pull/13116#issuecomment-221513002 @hvanhovell yeah, I agree with your opinion! Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r64160441 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -51,6 +49,7 @@ import org.apache.spark.sql.execution.python.EvaluatePython

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-22 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/13116#issuecomment-220857549 @burness that is no problem. I have taken a look at the current implementation of the `RDD`'s `takeSample`; it is very similar to your implementation. I do still

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-22 Thread burness
Github user burness commented on the pull request: https://github.com/apache/spark/pull/13116#issuecomment-220838511 @hvanhovell Thank you for code review. In my project, I want to sample by the specify num but in DataFrame or Dataset, there is only the sample by fraction. And I

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-16 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/13116#issuecomment-219546005 @burness thanks for working on this. The PR in its current state has some serious potential memory and performance problems (see the comments). What is the usecase

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63423262 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1494,6 +1493,56 @@ class Dataset[T] private[sql]( } /**

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63423222 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1494,6 +1493,56 @@ class Dataset[T] private[sql]( } /**

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63422967 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1494,6 +1493,56 @@ class Dataset[T] private[sql]( } /**

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63421995 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1494,6 +1493,56 @@ class Dataset[T] private[sql]( } /**

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63421664 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1494,6 +1493,56 @@ class Dataset[T] private[sql]( } /**

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63274249 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest with

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63274238 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest with

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63274228 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -18,6 +18,9 @@ package org.apache.spark.sql import

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13116#issuecomment-219213307 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread burness
GitHub user burness opened a pull request: https://github.com/apache/spark/pull/13116 [SPARK-15324] [SQL] Add the takeSample function to the Dataset ## What changes were proposed in this pull request? In this pr, I add the takeSample function with the Dataset which is to