Github user burness commented on the pull request:
https://github.com/apache/spark/pull/13116#issuecomment-221513002
@hvanhovell yeah, I agree with your opinionï¼ Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/13116#discussion_r64160441
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -51,6 +49,7 @@ import
org.apache.spark.sql.execution.python.EvaluatePython
Github user hvanhovell commented on the pull request:
https://github.com/apache/spark/pull/13116#issuecomment-220857549
@burness that is no problem. I have taken a look at the current
implementation of the `RDD`'s `takeSample`; it is very similar to your
implementation. I do still
Github user burness commented on the pull request:
https://github.com/apache/spark/pull/13116#issuecomment-220838511
@hvanhovell Thank you for code review. In my project, I want to sample by
the specify num but in DataFrame or Dataset, there is only the sample by
fraction. And I
Github user hvanhovell commented on the pull request:
https://github.com/apache/spark/pull/13116#issuecomment-219546005
@burness thanks for working on this. The PR in its current state has some
serious potential memory and performance problems (see the comments). What is
the usecase
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/13116#discussion_r63423262
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1494,6 +1493,56 @@ class Dataset[T] private[sql](
}
/**
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/13116#discussion_r63423222
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1494,6 +1493,56 @@ class Dataset[T] private[sql](
}
/**
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/13116#discussion_r63422967
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1494,6 +1493,56 @@ class Dataset[T] private[sql](
}
/**
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/13116#discussion_r63421995
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1494,6 +1493,56 @@ class Dataset[T] private[sql](
}
/**
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/13116#discussion_r63421664
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1494,6 +1493,56 @@ class Dataset[T] private[sql](
}
/**
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/13116#discussion_r63274249
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
---
@@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest with
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/13116#discussion_r63274238
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
---
@@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest with
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/13116#discussion_r63274228
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -18,6 +18,9 @@
package org.apache.spark.sql
import
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13116#issuecomment-219213307
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user burness opened a pull request:
https://github.com/apache/spark/pull/13116
[SPARK-15324] [SQL] Add the takeSample function to the Dataset
## What changes were proposed in this pull request?
In this pr, I add the takeSample function with the Dataset which is to
15 matches
Mail list logo