date:20190120

svn commit: r32060 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_20_19_45-123adbd-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-20 Thread pwendell

Author: pwendell
Date: Mon Jan 21 04:01:51 2019
New Revision: 32060

Log:
Apache Spark 2.4.1-SNAPSHOT-2019_01_20_19_45-123adbd docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r32059 - in /dev/spark/2.3.4-SNAPSHOT-2019_01_20_19_45-ae64e5b-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-20 Thread pwendell

Author: pwendell
Date: Mon Jan 21 04:00:00 2019
New Revision: 32059

Log:
Apache Spark 2.3.4-SNAPSHOT-2019_01_20_19_45-ae64e5b docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r32058 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_20_17_27-9a30e23-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-20 Thread pwendell

Author: pwendell
Date: Mon Jan 21 01:39:45 2019
New Revision: 32058

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_20_17_27-9a30e23 docs


[This commit notification would consist of 1778 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.3 updated: [SPARK-26351][MLLIB] Update doc and minor correction in the mllib evaluation metrics

2019-01-20 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new ae64e5b  [SPARK-26351][MLLIB] Update doc and minor correction in the 
mllib evaluation metrics
ae64e5b is described below

commit ae64e5b578ac40746588a46aef5e16ec7858f259
Author: Shahid 
AuthorDate: Sun Jan 20 18:11:14 2019 -0600

[SPARK-26351][MLLIB] Update doc and minor correction in the mllib 
evaluation metrics

## What changes were proposed in this pull request?
Currently, there are some minor inconsistencies in doc compared to the 
code. In this PR, I am correcting those inconsistencies.
1) Links related to the evaluation metrics in the docs are not working
2) Minor correction in the evaluation metrics formulas in docs.

## How was this patch tested?

NA

Closes #23589 from shahidki31/docCorrection.

Authored-by: Shahid 
Signed-off-by: Sean Owen 
(cherry picked from commit 9a30e23211e165a44acc0dbe19693950f7a7cc73)
Signed-off-by: Sean Owen 
---
 docs/mllib-evaluation-metrics.md   | 22 +++---
 .../spark/mllib/evaluation/RankingMetrics.scala|  2 ++
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index ac398fb..8afea2c 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -413,13 +413,13 @@ A ranking system usually deals with a set of $M$ users
 
 $$U = \left\{u_0, u_1, ..., u_{M-1}\right\}$$
 
-Each user ($u_i$) having a set of $N$ ground truth relevant documents
+Each user ($u_i$) having a set of $N_i$ ground truth relevant documents
 
-$$D_i = \left\{d_0, d_1, ..., d_{N-1}\right\}$$
+$$D_i = \left\{d_0, d_1, ..., d_{N_i-1}\right\}$$
 
-And a list of $Q$ recommended documents, in order of decreasing relevance
+And a list of $Q_i$ recommended documents, in order of decreasing relevance
 
-$$R_i = \left[r_0, r_1, ..., r_{Q-1}\right]$$
+$$R_i = \left[r_0, r_1, ..., r_{Q_i-1}\right]$$
 
 The goal of the ranking system is to produce the most relevant set of 
documents for each user. The relevance of the
 sets and the effectiveness of the algorithms can be measured using the metrics 
listed below.
@@ -439,10 +439,10 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
 Precision at k
   
   
-$p(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{k} 
\sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} rel_{D_i}(R_i(j))}$
+$p(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{k} 
\sum_{j=0}^{\text{min}(Q_i, k) - 1} rel_{D_i}(R_i(j))}$
   
   
-https://en.wikipedia.org/wiki/Information_retrieval#Precision_at_K;>Precision
 at k is a measure of
+https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Precision_at_K">Precision
 at k is a measure of
  how many of the first k recommended documents are in the set of true 
relevant documents averaged across all
  users. In this metric, the order of the recommendations is not taken 
into account.
   
@@ -450,10 +450,10 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
 
   Mean Average Precision
   
-$MAP=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{\left|D_i\right|} 
\sum_{j=0}^{Q-1} \frac{rel_{D_i}(R_i(j))}{j + 1}}$
+$MAP=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{N_i} \sum_{j=0}^{Q_i-1} 
\frac{rel_{D_i}(R_i(j))}{j + 1}}$
   
   
-https://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision;>MAP
 is a measure of how
+https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision">MAP
 is a measure of how
  many of the recommended documents are in the set of true relevant 
documents, where the
 order of the recommendations is taken into account (i.e. penalty for 
highly relevant documents is higher).
   
@@ -462,10 +462,10 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
   Normalized Discounted Cumulative Gain
   
 $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, 
k)}\sum_{j=0}^{n-1}
-  \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+2)}} \\
+  \frac{rel_{D_i}(R_i(j))}{\text{log}(j+2)}} \\
 \text{Where} \\
-\hspace{5 mm} n = 
\text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\
-\hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 
1} \frac{1}{\text{ln}(j+2)}$
+\hspace{5 mm} n = \text{min}\left(\text{max}\left(Q_i, 
N_i\right),k\right) \\
+\hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 
1} \frac{1}{\text{log}(j+2)}$

[spark] branch branch-2.4 updated: [SPARK-26351][MLLIB] Update doc and minor correction in the mllib evaluation metrics

2019-01-20 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 123adbd  [SPARK-26351][MLLIB] Update doc and minor correction in the 
mllib evaluation metrics
123adbd is described below

commit 123adbdbadedd0f77ac3cde0a1bb55c1b7c42b09
Author: Shahid 
AuthorDate: Sun Jan 20 18:11:14 2019 -0600

[SPARK-26351][MLLIB] Update doc and minor correction in the mllib 
evaluation metrics

## What changes were proposed in this pull request?
Currently, there are some minor inconsistencies in doc compared to the 
code. In this PR, I am correcting those inconsistencies.
1) Links related to the evaluation metrics in the docs are not working
2) Minor correction in the evaluation metrics formulas in docs.

## How was this patch tested?

NA

Closes #23589 from shahidki31/docCorrection.

Authored-by: Shahid 
Signed-off-by: Sean Owen 
(cherry picked from commit 9a30e23211e165a44acc0dbe19693950f7a7cc73)
Signed-off-by: Sean Owen 
---
 docs/mllib-evaluation-metrics.md   | 22 +++---
 .../spark/mllib/evaluation/RankingMetrics.scala|  2 ++
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index c65ecdc..896d95b 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -413,13 +413,13 @@ A ranking system usually deals with a set of $M$ users
 
 $$U = \left\{u_0, u_1, ..., u_{M-1}\right\}$$
 
-Each user ($u_i$) having a set of $N$ ground truth relevant documents
+Each user ($u_i$) having a set of $N_i$ ground truth relevant documents
 
-$$D_i = \left\{d_0, d_1, ..., d_{N-1}\right\}$$
+$$D_i = \left\{d_0, d_1, ..., d_{N_i-1}\right\}$$
 
-And a list of $Q$ recommended documents, in order of decreasing relevance
+And a list of $Q_i$ recommended documents, in order of decreasing relevance
 
-$$R_i = \left[r_0, r_1, ..., r_{Q-1}\right]$$
+$$R_i = \left[r_0, r_1, ..., r_{Q_i-1}\right]$$
 
 The goal of the ranking system is to produce the most relevant set of 
documents for each user. The relevance of the
 sets and the effectiveness of the algorithms can be measured using the metrics 
listed below.
@@ -439,10 +439,10 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
 Precision at k
   
   
-$p(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{k} 
\sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} rel_{D_i}(R_i(j))}$
+$p(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{k} 
\sum_{j=0}^{\text{min}(Q_i, k) - 1} rel_{D_i}(R_i(j))}$
   
   
-https://en.wikipedia.org/wiki/Information_retrieval#Precision_at_K;>Precision
 at k is a measure of
+https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Precision_at_K">Precision
 at k is a measure of
  how many of the first k recommended documents are in the set of true 
relevant documents averaged across all
  users. In this metric, the order of the recommendations is not taken 
into account.
   
@@ -450,10 +450,10 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
 
   Mean Average Precision
   
-$MAP=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{\left|D_i\right|} 
\sum_{j=0}^{Q-1} \frac{rel_{D_i}(R_i(j))}{j + 1}}$
+$MAP=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{N_i} \sum_{j=0}^{Q_i-1} 
\frac{rel_{D_i}(R_i(j))}{j + 1}}$
   
   
-https://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision;>MAP
 is a measure of how
+https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision">MAP
 is a measure of how
  many of the recommended documents are in the set of true relevant 
documents, where the
 order of the recommendations is taken into account (i.e. penalty for 
highly relevant documents is higher).
   
@@ -462,10 +462,10 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
   Normalized Discounted Cumulative Gain
   
 $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, 
k)}\sum_{j=0}^{n-1}
-  \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+2)}} \\
+  \frac{rel_{D_i}(R_i(j))}{\text{log}(j+2)}} \\
 \text{Where} \\
-\hspace{5 mm} n = 
\text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\
-\hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 
1} \frac{1}{\text{ln}(j+2)}$
+\hspace{5 mm} n = \text{min}\left(\text{max}\left(Q_i, 
N_i\right),k\right) \\
+\hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 
1} \frac{1}{\text{log}(j+2)}$

[spark] branch master updated: [SPARK-26351][MLLIB] Update doc and minor correction in the mllib evaluation metrics

2019-01-20 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9a30e23  [SPARK-26351][MLLIB] Update doc and minor correction in the 
mllib evaluation metrics
9a30e23 is described below

commit 9a30e23211e165a44acc0dbe19693950f7a7cc73
Author: Shahid 
AuthorDate: Sun Jan 20 18:11:14 2019 -0600

[SPARK-26351][MLLIB] Update doc and minor correction in the mllib 
evaluation metrics

## What changes were proposed in this pull request?
Currently, there are some minor inconsistencies in doc compared to the 
code. In this PR, I am correcting those inconsistencies.
1) Links related to the evaluation metrics in the docs are not working
2) Minor correction in the evaluation metrics formulas in docs.

## How was this patch tested?

NA

Closes #23589 from shahidki31/docCorrection.

Authored-by: Shahid 
Signed-off-by: Sean Owen 
---
 docs/mllib-evaluation-metrics.md   | 22 +++---
 .../spark/mllib/evaluation/RankingMetrics.scala|  2 ++
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index c65ecdc..896d95b 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -413,13 +413,13 @@ A ranking system usually deals with a set of $M$ users
 
 $$U = \left\{u_0, u_1, ..., u_{M-1}\right\}$$
 
-Each user ($u_i$) having a set of $N$ ground truth relevant documents
+Each user ($u_i$) having a set of $N_i$ ground truth relevant documents
 
-$$D_i = \left\{d_0, d_1, ..., d_{N-1}\right\}$$
+$$D_i = \left\{d_0, d_1, ..., d_{N_i-1}\right\}$$
 
-And a list of $Q$ recommended documents, in order of decreasing relevance
+And a list of $Q_i$ recommended documents, in order of decreasing relevance
 
-$$R_i = \left[r_0, r_1, ..., r_{Q-1}\right]$$
+$$R_i = \left[r_0, r_1, ..., r_{Q_i-1}\right]$$
 
 The goal of the ranking system is to produce the most relevant set of 
documents for each user. The relevance of the
 sets and the effectiveness of the algorithms can be measured using the metrics 
listed below.
@@ -439,10 +439,10 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
 Precision at k
   
   
-$p(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{k} 
\sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} rel_{D_i}(R_i(j))}$
+$p(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{k} 
\sum_{j=0}^{\text{min}(Q_i, k) - 1} rel_{D_i}(R_i(j))}$
   
   
-https://en.wikipedia.org/wiki/Information_retrieval#Precision_at_K;>Precision
 at k is a measure of
+https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Precision_at_K">Precision
 at k is a measure of
  how many of the first k recommended documents are in the set of true 
relevant documents averaged across all
  users. In this metric, the order of the recommendations is not taken 
into account.
   
@@ -450,10 +450,10 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
 
   Mean Average Precision
   
-$MAP=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{\left|D_i\right|} 
\sum_{j=0}^{Q-1} \frac{rel_{D_i}(R_i(j))}{j + 1}}$
+$MAP=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{N_i} \sum_{j=0}^{Q_i-1} 
\frac{rel_{D_i}(R_i(j))}{j + 1}}$
   
   
-https://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision;>MAP
 is a measure of how
+https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision">MAP
 is a measure of how
  many of the recommended documents are in the set of true relevant 
documents, where the
 order of the recommendations is taken into account (i.e. penalty for 
highly relevant documents is higher).
   
@@ -462,10 +462,10 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
   Normalized Discounted Cumulative Gain
   
 $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, 
k)}\sum_{j=0}^{n-1}
-  \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+2)}} \\
+  \frac{rel_{D_i}(R_i(j))}{\text{log}(j+2)}} \\
 \text{Where} \\
-\hspace{5 mm} n = 
\text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\
-\hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 
1} \frac{1}{\text{ln}(j+2)}$
+\hspace{5 mm} n = \text{min}\left(\text{max}\left(Q_i, 
N_i\right),k\right) \\
+\hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 
1} \frac{1}{\text{log}(j+2)}$
   
   
 https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG;>NDCG
 at k is a
diff --git

svn commit: r32056 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_20_12_55-6c18d8d-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-20 Thread pwendell

Author: pwendell
Date: Sun Jan 20 21:07:23 2019
New Revision: 32056

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_20_12_55-6c18d8d docs


[This commit notification would consist of 1778 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S.

2019-01-20 Thread felixcheung

This is an automated email from the ASF dual-hosted git repository.

felixcheung pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6c18d8d  [SPARK-26642][K8S] Add --num-executors option to spark-submit 
for Spark on K8S.
6c18d8d is described below

commit 6c18d8d8079ac4d2d6dc7539601ab83fc5b51760
Author: Luca Canali 
AuthorDate: Sun Jan 20 12:43:34 2019 -0800

[SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on 
K8S.

## What changes were proposed in this pull request?

This PR proposes to extend the spark-submit option --num-executors to be 
applicable to Spark on K8S too. It is motivated by convenience, for example 
when migrating jobs written for YARN to run on K8S.

## How was this patch tested?

Manually tested on a K8S cluster.

Author: Luca Canali 

Closes #23573 from LucaCanali/addNumExecutorsToK8s.
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala   | 4 ++--
 .../main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala   | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index b403cc4..d5e17ff 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -537,14 +537,14 @@ private[spark] class SparkSubmit extends Logging {
 
   // Yarn only
   OptionAssigner(args.queue, YARN, ALL_DEPLOY_MODES, confKey = 
"spark.yarn.queue"),
-  OptionAssigner(args.numExecutors, YARN, ALL_DEPLOY_MODES,
-confKey = EXECUTOR_INSTANCES.key),
   OptionAssigner(args.pyFiles, YARN, ALL_DEPLOY_MODES, confKey = 
"spark.yarn.dist.pyFiles"),
   OptionAssigner(args.jars, YARN, ALL_DEPLOY_MODES, confKey = 
"spark.yarn.dist.jars"),
   OptionAssigner(args.files, YARN, ALL_DEPLOY_MODES, confKey = 
"spark.yarn.dist.files"),
   OptionAssigner(args.archives, YARN, ALL_DEPLOY_MODES, confKey = 
"spark.yarn.dist.archives"),
 
   // Other options
+  OptionAssigner(args.numExecutors, YARN | KUBERNETES, ALL_DEPLOY_MODES,
+confKey = EXECUTOR_INSTANCES.key),
   OptionAssigner(args.executorCores, STANDALONE | YARN | KUBERNETES, 
ALL_DEPLOY_MODES,
 confKey = EXECUTOR_CORES.key),
   OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN | 
KUBERNETES, ALL_DEPLOY_MODES,
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
index f5e4c4a..9692d2a 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
@@ -585,15 +585,15 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 |  in standalone mode).
 |
 | Spark on YARN and Kubernetes only:
+|  --num-executors NUM Number of executors to launch (Default: 
2).
+|  If dynamic allocation is enabled, the 
initial number of
+|  executors will be at least NUM.
 |  --principal PRINCIPAL   Principal to be used to login to KDC.
 |  --keytab KEYTAB The full path to the file that contains 
the keytab for the
 |  principal specified above.
 |
 | Spark on YARN only:
 |  --queue QUEUE_NAME  The YARN queue to submit to (Default: 
"default").
-|  --num-executors NUM Number of executors to launch (Default: 
2).
-|  If dynamic allocation is enabled, the 
initial number of
-|  executors will be at least NUM.
 |  --archives ARCHIVES Comma separated list of archives to be 
extracted into the
 |  working directory of each executor.
   """.stripMargin


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r32054 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_20_03_50-6d9c54b-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-20 Thread pwendell

Author: pwendell
Date: Sun Jan 20 12:02:41 2019
New Revision: 32054

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_20_03_50-6d9c54b docs


[This commit notification would consist of 1778 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype

2019-01-20 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6d9c54b  [SPARK-26645][PYTHON] Support decimals with negative scale 
when parsing datatype
6d9c54b is described below

commit 6d9c54b62cee6fdf396f507caf7eb7f2e3f35b0a
Author: Marco Gaido 
AuthorDate: Sun Jan 20 17:43:50 2019 +0800

[SPARK-26645][PYTHON] Support decimals with negative scale when parsing 
datatype

## What changes were proposed in this pull request?

When parsing datatypes from the json internal representation, PySpark 
doesn't support decimals with negative scales. Since they are allowed and can 
actually happen, PySpark should be able to successfully parse them.

## How was this patch tested?

added test

Closes #23575 from mgaido91/SPARK-26645.

Authored-by: Marco Gaido 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/test_types.py | 8 +++-
 python/pyspark/sql/types.py| 4 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/tests/test_types.py 
b/python/pyspark/sql/tests/test_types.py
index fb673f2..3afb88c 100644
--- a/python/pyspark/sql/tests/test_types.py
+++ b/python/pyspark/sql/tests/test_types.py
@@ -24,7 +24,7 @@ import sys
 import unittest
 
 from pyspark.sql import Row
-from pyspark.sql.functions import UserDefinedFunction
+from pyspark.sql.functions import col, UserDefinedFunction
 from pyspark.sql.types import *
 from pyspark.sql.types import _array_signed_int_typecode_ctype_mappings, 
_array_type_mappings, \
 _array_unsigned_int_typecode_ctype_mappings, _infer_type, 
_make_type_verifier, _merge_type
@@ -202,6 +202,12 @@ class TypesTests(ReusedSQLTestCase):
 df = self.spark.createDataFrame([{'a': 1}], ["b"])
 self.assertEqual(df.columns, ['b'])
 
+def test_negative_decimal(self):
+df = self.spark.createDataFrame([(1, ), (11, )], ["value"])
+ret = df.select(col("value").cast(DecimalType(1, -1))).collect()
+actual = list(map(lambda r: int(r.value), ret))
+self.assertEqual(actual, [0, 10])
+
 def test_create_dataframe_from_objects(self):
 data = [MyObject(1, "1"), MyObject(2, "2")]
 df = self.spark.createDataFrame(data)
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 22ee5d3..00e90fc 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -752,7 +752,7 @@ _all_complex_types = dict((v.typeName(), v)
   for v in [ArrayType, MapType, StructType])
 
 
-_FIXED_DECIMAL = re.compile(r"decimal\(\s*(\d+)\s*,\s*(\d+)\s*\)")
+_FIXED_DECIMAL = re.compile(r"decimal\(\s*(\d+)\s*,\s*(-?\d+)\s*\)")
 
 
 def _parse_datatype_string(s):
@@ -865,6 +865,8 @@ def _parse_datatype_json_string(json_string):
 >>> complex_maptype = MapType(complex_structtype,
 ...   complex_arraytype, False)
 >>> check_datatype(complex_maptype)
+>>> # Decimal with negative scale.
+>>> check_datatype(DecimalType(1,-1))
 """
 return _parse_datatype_json_value(json.loads(json_string))
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r32060 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_20_19_45-123adbd-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

svn commit: r32059 - in /dev/spark/2.3.4-SNAPSHOT-2019_01_20_19_45-ae64e5b-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

svn commit: r32058 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_20_17_27-9a30e23-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

[spark] branch branch-2.3 updated: [SPARK-26351][MLLIB] Update doc and minor correction in the mllib evaluation metrics

[spark] branch branch-2.4 updated: [SPARK-26351][MLLIB] Update doc and minor correction in the mllib evaluation metrics

[spark] branch master updated: [SPARK-26351][MLLIB] Update doc and minor correction in the mllib evaluation metrics

svn commit: r32056 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_20_12_55-6c18d8d-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

[spark] branch master updated: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S.

svn commit: r32054 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_20_03_50-6d9c54b-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

[spark] branch master updated: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype

10 matches

Site Navigation

Mail list logo

Footer information