spark git commit: Fix reference to external metrics documentation
Repository: spark Updated Branches: refs/heads/branch-2.0 7d63c36e1 -> ccb53a20e Fix reference to external metrics documentation Author: Ben McCann Closes #12833 from benmccann/patch-1. (cherry picked from commit 214d1be4fd4a34399b6a2adb2618784de459a48d) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ccb53a20 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ccb53a20 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ccb53a20 Branch: refs/heads/branch-2.0 Commit: ccb53a20e4c3bff02a17542cad13a1fe36d7a7ea Parents: 7d63c36 Author: Ben McCann Authored: Sun May 1 22:43:28 2016 -0700 Committer: Reynold Xin Committed: Sun May 1 22:45:21 2016 -0700 -- docs/monitoring.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ccb53a20/docs/monitoring.md -- diff --git a/docs/monitoring.md b/docs/monitoring.md index 9912cde..88002eb 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -341,7 +341,7 @@ keep the paths consistent in both modes. # Metrics Spark has a configurable metrics system based on the -[Coda Hale Metrics Library](http://metrics.codahale.com/). +[Dropwizard Metrics Library](http://metrics.dropwizard.io/). This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics system is configured via a configuration file that Spark expects to be present at `$SPARK_HOME/conf/metrics.properties`. A custom file location can be specified via the - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Fix reference to external metrics documentation
Repository: spark Updated Branches: refs/heads/master 44da8d8ea -> 214d1be4f Fix reference to external metrics documentation Author: Ben McCann Closes #12833 from benmccann/patch-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/214d1be4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/214d1be4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/214d1be4 Branch: refs/heads/master Commit: 214d1be4fd4a34399b6a2adb2618784de459a48d Parents: 44da8d8 Author: Ben McCann Authored: Sun May 1 22:43:28 2016 -0700 Committer: Reynold Xin Committed: Sun May 1 22:43:28 2016 -0700 -- docs/monitoring.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/214d1be4/docs/monitoring.md -- diff --git a/docs/monitoring.md b/docs/monitoring.md index 9912cde..88002eb 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -341,7 +341,7 @@ keep the paths consistent in both modes. # Metrics Spark has a configurable metrics system based on the -[Coda Hale Metrics Library](http://metrics.codahale.com/). +[Dropwizard Metrics Library](http://metrics.dropwizard.io/). This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics system is configured via a configuration file that Spark expects to be present at `$SPARK_HOME/conf/metrics.properties`. A custom file location can be specified via the - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15049] Rename NewAccumulator to AccumulatorV2
Repository: spark Updated Branches: refs/heads/branch-2.0 705172202 -> 7d63c36e1 [SPARK-15049] Rename NewAccumulator to AccumulatorV2 ## What changes were proposed in this pull request? NewAccumulator isn't the best name if we ever come up with v3 of the API. ## How was this patch tested? Updated tests to reflect the change. Author: Reynold Xin Closes #12827 from rxin/SPARK-15049. (cherry picked from commit 44da8d8eabeccc12bfed0d43b37d54e0da845c66) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7d63c36e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7d63c36e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7d63c36e Branch: refs/heads/branch-2.0 Commit: 7d63c36e1efe8baec96cdc16a997249728e204fd Parents: 7051722 Author: Reynold Xin Authored: Sun May 1 20:21:02 2016 -0700 Committer: Reynold Xin Committed: Sun May 1 20:21:11 2016 -0700 -- .../scala/org/apache/spark/AccumulatorV2.scala | 394 +++ .../scala/org/apache/spark/ContextCleaner.scala | 2 +- .../org/apache/spark/HeartbeatReceiver.scala| 2 +- .../scala/org/apache/spark/NewAccumulator.scala | 393 -- .../scala/org/apache/spark/SparkContext.scala | 4 +- .../scala/org/apache/spark/TaskContext.scala| 2 +- .../org/apache/spark/TaskContextImpl.scala | 2 +- .../scala/org/apache/spark/TaskEndReason.scala | 4 +- .../org/apache/spark/executor/Executor.scala| 4 +- .../org/apache/spark/executor/TaskMetrics.scala | 18 +- .../apache/spark/scheduler/DAGScheduler.scala | 10 +- .../spark/scheduler/DAGSchedulerEvent.scala | 2 +- .../scala/org/apache/spark/scheduler/Task.scala | 2 +- .../org/apache/spark/scheduler/TaskResult.scala | 8 +- .../apache/spark/scheduler/TaskScheduler.scala | 4 +- .../spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../apache/spark/scheduler/TaskSetManager.scala | 2 +- .../org/apache/spark/AccumulatorSuite.scala | 2 +- .../apache/spark/InternalAccumulatorSuite.scala | 2 +- .../spark/executor/TaskMetricsSuite.scala | 6 +- .../spark/scheduler/DAGSchedulerSuite.scala | 6 +- .../scheduler/ExternalClusterManagerSuite.scala | 4 +- .../spark/scheduler/TaskSetManagerSuite.scala | 6 +- .../spark/sql/execution/metric/SQLMetrics.scala | 6 +- 24 files changed, 444 insertions(+), 443 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7d63c36e/core/src/main/scala/org/apache/spark/AccumulatorV2.scala -- diff --git a/core/src/main/scala/org/apache/spark/AccumulatorV2.scala b/core/src/main/scala/org/apache/spark/AccumulatorV2.scala new file mode 100644 index 000..c65108a --- /dev/null +++ b/core/src/main/scala/org/apache/spark/AccumulatorV2.scala @@ -0,0 +1,394 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.{lang => jl} +import java.io.ObjectInputStream +import java.util.concurrent.ConcurrentHashMap +import java.util.concurrent.atomic.AtomicLong + +import org.apache.spark.scheduler.AccumulableInfo +import org.apache.spark.util.Utils + + +private[spark] case class AccumulatorMetadata( +id: Long, +name: Option[String], +countFailedValues: Boolean) extends Serializable + + +/** + * The base class for accumulators, that can accumulate inputs of type `IN`, and produce output of + * type `OUT`. + */ +abstract class AccumulatorV2[IN, OUT] extends Serializable { + private[spark] var metadata: AccumulatorMetadata = _ + private[this] var atDriverSide = true + + private[spark] def register( + sc: SparkContext, + name: Option[String] = None, + countFailedValues: Boolean = false): Unit = { +if (this.metadata != null) { + throw new IllegalStateException("Cannot register an Accumulator twice.") +} +this.metadata = AccumulatorMetadata(AccumulatorContext.newId(), name, countFailedValues) +AccumulatorContext.register(this) +sc.clean
spark git commit: [SPARK-15049] Rename NewAccumulator to AccumulatorV2
Repository: spark Updated Branches: refs/heads/master a832cef11 -> 44da8d8ea [SPARK-15049] Rename NewAccumulator to AccumulatorV2 ## What changes were proposed in this pull request? NewAccumulator isn't the best name if we ever come up with v3 of the API. ## How was this patch tested? Updated tests to reflect the change. Author: Reynold Xin Closes #12827 from rxin/SPARK-15049. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/44da8d8e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/44da8d8e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/44da8d8e Branch: refs/heads/master Commit: 44da8d8eabeccc12bfed0d43b37d54e0da845c66 Parents: a832cef Author: Reynold Xin Authored: Sun May 1 20:21:02 2016 -0700 Committer: Reynold Xin Committed: Sun May 1 20:21:02 2016 -0700 -- .../scala/org/apache/spark/AccumulatorV2.scala | 394 +++ .../scala/org/apache/spark/ContextCleaner.scala | 2 +- .../org/apache/spark/HeartbeatReceiver.scala| 2 +- .../scala/org/apache/spark/NewAccumulator.scala | 393 -- .../scala/org/apache/spark/SparkContext.scala | 4 +- .../scala/org/apache/spark/TaskContext.scala| 2 +- .../org/apache/spark/TaskContextImpl.scala | 2 +- .../scala/org/apache/spark/TaskEndReason.scala | 4 +- .../org/apache/spark/executor/Executor.scala| 4 +- .../org/apache/spark/executor/TaskMetrics.scala | 18 +- .../apache/spark/scheduler/DAGScheduler.scala | 10 +- .../spark/scheduler/DAGSchedulerEvent.scala | 2 +- .../scala/org/apache/spark/scheduler/Task.scala | 2 +- .../org/apache/spark/scheduler/TaskResult.scala | 8 +- .../apache/spark/scheduler/TaskScheduler.scala | 4 +- .../spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../apache/spark/scheduler/TaskSetManager.scala | 2 +- .../org/apache/spark/AccumulatorSuite.scala | 2 +- .../apache/spark/InternalAccumulatorSuite.scala | 2 +- .../spark/executor/TaskMetricsSuite.scala | 6 +- .../spark/scheduler/DAGSchedulerSuite.scala | 6 +- .../scheduler/ExternalClusterManagerSuite.scala | 4 +- .../spark/scheduler/TaskSetManagerSuite.scala | 6 +- .../spark/sql/execution/metric/SQLMetrics.scala | 6 +- 24 files changed, 444 insertions(+), 443 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/44da8d8e/core/src/main/scala/org/apache/spark/AccumulatorV2.scala -- diff --git a/core/src/main/scala/org/apache/spark/AccumulatorV2.scala b/core/src/main/scala/org/apache/spark/AccumulatorV2.scala new file mode 100644 index 000..c65108a --- /dev/null +++ b/core/src/main/scala/org/apache/spark/AccumulatorV2.scala @@ -0,0 +1,394 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.{lang => jl} +import java.io.ObjectInputStream +import java.util.concurrent.ConcurrentHashMap +import java.util.concurrent.atomic.AtomicLong + +import org.apache.spark.scheduler.AccumulableInfo +import org.apache.spark.util.Utils + + +private[spark] case class AccumulatorMetadata( +id: Long, +name: Option[String], +countFailedValues: Boolean) extends Serializable + + +/** + * The base class for accumulators, that can accumulate inputs of type `IN`, and produce output of + * type `OUT`. + */ +abstract class AccumulatorV2[IN, OUT] extends Serializable { + private[spark] var metadata: AccumulatorMetadata = _ + private[this] var atDriverSide = true + + private[spark] def register( + sc: SparkContext, + name: Option[String] = None, + countFailedValues: Boolean = false): Unit = { +if (this.metadata != null) { + throw new IllegalStateException("Cannot register an Accumulator twice.") +} +this.metadata = AccumulatorMetadata(AccumulatorContext.newId(), name, countFailedValues) +AccumulatorContext.register(this) +sc.cleaner.foreach(_.registerAccumulatorForCleanup(this)) + } + + /** + * Returns true if this accumulator has
spark git commit: [SPARK-13425][SQL] Documentation for CSV datasource options
Repository: spark Updated Branches: refs/heads/branch-2.0 a6428292f -> 705172202 [SPARK-13425][SQL] Documentation for CSV datasource options ## What changes were proposed in this pull request? This PR adds the explanation and documentation for CSV options for reading and writing. ## How was this patch tested? Style tests with `./dev/run_tests` for documentation style. Author: hyukjinkwon Author: Hyukjin Kwon Closes #12817 from HyukjinKwon/SPARK-13425. (cherry picked from commit a832cef11233c6357c7ba7ede387b432e6b0ed71) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/70517220 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/70517220 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/70517220 Branch: refs/heads/branch-2.0 Commit: 7051722023b98f1720142c7b3b41948d275ea455 Parents: a642829 Author: hyukjinkwon Authored: Sun May 1 19:05:20 2016 -0700 Committer: Reynold Xin Committed: Sun May 1 19:05:32 2016 -0700 -- python/pyspark/sql/readwriter.py| 52 .../org/apache/spark/sql/DataFrameReader.scala | 47 -- .../org/apache/spark/sql/DataFrameWriter.scala | 8 +++ 3 files changed, 103 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/70517220/python/pyspark/sql/readwriter.py -- diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index ed9e716..cc5e93d 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -282,6 +282,45 @@ class DataFrameReader(object): :param paths: string, or list of strings, for input path(s). +You can set the following CSV-specific options to deal with CSV files: +* ``sep`` (default ``,``): sets the single character as a separator \ +for each field and value. +* ``charset`` (default ``UTF-8``): decodes the CSV files by the given \ +encoding type. +* ``quote`` (default ``"``): sets the single character used for escaping \ +quoted values where the separator can be part of the value. +* ``escape`` (default ``\``): sets the single character used for escaping quotes \ +inside an already quoted value. +* ``comment`` (default empty string): sets the single character used for skipping \ +lines beginning with this character. By default, it is disabled. +* ``header`` (default ``false``): uses the first line as names of columns. +* ``ignoreLeadingWhiteSpace`` (default ``false``): defines whether or not leading \ +whitespaces from values being read should be skipped. +* ``ignoreTrailingWhiteSpace`` (default ``false``): defines whether or not trailing \ +whitespaces from values being read should be skipped. +* ``nullValue`` (default empty string): sets the string representation of a null value. +* ``nanValue`` (default ``NaN``): sets the string representation of a non-number \ +value. +* ``positiveInf`` (default ``Inf``): sets the string representation of a positive \ +infinity value. +* ``negativeInf`` (default ``-Inf``): sets the string representation of a negative \ +infinity value. +* ``dateFormat`` (default ``None``): sets the string that indicates a date format. \ +Custom date formats follow the formats at ``java.text.SimpleDateFormat``. This \ +applies to both date type and timestamp type. By default, it is None which means \ +trying to parse times and date by ``java.sql.Timestamp.valueOf()`` and \ +``java.sql.Date.valueOf()``. +* ``maxColumns`` (default ``20480``): defines a hard limit of how many columns \ +a record can have. +* ``maxCharsPerColumn`` (default ``100``): defines the maximum number of \ +characters allowed for any given value being read. +* ``mode`` (default ``PERMISSIVE``): allows a mode for dealing with corrupt records \ +during parsing. +* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted record. \ +When a schema is set by user, it sets ``null`` for extra fields. +* ``DROPMALFORMED`` : ignores the whole corrupted records. +* ``FAILFAST`` : throws an exception when it meets corrupted records. + >>> df = sqlContext.read.csv('python/test_support/sql/ages.csv') >>> df.dtypes [('C0', 'str
spark git commit: [SPARK-13425][SQL] Documentation for CSV datasource options
Repository: spark Updated Branches: refs/heads/master a6428292f -> a832cef11 [SPARK-13425][SQL] Documentation for CSV datasource options ## What changes were proposed in this pull request? This PR adds the explanation and documentation for CSV options for reading and writing. ## How was this patch tested? Style tests with `./dev/run_tests` for documentation style. Author: hyukjinkwon Author: Hyukjin Kwon Closes #12817 from HyukjinKwon/SPARK-13425. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a832cef1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a832cef1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a832cef1 Branch: refs/heads/master Commit: a832cef11233c6357c7ba7ede387b432e6b0ed71 Parents: a642829 Author: hyukjinkwon Authored: Sun May 1 19:05:20 2016 -0700 Committer: Reynold Xin Committed: Sun May 1 19:05:20 2016 -0700 -- python/pyspark/sql/readwriter.py| 52 .../org/apache/spark/sql/DataFrameReader.scala | 47 -- .../org/apache/spark/sql/DataFrameWriter.scala | 8 +++ 3 files changed, 103 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a832cef1/python/pyspark/sql/readwriter.py -- diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index ed9e716..cc5e93d 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -282,6 +282,45 @@ class DataFrameReader(object): :param paths: string, or list of strings, for input path(s). +You can set the following CSV-specific options to deal with CSV files: +* ``sep`` (default ``,``): sets the single character as a separator \ +for each field and value. +* ``charset`` (default ``UTF-8``): decodes the CSV files by the given \ +encoding type. +* ``quote`` (default ``"``): sets the single character used for escaping \ +quoted values where the separator can be part of the value. +* ``escape`` (default ``\``): sets the single character used for escaping quotes \ +inside an already quoted value. +* ``comment`` (default empty string): sets the single character used for skipping \ +lines beginning with this character. By default, it is disabled. +* ``header`` (default ``false``): uses the first line as names of columns. +* ``ignoreLeadingWhiteSpace`` (default ``false``): defines whether or not leading \ +whitespaces from values being read should be skipped. +* ``ignoreTrailingWhiteSpace`` (default ``false``): defines whether or not trailing \ +whitespaces from values being read should be skipped. +* ``nullValue`` (default empty string): sets the string representation of a null value. +* ``nanValue`` (default ``NaN``): sets the string representation of a non-number \ +value. +* ``positiveInf`` (default ``Inf``): sets the string representation of a positive \ +infinity value. +* ``negativeInf`` (default ``-Inf``): sets the string representation of a negative \ +infinity value. +* ``dateFormat`` (default ``None``): sets the string that indicates a date format. \ +Custom date formats follow the formats at ``java.text.SimpleDateFormat``. This \ +applies to both date type and timestamp type. By default, it is None which means \ +trying to parse times and date by ``java.sql.Timestamp.valueOf()`` and \ +``java.sql.Date.valueOf()``. +* ``maxColumns`` (default ``20480``): defines a hard limit of how many columns \ +a record can have. +* ``maxCharsPerColumn`` (default ``100``): defines the maximum number of \ +characters allowed for any given value being read. +* ``mode`` (default ``PERMISSIVE``): allows a mode for dealing with corrupt records \ +during parsing. +* ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted record. \ +When a schema is set by user, it sets ``null`` for extra fields. +* ``DROPMALFORMED`` : ignores the whole corrupted records. +* ``FAILFAST`` : throws an exception when it meets corrupted records. + >>> df = sqlContext.read.csv('python/test_support/sql/ages.csv') >>> df.dtypes [('C0', 'string'), ('C1', 'string')] @@ -663,6 +702,19 @@ class DataFrameWriter(object):
[spark] Git Push Summary
Repository: spark Updated Branches: refs/heads/branch-2.0 [created] a6428292f - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update
Repository: spark Updated Branches: refs/heads/master cdf9e9753 -> a6428292f [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update ## What changes were proposed in this pull request? This PR is an update for [https://github.com/apache/spark/pull/12738] which: * Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side * Various fixes for bugs found * This includes changing classes taking weightCol to treat unset and empty String Param values the same way. Defaults changed: * Scala * LogisticRegression: weightCol defaults to not set (instead of empty string) * StringIndexer: labels default to not set (instead of empty array) * GeneralizedLinearRegression: * maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver) * weightCol defaults to not set (instead of empty string) * LinearRegression: weightCol defaults to not set (instead of empty string) * Python * MultilayerPerceptron: layers default to not set (instead of [1,1]) * ChiSqSelector: numTopFeatures defaults to 50 (instead of not set) ## How was this patch tested? Generic unit test. Manually tested that unit test by changing defaults and verifying that broke the test. Author: Joseph K. Bradley Author: yinxusen Closes #12816 from jkbradley/yinxusen-SPARK-14931. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6428292 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6428292 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a6428292 Branch: refs/heads/master Commit: a6428292f78fd594f41a4a7bf254d40268f46305 Parents: cdf9e97 Author: Xusen Yin Authored: Sun May 1 12:29:01 2016 -0700 Committer: Joseph K. Bradley Committed: Sun May 1 12:29:01 2016 -0700 -- .../ml/classification/LogisticRegression.scala | 7 ++- .../apache/spark/ml/feature/StringIndexer.scala | 5 +- .../GeneralizedLinearRegression.scala | 31 +++-- .../spark/ml/regression/LinearRegression.scala | 15 +++--- .../LogisticRegressionSuite.scala | 2 +- .../GeneralizedLinearRegressionSuite.scala | 2 +- python/pyspark/ml/classification.py | 13 ++ python/pyspark/ml/feature.py| 1 + python/pyspark/ml/regression.py | 9 ++-- python/pyspark/ml/tests.py | 48 python/pyspark/ml/wrapper.py| 3 +- 11 files changed, 96 insertions(+), 40 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a6428292/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala b/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala index 717e93c..d2d4e24 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala @@ -235,13 +235,12 @@ class LogisticRegression @Since("1.2.0") ( /** * Whether to over-/under-sample training instances according to the given weights in weightCol. - * If empty, all instances are treated equally (weight 1.0). - * Default is empty, so all instances have weight one. + * If not set or empty String, all instances are treated equally (weight 1.0). + * Default is not set, so all instances have weight one. * @group setParam */ @Since("1.6.0") def setWeightCol(value: String): this.type = set(weightCol, value) - setDefault(weightCol -> "") @Since("1.5.0") override def setThresholds(value: Array[Double]): this.type = super.setThresholds(value) @@ -264,7 +263,7 @@ class LogisticRegression @Since("1.2.0") ( protected[spark] def train(dataset: Dataset[_], handlePersistence: Boolean): LogisticRegressionModel = { -val w = if ($(weightCol).isEmpty) lit(1.0) else col($(weightCol)) +val w = if (!isDefined(weightCol) || $(weightCol).isEmpty) lit(1.0) else col($(weightCol)) val instances: RDD[Instance] = dataset.select(col($(labelCol)).cast(DoubleType), w, col($(featuresCol))).rdd.map { case Row(label: Double, weight: Double, features: Vector) => http://git-wip-us.apache.org/repos/asf/spark/blob/a6428292/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala index 7e0d374..cc0571f 100644 --- a/m
spark git commit: [SPARK-14505][CORE] Fix bug : creating two SparkContext objects in the same jvm, the first one will can not run any task!
Repository: spark Updated Branches: refs/heads/master 90787de86 -> cdf9e9753 [SPARK-14505][CORE] Fix bug : creating two SparkContext objects in the same jvm, the first one will can not run any task! After creating two SparkContext objects in the same jvm(the second one can not be created successfully!), use the first one to run job will throw exception like below: ![image](https://cloud.githubusercontent.com/assets/7162889/14402832/0c8da2a6-fe73-11e5-8aba-68ee3ddaf605.png) Author: Allen Closes #12273 from the-sea/context-create-bug. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cdf9e975 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cdf9e975 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cdf9e975 Branch: refs/heads/master Commit: cdf9e9753df4e7f2fa4e972d1bfded4e22943c27 Parents: 90787de Author: Allen Authored: Sun May 1 15:39:14 2016 +0100 Committer: Sean Owen Committed: Sun May 1 15:39:14 2016 +0100 -- .../scala/org/apache/spark/SparkContext.scala | 27 +--- .../org/apache/spark/SparkContextSuite.scala| 4 +++ 2 files changed, 16 insertions(+), 15 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cdf9e975/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index ed4408c..2cb3ed0 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -2216,21 +2216,7 @@ object SparkContext extends Logging { sc: SparkContext, allowMultipleContexts: Boolean): Unit = { SPARK_CONTEXT_CONSTRUCTOR_LOCK.synchronized { - contextBeingConstructed.foreach { otherContext => -if (otherContext ne sc) { // checks for reference equality - // Since otherContext might point to a partially-constructed context, guard against - // its creationSite field being null: - val otherContextCreationSite = - Option(otherContext.creationSite).map(_.longForm).getOrElse("unknown location") - val warnMsg = "Another SparkContext is being constructed (or threw an exception in its" + -" constructor). This may indicate an error, since only one SparkContext may be" + -" running in this JVM (see SPARK-2243)." + -s" The other SparkContext was created at:\n$otherContextCreationSite" - logWarning(warnMsg) -} - -if (activeContext.get() != null) { - val ctx = activeContext.get() + Option(activeContext.get()).filter(_ ne sc).foreach { ctx => val errMsg = "Only one SparkContext may be running in this JVM (see SPARK-2243)." + " To ignore this error, set spark.driver.allowMultipleContexts = true. " + s"The currently running SparkContext was created at:\n${ctx.creationSite.longForm}" @@ -2241,6 +2227,17 @@ object SparkContext extends Logging { throw exception } } + + contextBeingConstructed.filter(_ ne sc).foreach { otherContext => +// Since otherContext might point to a partially-constructed context, guard against +// its creationSite field being null: +val otherContextCreationSite = + Option(otherContext.creationSite).map(_.longForm).getOrElse("unknown location") +val warnMsg = "Another SparkContext is being constructed (or threw an exception in its" + + " constructor). This may indicate an error, since only one SparkContext may be" + + " running in this JVM (see SPARK-2243)." + + s" The other SparkContext was created at:\n$otherContextCreationSite" +logWarning(warnMsg) } } } http://git-wip-us.apache.org/repos/asf/spark/blob/cdf9e975/core/src/test/scala/org/apache/spark/SparkContextSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala index 841fd02..a759f36 100644 --- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala @@ -39,8 +39,12 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext { val conf = new SparkConf().setAppName("test").setMaster("local") .set("spark.driver.allowMultipleContexts", "false") sc = new SparkContext(conf) +val envBefore = SparkEnv.get // A SparkContext is already running, so we shouldn't be able to create a second one intercept[SparkException] { new SparkContext(conf