Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/8081#discussion_r36703012
--- Diff: docs/ml-guide.md ---
@@ -355,6 +355,75 @@ jsc.stop();
{% endhighlight %}
</div>
+<div data-lang="python">
+{% highlight python %}
+from pyspark import SparkContext
+from pyspark.mllib.regression import LabeledPoint
+from pyspark.ml.classification import LogisticRegression
+from pyspark.ml.param import Param, Params
+from pyspark.sql import Row, SQLContext
+
+sc = SparkContext(appName="SimpleParamsExample")
+sqlContext = SQLContext(sc)
+
+# Prepare training data.
+# We use LabeledPoint.
+training = sc.parallelize(
+[LabeledPoint(1.0, [0.0, 1.1, 0.1]),
+LabeledPoint(0.0, [2.0, 1.0, -1.0]),
+LabeledPoint(0.0, [2.0, 1.3, 1.0]),
+LabeledPoint(1.0, [0.0, 1.2, -0.5])])
+
+# Create a LogisticRegression instance. This instance is an Estimator.
+lr = LogisticRegression(maxIter=10, regParam=0.01)
+# Print out the parameters, documentation, and any default values.
+print "LogisticRegression parameters:\n" + lr.explainParams() + "\n"
+
+# Learn a LogisticRegression model. This uses the parameters stored in lr.
+model1 = lr.fit(training.toDF())
+
+# Since model1 is a Model (i.e., a transformer produced by an Estimator),
+# we can view the parameters it used during fit().
+# This prints the parameter (name: value) pairs, where names are unique
IDs for this
+# LogisticRegression instance.
+print "Model 1 was fit using parameters: "
+print model1.extractParamMap()
+
+# We may alternatively specify parameters using a Python dictionary as a
paramMap
+paramMap = {lr.maxIter: 20}
+paramMap[lr.maxIter] = 30 # Specify 1 Param, overwriting the original
maxIter.
+paramMap.update({lr.regParam: 0.1, lr.threshold: 0.55}) # Specify multiple
Params.
+
+# You can combine paramMaps, which are python dictionaries.
+paramMap2 = { lr.probabilityCol: "myProbability" } # Change output column
name
+paramMapCombined = paramMap.copy()
+paramMapCombined.update(paramMap2)
+
+# Now learn a new model using the paramMapCombined parameters.
+# paramMapCombined overrides all paramters set earlier via lr.set* methods.
+model2 = lr.fit(training.toDF(), paramMapCombined)
+print "Model 2 was fit using parameters: "
+print model2.extractParamMap()
+
+# Prepare test data
+test = sc.parallelize(
+[LabeledPoint(1.0, [-1.0, 1.5, 1.3]),
+LabeledPoint(0.0, [ 3.0, 2.0, -0.1]),
+LabeledPoint(1.0, [ 0.0, 2.2, -1.5])])
+
+# Make predictions on test data using the Transformer.transform() method.
+# LogisticRegression.transform will only use the 'features' column.
+# Note that model2.transform() outputs a "myProbability" column instead of
the usual
+# 'probability' column since we renamed the lr.probabilityCol parameter
previously.
+prediction = model2.transform(test.toDF())
+selected = prediction.select("features", "label", "myProbability",
"prediction")
+for row in selected.collect():
+ print row
--- End diff --
fix indentation
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]