[GitHub] spark pull request #19347: Branch 2.2 sparkmlib's output of many algorithms ...

2017-09-25 Thread ithjz
GitHub user ithjz opened a pull request:

https://github.com/apache/spark/pull/19347

Branch 2.2   sparkmlib'soutput of many algorithms is not clear



What's the use of these **results?**


JavaGradientBoostingRegressionExample 



Test Mean Squared Error: 0.12503
Learned regression GBT model:
TreeEnsembleModel regressor with 3 trees

  Tree 0:
If (feature 351 <= 15.0)
 Predict: 0.0
Else (feature 351 > 15.0)
 Predict: 1.0
  Tree 1:
Predict: 0.0
  Tree 2:
Predict: 0.0

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19347


commit e936a96badfeeb2051ee35dc4b0fbecefa9bf4cb
Author: Peng 
Date:   2017-05-24T11:54:17Z

[SPARK-20764][ML][PYSPARK][FOLLOWUP] Fix visibility discrepancy with 
numInstances and degreesOfFreedom in LR and GLR - Python version

## What changes were proposed in this pull request?
Add test cases for PR-18062

## How was this patch tested?
The existing UT

Author: Peng 

Closes #18068 from mpjlu/moreTest.

(cherry picked from commit 9afcf127d31b5477a539dde6e5f01861532a1c4c)
Signed-off-by: Yanbo Liang 

commit 1d107242f8ec842c009e0b427f6e4a8313d99aa2
Author: zero323 
Date:   2017-05-24T11:57:44Z

[SPARK-20631][FOLLOW-UP] Fix incorrect tests.

## What changes were proposed in this pull request?

- Fix incorrect tests for `_check_thresholds`.
- Move test to `ParamTests`.

## How was this patch tested?

Unit tests.

Author: zero323 

Closes #18085 from zero323/SPARK-20631-FOLLOW-UP.

(cherry picked from commit 1816eb3bef930407dc9e083de08f5105725c55d1)
Signed-off-by: Yanbo Liang 

commit 83aeac9e0590e99010d0af8e067822d0ed0971fe
Author: Bago Amirbekian 
Date:   2017-05-24T14:55:38Z

[SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in 
LogisticRegressionModel

## What changes were proposed in this pull request?

Fixed TypeError with python3 and numpy 1.12.1. Numpy's `reshape` no longer 
takes floats as arguments as of 1.12. Also, python3 uses float division for 
`/`, we should be using `//` to ensure that `_dataWithBiasSize` doesn't get set 
to a float.

## How was this patch tested?

Existing tests run using python3 and numpy 1.12.

Author: Bago Amirbekian 

Closes #18081 from MrBago/BF-py3floatbug.

(cherry picked from commit bc66a77bbe2120cc21bd8da25194efca4cde13c3)
Signed-off-by: Yanbo Liang 

commit c59ad420b5fda29567f4a06b5f71df76e70e269a
Author: Liang-Chi Hsieh 
Date:   2017-05-24T16:35:40Z

[SPARK-20848][SQL] Shutdown the pool after reading parquet files

## What changes were proposed in this pull request?

From JIRA: On each call to spark.read.parquet, a new ForkJoinPool is 
created. One of the threads in the pool is kept in the WAITING state, and never 
stopped, which leads to unbounded growth in number of threads.

We should shutdown the pool after reading parquet files.

## How was this patch tested?

Added a test to ParquetFileFormatSuite.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Author: Liang-Chi Hsieh 

Closes #18073 from viirya/SPARK-20848.

(cherry picked from commit f72ad303f05a6d99513ea3b121375726b177199c)
Signed-off-by: Wenchen Fan 

commit b7a2a16b1e01375292938fc48b0a333ec4e7cd30
Author: Reynold Xin 
Date:   2017-05-24T20:57:19Z

[SPARK-20867][SQL] Move hints from Statistics into HintInfo class

## What changes were proposed in this pull request?
This is a follow-up to SPARK-20857 to move the broadcast hint from 
Statistics into a new HintInfo class, so we can be more flexible in adding new 
hints in the future.

## How was this patch tested?
Updated test cases to reflect the change.

Author: Reynold Xin 

Closes #18087 from rxin/SPARK-20867.

(cherry picked from commit a64746677bf09ef67e3fd538355a6ee9b5ce8cf4)
Signed-off-by: Xiao Li 

commit 2405afce4e87c0486f2aef1d068f17aea2480b17
Author: Kris Mok 
Date:   2017-05-25T00:19:35Z

[SPARK-20872][SQL] ShuffleExchange.nodeName should handle null coordinator

## What changes were proposed in this pull request?

A one-liner change in `ShuffleExchange.nodeName` to cover the case when 
`coordinator` is `null`, so that the match expression is exhaustive.

Please refer to 
[SPARK-20872](https://issues.apache.org/ji

[GitHub] spark issue #18416: [SPARK-21204][SQL] Add support for Scala Set collection ...

2017-07-05 Thread ithjz
Github user ithjz commented on the issue:

https://github.com/apache/spark/pull/18416
  
I run the examples provided by the official website, errors, missing the 
necessary packages, and I hope someone will help me



[hadoop@hadoop01 bin]$ sh spark-shell --master local[9]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
[2017-07-06 11:59:23,252] WARN Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable 
(org.apache.hadoop.util.NativeCodeLoader:62)
[2017-07-06 11:59:23,356] WARN 
SPARK_CLASSPATH was detected (set to 
'/data/spark/jars/mysql-connector-java-5.1.40-bin.jar:').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
 (org.apache.spark.SparkConf:66)
[2017-07-06 11:59:23,357] WARN Setting 'spark.executor.extraClassPath' to 
'/data/spark/jars/mysql-connector-java-5.1.40-bin.jar:' as a work-around. 
(org.apache.spark.SparkConf:66)
[2017-07-06 11:59:23,357] WARN Setting 'spark.driver.extraClassPath' to 
'/data/spark/jars/mysql-connector-java-5.1.40-bin.jar:' as a work-around. 
(org.apache.spark.SparkConf:66)
Spark context Web UI available at http://192.168.8.29:4040
Spark context available as 'sc' (master = local[9], app id = 
local-1499313564077).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
  /_/
 
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val ds1 = 
spark.readStream.format("kafka").option("kafka.bootstrap.servers", 
"host1:port1,host2:port2").option("subscribe", "topic1").load()
java.lang.ClassNotFoundException: Failed to find data source: kafka. Please 
find packages at http://spark.apache.org/third-party-projects.html
  at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:569)
  at 
org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
  at 
org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:197)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87)
  at 
org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
  at 
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:124)
  ... 48 elided
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
  at 
scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554)
  at scala.util.Try$.apply(Try.scala:192)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554)
  at scala.util.Try.orElse(Try.scala:84)
  at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:554)
  ... 55 more



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org