[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731403#comment-15731403
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:46 AM:
--

When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-shell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala. the OS is 
Debian 4.6.4.


was (Author: djvulee):
When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-shell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala.

the -Dscala.usejavacp=true in bin/spark-shell seems  not work.

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731455#comment-15731455
 ] 

DjvuLee commented on SPARK-18778:
-

[~srowen] [~andrewor14] can you have a look at?

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3359) `sbt/sbt unidoc` doesn't work with Java 8

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3359:
---
Fix Version/s: (was: 2.1.1)
   2.1.0

> `sbt/sbt unidoc` doesn't work with Java 8
> -
>
> Key: SPARK-3359
> URL: https://issues.apache.org/jira/browse/SPARK-3359
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.1.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: errors.txt
>
>
> It seems that Java 8 is stricter on JavaDoc. I got many error messages like
> {code}
> [error] 
> /Users/meng/src/spark-mengxr/core/target/java/org/apache/hadoop/mapred/SparkHadoopMapRedUtil.java:2:
>  error: modifier private not allowed here
> [error] private abstract interface SparkHadoopMapRedUtil {
> [error]  ^
> {code}
> This is minor because we can always use Java 6/7 to generate the doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18615) Switch to multi-line doc to avoid a genjavadoc bug for backticks

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-18615:

Fix Version/s: (was: 2.1.1)
   2.1.0

> Switch to multi-line doc to avoid a genjavadoc bug for backticks
> 
>
> Key: SPARK-18615
> URL: https://issues.apache.org/jira/browse/SPARK-18615
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.1.0
>
>
> I suspect this is related with SPARK-16153 and genjavadoc issue in 
> https://github.com/typesafehub/genjavadoc/issues/85 but I am not too sure.
> Currently, single line comment does not mark down backticks to 
> {{..}} but prints as they are. For example, the line below:
> {code}
> /** Return an RDD with the pairs from `this` whose keys are not in `other`. */
> {code}
> So, we could work around this as below:
> {code}
> /**
>  * Return an RDD with the pairs from `this` whose keys are not in `other`.
>  */
> {code}
> Please refer the image in the pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18685) Fix all tests in ExecutorClassLoaderSuite to pass on Windows

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-18685:

Fix Version/s: (was: 2.1.1)
   2.1.0

> Fix all tests in ExecutorClassLoaderSuite to pass on Windows
> 
>
> Key: SPARK-18685
> URL: https://issues.apache.org/jira/browse/SPARK-18685
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Shell, Tests
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.0.3, 2.1.0
>
>
> There are two problems as below:
> We should make the URI correct and {{BufferedSource}} from 
> {{Source.fromInputStream}} closed after opening them in the tests in 
> {{ExecutorClassLoaderSuite}}. Currently, these are leading to test failures 
> on Windows.
> {code}
> ExecutorClassLoaderSuite:
> [info] - child first *** FAILED *** (78 milliseconds)
> [info]   java.net.URISyntaxException: Illegal character in authority at index 
> 7: 
> file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
> [info]   at java.net.URI$Parser.fail(URI.java:2848)
> [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
> ...
> [info] - parent first *** FAILED *** (15 milliseconds)
> [info]   java.net.URISyntaxException: Illegal character in authority at index 
> 7: 
> file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
> [info]   at java.net.URI$Parser.fail(URI.java:2848)
> [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
> ...
> [info] - child first can fall back *** FAILED *** (0 milliseconds)
> [info]   java.net.URISyntaxException: Illegal character in authority at index 
> 7: 
> file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
> [info]   at java.net.URI$Parser.fail(URI.java:2848)
> [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
> ...
> [info] - child first can fail *** FAILED *** (0 milliseconds)
> [info]   java.net.URISyntaxException: Illegal character in authority at index 
> 7: 
> file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
> [info]   at java.net.URI$Parser.fail(URI.java:2848)
> [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
> ...
> [info] - resource from parent *** FAILED *** (0 milliseconds)
> [info]   java.net.URISyntaxException: Illegal character in authority at index 
> 7: 
> file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
> [info]   at java.net.URI$Parser.fail(URI.java:2848)
> [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
> ...
> [info] - resources from parent *** FAILED *** (0 milliseconds)
> [info]   java.net.URISyntaxException: Illegal character in authority at index 
> 7: 
> file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
> [info]   at java.net.URI$Parser.fail(URI.java:2848)
> [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
> {code}
> {code}
> [info] Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.repl.ExecutorClassLoaderSuite *** ABORTED *** (7 seconds, 
> 333 milliseconds)
> [info]   java.io.IOException: Failed to delete: 
> C:\projects\spark\target\tmp\spark-77b2f37b-6405-47c4-af1c-4a6a206511f2
> [info]   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
> [info]   at 
> org.apache.spark.repl.ExecutorClassLoaderSuite.afterAll(ExecutorClassLoaderSuite.scala:76)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.afterAll(BeforeAndAfterAll.scala:213)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18645) spark-daemon.sh arguments error lead to throws Unrecognized option

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-18645:

Fix Version/s: (was: 2.1.1)
   2.1.0

> spark-daemon.sh arguments error lead to throws Unrecognized option
> --
>
> Key: SPARK-18645
> URL: https://issues.apache.org/jira/browse/SPARK-18645
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Fix For: 2.1.0
>
>
> {{start-thriftserver.sh}} can reproduce this:
> {noformat}
> [root@dev spark]# ./sbin/start-thriftserver.sh --conf 
> 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/tmp' 
> starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
> /tmp/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-dev.out
> failed to launch nice -n 0 bash 
> /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit
>  --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 
> Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC 
> -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp:
>   Error starting HiveServer2 with given arguments: 
>   Unrecognized option: -XX:-HeapDumpOnOutOfMemoryError
> full log in 
> /tmp/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-dev.out
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18762) Web UI should be http:4040 instead of https:4040

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-18762:

Fix Version/s: (was: 2.1.1)
   2.1.0

> Web UI should be http:4040 instead of https:4040
> 
>
> Key: SPARK-18762
> URL: https://issues.apache.org/jira/browse/SPARK-18762
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Web UI
>Affects Versions: 2.1.0
>Reporter: Xiangrui Meng
>Assignee: Kousuke Saruta
>Priority: Blocker
> Fix For: 2.0.3, 2.1.0
>
>
> When SSL is enabled, the Spark shell shows:
> {code}
> Spark context Web UI available at https://192.168.99.1:4040
> {code}
> This is wrong because 4040 is http, not https. It redirects to the https port.
> More importantly, this introduces several broken links in the UI. For 
> example, in the master UI, the worker link is https:8081 instead of http:8081 
> or https:8481.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18546) UnsafeShuffleWriter corrupts encrypted shuffle files when merging

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-18546:

Fix Version/s: (was: 2.1.1)
   2.1.0

> UnsafeShuffleWriter corrupts encrypted shuffle files when merging
> -
>
> Key: SPARK-18546
> URL: https://issues.apache.org/jira/browse/SPARK-18546
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Critical
> Fix For: 2.1.0
>
>
> The merging algorithm in {{UnsafeShuffleWriter}} does not consider 
> encryption, and when it tries to merge encrypted files the result data cannot 
> be read, since data encrypted with different initial vectors is interleaved 
> in the same partition data. This leads to exceptions when trying to read the 
> files during shuffle:
> {noformat}
> com.esotericsoftware.kryo.KryoException: com.ning.compress.lzf.LZFException: 
> Corrupt input data, block did not start with 2 byte signature ('ZV') followed 
> by type byte, 2-byte length)
>   at com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
>   at com.esotericsoftware.kryo.io.Input.require(Input.java:155)
>   at com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
>   at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
>   at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
>   at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228)
>   at 
> org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:169)
>   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:512)
>   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:533)
> ...
> {noformat}
> (This is our internal branch so don't worry if lines don't necessarily match.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:38 AM:
--

I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix 
this.[https://issues.apache.org/jira/browse/SPARK-4161]


was (Author: djvulee):
I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix this.

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:38 AM:
--

I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix 
this.[https://issues.apache.org/jira/browse/SPARK-4161]


was (Author: djvulee):
I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix 
this.[https://issues.apache.org/jira/browse/SPARK-4161]

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:35 AM:
--

I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix this.


was (Author: djvulee):
I give a fix in the https://github.com/apache/spark/pull/16210

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431
 ] 

DjvuLee commented on SPARK-18778:
-

I give a fix in the https://github.com/apache/spark/pull/16210

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18778:


Assignee: Apache Spark

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>Assignee: Apache Spark
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731432#comment-15731432
 ] 

Apache Spark commented on SPARK-18778:
--

User 'djvulee' has created a pull request for this issue:
https://github.com/apache/spark/pull/16210

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18778:


Assignee: (was: Apache Spark)

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18633) Add multiclass logistic regression summary python example and document

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731424#comment-15731424
 ] 

Nick Pentreath commented on SPARK-18633:


Went ahead and remarked fix version to {{2.1.0}} since RC2 has been cut.

> Add multiclass logistic regression summary python example and document
> --
>
> Key: SPARK-18633
> URL: https://issues.apache.org/jira/browse/SPARK-18633
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Miao Wang
>Assignee: Miao Wang
>Priority: Minor
> Fix For: 2.1.0
>
>
> Logistic Regression summary is added in Python API. We need to add example 
> and document for summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18081) Locality Sensitive Hashing (LSH) User Guide

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18081:
---
Fix Version/s: (was: 2.2.0)

> Locality Sensitive Hashing (LSH) User Guide
> ---
>
> Key: SPARK-18081
> URL: https://issues.apache.org/jira/browse/SPARK-18081
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Assignee: Yun Ni
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18633) Add multiclass logistic regression summary python example and document

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18633:
---
Fix Version/s: (was: 2.1.1)
   (was: 2.2.0)
   2.1.0

> Add multiclass logistic regression summary python example and document
> --
>
> Key: SPARK-18633
> URL: https://issues.apache.org/jira/browse/SPARK-18633
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Miao Wang
>Assignee: Miao Wang
>Priority: Minor
> Fix For: 2.1.0
>
>
> Logistic Regression summary is added in Python API. We need to add example 
> and document for summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18081) Locality Sensitive Hashing (LSH) User Guide

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731419#comment-15731419
 ] 

Nick Pentreath commented on SPARK-18081:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> Locality Sensitive Hashing (LSH) User Guide
> ---
>
> Key: SPARK-18081
> URL: https://issues.apache.org/jira/browse/SPARK-18081
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Assignee: Yun Ni
> Fix For: 2.1.0, 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15819) Add KMeanSummary in KMeans of PySpark

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731421#comment-15731421
 ] 

Nick Pentreath commented on SPARK-15819:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> Add KMeanSummary in KMeans of PySpark
> -
>
> Key: SPARK-15819
> URL: https://issues.apache.org/jira/browse/SPARK-15819
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 2.1.0
>
>
> There's no corresponding python api for KMeansSummary, it would be nice to 
> have it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18274) Memory leak in PySpark StringIndexer

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731418#comment-15731418
 ] 

Nick Pentreath commented on SPARK-18274:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> Memory leak in PySpark StringIndexer
> 
>
> Key: SPARK-18274
> URL: https://issues.apache.org/jira/browse/SPARK-18274
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.5.2, 1.6.3, 2.0.1, 2.0.2, 2.1.0
>Reporter: Jonas Amrich
>Assignee: Sandeep Singh
>Priority: Critical
> Fix For: 2.0.3, 2.1.0, 2.2.0
>
>
> StringIndexerModel won't get collected by GC in Java even when deleted in 
> Python. It can be reproduced by this code, which fails after couple of 
> iterations (around 7 if you set driver memory to 600MB): 
> {code}
> import random, string
> from pyspark.ml.feature import StringIndexer
> l = [(''.join(random.choice(string.ascii_uppercase) for _ in range(10)), ) 
> for _ in range(int(7e5))]  # 70 random strings of 10 characters
> df = spark.createDataFrame(l, ['string'])
> for i in range(50):
> indexer = StringIndexer(inputCol='string', outputCol='index')
> indexer.fit(df)
> {code}
> Explicit call to Python GC fixes the issue - following code runs fine:
> {code}
> for i in range(50):
> indexer = StringIndexer(inputCol='string', outputCol='index')
> indexer.fit(df)
> gc.collect()
> {code}
> The issue is similar to SPARK-6194 and can be probably fixed by calling jvm 
> detach in model's destructor. This is implemented in 
> pyspark.mlib.common.JavaModelWrapper but missing in 
> pyspark.ml.wrapper.JavaWrapper. Other models in ml package may also be 
> affected by this memory leak. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15819) Add KMeanSummary in KMeans of PySpark

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-15819:
---
Fix Version/s: (was: 2.1.1)
   (was: 2.2.0)
   2.1.0

> Add KMeanSummary in KMeans of PySpark
> -
>
> Key: SPARK-15819
> URL: https://issues.apache.org/jira/browse/SPARK-15819
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 2.1.0
>
>
> There's no corresponding python api for KMeansSummary, it would be nice to 
> have it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18081) Locality Sensitive Hashing (LSH) User Guide

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18081:
---
Fix Version/s: (was: 2.1.1)
   2.1.0

> Locality Sensitive Hashing (LSH) User Guide
> ---
>
> Key: SPARK-18081
> URL: https://issues.apache.org/jira/browse/SPARK-18081
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Assignee: Yun Ni
> Fix For: 2.1.0, 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18274) Memory leak in PySpark StringIndexer

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18274:
---
Fix Version/s: (was: 2.1.1)
   2.1.0

> Memory leak in PySpark StringIndexer
> 
>
> Key: SPARK-18274
> URL: https://issues.apache.org/jira/browse/SPARK-18274
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.5.2, 1.6.3, 2.0.1, 2.0.2, 2.1.0
>Reporter: Jonas Amrich
>Assignee: Sandeep Singh
>Priority: Critical
> Fix For: 2.0.3, 2.1.0, 2.2.0
>
>
> StringIndexerModel won't get collected by GC in Java even when deleted in 
> Python. It can be reproduced by this code, which fails after couple of 
> iterations (around 7 if you set driver memory to 600MB): 
> {code}
> import random, string
> from pyspark.ml.feature import StringIndexer
> l = [(''.join(random.choice(string.ascii_uppercase) for _ in range(10)), ) 
> for _ in range(int(7e5))]  # 70 random strings of 10 characters
> df = spark.createDataFrame(l, ['string'])
> for i in range(50):
> indexer = StringIndexer(inputCol='string', outputCol='index')
> indexer.fit(df)
> {code}
> Explicit call to Python GC fixes the issue - following code runs fine:
> {code}
> for i in range(50):
> indexer = StringIndexer(inputCol='string', outputCol='index')
> indexer.fit(df)
> gc.collect()
> {code}
> The issue is similar to SPARK-6194 and can be probably fixed by calling jvm 
> detach in model's destructor. This is implemented in 
> pyspark.mlib.common.JavaModelWrapper but missing in 
> pyspark.ml.wrapper.JavaWrapper. Other models in ml package may also be 
> affected by this memory leak. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18318) ML, Graph 2.1 QA: API: New Scala APIs, docs

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731415#comment-15731415
 ] 

Nick Pentreath commented on SPARK-18318:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> ML, Graph 2.1 QA: API: New Scala APIs, docs
> ---
>
> Key: SPARK-18318
> URL: https://issues.apache.org/jira/browse/SPARK-18318
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Blocker
> Fix For: 2.1.0
>
>
> Audit new public Scala APIs added to MLlib & GraphX.  Take note of:
> * Protected/public classes or methods.  If access can be more private, then 
> it should be.
> * Also look for non-sealed traits.
> * Documentation: Missing?  Bad links or formatting?
> *Make sure to check the object doc!*
> As you find issues, please create JIRAs and link them to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18319) ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731413#comment-15731413
 ] 

Nick Pentreath commented on SPARK-18319:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit
> --
>
> Key: SPARK-18319
> URL: https://issues.apache.org/jira/browse/SPARK-18319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: yuhao yang
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We should make a pass through the items marked as Experimental or 
> DeveloperApi and see if any are stable enough to be unmarked.
> We should also check for items marked final or sealed to see if they are 
> stable enough to be opened up as APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18319) ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18319:
---
Fix Version/s: (was: 2.2.0)
   2.1.0

> ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit
> --
>
> Key: SPARK-18319
> URL: https://issues.apache.org/jira/browse/SPARK-18319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: yuhao yang
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We should make a pass through the items marked as Experimental or 
> DeveloperApi and see if any are stable enough to be unmarked.
> We should also check for items marked final or sealed to see if they are 
> stable enough to be opened up as APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18319) ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18319:
---
Fix Version/s: (was: 2.1.1)

> ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit
> --
>
> Key: SPARK-18319
> URL: https://issues.apache.org/jira/browse/SPARK-18319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: yuhao yang
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We should make a pass through the items marked as Experimental or 
> DeveloperApi and see if any are stable enough to be unmarked.
> We should also check for items marked final or sealed to see if they are 
> stable enough to be opened up as APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18318) ML, Graph 2.1 QA: API: New Scala APIs, docs

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18318:
---
Fix Version/s: (was: 2.1.1)
   (was: 2.2.0)
   2.1.0

> ML, Graph 2.1 QA: API: New Scala APIs, docs
> ---
>
> Key: SPARK-18318
> URL: https://issues.apache.org/jira/browse/SPARK-18318
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Blocker
> Fix For: 2.1.0
>
>
> Audit new public Scala APIs added to MLlib & GraphX.  Take note of:
> * Protected/public classes or methods.  If access can be more private, then 
> it should be.
> * Also look for non-sealed traits.
> * Documentation: Missing?  Bad links or formatting?
> *Make sure to check the object doc!*
> As you find issues, please create JIRAs and link them to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18592) Move DT/RF/GBT Param setter methods to subclasses

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18592:
---
Fix Version/s: (was: 2.2.0)

> Move DT/RF/GBT Param setter methods to subclasses
> -
>
> Key: SPARK-18592
> URL: https://issues.apache.org/jira/browse/SPARK-18592
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
> Fix For: 2.1.0
>
>
> Move DT/RF/GBT Param setter methods to subclasses and deprecate these methods 
> in the Model classes to make them more Java-friendly.
> See discussion at 
> https://github.com/apache/spark/pull/15913#discussion_r89662469 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18320) ML 2.1 QA: API: Python API coverage

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731411#comment-15731411
 ] 

Nick Pentreath commented on SPARK-18320:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> ML 2.1 QA: API: Python API coverage
> ---
>
> Key: SPARK-18320
> URL: https://issues.apache.org/jira/browse/SPARK-18320
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Seth Hendrickson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> For new public APIs added to MLlib ({{spark.ml}} only), we need to check the 
> generated HTML doc and compare the Scala & Python versions.
> * *GOAL*: Audit and create JIRAs to fix in the next release.
> * *NON-GOAL*: This JIRA is _not_ for fixing the API parity issues.
> We need to track:
> * Inconsistency: Do class/method/parameter names match?
> * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
> be as complete as the Scala doc.
> * API breaking changes: These should be very rare but are occasionally either 
> necessary (intentional) or accidental.  These must be recorded and added in 
> the Migration Guide for this release.
> ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
> component, please note that as well.
> * Missing classes/methods/parameters: We should create to-do JIRAs for 
> functionality missing from Python, to be added in the next release cycle.  
> *Please use a _separate_ JIRA (linked below as "requires") for this list of 
> to-do items.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18324) ML, Graph 2.1 QA: Programming guide update and migration guide

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18324:
---
Fix Version/s: (was: 2.2.0)

> ML, Graph 2.1 QA: Programming guide update and migration guide
> --
>
> Key: SPARK-18324
> URL: https://issues.apache.org/jira/browse/SPARK-18324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
> Fix For: 2.1.0
>
>
> Before the release, we need to update the MLlib and GraphX Programming 
> Guides.  Updates will include:
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs and [SPARK-17692].
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18408) API Improvements for LSH

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18408:
---
Fix Version/s: (was: 2.2.0)

> API Improvements for LSH
> 
>
> Key: SPARK-18408
> URL: https://issues.apache.org/jira/browse/SPARK-18408
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yun Ni
>Assignee: Yun Ni
> Fix For: 2.1.0
>
>
> As the first improvements to current LSH Implementations, we are planning to 
> do the followings:
>  - Change output schema to {{Array of Vector}} instead of {{Vectors}}
>  - Use {{numHashTables}} as the dimension of {{Array}} and 
> {{numHashFunctions}} as the dimension of {{Vector}}
>  - Rename {{RandomProjection}} to {{BucketedRandomProjectionLSH}}, 
> {{MinHash}} to {{MinHashLSH}}
>  - Make randUnitVectors/randCoefficients private
>  - Make Multi-Probe NN Search and {{hashDistance}} private for future 
> discussion



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18320) ML 2.1 QA: API: Python API coverage

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18320:
---
Fix Version/s: (was: 2.1.1)
   (was: 2.2.0)
   2.1.0

> ML 2.1 QA: API: Python API coverage
> ---
>
> Key: SPARK-18320
> URL: https://issues.apache.org/jira/browse/SPARK-18320
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Seth Hendrickson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> For new public APIs added to MLlib ({{spark.ml}} only), we need to check the 
> generated HTML doc and compare the Scala & Python versions.
> * *GOAL*: Audit and create JIRAs to fix in the next release.
> * *NON-GOAL*: This JIRA is _not_ for fixing the API parity issues.
> We need to track:
> * Inconsistency: Do class/method/parameter names match?
> * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
> be as complete as the Scala doc.
> * API breaking changes: These should be very rare but are occasionally either 
> necessary (intentional) or accidental.  These must be recorded and added in 
> the Migration Guide for this release.
> ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
> component, please note that as well.
> * Missing classes/methods/parameters: We should create to-do JIRAs for 
> functionality missing from Python, to be added in the next release cycle.  
> *Please use a _separate_ JIRA (linked below as "requires") for this list of 
> to-do items.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18408) API Improvements for LSH

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731407#comment-15731407
 ] 

Nick Pentreath commented on SPARK-18408:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> API Improvements for LSH
> 
>
> Key: SPARK-18408
> URL: https://issues.apache.org/jira/browse/SPARK-18408
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yun Ni
>Assignee: Yun Ni
> Fix For: 2.1.0, 2.2.0
>
>
> As the first improvements to current LSH Implementations, we are planning to 
> do the followings:
>  - Change output schema to {{Array of Vector}} instead of {{Vectors}}
>  - Use {{numHashTables}} as the dimension of {{Array}} and 
> {{numHashFunctions}} as the dimension of {{Vector}}
>  - Rename {{RandomProjection}} to {{BucketedRandomProjectionLSH}}, 
> {{MinHash}} to {{MinHashLSH}}
>  - Make randUnitVectors/randCoefficients private
>  - Make Multi-Probe NN Search and {{hashDistance}} private for future 
> discussion



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18324) ML, Graph 2.1 QA: Programming guide update and migration guide

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18324:
---
Fix Version/s: (was: 2.1.1)
   2.1.0

> ML, Graph 2.1 QA: Programming guide update and migration guide
> --
>
> Key: SPARK-18324
> URL: https://issues.apache.org/jira/browse/SPARK-18324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
> Fix For: 2.1.0, 2.2.0
>
>
> Before the release, we need to update the MLlib and GraphX Programming 
> Guides.  Updates will include:
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs and [SPARK-17692].
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18366) Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731408#comment-15731408
 ] 

Nick Pentreath commented on SPARK-18366:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer
> ---
>
> Key: SPARK-18366
> URL: https://issues.apache.org/jira/browse/SPARK-18366
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: Seth Hendrickson
>Assignee: Sandeep Singh
>Priority: Minor
> Fix For: 2.1.0
>
>
> We should add the new {{handleInvalid}} param for these transformers to 
> Python to maintain API parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18324) ML, Graph 2.1 QA: Programming guide update and migration guide

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731410#comment-15731410
 ] 

Nick Pentreath commented on SPARK-18324:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> ML, Graph 2.1 QA: Programming guide update and migration guide
> --
>
> Key: SPARK-18324
> URL: https://issues.apache.org/jira/browse/SPARK-18324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
> Fix For: 2.1.0, 2.2.0
>
>
> Before the release, we need to update the MLlib and GraphX Programming 
> Guides.  Updates will include:
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs and [SPARK-17692].
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18612) Leaked broadcasted variable Mllib

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731402#comment-15731402
 ] 

Nick Pentreath commented on SPARK-18612:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> Leaked broadcasted variable Mllib
> -
>
> Key: SPARK-18612
> URL: https://issues.apache.org/jira/browse/SPARK-18612
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Anthony Truchet
>Assignee: Anthony Truchet
>Priority: Trivial
> Fix For: 2.1.0
>
>
> Fix broadcasted variable leaks in MLlib.
> For example, `bcW` in the L-BFGSS CostFun.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731403#comment-15731403
 ] 

DjvuLee commented on SPARK-18778:
-

When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-sell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala.

the -Dscala.usejavacp=true in bin/spark-shell seems  not work.

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18592) Move DT/RF/GBT Param setter methods to subclasses

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18592:
---
Fix Version/s: (was: 2.1.1)
   2.1.0

> Move DT/RF/GBT Param setter methods to subclasses
> -
>
> Key: SPARK-18592
> URL: https://issues.apache.org/jira/browse/SPARK-18592
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
> Fix For: 2.1.0, 2.2.0
>
>
> Move DT/RF/GBT Param setter methods to subclasses and deprecate these methods 
> in the Model classes to make them more Java-friendly.
> See discussion at 
> https://github.com/apache/spark/pull/15913#discussion_r89662469 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731403#comment-15731403
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:25 AM:
--

When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-shell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala.

the -Dscala.usejavacp=true in bin/spark-shell seems  not work.


was (Author: djvulee):
When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-sell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala.

the -Dscala.usejavacp=true in bin/spark-shell seems  not work.

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18592) Move DT/RF/GBT Param setter methods to subclasses

2016-12-07 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731406#comment-15731406
 ] 

Nick Pentreath commented on SPARK-18592:


Went ahead and re-marked fix version to {{2.1.0}} since RC2 has been cut.

> Move DT/RF/GBT Param setter methods to subclasses
> -
>
> Key: SPARK-18592
> URL: https://issues.apache.org/jira/browse/SPARK-18592
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
> Fix For: 2.1.0, 2.2.0
>
>
> Move DT/RF/GBT Param setter methods to subclasses and deprecate these methods 
> in the Model classes to make them more Java-friendly.
> See discussion at 
> https://github.com/apache/spark/pull/15913#discussion_r89662469 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18612) Leaked broadcasted variable Mllib

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18612:
---
Fix Version/s: (was: 2.1.1)
   2.1.0

> Leaked broadcasted variable Mllib
> -
>
> Key: SPARK-18612
> URL: https://issues.apache.org/jira/browse/SPARK-18612
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Anthony Truchet
>Assignee: Anthony Truchet
>Priority: Trivial
> Fix For: 2.1.0
>
>
> Fix broadcasted variable leaks in MLlib.
> For example, `bcW` in the L-BFGSS CostFun.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18408) API Improvements for LSH

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18408:
---
Fix Version/s: (was: 2.1.1)
   2.1.0

> API Improvements for LSH
> 
>
> Key: SPARK-18408
> URL: https://issues.apache.org/jira/browse/SPARK-18408
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yun Ni
>Assignee: Yun Ni
> Fix For: 2.1.0, 2.2.0
>
>
> As the first improvements to current LSH Implementations, we are planning to 
> do the followings:
>  - Change output schema to {{Array of Vector}} instead of {{Vectors}}
>  - Use {{numHashTables}} as the dimension of {{Array}} and 
> {{numHashFunctions}} as the dimension of {{Vector}}
>  - Rename {{RandomProjection}} to {{BucketedRandomProjectionLSH}}, 
> {{MinHash}} to {{MinHashLSH}}
>  - Make randUnitVectors/randCoefficients private
>  - Make Multi-Probe NN Search and {{hashDistance}} private for future 
> discussion



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18366) Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer

2016-12-07 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-18366:
---
Fix Version/s: (was: 2.1.1)
   2.1.0

> Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer
> ---
>
> Key: SPARK-18366
> URL: https://issues.apache.org/jira/browse/SPARK-18366
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: Seth Hendrickson
>Assignee: Sandeep Singh
>Priority: Minor
> Fix For: 2.1.0
>
>
> We should add the new {{handleInvalid}} param for these transformers to 
> Python to maintain API parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-18778:

Affects Version/s: 1.6.1
   2.0.2

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-18778:

Description: 
Failed to initialize compiler: object scala.runtime in compiler mirror not 
found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.AssertionError: assertion failed: null
at scala.Predef$.assert(Predef.scala:179)
at 
org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)
DjvuLee created SPARK-18778:
---

 Summary: Fix the Scala classpath in the spark-shell
 Key: SPARK-18778
 URL: https://issues.apache.org/jira/browse/SPARK-18778
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: DjvuLee






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18777) Return UDF objects when registering from Python

2016-12-07 Thread holdenk (JIRA)
holdenk created SPARK-18777:
---

 Summary: Return UDF objects when registering from Python
 Key: SPARK-18777
 URL: https://issues.apache.org/jira/browse/SPARK-18777
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Reporter: holdenk


In Scala when registering a UDF it gives you back a UDF object that you can use 
in the Dataset/DataFrame API as well as with SQL expressions. We can do the 
same in Python, for both Python UDFs and Java UDFs registered from Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10849) Allow user to specify database column type for data frame fields when writing data to jdbc data sources.

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731347#comment-15731347
 ] 

Apache Spark commented on SPARK-10849:
--

User 'sureshthalamati' has created a pull request for this issue:
https://github.com/apache/spark/pull/16209

> Allow user to specify database column type for data frame fields when writing 
> data to jdbc data sources. 
> -
>
> Key: SPARK-10849
> URL: https://issues.apache.org/jira/browse/SPARK-10849
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Suresh Thalamati
>Priority: Minor
>
> Mapping data frame field type to database column type is addressed to large  
> extent by  adding dialects, and Adding  maxlength option in SPARK-10101 to 
> set the  VARCHAR length size. 
> In some cases it is hard to determine max supported VARCHAR size , For 
> example DB2 Z/OS VARCHAR size depends on the page size.  And some databases 
> also has ROW SIZE limits for VARCHAR.  Specifying default CLOB for all String 
> columns  will likely make read/write slow. 
> Allowing users to specify database type corresponding to the data frame field 
> will be useful in cases where users wants to fine tune mapping for one or two 
> fields, and is fine with default for all other fields .  
> I propose to make the following two properties available for users to set in 
> the data frame metadata when writing to JDBC data sources.
> database.column.type  --  column type to use for create table.
> jdbc.column.type" --  jdbc type to  use for setting null values. 
> Example :
>   val secdf = sc.parallelize( Array(("Apple","Revenue ..."), 
> ("Google","Income:123213"))).toDF("name", "report")
>   val  metadataBuilder = new MetadataBuilder()
>   metadataBuilder.putString("database.column.type", "CLOB(100K)")
>   metadataBuilder.putLong("jdbc.type", java.sql.Types.CLOB)
>   val metadta =  metadataBuilder.build()
>   val secReportDF = secdf.withColumn("report", col("report").as("report", 
> metadata))
>   secReporrDF.write.jdbc("jdbc:mysql:///secdata", "reports", mysqlProps)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6417) Add Linear Programming algorithm

2016-12-07 Thread Ehsan Mohyedin Kermani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731333#comment-15731333
 ] 

Ehsan Mohyedin Kermani edited comment on SPARK-6417 at 12/8/16 6:54 AM:


Here's my implementation inspired by Spark-TFOCS design: 
https://github.com/ehsanmok/spark-lp
Empirical results will be added very soon.


was (Author: ehsan mohyedin kermani):
Here's my implementation inspired by Spark-TFOCS design: 
https://github.com/ehsanmok/spark-lp

> Add Linear Programming algorithm 
> -
>
> Key: SPARK-6417
> URL: https://issues.apache.org/jira/browse/SPARK-6417
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Fan Jiang
>  Labels: features
>
> Linear programming is the problem of finding a vector x that minimizes a 
> linear function fTx subject to linear constraints:
> minxfTx
> such that one or more of the following hold: A·x ≤ b, Aeq·x = beq, l ≤ x ≤ u.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6417) Add Linear Programming algorithm

2016-12-07 Thread Ehsan Mohyedin Kermani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731333#comment-15731333
 ] 

Ehsan Mohyedin Kermani commented on SPARK-6417:
---

Here's my implementation inspired by Spark-TFOCS design: 
https://github.com/ehsanmok/spark-lp

> Add Linear Programming algorithm 
> -
>
> Key: SPARK-6417
> URL: https://issues.apache.org/jira/browse/SPARK-6417
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Fan Jiang
>  Labels: features
>
> Linear programming is the problem of finding a vector x that minimizes a 
> linear function fTx subject to linear constraints:
> minxfTx
> such that one or more of the following hold: A·x ≤ b, Aeq·x = beq, l ≤ x ≤ u.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10849) Allow user to specify database column type for data frame fields when writing data to jdbc data sources.

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731326#comment-15731326
 ] 

Apache Spark commented on SPARK-10849:
--

User 'sureshthalamati' has created a pull request for this issue:
https://github.com/apache/spark/pull/16208

> Allow user to specify database column type for data frame fields when writing 
> data to jdbc data sources. 
> -
>
> Key: SPARK-10849
> URL: https://issues.apache.org/jira/browse/SPARK-10849
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Suresh Thalamati
>Priority: Minor
>
> Mapping data frame field type to database column type is addressed to large  
> extent by  adding dialects, and Adding  maxlength option in SPARK-10101 to 
> set the  VARCHAR length size. 
> In some cases it is hard to determine max supported VARCHAR size , For 
> example DB2 Z/OS VARCHAR size depends on the page size.  And some databases 
> also has ROW SIZE limits for VARCHAR.  Specifying default CLOB for all String 
> columns  will likely make read/write slow. 
> Allowing users to specify database type corresponding to the data frame field 
> will be useful in cases where users wants to fine tune mapping for one or two 
> fields, and is fine with default for all other fields .  
> I propose to make the following two properties available for users to set in 
> the data frame metadata when writing to JDBC data sources.
> database.column.type  --  column type to use for create table.
> jdbc.column.type" --  jdbc type to  use for setting null values. 
> Example :
>   val secdf = sc.parallelize( Array(("Apple","Revenue ..."), 
> ("Google","Income:123213"))).toDF("name", "report")
>   val  metadataBuilder = new MetadataBuilder()
>   metadataBuilder.putString("database.column.type", "CLOB(100K)")
>   metadataBuilder.putLong("jdbc.type", java.sql.Types.CLOB)
>   val metadta =  metadataBuilder.build()
>   val secReportDF = secdf.withColumn("report", col("report").as("report", 
> metadata))
>   secReporrDF.write.jdbc("jdbc:mysql:///secdata", "reports", mysqlProps)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18774) Ignore non-existing files when ignoreCorruptFiles is enabled

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-18774:

Fix Version/s: (was: 2.1.1)
   2.2.0

> Ignore non-existing files when ignoreCorruptFiles is enabled
> 
>
> Key: SPARK-18774
> URL: https://issues.apache.org/jira/browse/SPARK-18774
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18774) Ignore non-existing files when ignoreCorruptFiles is enabled

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-18774.
-
   Resolution: Fixed
Fix Version/s: 2.1.1

> Ignore non-existing files when ignoreCorruptFiles is enabled
> 
>
> Key: SPARK-18774
> URL: https://issues.apache.org/jira/browse/SPARK-18774
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-18745:

Target Version/s:   (was: 2.1.0)

> java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)
> -
>
> Key: SPARK-18745
> URL: https://issues.apache.org/jira/browse/SPARK-18745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: JESSE CHEN
>Assignee: Kazuaki Ishizaki
>Priority: Critical
> Fix For: 2.1.0
>
>
> Running query 68 with decreased executor memory (using 12GB executors instead 
> of 24GB) on 100TB parquet database using the Spark master dated 11/04 gave 
> IndexOutOfBoundsException.
> The query is as follows:
> {noformat}
> [select  c_last_name
>,c_first_name
>,ca_city
>,bought_city
>,ss_ticket_number
>,extended_price
>,extended_tax
>,list_price
>  from (select ss_ticket_number
>  ,ss_customer_sk
>  ,ca_city bought_city
>  ,sum(ss_ext_sales_price) extended_price 
>  ,sum(ss_ext_list_price) list_price
>  ,sum(ss_ext_tax) extended_tax 
>from store_sales
>,date_dim
>,store
>,household_demographics
>,customer_address 
>where store_sales.ss_sold_date_sk = date_dim.d_date_sk
>  and store_sales.ss_store_sk = store.s_store_sk  
> and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
> and store_sales.ss_addr_sk = customer_address.ca_address_sk
> and date_dim.d_dom between 1 and 2 
> and (household_demographics.hd_dep_count = 8 or
>  household_demographics.hd_vehicle_count= -1)
> and date_dim.d_year in (2000,2000+1,2000+2)
> and store.s_city in ('Plainview','Rogers')
>group by ss_ticket_number
>,ss_customer_sk
>,ss_addr_sk,ca_city) dn
>   ,customer
>   ,customer_address current_addr
>  where ss_customer_sk = c_customer_sk
>and customer.c_current_addr_sk = current_addr.ca_address_sk
>and current_addr.ca_city <> bought_city
>  order by c_last_name
>  ,ss_ticket_number
>   limit 100]
> {noformat}
> Spark output that showed the exception:
> {noformat}
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:215)
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecuteBroadcast(Exchange.scala:61)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:231)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
>   at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36)
>   at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:68)
>   at 
> 

[jira] [Commented] (SPARK-18748) UDF multiple evaluations causes very poor performance

2016-12-07 Thread Ohad Raviv (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731203#comment-15731203
 ] 

Ohad Raviv commented on SPARK-18748:


accidently. I already closed the other ticket as duplicate

> UDF multiple evaluations causes very poor performance
> -
>
> Key: SPARK-18748
> URL: https://issues.apache.org/jira/browse/SPARK-18748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Ohad Raviv
>
> We have a use case where we have a relatively expensive UDF that needs to be 
> calculated. The problem is that instead of being calculated once, it gets 
> calculated over and over again.
> for example:
> {quote}
> def veryExpensiveCalc(str:String) = \{println("blahblah1"); "nothing"\}
> hiveContext.udf.register("veryExpensiveCalc", veryExpensiveCalc _)
> hiveContext.sql("select * from (select veryExpensiveCalc('a') c)z where c is 
> not null and c<>''").show
> {quote}
> with the output:
> {quote}
> blahblah1
> blahblah1
> blahblah1
> +---+
> |  c|
> +---+
> |nothing|
> +---+
> {quote}
> You can see that for each reference of column "c" you will get the println.
> that causes very poor performance for our real use case.
> This also came out on StackOverflow:
> http://stackoverflow.com/questions/40320563/spark-udf-called-more-than-once-per-record-when-df-has-too-many-columns
> http://stackoverflow.com/questions/34587596/trying-to-turn-a-blob-into-multiple-columns-in-spark/
> with two problematic work-arounds:
> 1. cache() after the first time. e.g.
> {quote}
> hiveContext.sql("select veryExpensiveCalc('a') as c").cache().where("c is not 
> null and c<>''").show
> {quote}
> while it works, in our case we can't do that because the table is too big to 
> cache.
> 2. move back and forth to rdd:
> {quote}
> val df = hiveContext.sql("select veryExpensiveCalc('a') as c")
> hiveContext.createDataFrame(df.rdd, df.schema).where("c is not null and 
> c<>''").show
> {quote}
> which works but then we loose some of the optimizations like push down 
> predicate features, etc. and its very ugly.
> Any ideas on how we can make the UDF get calculated just once in a reasonable 
> way?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18750) spark should be able to control the number of executor and should not throw stack overslow

2016-12-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731079#comment-15731079
 ] 

Sean Owen commented on SPARK-18750:
---

Yeah, I guess I should have realized that by nature it won't show the source of 
the exception, because the stack is too deep. It's not clear where the 
exception is coming from -- do you know and is it Spark?

> spark should be able to control the number of executor and should not throw 
> stack overslow
> --
>
> Key: SPARK-18750
> URL: https://issues.apache.org/jira/browse/SPARK-18750
> Project: Spark
>  Issue Type: Bug
>Reporter: Neerja Khattar
>
> When running Sql queries on large datasets. Job fails with stack overflow 
> warning and it shows it is requesting lots of executors.
> Looks like there is no limit to number of executors or not even having an 
> upperbound based on yarn available resources.
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n5.svr.us.jpmchase.net:8041 
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n8.svr.us.jpmchase.net:8041 
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n2.svr.us.jpmchase.net:8041 
> 16/11/29 15:47:47 INFO yarn.YarnAllocator: Driver requested a total number of 
> 32770 executor(s). 
> 16/11/29 15:47:47 INFO yarn.YarnAllocator: Will request 24576 executor 
> containers, each with 1 cores and 6758 MB memory including 614 MB overhead 
> 16/11/29 15:49:11 INFO yarn.YarnAllocator: Driver requested a total number of 
> 52902 executor(s). 
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n5.svr.us.jpmchase.net:8041
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n8.svr.us.jpmchase.net:8041
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n2.svr.us.jpmchase.net:8041
> 16/11/29 15:47:47 INFO yarn.YarnAllocator: Driver requested a total number of 
> 32770 executor(s).
> 16/11/29 15:47:47 INFO yarn.YarnAllocator: Will request 24576 executor 
> containers, each with 1 cores and 6758 MB memory including 614 MB overhead
> 16/11/29 15:49:11 INFO yarn.YarnAllocator: Driver requested a total number of 
> 52902 executor(s).
> 16/11/29 15:49:11 WARN yarn.ApplicationMaster: Reporter thread fails 1 
> time(s) in a row.
> java.lang.StackOverflowError
>   at scala.collection.immutable.HashMap.$plus(HashMap.scala:57)
>   at scala.collection.immutable.HashMap.$plus(HashMap.scala:36)
>   at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
>   at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
>   at 
> scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:48)
>   at 
> scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:48)
>   at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.MapBuilder.$plus$plus$eq(MapBuilder.scala:24)
>   at 
> scala.collection.TraversableLike$class.$plus$plus(TraversableLike.scala:156)
>   at 
> scala.collection.AbstractTraversable.$plus$plus(Traversable.scala:105)
>   at scala.collection.immutable.HashMap.$plus(HashMap.scala:60)
>   at scala.collection.immutable.Map$Map4.updated(Map.scala:172)
>   at scala.collection.immutable.Map$Map4.$plus(Map.scala:173)
>   at scala.collection.immutable.Map$Map4.$plus(Map.scala:158)
>   at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
>   at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
>   at 
> scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 

[jira] [Resolved] (SPARK-18326) SparkR 2.1 QA: New R APIs and API docs

2016-12-07 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang resolved SPARK-18326.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

> SparkR 2.1 QA: New R APIs and API docs
> --
>
> Key: SPARK-18326
> URL: https://issues.apache.org/jira/browse/SPARK-18326
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SparkR
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Blocker
> Fix For: 2.1.0
>
>
> Audit new public R APIs.  Take note of:
> * Correctness and uniformity of API
> * Documentation: Missing?  Bad links or formatting?
> ** Check both the generated docs linked from the user guide and the R command 
> line docs `?read.df`. These are generated using roxygen.
> As you find issues, please create JIRAs and link them to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18326) SparkR 2.1 QA: New R APIs and API docs

2016-12-07 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang reassigned SPARK-18326:
---

Assignee: Yanbo Liang

> SparkR 2.1 QA: New R APIs and API docs
> --
>
> Key: SPARK-18326
> URL: https://issues.apache.org/jira/browse/SPARK-18326
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SparkR
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Blocker
>
> Audit new public R APIs.  Take note of:
> * Correctness and uniformity of API
> * Documentation: Missing?  Bad links or formatting?
> ** Check both the generated docs linked from the user guide and the R command 
> line docs `?read.df`. These are generated using roxygen.
> As you find issues, please create JIRAs and link them to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18776) Offset for FileStreamSource is not json formatted

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731003#comment-15731003
 ] 

Apache Spark commented on SPARK-18776:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/16205

> Offset for FileStreamSource is not json formatted
> -
>
> Key: SPARK-18776
> URL: https://issues.apache.org/jira/browse/SPARK-18776
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
>
> All source offset must be json formatted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18776) Offset for FileStreamSource is not json formatted

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18776:


Assignee: Tathagata Das  (was: Apache Spark)

> Offset for FileStreamSource is not json formatted
> -
>
> Key: SPARK-18776
> URL: https://issues.apache.org/jira/browse/SPARK-18776
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
>
> All source offset must be json formatted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18776) Offset for FileStreamSource is not json formatted

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18776:


Assignee: Apache Spark  (was: Tathagata Das)

> Offset for FileStreamSource is not json formatted
> -
>
> Key: SPARK-18776
> URL: https://issues.apache.org/jira/browse/SPARK-18776
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tathagata Das
>Assignee: Apache Spark
>Priority: Critical
>
> All source offset must be json formatted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18705) Docs for one-pass solver for linear regression with L1 and elastic-net penalties

2016-12-07 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang resolved SPARK-18705.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

> Docs for one-pass solver for linear regression with L1 and elastic-net 
> penalties
> 
>
> Key: SPARK-18705
> URL: https://issues.apache.org/jira/browse/SPARK-18705
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Yanbo Liang
>Assignee: Seth Hendrickson
>Priority: Minor
> Fix For: 2.1.0
>
>
> Add document for SPARK-17748 at [{{Normal equation solver for weighted least 
> squares}}|http://spark.apache.org/docs/latest/ml-advanced.html#normal-equation-solver-for-weighted-least-squares]
>  session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18776) Offset for FileStreamSource is not json formatted

2016-12-07 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-18776:
-

 Summary: Offset for FileStreamSource is not json formatted
 Key: SPARK-18776
 URL: https://issues.apache.org/jira/browse/SPARK-18776
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.0.2
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical


All source offset must be json formatted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18758) StreamingQueryListener events from a StreamingQuery should be sent only to the listeners in the same session as the query

2016-12-07 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-18758.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 16186
[https://github.com/apache/spark/pull/16186]

> StreamingQueryListener events from a StreamingQuery should be sent only to 
> the listeners in the same session as the query
> -
>
> Key: SPARK-18758
> URL: https://issues.apache.org/jira/browse/SPARK-18758
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tathagata Das
>Priority: Critical
> Fix For: 2.1.0
>
>
> Listeners added with `sparkSession.streams.addListener(l)` are added to a 
> SparkSession. So events only from queries in the same session as a listener 
> should be posted to the listener.
> Currently, all the events gets routed through the Spark's main listener bus, 
> and therefore all StreamingQueryListener events gets posted to 
> StreamingQueryListeners in all sessions. This is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18633) Add multiclass logistic regression summary python example and document

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-18633.
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   2.1.1

Issue resolved by pull request 16064
[https://github.com/apache/spark/pull/16064]

> Add multiclass logistic regression summary python example and document
> --
>
> Key: SPARK-18633
> URL: https://issues.apache.org/jira/browse/SPARK-18633
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Miao Wang
>Assignee: Miao Wang
>Priority: Minor
> Fix For: 2.1.1, 2.2.0
>
>
> Logistic Regression summary is added in Python API. We need to add example 
> and document for summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18633) Add multiclass logistic regression summary python example and document

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18633:
--
Assignee: Miao Wang

> Add multiclass logistic regression summary python example and document
> --
>
> Key: SPARK-18633
> URL: https://issues.apache.org/jira/browse/SPARK-18633
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Miao Wang
>Assignee: Miao Wang
>Priority: Minor
>
> Logistic Regression summary is added in Python API. We need to add example 
> and document for summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18633) Add multiclass logistic regression summary python example and document

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18633:
--
Shepherd: Joseph K. Bradley
Assignee: (was: Miao Wang)

> Add multiclass logistic regression summary python example and document
> --
>
> Key: SPARK-18633
> URL: https://issues.apache.org/jira/browse/SPARK-18633
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Miao Wang
>
> Logistic Regression summary is added in Python API. We need to add example 
> and document for summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18633) Add multiclass logistic regression summary python example and document

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18633:
--
Priority: Minor  (was: Major)

> Add multiclass logistic regression summary python example and document
> --
>
> Key: SPARK-18633
> URL: https://issues.apache.org/jira/browse/SPARK-18633
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Miao Wang
>Priority: Minor
>
> Logistic Regression summary is added in Python API. We need to add example 
> and document for summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18633) Add multiclass logistic regression summary python example and document

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18633:
--
Assignee: Miao Wang

> Add multiclass logistic regression summary python example and document
> --
>
> Key: SPARK-18633
> URL: https://issues.apache.org/jira/browse/SPARK-18633
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Miao Wang
>Assignee: Miao Wang
>
> Logistic Regression summary is added in Python API. We need to add example 
> and document for summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18750) spark should be able to control the number of executor and should not throw stack overslow

2016-12-07 Thread Neerja Khattar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730717#comment-15730717
 ] 

Neerja Khattar commented on SPARK-18750:


[~srowen] I added the the full stack trace.

> spark should be able to control the number of executor and should not throw 
> stack overslow
> --
>
> Key: SPARK-18750
> URL: https://issues.apache.org/jira/browse/SPARK-18750
> Project: Spark
>  Issue Type: Bug
>Reporter: Neerja Khattar
>
> When running Sql queries on large datasets. Job fails with stack overflow 
> warning and it shows it is requesting lots of executors.
> Looks like there is no limit to number of executors or not even having an 
> upperbound based on yarn available resources.
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n5.svr.us.jpmchase.net:8041 
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n8.svr.us.jpmchase.net:8041 
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n2.svr.us.jpmchase.net:8041 
> 16/11/29 15:47:47 INFO yarn.YarnAllocator: Driver requested a total number of 
> 32770 executor(s). 
> 16/11/29 15:47:47 INFO yarn.YarnAllocator: Will request 24576 executor 
> containers, each with 1 cores and 6758 MB memory including 614 MB overhead 
> 16/11/29 15:49:11 INFO yarn.YarnAllocator: Driver requested a total number of 
> 52902 executor(s). 
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n5.svr.us.jpmchase.net:8041
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n8.svr.us.jpmchase.net:8041
> 16/11/29 15:47:47 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> bdtcstr61n2.svr.us.jpmchase.net:8041
> 16/11/29 15:47:47 INFO yarn.YarnAllocator: Driver requested a total number of 
> 32770 executor(s).
> 16/11/29 15:47:47 INFO yarn.YarnAllocator: Will request 24576 executor 
> containers, each with 1 cores and 6758 MB memory including 614 MB overhead
> 16/11/29 15:49:11 INFO yarn.YarnAllocator: Driver requested a total number of 
> 52902 executor(s).
> 16/11/29 15:49:11 WARN yarn.ApplicationMaster: Reporter thread fails 1 
> time(s) in a row.
> java.lang.StackOverflowError
>   at scala.collection.immutable.HashMap.$plus(HashMap.scala:57)
>   at scala.collection.immutable.HashMap.$plus(HashMap.scala:36)
>   at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
>   at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
>   at 
> scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:48)
>   at 
> scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:48)
>   at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.MapBuilder.$plus$plus$eq(MapBuilder.scala:24)
>   at 
> scala.collection.TraversableLike$class.$plus$plus(TraversableLike.scala:156)
>   at 
> scala.collection.AbstractTraversable.$plus$plus(Traversable.scala:105)
>   at scala.collection.immutable.HashMap.$plus(HashMap.scala:60)
>   at scala.collection.immutable.Map$Map4.updated(Map.scala:172)
>   at scala.collection.immutable.Map$Map4.$plus(Map.scala:173)
>   at scala.collection.immutable.Map$Map4.$plus(Map.scala:158)
>   at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
>   at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
>   at 
> scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>   at 
> 

[jira] [Resolved] (SPARK-18654) JacksonParser.makeRootConverter has effectively unreachable code

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-18654.
-
   Resolution: Fixed
 Assignee: Nathan Howell
Fix Version/s: 2.2.0

> JacksonParser.makeRootConverter has effectively unreachable code
> 
>
> Key: SPARK-18654
> URL: https://issues.apache.org/jira/browse/SPARK-18654
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Assignee: Nathan Howell
>Priority: Minor
> Fix For: 2.2.0
>
>
> {{JacksonParser.makeRootConverter}} currently takes a {{DataType}} but is 
> only called with a {{StructType}}. Revising the method to only accept a 
> {{StructType}} allows us to remove some pattern matches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18775) Limit the max number of records written per file

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730476#comment-15730476
 ] 

Apache Spark commented on SPARK-18775:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/16204

> Limit the max number of records written per file
> 
>
> Key: SPARK-18775
> URL: https://issues.apache.org/jira/browse/SPARK-18775
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Currently, Spark writes a single file out per task, sometimes leading to very 
> large files. It would be great to have an option to limit the max number of 
> records written per file in a task, to avoid humongous files.
> This was initially suggested by [~simeons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18775) Limit the max number of records written per file

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18775:


Assignee: Reynold Xin  (was: Apache Spark)

> Limit the max number of records written per file
> 
>
> Key: SPARK-18775
> URL: https://issues.apache.org/jira/browse/SPARK-18775
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Currently, Spark writes a single file out per task, sometimes leading to very 
> large files. It would be great to have an option to limit the max number of 
> records written per file in a task, to avoid humongous files.
> This was initially suggested by [~simeons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18775) Limit the max number of records written per file

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18775:


Assignee: Apache Spark  (was: Reynold Xin)

> Limit the max number of records written per file
> 
>
> Key: SPARK-18775
> URL: https://issues.apache.org/jira/browse/SPARK-18775
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> Currently, Spark writes a single file out per task, sometimes leading to very 
> large files. It would be great to have an option to limit the max number of 
> records written per file in a task, to avoid humongous files.
> This was initially suggested by [~simeons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18774) Ignore non-existing files when ignoreCorruptFiles is enabled

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18774:


Assignee: Apache Spark  (was: Shixiong Zhu)

> Ignore non-existing files when ignoreCorruptFiles is enabled
> 
>
> Key: SPARK-18774
> URL: https://issues.apache.org/jira/browse/SPARK-18774
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18774) Ignore non-existing files when ignoreCorruptFiles is enabled

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730465#comment-15730465
 ] 

Apache Spark commented on SPARK-18774:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/16203

> Ignore non-existing files when ignoreCorruptFiles is enabled
> 
>
> Key: SPARK-18774
> URL: https://issues.apache.org/jira/browse/SPARK-18774
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18774) Ignore non-existing files when ignoreCorruptFiles is enabled

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18774:


Assignee: Shixiong Zhu  (was: Apache Spark)

> Ignore non-existing files when ignoreCorruptFiles is enabled
> 
>
> Key: SPARK-18774
> URL: https://issues.apache.org/jira/browse/SPARK-18774
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18775) Limit the max number of records written per file

2016-12-07 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-18775:
---

 Summary: Limit the max number of records written per file
 Key: SPARK-18775
 URL: https://issues.apache.org/jira/browse/SPARK-18775
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


Currently, Spark writes a single file out per task, sometimes leading to very 
large files. It would be great to have an option to limit the max number of 
records written per file in a task, to avoid humongous files.

This was initially suggested by [~simeons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18774) Ignore non-existing files when ignoreCorruptFiles is enabled

2016-12-07 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-18774:


 Summary: Ignore non-existing files when ignoreCorruptFiles is 
enabled
 Key: SPARK-18774
 URL: https://issues.apache.org/jira/browse/SPARK-18774
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 2.1.0
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18662) Move cluster managers into their own sub-directory

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730428#comment-15730428
 ] 

Apache Spark commented on SPARK-18662:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/16202

> Move cluster managers into their own sub-directory
> --
>
> Key: SPARK-18662
> URL: https://issues.apache.org/jira/browse/SPARK-18662
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Minor
> Fix For: 2.2.0
>
>
> As we move to support Kubernetes in addition to Yarn and Mesos 
> (https://issues.apache.org/jira/browse/SPARK-18278), we should move all the 
> cluster managers into a "resource-managers/" sub-directory. This is simply a 
> reorganization.
> Ref: https://github.com/apache/spark/pull/16061#issuecomment-263649340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18322) ML, Graph 2.1 QA: Update user guide for new features & APIs

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-18322.
---
   Resolution: Done
Fix Version/s: 2.1.0

Thanks [~yanboliang]!

> ML, Graph 2.1 QA: Update user guide for new features & APIs
> ---
>
> Key: SPARK-18322
> URL: https://issues.apache.org/jira/browse/SPARK-18322
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
> Fix For: 2.1.0
>
>
> Check the user guide vs. a list of new APIs (classes, methods, data members) 
> to see what items require updates to the user guide.
> For each feature missing user guide doc:
> * Create a JIRA for that feature, and assign it to the author of the feature
> * Link it to (a) the original JIRA which introduced that feature ("related 
> to") and (b) to this JIRA ("requires").
> For MLlib:
> * This task does not include major reorganizations for the programming guide.
> * We should now begin copying algorithm details from the spark.mllib guide to 
> spark.ml as needed, rather than just linking back to the corresponding 
> algorithms in the spark.mllib user guide.
> If you would like to work on this task, please comment, and we can create & 
> link JIRAs for parts of this work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18705) Docs for one-pass solver for linear regression with L1 and elastic-net penalties

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18705:
--
Target Version/s: 2.1.0

> Docs for one-pass solver for linear regression with L1 and elastic-net 
> penalties
> 
>
> Key: SPARK-18705
> URL: https://issues.apache.org/jira/browse/SPARK-18705
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Yanbo Liang
>Assignee: Seth Hendrickson
>Priority: Minor
>
> Add document for SPARK-17748 at [{{Normal equation solver for weighted least 
> squares}}|http://spark.apache.org/docs/latest/ml-advanced.html#normal-equation-solver-for-weighted-least-squares]
>  session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18705) Docs for one-pass solver for linear regression with L1 and elastic-net penalties

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18705:
--
Issue Type: Documentation  (was: Improvement)

> Docs for one-pass solver for linear regression with L1 and elastic-net 
> penalties
> 
>
> Key: SPARK-18705
> URL: https://issues.apache.org/jira/browse/SPARK-18705
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Yanbo Liang
>Assignee: Seth Hendrickson
>Priority: Minor
>
> Add document for SPARK-17748 at [{{Normal equation solver for weighted least 
> squares}}|http://spark.apache.org/docs/latest/ml-advanced.html#normal-equation-solver-for-weighted-least-squares]
>  session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18322) ML, Graph 2.1 QA: Update user guide for new features & APIs

2016-12-07 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730403#comment-15730403
 ] 

Joseph K. Bradley commented on SPARK-18322:
---

I'd like the docs for 2.1 features to get into the 2.1 branch.  It's tolerable 
if they are put in after the release since we can update the docs after the 
release.  I'll mark this task resolved but will target those JIRAs for 2.1.

> ML, Graph 2.1 QA: Update user guide for new features & APIs
> ---
>
> Key: SPARK-18322
> URL: https://issues.apache.org/jira/browse/SPARK-18322
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
>
> Check the user guide vs. a list of new APIs (classes, methods, data members) 
> to see what items require updates to the user guide.
> For each feature missing user guide doc:
> * Create a JIRA for that feature, and assign it to the author of the feature
> * Link it to (a) the original JIRA which introduced that feature ("related 
> to") and (b) to this JIRA ("requires").
> For MLlib:
> * This task does not include major reorganizations for the programming guide.
> * We should now begin copying algorithm details from the spark.mllib guide to 
> spark.ml as needed, rather than just linking back to the corresponding 
> algorithms in the spark.mllib user guide.
> If you would like to work on this task, please comment, and we can create & 
> link JIRAs for parts of this work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18633) Add multiclass logistic regression summary python example and document

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18633:
--
Issue Type: Documentation  (was: Improvement)

> Add multiclass logistic regression summary python example and document
> --
>
> Key: SPARK-18633
> URL: https://issues.apache.org/jira/browse/SPARK-18633
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Miao Wang
>
> Logistic Regression summary is added in Python API. We need to add example 
> and document for summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18633) Add multiclass logistic regression summary python example and document

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18633:
--
Target Version/s: 2.1.0

> Add multiclass logistic regression summary python example and document
> --
>
> Key: SPARK-18633
> URL: https://issues.apache.org/jira/browse/SPARK-18633
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Miao Wang
>
> Logistic Regression summary is added in Python API. We need to add example 
> and document for summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18080) Locality Sensitive Hashing (LSH) Python API

2016-12-07 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18080:
--
Target Version/s: 2.2.0

> Locality Sensitive Hashing (LSH) Python API
> ---
>
> Key: SPARK-18080
> URL: https://issues.apache.org/jira/browse/SPARK-18080
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3359) `sbt/sbt unidoc` doesn't work with Java 8

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730387#comment-15730387
 ] 

Apache Spark commented on SPARK-3359:
-

User 'michalsenkyr' has created a pull request for this issue:
https://github.com/apache/spark/pull/16201

> `sbt/sbt unidoc` doesn't work with Java 8
> -
>
> Key: SPARK-3359
> URL: https://issues.apache.org/jira/browse/SPARK-3359
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.1.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: errors.txt
>
>
> It seems that Java 8 is stricter on JavaDoc. I got many error messages like
> {code}
> [error] 
> /Users/meng/src/spark-mengxr/core/target/java/org/apache/hadoop/mapred/SparkHadoopMapRedUtil.java:2:
>  error: modifier private not allowed here
> [error] private abstract interface SparkHadoopMapRedUtil {
> [error]  ^
> {code}
> This is minor because we can always use Java 6/7 to generate the doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18754) Rename recentProgresses to recentProgress

2016-12-07 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-18754.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 16182
[https://github.com/apache/spark/pull/16182]

> Rename recentProgresses to recentProgress
> -
>
> Key: SPARK-18754
> URL: https://issues.apache.org/jira/browse/SPARK-18754
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
> Fix For: 2.1.0
>
>
> An informal poll of a bunch of users found this name to be more clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18773) Make translation of Spark configs to commons-crypto configs consistent

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18773:


Assignee: Apache Spark

> Make translation of Spark configs to commons-crypto configs consistent
> --
>
> Key: SPARK-18773
> URL: https://issues.apache.org/jira/browse/SPARK-18773
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Minor
>
> Recent changes to introduce AES encryption to the network layer added some 
> duplication to the code that translates between Spark configuration and 
> commons-crypto configuration.
> Moreover, the duplication is not consistent: the code in the network-common 
> module does not translate all configs.
> We should centralize that code and make all the code paths that use AES 
> encryption support the same options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18773) Make translation of Spark configs to commons-crypto configs consistent

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18773:


Assignee: (was: Apache Spark)

> Make translation of Spark configs to commons-crypto configs consistent
> --
>
> Key: SPARK-18773
> URL: https://issues.apache.org/jira/browse/SPARK-18773
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Recent changes to introduce AES encryption to the network layer added some 
> duplication to the code that translates between Spark configuration and 
> commons-crypto configuration.
> Moreover, the duplication is not consistent: the code in the network-common 
> module does not translate all configs.
> We should centralize that code and make all the code paths that use AES 
> encryption support the same options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18773) Make translation of Spark configs to commons-crypto configs consistent

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730348#comment-15730348
 ] 

Apache Spark commented on SPARK-18773:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/16200

> Make translation of Spark configs to commons-crypto configs consistent
> --
>
> Key: SPARK-18773
> URL: https://issues.apache.org/jira/browse/SPARK-18773
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Recent changes to introduce AES encryption to the network layer added some 
> duplication to the code that translates between Spark configuration and 
> commons-crypto configuration.
> Moreover, the duplication is not consistent: the code in the network-common 
> module does not translate all configs.
> We should centralize that code and make all the code paths that use AES 
> encryption support the same options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3359) `sbt/sbt unidoc` doesn't work with Java 8

2016-12-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730338#comment-15730338
 ] 

Michal Šenkýř commented on SPARK-3359:
--

The build passed. I can put it into a PR if you'd like.

> `sbt/sbt unidoc` doesn't work with Java 8
> -
>
> Key: SPARK-3359
> URL: https://issues.apache.org/jira/browse/SPARK-3359
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.1.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: errors.txt
>
>
> It seems that Java 8 is stricter on JavaDoc. I got many error messages like
> {code}
> [error] 
> /Users/meng/src/spark-mengxr/core/target/java/org/apache/hadoop/mapred/SparkHadoopMapRedUtil.java:2:
>  error: modifier private not allowed here
> [error] private abstract interface SparkHadoopMapRedUtil {
> [error]  ^
> {code}
> This is minor because we can always use Java 6/7 to generate the doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18772) Parsing JSON with some NaN and Infinity values throws NumberFormatException

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18772:


Assignee: (was: Apache Spark)

> Parsing JSON with some NaN and Infinity values throws NumberFormatException
> ---
>
> Key: SPARK-18772
> URL: https://issues.apache.org/jira/browse/SPARK-18772
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Priority: Minor
>
> JacksonParser tests for infinite and NaN values in a way that is not 
> supported by the underlying float/double parser. For example, the input 
> string is always lowercased to check for {{-Infinity}} but the parser only 
> supports titlecased values. So a {{-infinitY}} will pass the test but fail 
> with a {{NumberFormatException}} when parsing. This exception is not caught 
> anywhere and the task ends up failing.
> A related issue is that the code checks for {{Inf}} but the parser only 
> supports the long form of {{Infinity}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18772) Parsing JSON with some NaN and Infinity values throws NumberFormatException

2016-12-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18772:


Assignee: Apache Spark

> Parsing JSON with some NaN and Infinity values throws NumberFormatException
> ---
>
> Key: SPARK-18772
> URL: https://issues.apache.org/jira/browse/SPARK-18772
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Assignee: Apache Spark
>Priority: Minor
>
> JacksonParser tests for infinite and NaN values in a way that is not 
> supported by the underlying float/double parser. For example, the input 
> string is always lowercased to check for {{-Infinity}} but the parser only 
> supports titlecased values. So a {{-infinitY}} will pass the test but fail 
> with a {{NumberFormatException}} when parsing. This exception is not caught 
> anywhere and the task ends up failing.
> A related issue is that the code checks for {{Inf}} but the parser only 
> supports the long form of {{Infinity}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18772) Parsing JSON with some NaN and Infinity values throws NumberFormatException

2016-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730323#comment-15730323
 ] 

Apache Spark commented on SPARK-18772:
--

User 'NathanHowell' has created a pull request for this issue:
https://github.com/apache/spark/pull/16199

> Parsing JSON with some NaN and Infinity values throws NumberFormatException
> ---
>
> Key: SPARK-18772
> URL: https://issues.apache.org/jira/browse/SPARK-18772
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Priority: Minor
>
> JacksonParser tests for infinite and NaN values in a way that is not 
> supported by the underlying float/double parser. For example, the input 
> string is always lowercased to check for {{-Infinity}} but the parser only 
> supports titlecased values. So a {{-infinitY}} will pass the test but fail 
> with a {{NumberFormatException}} when parsing. This exception is not caught 
> anywhere and the task ends up failing.
> A related issue is that the code checks for {{Inf}} but the parser only 
> supports the long form of {{Infinity}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3359) `sbt/sbt unidoc` doesn't work with Java 8

2016-12-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730311#comment-15730311
 ] 

Michal Šenkýř edited comment on SPARK-3359 at 12/7/16 11:40 PM:


Yes, I did do a grep on `[error]`. I am trying to build the docs now with those 
`>` replaced by `gt;` to see if it completes.

I believe it would be worth it to fix. As a newcomer to the Spark community, I 
had no idea it was Java 8 related.


was (Author: michalsenkyr):
Yes, I did do a grep on `[error]`. I am trying to build the docs now with those 
`>` replaced by `>` to see if it completes.

I believe it would be worth it to fix. As a newcomer to the Spark community, I 
had no idea it was Java 8 related.

> `sbt/sbt unidoc` doesn't work with Java 8
> -
>
> Key: SPARK-3359
> URL: https://issues.apache.org/jira/browse/SPARK-3359
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.1.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: errors.txt
>
>
> It seems that Java 8 is stricter on JavaDoc. I got many error messages like
> {code}
> [error] 
> /Users/meng/src/spark-mengxr/core/target/java/org/apache/hadoop/mapred/SparkHadoopMapRedUtil.java:2:
>  error: modifier private not allowed here
> [error] private abstract interface SparkHadoopMapRedUtil {
> [error]  ^
> {code}
> This is minor because we can always use Java 6/7 to generate the doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >