[GitHub] spark pull request #18872: [SPARK-21723][ML] Fix writing LibSVM (key not fou...

2017-08-15 Thread ProtD
Github user ProtD commented on a diff in the pull request:

https://github.com/apache/spark/pull/18872#discussion_r133150156
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 ---
@@ -109,14 +112,15 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   test("write libsvm data and read it again") {
 val df = spark.read.format("libsvm").load(path)
 val tempDir2 = new File(tempDir, "read_write_test")
--- End diff --

`Utils.createTempDir` seems to be a nicer way. The directory is 
automatically deleted when VM shuts down, so I believe no manual cleanup (cf. 
comment lower) is needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18872: [SPARK-21723][ML] Fix writing LibSVM (key not found: num...

2017-08-14 Thread ProtD
Github user ProtD commented on the issue:

https://github.com/apache/spark/pull/18872
  
@srowen Ok, I created and linked a JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18872: [MLlib] Fix writing LibSVM (key not found: numFea...

2017-08-11 Thread ProtD
Github user ProtD commented on a diff in the pull request:

https://github.com/apache/spark/pull/18872#discussion_r132671477
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 ---
@@ -126,6 +130,29 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 }
   }
 
+  test("write libsvm data from scratch and read it again") {
+val rawData = new java.util.ArrayList[Row]()
+rawData.add(Row(1.0, Vectors.sparse(3, Seq((0, 2.0), (1, 3.0)
+rawData.add(Row(4.0, Vectors.sparse(3, Seq((0, 5.0), (2, 6.0)
+   
--- End diff --

Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18872: [MLlib] Fix writing LibSVM (key not found: numFeatures)

2017-08-10 Thread ProtD
Github user ProtD commented on the issue:

https://github.com/apache/spark/pull/18872
  
I added the unit test, please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18872: [MLlib] Fix writing LibSVM

2017-08-10 Thread ProtD
Github user ProtD commented on the issue:

https://github.com/apache/spark/pull/18872
  
To reproduce the bug on v2.2 and v2.3:
```scala
import org.apache.spark.ml.linalg.Vectors
val rawData = Seq((1.0, Vectors.sparse(3, Seq((0, 2.0), (1, 3.0,
  (4.0, Vectors.sparse(3, Seq((0, 5.0), (2, 6.0)
val dfTemp = spark.sparkContext.parallelize(rawData).toDF("label", 
"features")
dfTemp.coalesce(1).write.format("libsvm").save("...filename...")
```
This causes `java.util.NoSuchElementException: key not found: numFeatures`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18872: [MLlib] Fix writing LibSVM

2017-08-10 Thread ProtD
Github user ProtD commented on the issue:

https://github.com/apache/spark/pull/18872
  
@srowen It worked in v2.0, but was broken probably in v2.2.0 by 
b3d39620c563e5f6a32a4082aa3908e1009c17d2. Current unit tests check writing only 
for dataframes which were previously read from a LibSVM format, not general 
ones. (And I guess people don't write LibSVMs very often - that may be why 
nobody has reported it.)

@WeichenXu123 Yes, good idea, will do it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18872: [MLlib] Fix writing LibSVM

2017-08-07 Thread ProtD
GitHub user ProtD opened a pull request:

https://github.com/apache/spark/pull/18872

[MLlib] Fix writing LibSVM

## What changes were proposed in this pull request?

Check the option "numFeatures" only when reading LibSVM, not when writing. 
When writing, Spark was raising an exception. After the change it will ignore 
the option completely. @liancheng @HyukjinKwon

(Maybe the usage should be forbidden when writing, in a major version 
change?).

## How was this patch tested?

Manual test, that loading and writing LibSVM files work fine, both with and 
without the numFeatures option.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ProtD/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18872.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18872


commit 3b43de07ea43b341aa782d629dff1e5da970916f
Author: Jan Vrsovsky <jan.vrsov...@firma.seznam.cz>
Date:   2017-08-07T16:24:11Z

check numFeatures only when reading LibSVM -- not when writing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org