[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-04-15 Thread pravingadakh
Github user pravingadakh closed the pull request at:

https://github.com/apache/spark/pull/11575


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-08 Thread pravingadakh
Github user pravingadakh commented on a diff in the pull request:

https://github.com/apache/spark/pull/11575#discussion_r55349423
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -178,6 +178,20 @@ class StringIndexerSuite
 }
   }
 
+  test("StringIndexer on column with empty string values") {
+val data = sc.parallelize(Seq((0, "a"), (1, ""), (2, "c"), (3, "a"), 
(4, "a"), (5, "c")), 2)
+val df = sqlContext.createDataFrame(data).toDF("id", "label")
--- End diff --

@jaceklaskowski Yes you are right. I'm using Intellij ide, which requires 
you add `import sqlContext.implicits._` in order to use syntax mentioned by 
you. Thus I was avoiding that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-08 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/11575#discussion_r55345145
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -178,6 +178,20 @@ class StringIndexerSuite
 }
   }
 
+  test("StringIndexer on column with empty string values") {
+val data = sc.parallelize(Seq((0, "a"), (1, ""), (2, "c"), (3, "a"), 
(4, "a"), (5, "c")), 2)
+val df = sqlContext.createDataFrame(data).toDF("id", "label")
--- End diff --

These two lines could be written as:

```
Seq((0, "a"), (1, ""), (2, "c"), (3, "a"), (4, "a"), (5, "c")).toDF("id", 
"label")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11575#issuecomment-193685010
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-08 Thread pravingadakh
Github user pravingadakh commented on the pull request:

https://github.com/apache/spark/pull/9522#issuecomment-193683770
  
@thunterdb Please refer following PR. 
[https://github.com/apache/spark/pull/11575](PR)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-08 Thread pravingadakh
GitHub user pravingadakh opened a pull request:

https://github.com/apache/spark/pull/11575

[SPARK-11535][ML] handling empty string in StringIndexer

## What changes were proposed in this pull request?

Replacing "" (not null) with string "EMPTY_STRING" in StringIndexer. 
Another approach is to use "0" (or next available integer), but it may have 
performance issues when input column has integer values say (0 to 10). We 
can use another string to replace "" values if "EMPTY_STRING" is commonly used.


## How was this patch tested?

unit tests




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pravingadakh/spark SPARK-11535

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11575.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11575


commit fc86cf4bf8d541fc946c5055a583545aa96ac437
Author: Pravin Gadakh 
Date:   2016-03-08T09:14:36Z

handling empty string in StringIndexer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-08 Thread pravingadakh
Github user pravingadakh closed the pull request at:

https://github.com/apache/spark/pull/9522


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-08 Thread pravingadakh
Github user pravingadakh commented on the pull request:

https://github.com/apache/spark/pull/9522#issuecomment-193663512
  
@thunterdb Hi, I think I messed up while merging, thus closing this pull 
request creating a new one for the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-07 Thread thunterdb
Github user thunterdb commented on the pull request:

https://github.com/apache/spark/pull/9522#issuecomment-193542005
  
@pravingadakh sorry for the delay. Would you mind resolving the conflicts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2015-11-16 Thread pravingadakh
Github user pravingadakh commented on the pull request:

https://github.com/apache/spark/pull/9522#issuecomment-157131275
  
@jkbradley Any updates on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2015-11-16 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/9522#issuecomment-157204594
  
@pravingadakh Apologies, but I may need to hold off on review for a bit as 
we prepare the 1.6 release.  Once that's out of the way, I'll be able to resume 
regular code review.  Thanks for being patient!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2015-11-06 Thread pravingadakh
GitHub user pravingadakh opened a pull request:

https://github.com/apache/spark/pull/9522

[SPARK-11535][ML] handling empty string in StringIndexer

Replacing "" (not null) with string "EMPTY_STRING" in StringIndexer. 
Another approach is to use "0" (or next available integer), but it may have 
performance issues when input column has integer values say (0 to 10). We 
can use another string to replace "" values if "EMPTY_STRING" is commonly used.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pravingadakh/spark SPARK-11535

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9522.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9522


commit 66eba8e322e170e6570900cbf0b2802947d95781
Author: Pravin Gadakh 
Date:   2015-11-06T11:35:48Z

handling empty string in StringIndexer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9522#issuecomment-154390716
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org