[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315919#comment-14315919
 ] 

Apache Spark commented on SPARK-5566:
-

User 'aborsu985' has created a pull request for this issue:
https://github.com/apache/spark/pull/4504

 Tokenizer for mllib package
 ---

 Key: SPARK-5566
 URL: https://issues.apache.org/jira/browse/SPARK-5566
 Project: Spark
  Issue Type: New Feature
  Components: ML, MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 There exist tokenizer classes in the spark.ml.feature package and in the 
 LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
 LDAExample is more advanced and should be made into a full-fledged public 
 class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
 become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-10 Thread Augustin Borsu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313996#comment-14313996
 ] 

Augustin Borsu commented on SPARK-5566:
---

We could use a tokenizer like this, but we would need to add regex and 
Array[String] parameters type to be able to change those aprameters in a 
crossvalidation.
https://github.com/apache/spark/pull/4504

 Tokenizer for mllib package
 ---

 Key: SPARK-5566
 URL: https://issues.apache.org/jira/browse/SPARK-5566
 Project: Spark
  Issue Type: New Feature
  Components: ML, MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 There exist tokenizer classes in the spark.ml.feature package and in the 
 LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
 LDAExample is more advanced and should be made into a full-fledged public 
 class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
 become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-05 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308733#comment-14308733
 ] 

yuhao yang commented on SPARK-5566:
---

I mean only the underlying implementation. 

 Tokenizer for mllib package
 ---

 Key: SPARK-5566
 URL: https://issues.apache.org/jira/browse/SPARK-5566
 Project: Spark
  Issue Type: New Feature
  Components: ML, MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 There exist tokenizer classes in the spark.ml.feature package and in the 
 LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
 LDAExample is more advanced and should be made into a full-fledged public 
 class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
 become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-04 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305588#comment-14305588
 ] 

Joseph K. Bradley commented on SPARK-5566:
--

Do you mean to share the underlying implementation or the public API?
It will be good if we can share some underlying code, but those various 
featurization methods are quite different and probably belong in different 
classes.  The APIs can be similar to the extent that all feature transformers 
should be similar.

 Tokenizer for mllib package
 ---

 Key: SPARK-5566
 URL: https://issues.apache.org/jira/browse/SPARK-5566
 Project: Spark
  Issue Type: New Feature
  Components: ML, MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 There exist tokenizer classes in the spark.ml.feature package and in the 
 LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
 LDAExample is more advanced and should be made into a full-fledged public 
 class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
 become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-04 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305172#comment-14305172
 ] 

yuhao yang commented on SPARK-5566:
---

Actually I believe many current code like Word2Vec and HashingTF share the 
similar data flow and it's best if we can take the common requirement into 
consideration. 

 Tokenizer for mllib package
 ---

 Key: SPARK-5566
 URL: https://issues.apache.org/jira/browse/SPARK-5566
 Project: Spark
  Issue Type: New Feature
  Components: ML, MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 There exist tokenizer classes in the spark.ml.feature package and in the 
 LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
 LDAExample is more advanced and should be made into a full-fledged public 
 class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
 become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org