[jira] [Created] (SPARK-13961) spark.ml ChiSqSelector should support other numeric types for label

2016-03-19 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13961: -- Summary: spark.ml ChiSqSelector should support other numeric types for label Key: SPARK-13961 URL: https://issues.apache.org/jira/browse/SPARK-13961 Project

[jira] [Commented] (SPARK-13968) Use MurmurHash3 for hashing String features

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200254#comment-15200254 ] Nick Pentreath commented on SPARK-13968: Sure, I will assign to you. But

[jira] [Created] (SPARK-13967) Add binary toggle Param to PySpark CountVectorizer

2016-03-19 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13967: -- Summary: Add binary toggle Param to PySpark CountVectorizer Key: SPARK-13967 URL: https://issues.apache.org/jira/browse/SPARK-13967 Project: Spark Issue

[jira] [Commented] (SPARK-13967) Add binary toggle Param to PySpark CountVectorizer

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201339#comment-15201339 ] Nick Pentreath commented on SPARK-13967: [~yuhaoyan] or [~bryanc] would you

[jira] [Commented] (SPARK-13998) HashingTF should extend UnaryTransformer

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201236#comment-15201236 ] Nick Pentreath commented on SPARK-13998: [~jlaskowski] I've moved this

[jira] [Updated] (SPARK-13968) Use MurmurHash3 for hashing String features

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13968: --- Summary: Use MurmurHash3 for hashing String features (was: User MurmurHash3 for hashing

[jira] [Created] (SPARK-13962) spark.ml Evaluators should support other numeric types for label

2016-03-19 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13962: -- Summary: spark.ml Evaluators should support other numeric types for label Key: SPARK-13962 URL: https://issues.apache.org/jira/browse/SPARK-13962 Project: Spark

[jira] [Updated] (SPARK-13964) Feature hashing improvements

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13964: --- Description: Investigate improvements to Spark ML feature hashing (see e.g. http://scikit

[jira] [Updated] (SPARK-13963) Add binary toggle Param to ml.HashingTF

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13963: --- Description: It would be handy to add a binary toggle Param to {{HashingTF}, as in the

Re: Spark ML - Scaling logistic regression for many features

2016-03-19 Thread Nick Pentreath
ou give me the issue key(s)? If not, would you like me to create these > tickets? > > I'm going to look into this some more and see if I can figure out how to > implement these fixes. > > ~Daniel Siegmann > > On Sat, Mar 12, 2016 at 5:53 AM, Nick Pentreath > wrot

[jira] [Commented] (SPARK-13952) spark.ml GBT algs need to use random seed

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198859#comment-15198859 ] Nick Pentreath commented on SPARK-13952: [~josephkb] As far as I can see,

[jira] [Comment Edited] (SPARK-13969) Extend input format that feature hashing can handle

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201257#comment-15201257 ] Nick Pentreath edited comment on SPARK-13969 at 3/18/16 10:0

[jira] [Updated] (SPARK-13968) User MurmurHash for feature hashing

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13968: --- Summary: User MurmurHash for feature hashing (was: User MurmurHash in for feature hashing

[jira] [Created] (SPARK-13964) Feature hashing improvements

2016-03-19 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13964: -- Summary: Feature hashing improvements Key: SPARK-13964 URL: https://issues.apache.org/jira/browse/SPARK-13964 Project: Spark Issue Type: Umbrella

[jira] [Updated] (SPARK-7425) spark.ml Predictor should support other numeric types for label

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-7425: -- Assignee: Benjamin Fradet > spark.ml Predictor should support other numeric types for la

[jira] [Updated] (SPARK-13963) Add binary toggle Param to ml.HashingTF

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13963: --- Description: It would be handy to add a binary toggle Param to {{HashingTF}}, as in the

[jira] [Updated] (SPARK-13961) spark.ml ChiSqSelector and RFormula should support other numeric types for label

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13961: --- Summary: spark.ml ChiSqSelector and RFormula should support other numeric types for label

[jira] [Updated] (SPARK-13968) User MurmurHash3 for hashing String features

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13968: --- Summary: User MurmurHash3 for hashing String features (was: User MurmurHash for feature

[jira] [Created] (SPARK-13969) Extend input format that feature hashing can handle

2016-03-19 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13969: -- Summary: Extend input format that feature hashing can handle Key: SPARK-13969 URL: https://issues.apache.org/jira/browse/SPARK-13969 Project: Spark

[jira] [Updated] (SPARK-13963) Add binary toggle Param to ml.HashingTF

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13963: --- Issue Type: Sub-task (was: New Feature) Parent: SPARK-13964 > Add binary tog

[jira] [Updated] (SPARK-13969) Extend input format that feature hashing can handle

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13969: --- Description: Currently {{HashingTF}} works like {{CountVectorizer}} (the equivalent in

[jira] [Commented] (SPARK-13963) Add binary toggle Param to ml.HashingTF

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200047#comment-15200047 ] Nick Pentreath commented on SPARK-13963: Sure, assigned to you. > Add

[jira] [Commented] (SPARK-13968) Use MurmurHash3 for hashing String features

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201378#comment-15201378 ] Nick Pentreath commented on SPARK-13968: [~yanboliang] Actually I think this

[jira] [Commented] (SPARK-13963) Add binary toggle Param to ml.HashingTF

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200048#comment-15200048 ] Nick Pentreath commented on SPARK-13963: Sure, assigned to you. > Add

[jira] [Created] (SPARK-13963) Add binary toggle Param to ml.HashingTF

2016-03-19 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13963: -- Summary: Add binary toggle Param to ml.HashingTF Key: SPARK-13963 URL: https://issues.apache.org/jira/browse/SPARK-13963 Project: Spark Issue Type: New

[jira] [Commented] (SPARK-13969) Extend input format that feature hashing can handle

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201257#comment-15201257 ] Nick Pentreath commented on SPARK-13969: What I have in mind is something

[jira] [Created] (SPARK-13968) User MurmurHash in for feature hashing

2016-03-19 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13968: -- Summary: User MurmurHash in for feature hashing Key: SPARK-13968 URL: https://issues.apache.org/jira/browse/SPARK-13968 Project: Spark Issue Type: Sub

[jira] [Updated] (SPARK-13998) HashingTF should extend UnaryTransformer

2016-03-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13998: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-13964 > HashingTF sho

[jira] [Commented] (SPARK-13968) Use MurmurHash3 for hashing String features

2016-03-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202590#comment-15202590 ] Nick Pentreath commented on SPARK-13968: Ah I didn't pick up the o

[jira] [Updated] (SPARK-13968) Use MurmurHash3 for hashing String features

2016-03-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13968: --- Assignee: Yanbo Liang > Use MurmurHash3 for hashing String featu

[jira] [Resolved] (SPARK-13629) Add binary toggle Param to CountVectorizer

2016-03-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13629. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11536 [https

[jira] [Updated] (SPARK-7425) spark.ml Predictor should support other numeric types for label

2016-03-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-7425: -- Shepherd: Nick Pentreath > spark.ml Predictor should support other numeric types for la

[jira] [Updated] (SPARK-8971) Support balanced class labels when splitting train/cross validation sets

2016-03-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-8971: -- Shepherd: Nick Pentreath Target Version/s: (was: ) > Support balanced cl

[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195696#comment-15195696 ] Nick Pentreath edited comment on SPARK-13857 at 3/16/16 6:4

[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195702#comment-15195702 ] Nick Pentreath edited comment on SPARK-13857 at 3/16/16 6:4

[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195696#comment-15195696 ] Nick Pentreath edited comment on SPARK-13857 at 3/16/16 6:4

[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195696#comment-15195696 ] Nick Pentreath edited comment on SPARK-13857 at 3/16/16 6:4

[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195696#comment-15195696 ] Nick Pentreath edited comment on SPARK-13857 at 3/16/16 6:3

[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195696#comment-15195696 ] Nick Pentreath edited comment on SPARK-13857 at 3/16/16 6:3

[jira] [Commented] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195702#comment-15195702 ] Nick Pentreath commented on SPARK-13857: Also, what's nice in the ML AP

[jira] [Commented] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195696#comment-15195696 ] Nick Pentreath commented on SPARK-13857: There are two broad options for ad

[jira] [Resolved] (SPARK-12379) Copy GBT implementation to spark.ml

2016-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-12379. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10607 [https

Re: is there any way to make WEB UI auto-refresh?

2016-03-15 Thread Nick Pentreath
You may want to check out https://github.com/hammerlab/spree On Tue, 15 Mar 2016 at 10:43 charles li wrote: > every time I can only get the latest info by refreshing the page, that's a > little boring. > > so is there any way to make the WEB UI auto-refreshing ? > > > great thanks > > > > -- > *

Re: [MLlib - ALS] Merging two Models?

2016-03-15 Thread Nick Pentreath
By the way, I created a JIRA for supporting initial model for warm start ALS here: https://issues.apache.org/jira/browse/SPARK-13856 On Fri, 11 Mar 2016 at 09:14, Nick Pentreath wrote: > Sean's old Myrrix slides contain an overview of the fold-in math: > http://www.slideshare.net

[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193916#comment-15193916 ] Nick Pentreath commented on SPARK-11136: I would say the initial model pa

[jira] [Commented] (SPARK-6717) Clear shuffle files after checkpointing in ALS

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193268#comment-15193268 ] Nick Pentreath commented on SPARK-6717: --- [~antonymayi] is this still an issu

[jira] [Closed] (SPARK-13066) Specify types for per-model/estimator params in ML to allow automatic type conversion

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-13066. -- Resolution: Won't Fix > Specify types for per-model/estimator params in ML to allow a

[jira] [Commented] (SPARK-13066) Specify types for per-model/estimator params in ML to allow automatic type conversion

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193258#comment-15193258 ] Nick Pentreath commented on SPARK-13066: I think this is now conta

[jira] [Closed] (SPARK-7376) Python: Add validation functionality to individual Param

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-7376. - Resolution: Won't Fix > Python: Add validation functionality to individu

[jira] [Commented] (SPARK-7376) Python: Add validation functionality to individual Param

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193252#comment-15193252 ] Nick Pentreath commented on SPARK-7376: --- I'm going to close this in favour

[jira] [Updated] (SPARK-13068) Extend pyspark ml paramtype conversion to support lists

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13068: --- Assignee: Seth Hendrickson > Extend pyspark ml paramtype conversion to support li

[jira] [Created] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-03-14 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13857: -- Summary: Feature parity for ALS ML with MLLIB Key: SPARK-13857 URL: https://issues.apache.org/jira/browse/SPARK-13857 Project: Spark Issue Type

[jira] [Closed] (SPARK-8491) DAISY Feature Transformer

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-8491. - Resolution: Won't Fix > DAISY Feature Tra

[jira] [Closed] (SPARK-8493) Fisher Vector Estimator

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-8493. - Resolution: Won't Fix > Fisher Vector Estimator > --- > >

[jira] [Closed] (SPARK-8486) SIFT Feature Transformer

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-8486. - Resolution: Won't Fix > SIFT Feature Tra

[jira] [Closed] (SPARK-8488) HOG Feature Transformer

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-8488. - Resolution: Won't Fix > HOG Feature Transformer > --- > >

[jira] [Commented] (SPARK-8485) Feature transformers for image processing

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193039#comment-15193039 ] Nick Pentreath commented on SPARK-8485: --- I agree this should start life

[jira] [Closed] (SPARK-8485) Feature transformers for image processing

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-8485. - Resolution: Won't Fix > Feature transformers for image pr

[jira] [Commented] (SPARK-8490) SURF Feature Transformer

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193029#comment-15193029 ] Nick Pentreath commented on SPARK-8490: --- I think if there's interest in

[jira] [Closed] (SPARK-8490) SURF Feature Transformer

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-8490. - Resolution: Won't Fix > SURF Feature Tra

[jira] [Updated] (SPARK-13856) Support initialModel in ALS

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13856: --- Description: Once SPARK-10780 is completed and the initial model API for Estimators is

[jira] [Created] (SPARK-13856) Support initialModel in ALS

2016-03-14 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13856: -- Summary: Support initialModel in ALS Key: SPARK-13856 URL: https://issues.apache.org/jira/browse/SPARK-13856 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2016-03-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192975#comment-15192975 ] Nick Pentreath commented on SPARK-11136: A question about the API design

Re: Spark ML - Scaling logistic regression for many features

2016-03-12 Thread Nick Pentreath
Also adding dev list in case anyone else has ideas / views. On Sat, 12 Mar 2016 at 12:52, Nick Pentreath wrote: > Thanks for the feedback. > > I think Spark can certainly meet your use case when your data size scales > up, as the actual model dimension is very small - you will

Re: Spark ML - Scaling logistic regression for many features

2016-03-12 Thread Nick Pentreath
Also adding dev list in case anyone else has ideas / views. On Sat, 12 Mar 2016 at 12:52, Nick Pentreath wrote: > Thanks for the feedback. > > I think Spark can certainly meet your use case when your data size scales > up, as the actual model dimension is very small - you will

Re: Spark ML - Scaling logistic regression for many features

2016-03-12 Thread Nick Pentreath
h Spark. > > > > On Fri, Mar 11, 2016 at 12:45 PM, Nick Pentreath > wrote: > >> Ok, I think I understand things better now. >> >> For Spark's current implementation, you would need to map those features >> as you mention. You could also use say StringIn

Re: Spark ML - Scaling logistic regression for many features

2016-03-11 Thread Nick Pentreath
s currently. There are potential solutions to these but they haven't been implemented as yet. On Fri, 11 Mar 2016 at 18:35 Daniel Siegmann wrote: > On Fri, Mar 11, 2016 at 5:29 AM, Nick Pentreath > wrote: > >> Would you mind letting us know the # training examples in the dataset

Re: ALS update without re-computing everything

2016-03-11 Thread Nick Pentreath
b, and with the new state management it could work much better. On Fri, 11 Mar 2016 at 14:21 Sean Owen wrote: > On Fri, Mar 11, 2016 at 12:18 PM, Nick Pentreath > wrote: > > In general, for serving situations MF models are stored in some other > > serving system, so that syst

Re: ALS update without re-computing everything

2016-03-11 Thread Nick Pentreath
Currently this is not supported. If you want to do incremental fold-in of new data you would need to do it outside of Spark (e.g. see this discussion: https://mail-archives.apache.org/mod_mbox/spark-user/201603.mbox/browser, which also mentions a streaming on-line MF implementation with SGD). In g

Re: Spark ML - Scaling logistic regression for many features

2016-03-11 Thread Nick Pentreath
meanwhile > has fewer arrays, but if you try to pass coefficients as anything other > than a dense vector it actually throws an error! Any idea why? Anyone know > a reason these aggregators *must* store their data densely, or is just an > implementation choice? Perhaps refactoring

Re: Running ALS on comparitively large RDD

2016-03-11 Thread Nick Pentreath
ically have around a million ratings > 2. Spark 1.6 on Amazon EMR > > On Fri, Mar 11, 2016 at 12:46 PM, Nick Pentreath > wrote: > >> Could you provide more details about: >> 1. Data set size (# ratings, # users and # products) >> 2. Spark cluster set up and version >

Re: Running ALS on comparitively large RDD

2016-03-11 Thread Nick Pentreath
ically have around a million ratings > 2. Spark 1.6 on Amazon EMR > > On Fri, Mar 11, 2016 at 12:46 PM, Nick Pentreath > wrote: > >> Could you provide more details about: >> 1. Data set size (# ratings, # users and # products) >> 2. Spark cluster set up and version >

[jira] [Updated] (SPARK-13787) Feature importances for decision trees in Python

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13787: --- Priority: Minor (was: Major) > Feature importances for decision trees in Pyt

[jira] [Resolved] (SPARK-13787) Feature importances for decision trees in Python

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13787. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11622 [https

[jira] [Updated] (SPARK-13787) Feature importances for decision trees in Python

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13787: --- Assignee: Seth Hendrickson > Feature importances for decision trees in Pyt

[jira] [Resolved] (SPARK-13512) Add example and doc for ml.feature.MaxAbsScaler

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13512. Resolution: Fixed Fix Version/s: 2.0.0 > Add example and doc

[jira] [Resolved] (SPARK-13672) Add python examples of BisectingKMeans in ML and MLLIB

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13672. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11515 [https

Re: Running ALS on comparitively large RDD

2016-03-10 Thread Nick Pentreath
Could you provide more details about: 1. Data set size (# ratings, # users and # products) 2. Spark cluster set up and version Thanks On Fri, 11 Mar 2016 at 05:53 Deepak Gopalakrishnan wrote: > Hello All, > > I've been running Spark's ALS on a dataset of users and rated items. I > first encode

Re: Running ALS on comparitively large RDD

2016-03-10 Thread Nick Pentreath
Could you provide more details about: 1. Data set size (# ratings, # users and # products) 2. Spark cluster set up and version Thanks On Fri, 11 Mar 2016 at 05:53 Deepak Gopalakrishnan wrote: > Hello All, > > I've been running Spark's ALS on a dataset of users and rated items. I > first encode

Re: [MLlib - ALS] Merging two Models?

2016-03-10 Thread Nick Pentreath
Sean's old Myrrix slides contain an overview of the fold-in math: http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14?src=clipshare I never quite got around to actually incorporating it into my own ALS-based systems, because in the end I just re-compute

Re: Can we use spark inside a web service?

2016-03-10 Thread Nick Pentreath
Yes, really interesting discussion. It would be really interesting to compare the performance of alternative architectures. Specifically, I've found that Elasticsearch is a great option for analytic workloads - it doesn't support SQL (joins in particular), but its aggregation and arbitrary filteri

[jira] [Updated] (SPARK-13340) [ML] PolynomialExpansion and Normalizer should validate input type

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13340: --- Shepherd: Nick Pentreath > [ML] PolynomialExpansion and Normalizer should validate in

[jira] [Updated] (SPARK-13512) Add example and doc for ml.feature.MaxAbsScaler

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13512: --- Shepherd: Nick Pentreath Assignee: yuhao yang > Add example and doc

[jira] [Resolved] (SPARK-11108) OneHotEncoder should support other numeric input types

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-11108. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 9777 [https

[jira] [Updated] (SPARK-11108) OneHotEncoder should support other numeric input types

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-11108: --- Shepherd: Nick Pentreath > OneHotEncoder should support other numeric input ty

[jira] [Updated] (SPARK-11108) OneHotEncoder should support other numeric input types

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-11108: --- Assignee: Seth Hendrickson > OneHotEncoder should support other numeric input ty

[jira] [Updated] (SPARK-13600) Use approxQuantile from DataFrame stats in QuantileDiscretizer

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13600: --- Shepherd: Nick Pentreath > Use approxQuantile from DataFrame stats in QuantileDiscreti

[jira] [Updated] (SPARK-13672) Add python examples of BisectingKMeans in ML and MLLIB

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13672: --- Shepherd: Nick Pentreath Assignee: zhengruifeng > Add python examples of BisectingKMe

[jira] [Updated] (SPARK-13629) Add binary toggle Param to CountVectorizer

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13629: --- Shepherd: Nick Pentreath > Add binary toggle Param to CountVectori

[jira] [Updated] (SPARK-13629) Add binary toggle Param to CountVectorizer

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13629: --- Assignee: yuhao yang > Add binary toggle Param to CountVectori

[jira] [Resolved] (SPARK-13706) Python Example for Train Validation Split Missing

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13706. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11547 [https

[jira] [Updated] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13430: --- Assignee: Bryan Cutler > Expose ml summary function in PySpark for classification

[jira] [Commented] (SPARK-12626) MLlib 2.0 Roadmap

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188774#comment-15188774 ] Nick Pentreath commented on SPARK-12626: [~dbtsai] ok thanks - would lik

Re: Spark ML - Scaling logistic regression for many features

2016-03-09 Thread Nick Pentreath
Hi Daniel The bottleneck in Spark ML is most likely (a) the fact that the weight vector itself is dense, and (b) the related communication via the driver. A tree aggregation mechanism is used for computing gradient sums (see https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apac

[jira] [Commented] (SPARK-12626) MLlib 2.0 Roadmap

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186985#comment-15186985 ] Nick Pentreath commented on SPARK-12626: [~mengxr] [~josephkb] I see

[jira] [Updated] (SPARK-13706) Python Example for Train Validation Split Missing

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13706: --- Assignee: Jeremy > Python Example for Train Validation Split Miss

[jira] [Updated] (SPARK-13706) Python Example for Train Validation Split Missing

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13706: --- Issue Type: Improvement (was: Bug) > Python Example for Train Validation Split Miss

[jira] [Commented] (SPARK-13600) Use approxQuantile from DataFrame stats in QuantileDiscretizer

2016-03-08 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186648#comment-15186648 ] Nick Pentreath commented on SPARK-13600: Thanks, that's fi

Re: Spark ML Interaction

2016-03-08 Thread Nick Pentreath
Could you create a JIRA to add an example and documentation? Thanks On Tue, 8 Mar 2016 at 16:18, amarouni wrote: > Hi, > > Did anyone here manage to write an example of the following ML feature > transformer > > http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/Interactio

Re: how to implement ALS with csv file? getting error while calling Rating class

2016-03-08 Thread Nick Pentreath
able in mllib.ALS On Mon, 7 Mar 2016 at 21:25 Shishir Anshuman wrote: > Hello Nick, > > I used *ml *instead of *mllib* for ALS and Rating. But now It gives me > error while using *predict()* from > *org.apache.spark.mllib.recommendation.MatrixFactorizationModel.* > > I have attached the code and the err

<    6   7   8   9   10   11   12   13   14   15   >