[jira] [Commented] (SPARK-30670) Pipes for PySpark

2020-01-30 Thread Vincent (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027242#comment-17027242 ] Vincent commented on SPARK-30670: - I just had a look, but transform does not allow for `*args` and

[jira] [Commented] (SPARK-26449) Missing Dataframe.transform API in Python API

2020-01-30 Thread Vincent (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027241#comment-17027241 ] Vincent commented on SPARK-26449: - Is there a reason why transform does not accept `*args` and

[jira] [Created] (SPARK-30670) Pipes for PySpark

2020-01-29 Thread Vincent (Jira)
Vincent created SPARK-30670: --- Summary: Pipes for PySpark Key: SPARK-30670 URL: https://issues.apache.org/jira/browse/SPARK-30670 Project: Spark Issue Type: New Feature Components: SQL

[jira] [Created] (SPARK-27087) Inability to access to column alias in pyspark

2019-03-07 Thread Vincent (JIRA)
Vincent created SPARK-27087: --- Summary: Inability to access to column alias in pyspark Key: SPARK-27087 URL: https://issues.apache.org/jira/browse/SPARK-27087 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-25412) FeatureHasher would change the value of output feature

2018-09-13 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613181#comment-16613181 ] Vincent commented on SPARK-25412: - Thanks, Nick,  for the reply. so, the tradeoff is between highly

[jira] [Commented] (SPARK-25412) FeatureHasher would change the value of output feature

2018-09-11 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611628#comment-16611628 ] Vincent commented on SPARK-25412: - [~nick.pentre...@gmail.com] thanks. > FeatureHasher would change the

[jira] [Created] (SPARK-25412) FeatureHasher would change the value of output feature

2018-09-11 Thread Vincent (JIRA)
Vincent created SPARK-25412: --- Summary: FeatureHasher would change the value of output feature Key: SPARK-25412 URL: https://issues.apache.org/jira/browse/SPARK-25412 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-25364) a better way to handle vector index and sparsity in FeatureHasher implementation ?

2018-09-09 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608612#comment-16608612 ] Vincent commented on SPARK-25364: - duplication. close this Jira. > a better way to handle vector index

[jira] [Resolved] (SPARK-25364) a better way to handle vector index and sparsity in FeatureHasher implementation ?

2018-09-09 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent resolved SPARK-25364. - Resolution: Duplicate > a better way to handle vector index and sparsity in FeatureHasher >

[jira] [Commented] (SPARK-25365) a better way to handle vector index and sparsity in FeatureHasher implementation ?

2018-09-07 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606746#comment-16606746 ] Vincent commented on SPARK-25365: - [~nick.pentre...@gmail.com] Thanks. > a better way to handle vector

[jira] [Created] (SPARK-25364) a better way to handle vector index and sparsity in FeatureHasher implementation ?

2018-09-07 Thread Vincent (JIRA)
Vincent created SPARK-25364: --- Summary: a better way to handle vector index and sparsity in FeatureHasher implementation ? Key: SPARK-25364 URL: https://issues.apache.org/jira/browse/SPARK-25364 Project:

[jira] [Created] (SPARK-25365) a better way to handle vector index and sparsity in FeatureHasher implementation ?

2018-09-07 Thread Vincent (JIRA)
Vincent created SPARK-25365: --- Summary: a better way to handle vector index and sparsity in FeatureHasher implementation ? Key: SPARK-25365 URL: https://issues.apache.org/jira/browse/SPARK-25365 Project:

[jira] [Created] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

2018-08-06 Thread Vincent (JIRA)
Vincent created SPARK-25034: --- Summary: possible triple memory consumption in fetchBlockSync() Key: SPARK-25034 URL: https://issues.apache.org/jira/browse/SPARK-25034 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-24968) Configurable Chunksize in ChunkedByteBufferOutputStream

2018-07-31 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent resolved SPARK-24968. - Resolution: Fixed > Configurable Chunksize in ChunkedByteBufferOutputStream >

[jira] [Commented] (SPARK-24968) Configurable Chunksize in ChunkedByteBufferOutputStream

2018-07-31 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563467#comment-16563467 ] Vincent commented on SPARK-24968: - indeed they are closely related I'll close this ticket >

[jira] [Created] (SPARK-24968) Configurable Chunksize in ChunkedByteBufferOutputStream

2018-07-30 Thread Vincent (JIRA)
Vincent created SPARK-24968: --- Summary: Configurable Chunksize in ChunkedByteBufferOutputStream Key: SPARK-24968 URL: https://issues.apache.org/jira/browse/SPARK-24968 Project: Spark Issue Type:

[jira] [Updated] (SPARK-24917) Sending a partition over netty results in 2x memory usage

2018-07-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-24917: Description: Hello while investigating some OOM errors in Spark 2.2 [(here's my call

[jira] [Updated] (SPARK-24917) Sending a partition over netty results in 2x memory usage

2018-07-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-24917: Description: Hello while investigating some OOM errors in Spark 2.2 [(here's my call

[jira] [Created] (SPARK-24917) Sending a partition over netty results in 2x memory usage

2018-07-25 Thread Vincent (JIRA)
Vincent created SPARK-24917: --- Summary: Sending a partition over netty results in 2x memory usage Key: SPARK-24917 URL: https://issues.apache.org/jira/browse/SPARK-24917 Project: Spark Issue Type:

[jira] [Updated] (SPARK-22096) use aggregateByKeyLocally to save one stage in calculating ItemFrequency in NaiveBayes

2017-09-21 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-22096: Attachment: performance data for NB.png > use aggregateByKeyLocally to save one stage in calculating

[jira] [Updated] (SPARK-22096) use aggregateByKeyLocally to save one stage in calculating ItemFrequency in NaiveBayes

2017-09-21 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-22096: Description: NaiveBayes currently takes aggreateByKey followed by a collect to calculate frequency for

[jira] [Updated] (SPARK-22096) use aggregateByKeyLocally to save one stage in calculating ItemFrequency in NaiveBayes

2017-09-21 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-22096: Description: NaiveBayes currently takes aggreateByKey followed by a collect to calculate frequency for

[jira] [Created] (SPARK-22098) Add aggregateByKeyLocally in RDD

2017-09-21 Thread Vincent (JIRA)
Vincent created SPARK-22098: --- Summary: Add aggregateByKeyLocally in RDD Key: SPARK-22098 URL: https://issues.apache.org/jira/browse/SPARK-22098 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-22096) use aggregateByKeyLocally to save one stage in calculating ItemFrequency in NaiveBayes

2017-09-21 Thread Vincent (JIRA)
Vincent created SPARK-22096: --- Summary: use aggregateByKeyLocally to save one stage in calculating ItemFrequency in NaiveBayes Key: SPARK-22096 URL: https://issues.apache.org/jira/browse/SPARK-22096

[jira] [Commented] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-14 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125353#comment-16125353 ] Vincent commented on SPARK-21688: - sorry for late reply. Yes, It's simple and easy to check the env

[jira] [Commented] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121624#comment-16121624 ] Vincent commented on SPARK-21688: - Okay. Yes, true. It can still run without issue but we are just

[jira] [Commented] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121589#comment-16121589 ] Vincent commented on SPARK-21688: - [~srowen] Thanks for your comments. I think if user decides to use

[jira] [Commented] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121254#comment-16121254 ] Vincent commented on SPARK-21688: - and if native blas is left with default multi-threading setting, it

[jira] [Updated] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-21688: Attachment: native-trywait.png > performance improvement in mllib SVM with native BLAS >

[jira] [Updated] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-21688: Attachment: (was: uni-test on ddot.png) > performance improvement in mllib SVM with native BLAS >

[jira] [Updated] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-21688: Attachment: ddot unitest.png > performance improvement in mllib SVM with native BLAS >

[jira] [Commented] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121236#comment-16121236 ] Vincent commented on SPARK-21688: - upload a data we collected before, uni-test on ddot, we can see for

[jira] [Updated] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-21688: Attachment: uni-test on ddot.png > performance improvement in mllib SVM with native BLAS >

[jira] [Commented] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121209#comment-16121209 ] Vincent commented on SPARK-21688: - currently, there are certain places in ML/MLLib, such as in mllib/SVM,

[jira] [Comment Edited] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121113#comment-16121113 ] Vincent edited comment on SPARK-21688 at 8/10/17 6:13 AM: -- attach svm profiling

[jira] [Updated] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-21688: Attachment: svm1.png svm2.png svm-mkl-1.png svm-mkl-2.png

[jira] [Updated] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-09 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-21688: Attachment: mllib svm training.png > performance improvement in mllib SVM with native BLAS >

[jira] [Created] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-09 Thread Vincent (JIRA)
Vincent created SPARK-21688: --- Summary: performance improvement in mllib SVM with native BLAS Key: SPARK-21688 URL: https://issues.apache.org/jira/browse/SPARK-21688 Project: Spark Issue Type:

[jira] [Commented] (SPARK-20988) Convert logistic regression to new aggregator framework

2017-06-14 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048769#comment-16048769 ] Vincent commented on SPARK-20988: - okay, no problem :) > Convert logistic regression to new aggregator

[jira] [Commented] (SPARK-20988) Convert logistic regression to new aggregator framework

2017-06-13 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048593#comment-16048593 ] Vincent commented on SPARK-20988: - opps. I have finished the conversion part, but there are still other

[jira] [Commented] (SPARK-20988) Convert logistic regression to new aggregator framework

2017-06-12 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047370#comment-16047370 ] Vincent commented on SPARK-20988: - I can work on this if no one is working on it now :) > Convert

[jira] [Created] (SPARK-21058) potential SVD optimization

2017-06-11 Thread Vincent (JIRA)
Vincent created SPARK-21058: --- Summary: potential SVD optimization Key: SPARK-21058 URL: https://issues.apache.org/jira/browse/SPARK-21058 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-21049) why do we need computeGramianMatrix when computing SVD

2017-06-10 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045536#comment-16045536 ] Vincent commented on SPARK-21049: - [~srowen]thanks. that's right. But we found it quite often that, the

[jira] [Created] (SPARK-21049) why do we need computeGramianMatrix when computing SVD

2017-06-10 Thread Vincent (JIRA)
Vincent created SPARK-21049: --- Summary: why do we need computeGramianMatrix when computing SVD Key: SPARK-21049 URL: https://issues.apache.org/jira/browse/SPARK-21049 Project: Spark Issue Type:

[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2017-05-07 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000176#comment-16000176 ] Vincent commented on SPARK-17134: - I will submit a PR for this issue soon. > Use level 2 BLAS operations

[jira] [Commented] (SPARK-19852) StringIndexer.setHandleInvalid should have another option 'new': Python API and docs

2017-03-07 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900463#comment-15900463 ] Vincent commented on SPARK-19852: - I can work on this issue, since it is related to SPARK-17498 >

[jira] [Commented] (SPARK-7132) Add fit with validation set to spark.ml GBT

2017-02-22 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877781#comment-15877781 ] Vincent commented on SPARK-7132: Hi All, any update on this issue? > Add fit with validation set to

[jira] [Commented] (SPARK-14682) Provide evaluateEachIteration method or equivalent for spark.ml GBTs

2017-02-21 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875618#comment-15875618 ] Vincent commented on SPARK-14682: - any update? > Provide evaluateEachIteration method or equivalent for

[jira] [Created] (SPARK-19590) Update the document for QuantileDiscretizer in pyspark

2017-02-13 Thread Vincent (JIRA)
Vincent created SPARK-19590: --- Summary: Update the document for QuantileDiscretizer in pyspark Key: SPARK-19590 URL: https://issues.apache.org/jira/browse/SPARK-19590 Project: Spark Issue Type:

[jira] [Commented] (SPARK-17498) StringIndexer.setHandleInvalid should have another option 'new'

2017-02-08 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857600#comment-15857600 ] Vincent commented on SPARK-17498: - I can take the issue and make a PR > StringIndexer.setHandleInvalid

[jira] [Commented] (SPARK-18023) Adam optimizer

2016-11-20 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15682491#comment-15682491 ] Vincent commented on SPARK-18023: - thanks [~mlnick] that's really what we need. when I wrote the code for

[jira] [Commented] (SPARK-17055) add groupKFold to CrossValidator

2016-11-02 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631286#comment-15631286 ] Vincent commented on SPARK-17055: - [~srowen] No offense. Maybe we can invite more ppl to have a look at

[jira] [Updated] (SPARK-17055) add groupKFold to CrossValidator

2016-11-01 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-17055: Summary: add groupKFold to CrossValidator (was: add labelKFold to CrossValidator) > add groupKFold to

[jira] [Commented] (SPARK-17055) add labelKFold to CrossValidator

2016-11-01 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625108#comment-15625108 ] Vincent commented on SPARK-17055: - [~sowen] Okay. hmm, I guess we have some misunderstanding here.

[jira] [Commented] (SPARK-17055) add labelKFold to CrossValidator

2016-10-31 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624114#comment-15624114 ] Vincent commented on SPARK-17055: - [~srowen] May I ask the reason why we close this issue? It'd be

[jira] [Commented] (SPARK-18023) Adam optimizer

2016-10-20 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590951#comment-15590951 ] Vincent commented on SPARK-18023: - I can start with ADAM, then maybe other Ada methods after that > Adam

[jira] [Created] (SPARK-18023) Adam optimizer

2016-10-20 Thread Vincent (JIRA)
Vincent created SPARK-18023: --- Summary: Adam optimizer Key: SPARK-18023 URL: https://issues.apache.org/jira/browse/SPARK-18023 Project: Spark Issue Type: New Feature Components: ML, MLlib

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-09 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559597#comment-15559597 ] Vincent commented on SPARK-17219: - No problem. I will try to submit another PR based on above

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557034#comment-15557034 ] Vincent commented on SPARK-17219: - [~josephkb] [~srowen] [~timhunter] let me know what I can do to help

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557021#comment-15557021 ] Vincent commented on SPARK-17219: - in this PR(https://github.com/apache/spark/pull/14858) NaN values are

[jira] [Comment Edited] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'

2016-09-12 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483514#comment-15483514 ] Vincent edited comment on SPARK-17498 at 9/12/16 8:55 AM: -- Here is how we cc

[jira] [Comment Edited] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'

2016-09-12 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483514#comment-15483514 ] Vincent edited comment on SPARK-17498 at 9/12/16 8:43 AM: -- Here is what we cc

[jira] [Commented] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'

2016-09-12 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483514#comment-15483514 ] Vincent commented on SPARK-17498: - Here is what we cc [~qhuang] see about this issue and correct me if

[jira] [Commented] (SPARK-6680) Be able to specifie IP for spark-shell(spark driver) blocker for Docker integration

2016-09-09 Thread YSMAL Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476947#comment-15476947 ] YSMAL Vincent commented on SPARK-6680: -- HI, using docker you can get rid of this alias on hostname,

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445873#comment-15445873 ] Vincent commented on SPARK-17219: - Cool. I will refine the patch. thanks [~srowen] :) >

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445858#comment-15445858 ] Vincent commented on SPARK-17219: - yes, discretizer can do it easily, especially if only

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445808#comment-15445808 ] Vincent commented on SPARK-17219: - then we have to shift this work to user, who needs to filter out the

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445768#comment-15445768 ] Vincent commented on SPARK-17219: - [~srowen] Hi all, per discussion, I thought we are going to handle NaN

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436873#comment-15436873 ] Vincent commented on SPARK-17219: - Okay, thanks. So, meaning we will have no options for users actually.

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436870#comment-15436870 ] Vincent commented on SPARK-17219: - yes, if we wanna make this scenario more general to all bucketizer

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436856#comment-15436856 ] Vincent commented on SPARK-17219: - [~srowen] sorryOwen, by saying 'keep it to one behavior'? do u mean

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436804#comment-15436804 ] Vincent commented on SPARK-17219: - if so, we have to add this option within Bucketizer, right? >

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436553#comment-15436553 ] Vincent commented on SPARK-17219: - I can work on this issue if no one else is on it :) >

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436143#comment-15436143 ] Vincent commented on SPARK-17219: - for this scenario, we can add a new parameter for QuantileDiscretizer,

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436136#comment-15436136 ] Vincent commented on SPARK-17219: - for cases where only null and non-null buckets are needed, I guess we

[jira] [Comment Edited] (SPARK-17055) add labelKFold to CrossValidator

2016-08-23 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432463#comment-15432463 ] Vincent edited comment on SPARK-17055 at 8/23/16 10:34 AM: --- sorry for late

[jira] [Comment Edited] (SPARK-17055) add labelKFold to CrossValidator

2016-08-23 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432463#comment-15432463 ] Vincent edited comment on SPARK-17055 at 8/23/16 9:18 AM: -- sorry for late reply.

[jira] [Commented] (SPARK-17055) add labelKFold to CrossValidator

2016-08-23 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432463#comment-15432463 ] Vincent commented on SPARK-17055: - sorry for late reply. Yes, I just knew they intend to rename it to

[jira] [Comment Edited] (SPARK-17055) add labelKFold to CrossValidator

2016-08-22 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430381#comment-15430381 ] Vincent edited comment on SPARK-17055 at 8/22/16 9:14 AM: -- well, a better model

[jira] [Commented] (SPARK-17055) add labelKFold to CrossValidator

2016-08-22 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430381#comment-15430381 ] Vincent commented on SPARK-17055: - well, a better model will have a better cv performance on data with

[jira] [Commented] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-18 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426276#comment-15426276 ] Vincent commented on SPARK-17086: - Agree! [~srowen] > QuantileDiscretizer throws

[jira] [Comment Edited] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-18 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426261#comment-15426261 ] Vincent edited comment on SPARK-17086 at 8/18/16 10:57 AM: --- [~srowen] in the

[jira] [Commented] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-18 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426261#comment-15426261 ] Vincent commented on SPARK-17086: - [~srowen] in the example you just took, yes, it will return

[jira] [Commented] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-18 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426246#comment-15426246 ] Vincent commented on SPARK-17086: - [~yanboliang] yes, actually that case was handled on spark-1.6.2

[jira] [Commented] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-17 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424076#comment-15424076 ] Vincent commented on SPARK-17086: - confirmed issue doesnt exist on Spark-1.6.2. I will work on this

[jira] [Commented] (SPARK-17055) add labelKFold to CrossValidator

2016-08-15 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420867#comment-15420867 ] Vincent commented on SPARK-17055: - one of the most common tasks is to fit a "model" to a set of training

[jira] [Updated] (SPARK-17055) add labelKFold to CrossValidator

2016-08-14 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-17055: Description: Current CrossValidator only supports k-fold, which randomly divides all the samples in k

[jira] [Updated] (SPARK-17055) add labelKFold to CrossValidator

2016-08-14 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-17055: Affects Version/s: (was: 2.0.0) > add labelKFold to CrossValidator >

[jira] [Created] (SPARK-17055) add labelKFold to CrossValidator

2016-08-14 Thread Vincent (JIRA)
Vincent created SPARK-17055: --- Summary: add labelKFold to CrossValidator Key: SPARK-17055 URL: https://issues.apache.org/jira/browse/SPARK-17055 Project: Spark Issue Type: New Feature