[
https://issues.apache.org/jira/browse/SPARK-30670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027242#comment-17027242
]
Vincent commented on SPARK-30670:
-
I just had a look, but transform does not allow for `*args` and
[
https://issues.apache.org/jira/browse/SPARK-26449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027241#comment-17027241
]
Vincent commented on SPARK-26449:
-
Is there a reason why transform does not accept `*args` and
Vincent created SPARK-30670:
---
Summary: Pipes for PySpark
Key: SPARK-30670
URL: https://issues.apache.org/jira/browse/SPARK-30670
Project: Spark
Issue Type: New Feature
Components: SQL
Vincent created SPARK-27087:
---
Summary: Inability to access to column alias in pyspark
Key: SPARK-27087
URL: https://issues.apache.org/jira/browse/SPARK-27087
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-25412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613181#comment-16613181
]
Vincent commented on SPARK-25412:
-
Thanks, Nick, for the reply.
so, the tradeoff is between highly
[
https://issues.apache.org/jira/browse/SPARK-25412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611628#comment-16611628
]
Vincent commented on SPARK-25412:
-
[~nick.pentre...@gmail.com] thanks.
> FeatureHasher would change the
Vincent created SPARK-25412:
---
Summary: FeatureHasher would change the value of output feature
Key: SPARK-25412
URL: https://issues.apache.org/jira/browse/SPARK-25412
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-25364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608612#comment-16608612
]
Vincent commented on SPARK-25364:
-
duplication. close this Jira.
> a better way to handle vector index
[
https://issues.apache.org/jira/browse/SPARK-25364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent resolved SPARK-25364.
-
Resolution: Duplicate
> a better way to handle vector index and sparsity in FeatureHasher
>
[
https://issues.apache.org/jira/browse/SPARK-25365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606746#comment-16606746
]
Vincent commented on SPARK-25365:
-
[~nick.pentre...@gmail.com] Thanks.
> a better way to handle vector
Vincent created SPARK-25364:
---
Summary: a better way to handle vector index and sparsity in
FeatureHasher implementation ?
Key: SPARK-25364
URL: https://issues.apache.org/jira/browse/SPARK-25364
Project:
Vincent created SPARK-25365:
---
Summary: a better way to handle vector index and sparsity in
FeatureHasher implementation ?
Key: SPARK-25365
URL: https://issues.apache.org/jira/browse/SPARK-25365
Project:
Vincent created SPARK-25034:
---
Summary: possible triple memory consumption in fetchBlockSync()
Key: SPARK-25034
URL: https://issues.apache.org/jira/browse/SPARK-25034
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-24968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent resolved SPARK-24968.
-
Resolution: Fixed
> Configurable Chunksize in ChunkedByteBufferOutputStream
>
[
https://issues.apache.org/jira/browse/SPARK-24968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563467#comment-16563467
]
Vincent commented on SPARK-24968:
-
indeed they are closely related
I'll close this ticket
>
Vincent created SPARK-24968:
---
Summary: Configurable Chunksize in ChunkedByteBufferOutputStream
Key: SPARK-24968
URL: https://issues.apache.org/jira/browse/SPARK-24968
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-24917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-24917:
Description:
Hello
while investigating some OOM errors in Spark 2.2 [(here's my call
[
https://issues.apache.org/jira/browse/SPARK-24917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-24917:
Description:
Hello
while investigating some OOM errors in Spark 2.2 [(here's my call
Vincent created SPARK-24917:
---
Summary: Sending a partition over netty results in 2x memory usage
Key: SPARK-24917
URL: https://issues.apache.org/jira/browse/SPARK-24917
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-22096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-22096:
Attachment: performance data for NB.png
> use aggregateByKeyLocally to save one stage in calculating
[
https://issues.apache.org/jira/browse/SPARK-22096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-22096:
Description:
NaiveBayes currently takes aggreateByKey followed by a collect to calculate
frequency for
[
https://issues.apache.org/jira/browse/SPARK-22096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-22096:
Description:
NaiveBayes currently takes aggreateByKey followed by a collect to calculate
frequency for
Vincent created SPARK-22098:
---
Summary: Add aggregateByKeyLocally in RDD
Key: SPARK-22098
URL: https://issues.apache.org/jira/browse/SPARK-22098
Project: Spark
Issue Type: Sub-task
Vincent created SPARK-22096:
---
Summary: use aggregateByKeyLocally to save one stage in
calculating ItemFrequency in NaiveBayes
Key: SPARK-22096
URL: https://issues.apache.org/jira/browse/SPARK-22096
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125353#comment-16125353
]
Vincent commented on SPARK-21688:
-
sorry for late reply.
Yes, It's simple and easy to check the env
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121624#comment-16121624
]
Vincent commented on SPARK-21688:
-
Okay. Yes, true. It can still run without issue but we are just
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121589#comment-16121589
]
Vincent commented on SPARK-21688:
-
[~srowen] Thanks for your comments. I think if user decides to use
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121254#comment-16121254
]
Vincent commented on SPARK-21688:
-
and if native blas is left with default multi-threading setting, it
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-21688:
Attachment: native-trywait.png
> performance improvement in mllib SVM with native BLAS
>
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-21688:
Attachment: (was: uni-test on ddot.png)
> performance improvement in mllib SVM with native BLAS
>
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-21688:
Attachment: ddot unitest.png
> performance improvement in mllib SVM with native BLAS
>
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121236#comment-16121236
]
Vincent commented on SPARK-21688:
-
upload a data we collected before, uni-test on ddot, we can see for
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-21688:
Attachment: uni-test on ddot.png
> performance improvement in mllib SVM with native BLAS
>
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121209#comment-16121209
]
Vincent commented on SPARK-21688:
-
currently, there are certain places in ML/MLLib, such as in mllib/SVM,
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121113#comment-16121113
]
Vincent edited comment on SPARK-21688 at 8/10/17 6:13 AM:
--
attach svm profiling
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-21688:
Attachment: svm1.png
svm2.png
svm-mkl-1.png
svm-mkl-2.png
[
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-21688:
Attachment: mllib svm training.png
> performance improvement in mllib SVM with native BLAS
>
Vincent created SPARK-21688:
---
Summary: performance improvement in mllib SVM with native BLAS
Key: SPARK-21688
URL: https://issues.apache.org/jira/browse/SPARK-21688
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-20988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048769#comment-16048769
]
Vincent commented on SPARK-20988:
-
okay, no problem :)
> Convert logistic regression to new aggregator
[
https://issues.apache.org/jira/browse/SPARK-20988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048593#comment-16048593
]
Vincent commented on SPARK-20988:
-
opps. I have finished the conversion part, but there are still other
[
https://issues.apache.org/jira/browse/SPARK-20988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047370#comment-16047370
]
Vincent commented on SPARK-20988:
-
I can work on this if no one is working on it now :)
> Convert
Vincent created SPARK-21058:
---
Summary: potential SVD optimization
Key: SPARK-21058
URL: https://issues.apache.org/jira/browse/SPARK-21058
Project: Spark
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/SPARK-21049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045536#comment-16045536
]
Vincent commented on SPARK-21049:
-
[~srowen]thanks. that's right. But we found it quite often that, the
Vincent created SPARK-21049:
---
Summary: why do we need computeGramianMatrix when computing SVD
Key: SPARK-21049
URL: https://issues.apache.org/jira/browse/SPARK-21049
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000176#comment-16000176
]
Vincent commented on SPARK-17134:
-
I will submit a PR for this issue soon.
> Use level 2 BLAS operations
[
https://issues.apache.org/jira/browse/SPARK-19852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900463#comment-15900463
]
Vincent commented on SPARK-19852:
-
I can work on this issue, since it is related to SPARK-17498
>
[
https://issues.apache.org/jira/browse/SPARK-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877781#comment-15877781
]
Vincent commented on SPARK-7132:
Hi All, any update on this issue?
> Add fit with validation set to
[
https://issues.apache.org/jira/browse/SPARK-14682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875618#comment-15875618
]
Vincent commented on SPARK-14682:
-
any update?
> Provide evaluateEachIteration method or equivalent for
Vincent created SPARK-19590:
---
Summary: Update the document for QuantileDiscretizer in pyspark
Key: SPARK-19590
URL: https://issues.apache.org/jira/browse/SPARK-19590
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857600#comment-15857600
]
Vincent commented on SPARK-17498:
-
I can take the issue and make a PR
> StringIndexer.setHandleInvalid
[
https://issues.apache.org/jira/browse/SPARK-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15682491#comment-15682491
]
Vincent commented on SPARK-18023:
-
thanks [~mlnick]
that's really what we need. when I wrote the code for
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631286#comment-15631286
]
Vincent commented on SPARK-17055:
-
[~srowen] No offense. Maybe we can invite more ppl to have a look at
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-17055:
Summary: add groupKFold to CrossValidator (was: add labelKFold to
CrossValidator)
> add groupKFold to
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625108#comment-15625108
]
Vincent commented on SPARK-17055:
-
[~sowen] Okay. hmm, I guess we have some misunderstanding here.
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624114#comment-15624114
]
Vincent commented on SPARK-17055:
-
[~srowen] May I ask the reason why we close this issue? It'd be
[
https://issues.apache.org/jira/browse/SPARK-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590951#comment-15590951
]
Vincent commented on SPARK-18023:
-
I can start with ADAM, then maybe other Ada methods after that
> Adam
Vincent created SPARK-18023:
---
Summary: Adam optimizer
Key: SPARK-18023
URL: https://issues.apache.org/jira/browse/SPARK-18023
Project: Spark
Issue Type: New Feature
Components: ML, MLlib
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559597#comment-15559597
]
Vincent commented on SPARK-17219:
-
No problem. I will try to submit another PR based on above
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557034#comment-15557034
]
Vincent commented on SPARK-17219:
-
[~josephkb] [~srowen] [~timhunter] let me know what I can do to help
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557021#comment-15557021
]
Vincent commented on SPARK-17219:
-
in this PR(https://github.com/apache/spark/pull/14858) NaN values are
[
https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483514#comment-15483514
]
Vincent edited comment on SPARK-17498 at 9/12/16 8:55 AM:
--
Here is how we cc
[
https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483514#comment-15483514
]
Vincent edited comment on SPARK-17498 at 9/12/16 8:43 AM:
--
Here is what we cc
[
https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483514#comment-15483514
]
Vincent commented on SPARK-17498:
-
Here is what we cc [~qhuang] see about this issue
and correct me if
[
https://issues.apache.org/jira/browse/SPARK-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476947#comment-15476947
]
YSMAL Vincent commented on SPARK-6680:
--
HI, using docker you can get rid of this alias on hostname,
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445873#comment-15445873
]
Vincent commented on SPARK-17219:
-
Cool. I will refine the patch. thanks [~srowen] :)
>
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445858#comment-15445858
]
Vincent commented on SPARK-17219:
-
yes, discretizer can do it easily, especially if only
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445808#comment-15445808
]
Vincent commented on SPARK-17219:
-
then we have to shift this work to user, who needs to filter out the
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445768#comment-15445768
]
Vincent commented on SPARK-17219:
-
[~srowen] Hi all, per discussion, I thought we are going to handle NaN
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436873#comment-15436873
]
Vincent commented on SPARK-17219:
-
Okay, thanks.
So, meaning we will have no options for users actually.
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436870#comment-15436870
]
Vincent commented on SPARK-17219:
-
yes, if we wanna make this scenario more general to all bucketizer
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436856#comment-15436856
]
Vincent commented on SPARK-17219:
-
[~srowen] sorryOwen, by saying 'keep it to one behavior'? do u mean
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436804#comment-15436804
]
Vincent commented on SPARK-17219:
-
if so, we have to add this option within Bucketizer, right?
>
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436553#comment-15436553
]
Vincent commented on SPARK-17219:
-
I can work on this issue if no one else is on it :)
>
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436143#comment-15436143
]
Vincent commented on SPARK-17219:
-
for this scenario, we can add a new parameter for QuantileDiscretizer,
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436136#comment-15436136
]
Vincent commented on SPARK-17219:
-
for cases where only null and non-null buckets are needed, I guess we
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432463#comment-15432463
]
Vincent edited comment on SPARK-17055 at 8/23/16 10:34 AM:
---
sorry for late
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432463#comment-15432463
]
Vincent edited comment on SPARK-17055 at 8/23/16 9:18 AM:
--
sorry for late reply.
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432463#comment-15432463
]
Vincent commented on SPARK-17055:
-
sorry for late reply. Yes, I just knew they intend to rename it to
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430381#comment-15430381
]
Vincent edited comment on SPARK-17055 at 8/22/16 9:14 AM:
--
well, a better model
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430381#comment-15430381
]
Vincent commented on SPARK-17055:
-
well, a better model will have a better cv performance on data with
[
https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426276#comment-15426276
]
Vincent commented on SPARK-17086:
-
Agree! [~srowen]
> QuantileDiscretizer throws
[
https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426261#comment-15426261
]
Vincent edited comment on SPARK-17086 at 8/18/16 10:57 AM:
---
[~srowen] in the
[
https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426261#comment-15426261
]
Vincent commented on SPARK-17086:
-
[~srowen] in the example you just took, yes, it will return
[
https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426246#comment-15426246
]
Vincent commented on SPARK-17086:
-
[~yanboliang] yes, actually that case was handled on spark-1.6.2
[
https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424076#comment-15424076
]
Vincent commented on SPARK-17086:
-
confirmed issue doesnt exist on Spark-1.6.2.
I will work on this
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420867#comment-15420867
]
Vincent commented on SPARK-17055:
-
one of the most common tasks is to fit a "model" to a set of training
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-17055:
Description:
Current CrossValidator only supports k-fold, which randomly divides all the
samples in k
[
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent updated SPARK-17055:
Affects Version/s: (was: 2.0.0)
> add labelKFold to CrossValidator
>
Vincent created SPARK-17055:
---
Summary: add labelKFold to CrossValidator
Key: SPARK-17055
URL: https://issues.apache.org/jira/browse/SPARK-17055
Project: Spark
Issue Type: New Feature
89 matches
Mail list logo