Re: I Want to Help with MLlib Migration

2018-02-16 Thread Yacine Mazari
Thanks for the clarification and suggestion @weichen. I will try to benchmark it and share the results for discussion. Regards, Yacine. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe

Re: I Want to Help with MLlib Migration

2018-02-15 Thread Yacine Mazari
Thanks for the reply @srowen. >>I don't think you can move or alter the class APis. Agreed. That's not my intention at all. >>There also isn't much value in copying the code. Maybe there are opportunities for moving some internal code. There will probably be some copying and moving internal

I Want to Help with MLlib Migration

2018-02-15 Thread Yacine Mazari
Hi, I see that many classes under "org.apache.spark.ml" are still referring to the "org.apache.spark.mllib" implementation. While there still is time until the deprecation deadline by version 3.0, having these dependencies makes it impossible or difficult to make improvements to these classes.

Re: [SQL] [Suggestion] Add top() to Dataset

2018-02-02 Thread Yacine Mazari
I see, thanks a lot for the clarifications. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [SQL] [Suggestion] Add top() to Dataset

2018-01-30 Thread Yacine Mazari
Thanks for the quick reply and explanation @rxin. So if one does not want to collect()/take() but want the top k as a dataset to do further transformations there is no optimized API, that's why I am suggesting adding this "top()" as a public method. If that sounds like a good idea, I will open a

[SQL] [Suggestion] Add top() to Dataset

2018-01-30 Thread Yacine Mazari
Hi All, Would it make sense to add a "top()" method to the Dataset API? This method would return a Dataset containing the top k elements, the caller may then do further processing on the Dataset or call collect(). This is in contrast with RDD's top() which returns a collected array. In terms of

Re: Failing Spark Unit Tests

2018-01-23 Thread Yacine Mazari
Got it, I opened a PR. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Failing Spark Unit Tests

2018-01-23 Thread Yacine Mazari
Hi All, I am currently working on SPARK-23166 , but after running "./dev/run-tests", the Python unit tests (supposedly unrelated to my change) are failing for the following reason: