Re: [SQL] [Suggestion] Add top() to Dataset

2018-02-02 Thread Yacine Mazari
I see, thanks a lot for the clarifications. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [SQL] [Suggestion] Add top() to Dataset

2018-01-30 Thread Wenchen Fan
You can use `Dataset.limit`, which return a new `Dataset` instead of an Array. Then you can transform it and still get the top k optimization from Spark. On Wed, Jan 31, 2018 at 3:39 PM, Yacine Mazari wrote: > Thanks for the quick reply and explanation @rxin. > > So if one

Re: [SQL] [Suggestion] Add top() to Dataset

2018-01-30 Thread Yacine Mazari
Thanks for the quick reply and explanation @rxin. So if one does not want to collect()/take() but want the top k as a dataset to do further transformations there is no optimized API, that's why I am suggesting adding this "top()" as a public method. If that sounds like a good idea, I will open a

Re: [SQL] [Suggestion] Add top() to Dataset

2018-01-30 Thread Reynold Xin
For the DataFrame/Dataset API, the optimizer rewrites orderBy followed by a take into a priority queue based top implementation actually. On Tue, Jan 30, 2018 at 11:10 PM, Yacine Mazari wrote: > Hi All, > > Would it make sense to add a "top()" method to the Dataset API? >