Re: redundant decision tree model

2018-02-16 Thread Alessandro Solimando
Hello, a small recap for who is interested. There was already a ticket covering the case that I failed to find when I checked. As a result the other one has been correctly marked as duplicate: https://issues.apache.org/jira/browse/SPARK-3159 I have created a PR for this that you can check here

Re: I Want to Help with MLlib Migration

2018-02-16 Thread Yacine Mazari
Thanks for the clarification and suggestion @weichen. I will try to benchmark it and share the results for discussion. Regards, Yacine. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe

Re: A new external catalog

2018-02-16 Thread Steve Loughran
On 14 Feb 2018, at 19:56, Tayyebi, Ameen > wrote: Newbie question: I want to add system/integration tests for the new functionality. There are a set of existing tests around Spark Catalog that I can leverage. Great. The provider I’m writing is

Re: A new external catalog

2018-02-16 Thread Steve Loughran
On 14 Feb 2018, at 13:51, Tayyebi, Ameen > wrote: Thanks a lot Steve. I’ll go through the Jira’s you linked in detail. I took a quick look and am sufficiently scared for now. I had run into that warning from the S3 stream before. Sigh. things

Re: I Want to Help with MLlib Migration

2018-02-16 Thread Weichen Xu
>>The goal is to have these algorithms implemented using the Dataset API. Currently, the implementation of these classes/algorithms uses RDDs by wrapping the old (mllib) classes, which will eventually be deprecated (and deleted). It need discussion and test for each algorithm before doing that.