Re: FP Growth saveAsTextFile

2015-05-20 Thread Xiangrui Meng
Could you post the stack trace? If you are using Spark 1.3 or 1.4, it would be easier to save freq itemsets as a Parquet file. -Xiangrui On Wed, May 20, 2015 at 12:16 PM, Eric Tanner eric.tan...@justenough.com wrote: I am having trouble with saving an FP-Growth model as a text file. I can

Re: User Defined Type (UDT)

2015-05-20 Thread Xiangrui Meng
using them to support java 8's ZonedDateTime. On Tue, May 19, 2015 at 3:14 PM Xiangrui Meng men...@gmail.com wrote: (Note that UDT is not a public API yet.) On Thu, May 7, 2015 at 7:11 AM, wjur wojtek.jurc...@gmail.com wrote: Hi all! I'm using Spark 1.3.0 and I'm struggling

Re: FP Growth saveAsTextFile

2015-05-21 Thread Xiangrui Meng
) at java.lang.Thread.run(Thread.java:745) On Wed, May 20, 2015 at 2:05 PM, Xiangrui Meng men...@gmail.com wrote: Could you post the stack trace? If you are using Spark 1.3 or 1.4, it would be easier to save freq itemsets as a Parquet file. -Xiangrui On Wed, May 20, 2015 at 12:16 PM, Eric Tanner

Re: NaiveBayes for MLPipeline is absent

2015-06-19 Thread Xiangrui Meng
Hi Justin, We plan to add it in 1.5, along with some other estimators. We are now preparing a list of JIRAs, but feel free to create a JIRA for this and submit a PR:) Best, Xiangrui On Thu, Jun 18, 2015 at 6:35 PM, Justin Yip yipjus...@prediction.io wrote: Hello, Currently, there is no

Re: NaiveBayes for MLPipeline is absent

2015-06-25 Thread Xiangrui Meng
FYI, I made a JIRA for this: https://issues.apache.org/jira/browse/SPARK-8600. -Xiangrui On Fri, Jun 19, 2015 at 3:01 PM, Xiangrui Meng men...@gmail.com wrote: Hi Justin, We plan to add it in 1.5, along with some other estimators. We are now preparing a list of JIRAs, but feel free to create

Re: Issue with PySpark UDF on a column of Vectors

2015-06-18 Thread Xiangrui Meng
This is a known issue. See https://issues.apache.org/jira/browse/SPARK-7902 -Xiangrui On Thu, Jun 18, 2015 at 6:41 AM, calstad colin.als...@gmail.com wrote: I am having trouble using a UDF on a column of Vectors in PySpark which can be illustrated here: from pyspark import SparkContext from

Re: Does MLLib has attribute importance?

2015-06-18 Thread Xiangrui Meng
, Jun 17, 2015 at 5:50 PM, Xiangrui Meng men...@gmail.com wrote: We don't have it in MLlib. The closest would be the ChiSqSelector, which works for categorical data. -Xiangrui On Thu, Jun 11, 2015 at 4:33 PM, Ruslan Dautkhanov dautkha...@gmail.com wrote: What would be closest equivalent in MLLib

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Xiangrui Meng
:20 PM, Xiangrui Meng men...@gmail.com wrote: It shouldn't be hard to handle 1 billion ratings in 1.3. Just need more information to guess what happened: 1. Could you share the ALS settings, e.g., number of blocks, rank and number of iterations, as well as number of users/items in your

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Xiangrui Meng
On Jun 23, 2015, at 6:20 PM, Xiangrui Meng men...@gmail.com wrote: It shouldn't be hard to handle 1 billion ratings in 1.3. Just need more information to guess what happened: 1. Could you share the ALS settings, e.g., number of blocks, rank and number of iterations, as well as number

Re: Settings for K-Means Clustering in Mlib for large data set

2015-06-18 Thread Xiangrui Meng
With 80,000 features and 1000 clusters, you need 80,000,000 doubles to store the cluster centers. That is ~600MB. If there are 10 partitions, you might need 6GB on the driver to collect updates from workers. I guess the driver died. Did you specify driver memory with spark-submit? -Xiangrui On

Re: Settings for K-Means Clustering in Mlib for large data set

2015-06-23 Thread Xiangrui Meng
should be adding another extra argument --conf spark.driver.memory=15g . Is that correct? Regards, Rogers Jeffrey L On Thu, Jun 18, 2015 at 7:50 PM, Xiangrui Meng men...@gmail.com wrote: With 80,000 features and 1000 clusters, you need 80,000,000 doubles to store the cluster centers

Re: Spark FP-Growth algorithm for frequent sequential patterns

2015-06-23 Thread Xiangrui Meng
This is on the wish list for Spark 1.5. Assuming that the items from the same transaction are distinct. We can still follow FP-Growth's steps: 1. find frequent items 2. filter transactions and keep only frequent items 3. do NOT order by frequency 4. use suffix to partition the transactions

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-23 Thread Xiangrui Meng
It shouldn't be hard to handle 1 billion ratings in 1.3. Just need more information to guess what happened: 1. Could you share the ALS settings, e.g., number of blocks, rank and number of iterations, as well as number of users/items in your dataset? 2. If you monitor the progress in the WebUI,

Re: How could output the StreamingLinearRegressionWithSGD prediction result?

2015-06-23 Thread Xiangrui Meng
Please check the input path to your test data, and call `.count()` and see whether there are records in it. -Xiangrui On Sat, Jun 20, 2015 at 9:23 PM, Gavin Yue yue.yuany...@gmail.com wrote: Hey, I am testing the StreamingLinearRegressionWithSGD following the tutorial. It works, but I could

Re: which mllib algorithm for large multi-class classification?

2015-06-23 Thread Xiangrui Meng
We have multinomial logistic regression implemented. For your case, the model size is 500 * 300,000 = 150,000,000. MLlib's implementation might not be able to handle it efficiently, we plan to have a more scalable implementation in 1.5. However, it shouldn't give you an array larger than MaxInt

Re: mutable vs. pure functional implementation - StatCounter

2015-06-23 Thread Xiangrui Meng
Creating millions of temporary (immutable) objects is bad for performance. It should be simple to do a micro-benchmark locally. -Xiangrui On Mon, Jun 22, 2015 at 7:25 PM, mzeltser mzelt...@gmail.com wrote: Using StatCounter as an example, I'd like to understand if pure functional implementation

Re: MLLib: instance weight

2015-06-17 Thread Xiangrui Meng
Hi Gajan, Please subscribe our user mailing list, which is the best place to get your questions answered. We don't have weighted instance support, but it should be easy to add and we plan to do it in the next release (1.5). Thanks for asking! Best, Xiangrui On Wed, Jun 17, 2015 at 2:33 PM,

Re: spark mlib variance analysis

2015-06-17 Thread Xiangrui Meng
We don't have R-like model summary in MLlib, but we plan to add some in 1.5. Please watch https://issues.apache.org/jira/browse/SPARK-7674. -Xiangrui On Thu, May 28, 2015 at 3:47 PM, rafac rafaelme...@hotmail.com wrote: I have a simple problem: i got mean number of people on one place by

Re: Why the default Params.copy doesn't work for Model.copy?

2015-06-17 Thread Xiangrui Meng
That's is a bug, which will be fixed in https://github.com/apache/spark/pull/6622. I disabled Model.copy because models usually doesn't have a default constructor and hence the default Params.copy implementation won't work. Unfortunately, due to insufficient test coverage, StringIndexModel.copy is

Re: Optimization module in Python mllib

2015-06-17 Thread Xiangrui Meng
There is no plan at this time. We haven't reached 100% coverage on user-facing API in PySpark yet, which would have higher priority. -Xiangrui On Sun, Jun 7, 2015 at 1:42 AM, martingoodson martingood...@gmail.com wrote: Am I right in thinking that Python mllib does not contain the optimization

Re: Does MLLib has attribute importance?

2015-06-17 Thread Xiangrui Meng
We don't have it in MLlib. The closest would be the ChiSqSelector, which works for categorical data. -Xiangrui On Thu, Jun 11, 2015 at 4:33 PM, Ruslan Dautkhanov dautkha...@gmail.com wrote: What would be closest equivalent in MLLib to Oracle Data Miner's Attribute Importance mining function?

Re: *Metrics API is odd in MLLib

2015-06-17 Thread Xiangrui Meng
LabeledPoint was used for both classification and regression, where label type is Double for simplicity. So in BinaryClassificationMetrics, we still use Double for labels. We compute the confusion matrix at each threshold internally, but this is not exposed to users (

Re: Parallel parameter tuning: distributed execution of MLlib algorithms

2015-06-17 Thread Xiangrui Meng
On Fri, May 22, 2015 at 6:15 AM, Hugo Ferreira h...@inesctec.pt wrote: Hi, I am currently experimenting with linear regression (SGD) (Spark + MLlib, ver. 1.2). At this point in time I need to fine-tune the hyper-parameters. I do this (for now) by an exhaustive grid search of the step size and

Re: Collabrative Filtering

2015-06-17 Thread Xiangrui Meng
Please following the code examples from the user guide: http://spark.apache.org/docs/latest/programming-guide.html#passing-functions-to-spark. -Xiangrui On Tue, May 26, 2015 at 12:34 AM, Yasemin Kaya godo...@gmail.com wrote: Hi, In CF String path = data/mllib/als/test.data; JavaRDDString

Re: How to use Apache spark mllib Model output in C++ component

2015-06-17 Thread Xiangrui Meng
In 1.3, we added some model save/load support in Parquet format. You can use Parquet's C++ library (https://github.com/Parquet/parquet-cpp) to load the data back. -Xiangrui On Wed, Jun 10, 2015 at 12:15 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Hope Swig and JNA might help for accessing

Re: RandomForest - subsamplingRate parameter

2015-06-17 Thread Xiangrui Meng
Because we don't have random access to the record, sampling still need to go through the records sequentially. It does save some computation, which is perhaps noticeable only if you have data cached in memory. Different random seeds are used for trees. -Xiangrui On Wed, Jun 3, 2015 at 4:40 PM,

Re: redshift spark

2015-06-17 Thread Xiangrui Meng
Hi Hafiz, As Ewan mentioned, the path is the path to the S3 files unloaded from Redshift. This is a more scalable way to get a large amount of data from Redshift than via JDBC. I'd recommend using the SQL API instead of the Hadoop API (https://github.com/databricks/spark-redshift). Best,

Re: k-means for text mining in a streaming context

2015-06-17 Thread Xiangrui Meng
Yes. You can apply HashingTF on your input stream and then use StreamingKMeans for training and prediction. -Xiangrui On Mon, Jun 8, 2015 at 11:05 AM, Ruslan Dautkhanov dautkha...@gmail.com wrote: Hello, https://spark.apache.org/docs/latest/mllib-feature-extraction.html would Feature

Re: What is the right algorithm to do cluster analysis with mixed numeric, categorical, and string value attributes?

2015-06-17 Thread Xiangrui Meng
You can try hashing to control the feature dimension. MLlib's k-means implementation can handle sparse data efficiently if the number of features is not huge. -Xiangrui On Tue, Jun 16, 2015 at 2:44 PM, Rex X dnsr...@gmail.com wrote: Hi Sujit, That's a good point. But 1-hot encoding will make

Re: Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0

2015-06-17 Thread Xiangrui Meng
That sounds like a bug. Could you create a JIRA and ping Yin Huai (cc'ed). -Xiangrui On Wed, May 27, 2015 at 12:57 AM, Justin Yip yipjus...@prediction.io wrote: Hello, I am trying out 1.4.0 and notice there are some differences in behavior with Timestamp between 1.3.1 and 1.4.0. In 1.3.1, I

Re: Efficient way to get top K values per key in (key, value) RDD?

2015-06-17 Thread Xiangrui Meng
This is implemented in MLlib: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala#L41. -Xiangrui On Wed, Jun 10, 2015 at 1:53 PM, erisa erisa...@gmail.com wrote: Hi, I am a Spark newbie, and trying to solve the same problem, and

Re: Not albe to run FP-growth Example

2015-06-17 Thread Xiangrui Meng
You should add spark-mllib_2.10 as a dependency instead of declaring it as the artifactId. And always use the same version for spark-core and spark-mllib. I saw you used 1.3.0 for spark-core but 1.4.0 for spark-mllib, which is not guaranteed to work. If you set the scope to provided, mllib jar

Re: Pandas timezone problems

2015-05-21 Thread Xiangrui Meng
These are relevant: JIRA: https://issues.apache.org/jira/browse/SPARK-6411 PR: https://github.com/apache/spark/pull/6250 On Thu, May 21, 2015 at 3:16 PM, Def_Os njde...@gmail.com wrote: After deserialization, something seems to be wrong with my pandas DataFrames. It looks like the timezone

Re: MLlib libsvm isssues with data

2015-05-19 Thread Xiangrui Meng
The index should start from 1 for LIBSVM format, as defined in the README of LIBSVM (https://github.com/cjlin1/libsvm/blob/master/README#L64). The only exception is the precomputed kernel, which MLlib doesn't support. -Xiangrui On Wed, May 6, 2015 at 1:42 AM, doyere doy...@doyere.cn wrote: Hi

Re: RandomSplit with Spark-ML and Dataframe

2015-05-19 Thread Xiangrui Meng
In 1.4, we added RAND as a DataFrame expression, which can be used for random split. Please check the example here: https://github.com/apache/spark/blob/master/python/pyspark/ml/tuning.py#L214. -Xiangrui On Thu, May 7, 2015 at 8:39 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi,

Re: User Defined Type (UDT)

2015-05-19 Thread Xiangrui Meng
(Note that UDT is not a public API yet.) On Thu, May 7, 2015 at 7:11 AM, wjur wojtek.jurc...@gmail.com wrote: Hi all! I'm using Spark 1.3.0 and I'm struggling with a definition of a new type for a project I'm working on. I've created a case class Person(name: String) and now I'm trying to

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-05-19 Thread Xiangrui Meng
:59 PM, Xiangrui Meng men...@gmail.com wrote: Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642 and used the same lambda scaling as in 1.2. The change will be included in Spark 1.3.1, which will be released soon. Thanks for reporting this issue! -Xiangrui On Tue, Mar 31

Re: Spark FP-Growth algorithm for frequent sequential patterns

2015-06-28 Thread Xiangrui Meng
for this which I just submitted a PR for :) On Tue, Jun 23, 2015 at 6:09 PM, Xiangrui Meng men...@gmail.com wrote: This is on the wish list for Spark 1.5. Assuming that the items from the same transaction are distinct. We can still follow FP-Growth's steps: 1. find frequent items 2. filter

Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-07-28 Thread Xiangrui Meng
Hi Stahlman, finalRDDStorageLevel is the storage level for the final user/item factors. It is not common to set it to StorageLevel.NONE, unless you want to save the factors directly to disk. So if it is NONE, we cannot unpersist the intermediate RDDs (in/out blocks) because the final user/item

Re: Proper saving/loading of MatrixFactorizationModel

2015-07-28 Thread Xiangrui Meng
The partitioner is not saved with the RDD. So when you load the model back, we lose the partitioner information. You can call repartition on the user/product factors and then create a new MatrixFactorizationModel object using the repartitioned RDDs. It would be useful to create a utility method

Re: MovieALS Implicit Error

2015-07-28 Thread Xiangrui Meng
Hi Benedict, Did you set lambda to zero? -Xiangrui On Mon, Jul 13, 2015 at 4:18 AM, Benedict Liang bli...@thecarousell.com wrote: Hi Sean, This user dataset is organic. What do you think is a good ratings threshold then? I am only encountering this with the implicit type though. The explicit

Re: Out of Memory Errors on less number of cores in proportion to Partitions in Data

2015-07-28 Thread Xiangrui Meng
Hi Aniruddh, Increasing number of partitions doesn't always help in ALS due to communication/computation trade-off. What rank did you set? If the rank is not large, I'd recommend a small number of partitions. There are some other numbers to watch. Do you have super popular items/users in your

Re: Cluster sizing for recommendations

2015-07-28 Thread Xiangrui Meng
Hi Danny, You might need to reduce the number of partitions (or set userBlocks and productBlocks directly in ALS). Using a large number of partitions increases shuffle size and memory requirement. If you have 16 x 16 = 256 cores. I would recommend 64 or 128 instead of 2048.

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Too many values to unpack

2015-07-28 Thread Xiangrui Meng
. Any idea what my errors mean, and why increasing memory causes them to go away? Thanks. On Fri, Jun 26, 2015 at 11:26 AM, Xiangrui Meng men...@gmail.com wrote: Please see my comments inline. It would be helpful if you can attach the full stack trace. -Xiangrui On Fri, Jun 26, 2015 at 7:18

Re: Why transformer from ml.Pipeline transform only a DataFrame ?

2015-08-28 Thread Xiangrui Meng
Yes, we will open up APIs in next release. There were some discussion about the APIs. One approach is to have multiple methods for different outputs like predicted class and probabilities. -Xiangrui On Aug 28, 2015 6:39 AM, Jaonary Rabarisoa jaon...@gmail.com wrote: Hi there, The actual API of

Re: AFTSurvivalRegression Prediction and QuantilProbabilities

2016-02-17 Thread Xiangrui Meng
You can get response, and quantiles if you enables quantilesCol. You can change quantile probabilities as well. There is some example code from the user guide: http://spark.apache.org/docs/latest/ml-classification-regression.html#survival-regression. -Xiangrui On Mon, Feb 1, 2016 at 9:09 AM

Re: How to train and predict in parallel via Spark MLlib?

2016-02-18 Thread Xiangrui Meng
If you have a big cluster, you can trigger training jobs in different threads on the driver. Putting RDDs inside an RDD won't work. -Xiangrui On Thu, Feb 18, 2016, 4:28 AM Igor L. wrote: > Good day, Spark team! > I have to solve regression problem for different restricitons.

Re: Why no computations run on workers/slaves in cluster mode?

2016-02-18 Thread Xiangrui Meng
Did the test program finish and did you see any error messages from the console? -Xiangrui On Wed, Feb 17, 2016, 1:49 PM Junjie Qian wrote: > Hi all, > > I am new to Spark, and have one problem that, no computations run on > workers/slave_servers in the standalone

Re: How to train and predict in parallel via Spark MLlib?

2016-02-19 Thread Xiangrui Meng
u show me anyone. > > Thanks a lot for your participation! > --Igor > > 2016-02-18 17:24 GMT+03:00 Xiangrui Meng <men...@gmail.com>: > >> If you have a big cluster, you can trigger training jobs in different >> threads on the driver. Putting RDDs inside an RDD w

Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Xiangrui Meng
Hi all, More than a year ago, in Spark 1.2 we introduced the ML pipeline API built on top of Spark SQL’s DataFrames. Since then the new DataFrame-based API has been developed under the spark.ml package, while the old RDD-based API has been developed in parallel under the spark.mllib package.

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Xiangrui Meng
ace in the 2.x series ? > > Thanks > Shivaram > > On Tue, Apr 5, 2016 at 11:48 AM, Sean Owen <so...@cloudera.com> wrote: > > FWIW, all of that sounds like a good plan to me. Developing one API is > > certainly better than two. > > > > On Tue, Apr 5, 20

Re: Is JavaSparkContext.wholeTextFiles distributed?

2016-04-28 Thread Xiangrui Meng
It implements CombineInputFormat from Hadoop. isSplittable=false means each individual file cannot be split. If you only see one partition even with a large minPartitions, perhaps the total size of files is not big enough. Those are configurable in Hadoop conf. -Xiangrui On Tue, Apr 26, 2016,

Re: dataframe stat corr for multiple columns

2016-05-19 Thread Xiangrui Meng
This is nice to have. Please create a JIRA for it. Right now, you can merge all columns into a vector column using RFormula or VectorAssembler, then convert it into an RDD and call corr from MLlib. On Tue, May 17, 2016, 7:09 AM Ankur Jain wrote: > Hello Team, > > > > In my

Re: Custom Spark Error on Hadoop Cluster

2016-07-18 Thread Xiangrui Meng
gt;> Thanks for the reply however I couldn't locate the MLlib jar. What I >>>>>> have is a fat 'spark-assembly-1.5.1-hadoop2.6.0.jar'. >>>>>> >>>>>> There's an error on me copying user@spark.apache.org. The message >>>>>&g

Re: Custom Spark Error on Hadoop Cluster

2016-07-11 Thread Xiangrui Meng
not on the classpath? > We change entirely the contents of the directory for SPARK_HOME. The newly > built customized spark is the new contents of the current SPARK_HOME we > have right now. > > Thanks, > > Alger > > On Fri, Jul 8, 2016 at 1:32 PM, Xiangrui Meng &l

Re: Custom Spark Error on Hadoop Cluster

2016-07-07 Thread Xiangrui Meng
This seems like a deployment or dependency issue. Please check the following: 1. The unmodified Spark jars were not on the classpath (already existed on the cluster or pulled in by other packages). 2. The modified jars were indeed deployed to both master and slave nodes. On Tue, Jul 5, 2016 at

SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-01-15 Thread Xiangrui Meng
Hi all, I want to re-send the previous SPIP on introducing a DataFrame-based graph component to collect more feedback. It supports property graphs, Cypher graph queries, and graph algorithms built on top of the DataFrame API. If you are a GraphX user or your workload is essentially graph queries,

Re: Spark ML with null labels

2019-01-10 Thread Xiangrui Meng
In your custom transformer that produces labels, can you filter null labels? A transformer doesn't always need to do 1:1 mapping. On Thu, Jan 10, 2019, 7:53 AM Patrick McCarthy I'm trying to implement an algorithm on the MNIST digits that runs like so: > > >- for every pair of digits (0,1),

Re: question about barrier execution mode in Spark 2.4.0

2018-12-19 Thread Xiangrui Meng
On Mon, Nov 12, 2018 at 7:33 AM Joe wrote: > Hello, > I was reading Spark 2.4.0 release docs and I'd like to find out more > about barrier execution mode. > In particular I'd like to know what happens when number of partitions > exceeds number of nodes (which I think is allowed, Spark tuning doc

Re: Should python-2 be supported in Spark 3.0?

2019-05-29 Thread Xiangrui Meng
Hi all, I want to revive this old thread since no action was taken so far. If we plan to mark Python 2 as deprecated in Spark 3.0, we should do it as early as possible and let users know ahead. PySpark depends on Python, numpy, pandas, and pyarrow, all of which are sunsetting Python 2 support by

Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Xiangrui Meng
From:* Reynold Xin > *Sent:* Thursday, May 30, 2019 12:59:14 AM > *To:* shane knapp > *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen > Fen; Xiangrui Meng; dev; user > *Subject:* Re: Should python-2 be supported in Spark 3.0? > > +1 on Xiangrui’s plan. &

Re: Should python-2 be supported in Spark 3.0?

2019-06-03 Thread Xiangrui Meng
-- > *From:* shane knapp > *Sent:* Friday, May 31, 2019 7:38:10 PM > *To:* Denny Lee > *Cc:* Holden Karau; Bryan Cutler; Erik Erlandson; Felix Cheung; Mark > Hamstra; Matei Zaharia; Reynold Xin; Sean Owen; Wenchen Fen; Xiangrui Meng; > dev; user > *Subj

[ANNOUNCEMENT] Plan for dropping Python 2 support

2019-06-03 Thread Xiangrui Meng
Hi all, Today we announced the plan for dropping Python 2 support [1] in Apache Spark: As many of you already knew, Python core development team and many utilized Python packages like Pandas and NumPy will drop Python 2

Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Xiangrui Meng
, I'm going to upload it to Spark website and announce it here. Let me know if you think we should do a VOTE instead. On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng wrote: > I created https://issues.apache.org/jira/browse/SPARK-27884 to track the > work. > > On Thu, May 30, 2019 at 2

<    1   2   3   4   5