Re: [New Project] sparksql-ml : Distributed Machine Learning using SparkSQL.

Russell Jurney Mon, 27 Feb 2023 16:21:32 -0800

I think it is awesome. Brilliant interface that is missing from Spark.
Would you integrate with something like MLFlow?


Thanks,
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com Book a time on Calendly
<https://calendly.com/rjurney_personal/30min>


On Mon, Feb 27, 2023 at 10:16 AM Chitral Verma <chitralve...@gmail.com>
wrote:

> Hi All,
> I worked on this idea a few years back as a pet project to bridge
> *SparkSQL* and *SparkML* and empower anyone to implement production
> grade, distributed machine learning over Apache Spark as long as they have
> SQL skills.
>
> In principle the idea works exactly like Google's BigQueryML but at a much
> wider scope with no vendor lock-in on basically every source that's
> supported by Spark in cloud or on-prem.
>
> *Training* a ML model can look like,
>
> FIT 'LogisticRegression' ESTIMATOR WITH PARAMS(maxIter = 3) TO (
> SELECT * FROM mlDataset) AND OVERWRITE AT LOCATION '/path/to/lr-model';
>
> *Prediction* a ML model can look like,
>
> PREDICT FOR (SELECT * FROM mlTestDataset) USING MODEL STORED AT LOCATION 
> '/path/to/lr-model'
>
> *Feature Preprocessing* can look like,
>
> TRANSFORM (SELECT * FROM dataset) using 'StopWordsRemover' TRANSFORMER WITH
> PARAMS (inputCol='raw', outputCol='filtered') AND WRITE AT LOCATION 
> '/path/to/test-transformer'
>
>
> But a lot more can be done with this library.
>
> I was wondering if any of you find this interesting and would like to
> contribute to the project here,
>
> https://github.com/chitralverma/sparksql-ml
>
>
> Regards,
> Chitral Verma
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: [New Project] sparksql-ml : Distributed Machine Learning using SparkSQL.

Reply via email to