I think it is awesome. Brilliant interface that is missing from Spark. Would you integrate with something like MLFlow?
Thanks, Russell Jurney @rjurney <http://twitter.com/rjurney> russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB <http://facebook.com/jurney> datasyndrome.com Book a time on Calendly <https://calendly.com/rjurney_personal/30min> On Mon, Feb 27, 2023 at 10:16 AM Chitral Verma <chitralve...@gmail.com> wrote: > Hi All, > I worked on this idea a few years back as a pet project to bridge > *SparkSQL* and *SparkML* and empower anyone to implement production > grade, distributed machine learning over Apache Spark as long as they have > SQL skills. > > In principle the idea works exactly like Google's BigQueryML but at a much > wider scope with no vendor lock-in on basically every source that's > supported by Spark in cloud or on-prem. > > *Training* a ML model can look like, > > FIT 'LogisticRegression' ESTIMATOR WITH PARAMS(maxIter = 3) TO ( > SELECT * FROM mlDataset) AND OVERWRITE AT LOCATION '/path/to/lr-model'; > > *Prediction* a ML model can look like, > > PREDICT FOR (SELECT * FROM mlTestDataset) USING MODEL STORED AT LOCATION > '/path/to/lr-model' > > *Feature Preprocessing* can look like, > > TRANSFORM (SELECT * FROM dataset) using 'StopWordsRemover' TRANSFORMER WITH > PARAMS (inputCol='raw', outputCol='filtered') AND WRITE AT LOCATION > '/path/to/test-transformer' > > > But a lot more can be done with this library. > > I was wondering if any of you find this interesting and would like to > contribute to the project here, > > https://github.com/chitralverma/sparksql-ml > > > Regards, > Chitral Verma > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >