Stefan Goldener created MAHOUT-2101:
---------------------------------------

             Summary: Mahout local file distribution
                 Key: MAHOUT-2101
                 URL: https://issues.apache.org/jira/browse/MAHOUT-2101
             Project: Mahout
          Issue Type: Improvement
            Reporter: Stefan Goldener


At the moment Mahout is heavily based on HDFS. Although MAHOUT_LOCAL is using 
the local File system it is not possible to use MAHOUT_LOCAL=true and a SPARK 
ONLY Cluster.

My suggestion is to improve the Mahout code to support local files and 
distribute them via SPARK. There are multiple options for that e.g. Spark SQL, 
DataFrames, Datasets or RDD's.

This will also allow Mahout to use the new SPARK Kubernetes features and hence 
be highly scalable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to