I know that there has been some discussion around improving reflection in Swift and I wanted to add to the discussion with some of the work I have been trying to do using the Swift Language. I have been investigating using Swift to create a framework that provides a programming API to process data and execute functions in parallel on a cluster. The framework needs to be able instantiate these functions on the cluster workers and have the data processed by the functions. The plans are to use one of the existing cluster managers, such as Spark or Storm. As of today, I have been looking at using Spark. There would be a predefined set of functions supported such as map, fliter, join, etc. as defined by the cluster manager.
In my experimenting, I have run into a number of issues which I haven't been able to solve due to the limited support for reflection in Swift. In my description of the issues, I'm going to use APIs based on Spark since that is the cluster manager I have been playing with.
Parameter and Return types
The following is an example of a Swift class that maps to the RDD class in Spark.
public class RDD <T> {
   public func collect() throws -> [T] {
The value of T could be any basic type to a class. Even if the types are limited to basic types and known Spark types, the list of possibilities is large. From one of the Spark examples, T would be
  Tuple2<Int32, Tuple3<Int32, Int32, Double>>
The possible combinations of types is too large to be hard coded given Spark supports Tuples with 22 different types. I can get the type of T in a string, but haven't found a way to instantiate the type using the string. Is there some way around this problem?

User-Defined Functions
A programmer would define functions that will be executed on a cluster to process data. The programmer doesn't need to do special packaging of functions that run on a cluster. The programmer would code a filter function against the cluster the same way as the filter function for a Swift array. For instance, for a filter method such as the following:
let result = RDD.filter({ (value) -> Bool in
    return value > 15
The framework would need to be able to do reflection on the function to get the information needed to instantiate and call the function on the cluster workers. Following is some of the information needed:
  Module name
  Class/Struct name
  Function name
  Parameter names and type information
Once on the cluster the framework would need to do the following:
  1. Instantiate the parameters. Again, a parameter could be a basic type to a class.
  2. Dynamically load/import the module containing the function.
  3. Find the function in the module that matches the signature.
  4. Call the function.
  5. Handle the return type.
With the existing Swift support for reflection, I couldn't get all of the information that is needed and what information I could get wasn't in a very convenient form. In some cases, I needed to parse a string to get the different parameter types. Even if I had the information, I didn't see a way to use the information to load the module and execute the function. My plans are to require the programmer to pass the location of modules and dependencies that need to be deployed to the cluster workers on application startup. Given the limitations of reflections in Swift, I don't see how this framework could be implemented. Since this needs to run on Linux, I want to avoid any solution that uses Objective C.

Robert Goodman

swift-evolution mailing list

Reply via email to