I had once worked on a named row feature but haven’t got time to finish
it. It looks like this:
|sql("...").named.map { row:NamedRow =>
row[Int]('key) -> row[String]('value)
}
|
Basically the |named| method generates a field name to ordinal map for
each RDD partition. This map is then shared shared by all |NamedRow|
instances within a partition. Not exactly what you want, but might be
helpful.
Cheng
On 1/20/15 3:39 AM, Night Wolf wrote:
In Spark SQL we have|Row|objects which contain a list of fields that
make up a row. A|Row|has ordinal accessors such
as|.getInt(0)|or|getString(2)|.
Say ordinal 0 = ID and ordinal 1 = Name. It becomes hard to remember
what ordinal is what, making the code confusing.
Say for example I have the following code
|def doStuff(row: Row) = {
//extract some items from the row into a tuple;
(row.getInt(0), row.getString(1)) //tuple of ID, Name
}|
The question becomes how could I create aliases for these fields in a
Row object?
I was thinking I could create methods which take a implicit Row object;
|def id(implicit row: Row) = row.getInt(0)
def name(implicit row: Row) = row.getString(1)|
I could then rewrite the above as;
|def doStuff(implicit row: Row) = {
//extract some items from the row into a tuple;
(id, name) //tuple of ID, Name
}|
Is there a better/neater approach?
Cheers,
~NW