Hi,
I have a program that loads a single avro file using spark SQL, queries it,
transforms it and then outputs the data. The file is loaded with:
val records = sqlContext.avroFile(filePath)
val data = records.registerTempTable("data")
...
Now I want to run it over tens of thousands of Avro file
like this:
person.map(r => (r.getInt(2), r)).take(4).collect()
Is there any way to be able to specify the column name ("user_id") instead
of needing to know/calculate the offset somehow?
Thanks again
On Fri, Nov 21, 2014 at 11:48 AM, thomas j wrote:
> Thanks for the pointer Mich
Thanks for the pointer Michael.
I've downloaded spark 1.2.0 from
https://people.apache.org/~pwendell/spark-1.2.0-snapshot1/ and clone and
built the spark-avro repo you linked to.
When I run it against the example avro file linked to in the documentation
it works. However, when I try to load my av