Denny, I am not sure what exception you're observing but I've had luck with 2 things:
val table = sc.textFile("hdfs://....") You can try calling table.first here and you'll see the first line of the file. You can also do val debug = table.first.split("\t") which would give you an array and you can indeed verify that the array contains what you want in positions 167,119 and 200. In the case of large files with a random bad line I find wrapping the call within the map in try/catch very valuable -- you can dump out the whole line in the catch statement Lastly I would guess that you're getting a compile error and not a runtime error -- I believe c is an array of values so I think you want tabs.map(c => (c(167), c(110), c(200)) instead of tabs.map(c => (c._(167), c._(110), c._(200)) On Sun, Dec 14, 2014 at 3:12 PM, Denny Lee <denny.g....@gmail.com> wrote: > > Yes - that works great! Sorry for implying I couldn't. Was just more > flummoxed that I couldn't make the Scala call work on its own. Will > continue to debug ;-) > > On Sun, Dec 14, 2014 at 11:39 Michael Armbrust <mich...@databricks.com> > wrote: > >> BTW, I cannot use SparkSQL / case right now because my table has 200 >>> columns (and I'm on Scala 2.10.3) >>> >> >> You can still apply the schema programmatically: >> http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema >> >