Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3121#issuecomment-61935501
To be more concrete, I'm suggesting something like this:
```scala
object TestData {
/**
* Initialize TestData using the given SQLContext. This will re-create
all SchemaRDDs and tables
* using that context.
*/
def init(sqlContext: SQLContext) {
initMethods.foreach(m => m(sqlContext))
}
/** A sequence of functions that are invoked when `init()` is called */
private val initMethods = mutable.Buffer[SQLContext => Unit]()
/**
* Register a block of code to be called when TestData is initialized
with a new SQLContext.
*/
private def onInit(block: SQLContext => Unit) {
initMethods += block
}
def testData = _testData
private var _testData: SchemaRDD = null
onInit { sqlContext =>
_testData = sqlContext.sparkContext.parallelize(
(1 to 100).map(i => TestData(i, i.toString))).toSchemaRDD
testData.registerTempTable("testData")
}
case class LargeAndSmallInts(a: Int, b: Int)
def largeAndSmallInts = _largeAndSmallInts
private var _largeAndSmallInts: SchemaRDD = null
onInit { sqlContext =>
...
}
[...]
```
This whole `onInit` thing is a way to co-locate the fields, case classes,
and initialization code fragments. From clients' perspectives, the public
`val`s have become getter `def`s, but everything else stays the same.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]