subject:"DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state"

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-19 Thread Ryan Blue

I think this is expected behavior, though not what I think is reasonable in the long term. To my knowledge, this is how the v1 sources behave, and v2 just reuses the same mechanism to instantiate sources and uses a new interface for v2 features. I think that the right approach is to use catalogs,

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-09 Thread Hyukjin Kwon

I took a look for the codes. val source = classOf[MyDataSource].getCanonicalName spark.read.format(source).load().collect() Looks indeed it calls twice. First all: Looks it creates it first to read the schema for a logical plan