I think this is expected behavior, though not what I think is reasonable in
the long term. To my knowledge, this is how the v1 sources behave, and v2
just reuses the same mechanism to instantiate sources and uses a new
interface for v2 features.
I think that the right approach is to use catalogs,
I took a look for the codes.
val source = classOf[MyDataSource].getCanonicalName
spark.read.format(source).load().collect()
Looks indeed it calls twice.
First all: Looks it creates it first to read the schema for a logical plan