How to skip nonexistent file when read files with spark?

JF Chen Sun, 20 May 2018 20:31:27 -0700

Hi Everyone
I meet a tricky problem recently. I am trying to read some file paths
generated by other method. The file paths are represented by wild card in
list, like [ '/data/*/12', '/data/*/13']
But in practice, if the wildcard cannot match any existed path, it will
throw an exception:"pyspark.sql.utils.AnalysisException: 'Path does not
exist: ...'", and the program stops after that.
Actually I want spark can just ignore and skip these nonexistent  file
path, and continues to run. I have tried python HDFSCli api to check the
existence of path , but hdfs cli cannot support wildcard.


Any good idea to solve my problem? Thanks~

Regard,
Junfeng Chen

How to skip nonexistent file when read files with spark?

Reply via email to