I have a DF like below reading a csv file

 

 

val df =
HiveContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header", "true").load("/data/stg/table2")

 

val a = df.map(x => (x.getString(0), x.getString(1),
x.getString(2).substring(1).replace(",",
"").toDouble,x.getString(3).substring(1).replace(",", "").toDouble,
x.getString(4).substring(1).replace(",", "").toDouble))

 

 

For most rows I am reading from csv file the above mapping works fine.
However, at the bottom of csv there are couple of empty columns as below

 

[421,02/10/2015,?1,187.50,?237.50,?1,425.00]

[,,,,]

[Net income,,?182,531.25,?14,606.25,?197,137.50]

[,,,,]

[year 2014,,?113,500.00,?0.00,?113,500.00]

[Year 2015,,?69,031.25,?14,606.25,?83,637.50]

 

However, I get 

 

a.collect.foreach(println)

16/02/20 08:31:53 ERROR Executor: Exception in task 0.0 in stage 123.0 (TID
161)

java.lang.StringIndexOutOfBoundsException: String index out of range: -1

 

I suspect the cause is substring operation  say x.getString(2).substring(1)
on empty values that according to web will throw this type of error

 

 

The easiest solution seems to be to check whether x above is not null and do
the substring operation. Can this be done without using a UDF?

 

Thanks

 

Dr Mich Talebzadeh

 

LinkedIn
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.

 

 

Reply via email to