Looks like you’re using substring just to get rid of the ‘?’. Why not use replace for that as well? And then you wouldn’t run into issues with index out of bound.
val a = "?1,187.50" val b = "" println(a.substring(1).replace(",", "”)) —> 1187.50 println(a.replace("?", "").replace(",", "”)) —> 1187.50 println(b.replace("?", "").replace(",", "”)) —> No error / output since both ‘?' and ‘,' don’t exist. > On Feb 20, 2016, at 8:24 AM, Mich Talebzadeh <m...@peridale.co.uk> wrote: > > > I have a DF like below reading a csv file > > > val df = > HiveContext.read.format("com.databricks.spark.csv").option("inferSchema", > "true").option("header", "true").load("/data/stg/table2") > > val a = df.map(x => (x.getString(0), x.getString(1), > x.getString(2).substring(1).replace(",", > "").toDouble,x.getString(3).substring(1).replace(",", "").toDouble, > x.getString(4).substring(1).replace(",", "").toDouble)) > > > For most rows I am reading from csv file the above mapping works fine. > However, at the bottom of csv there are couple of empty columns as below > > [421,02/10/2015,?1,187.50,?237.50,?1,425.00] > [,,,,] > [Net income,,?182,531.25,?14,606.25,?197,137.50] > [,,,,] > [year 2014,,?113,500.00,?0.00,?113,500.00] > [Year 2015,,?69,031.25,?14,606.25,?83,637.50] > > However, I get > > a.collect.foreach(println) > 16/02/20 08:31:53 ERROR Executor: Exception in task 0.0 in stage 123.0 (TID > 161) > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > > I suspect the cause is substring operation say x.getString(2).substring(1) > on empty values that according to web will throw this type of error > > > The easiest solution seems to be to check whether x above is not null and do > the substring operation. Can this be done without using a UDF? > > Thanks > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this message > shall not be understood as given or endorsed by Peridale Technology Ltd, its > subsidiaries or their employees, unless expressly so stated. It is the > responsibility of the recipient to ensure that this email is virus free, > therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility.