My CSV:

*name,checked-in,booking_cost*
AC,true,1200
BK,false,0
DDC,true,1200

I have done:

 val textFile=sc.textFile("/home/user/sampleCSV.txt")
 val schemaString="name,checked-in,booking_cost"
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.types.{StructType,StructField,StringType};

 val schema =
  StructType(
    schemaString.split(",").map(fieldName => StructField(fieldName,
StringType, true)));


* val rowRDD = textFile.map(_.split(",")).map(p =>
Row(p(0).trim.substring(1), p(1).trim,p(2)))*
 val dataFrame = sqlContext.createDataFrame(rowRDD, schema);
 dataFrame.show

+----+----------+------------+
|name|checked-in|booking_cost|
+----+----------+------------+
|   C|      true|        1200|
|   K|     false|           0|
|  DC|      true|        1200|
+----+----------+------------+

This will work if your column values are prefixed with '?' else you can do:

 val rowRDD = textFile.map(_.split(",")).map(p =>
Row(p(0).trim.replace('?',''), p(1).trim,p(2)))



On Fri, Feb 19, 2016 at 2:36 PM, Mich Talebzadeh <m...@peridale.co.uk>
wrote:

> Ok
>
>
>
> I have created a one liner csv file as follows:
>
>
>
> cat testme.csv
>
> 360,10/02/2014,"?2,500.00",?0.00,"?2,500.00"
>
>
>
> I use the following in Spark to split it
>
>
>
> csv=sc.textFile("/data/incoming/testme.csv")
>
> csv.map(_.split(",")).first
>
> res159: Array[String] = Array(360, 10/02/2014, "?2, 500.00", ?0.00, "?2,
> 500.00")
>
>
>
> That comes back with an array
>
>
>
> Now all I want is to get rid of “?” and “,” in above. The problem is I
> have a currency field “?2,500.00” that has got an additional “,” as well
> that messes up things
>
>
>
> replaceAll() does not work
>
>
>
> Any other alternatives?
>
>
>
> Thanks,
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>
> *From:* Andrew Ehrlich [mailto:and...@aehrlich.com]
> *Sent:* 19 February 2016 01:22
> *To:* Mich Talebzadeh <m...@peridale.co.uk>
> *Cc:* User <user@spark.apache.org>
> *Subject:* Re: Hive REGEXP_REPLACE use or equivalent in Spark
>
>
>
> Use the scala method .split(",") to split the string into a collection of
> strings, and try using .replaceAll() on the field with the "?" to remove it.
>
>
>
> On Thu, Feb 18, 2016 at 2:09 PM, Mich Talebzadeh <m...@peridale.co.uk>
> wrote:
>
> Hi,
>
> What is the equivalent of this Hive statement in Spark
>
>
>
> select "?2,500.00", REGEXP_REPLACE("?2,500.00",'[^\\d\\.]','');
> +------------+----------+--+
> |    _c0     |   _c1    |
> +------------+----------+--+
> | ?2,500.00  | 2500.00  |
> +------------+----------+--+
>
> Basically I want to get rid of "?" and "," in the csv file
>
>
>
> The full csv line is
>
>
>
> scala> csv2.first
> res94: String = 360,10/02/2014,"?2,500.00",?0.00,"?2,500.00"
>
> I want to transform that string into 5 columns and use "," as the split
>
> Thanks,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>
>
>

Reply via email to