Ok
I have created a one liner csv file as follows:
cat testme.csv
360,10/02/2014,"?2,500.00",?0.00,"?2,500.00"
I use the following in Spark to split it
csv=sc.textFile("/data/incoming/testme.csv")
csv.map(_.split(",")).first
res159: Array[String] = Array(360, 10/02/2014, "?2, 500.00", ?0.00, "?2,
500.00")
That comes back with an array
Now all I want is to get rid of “?” and “,” in above. The problem is I have a
currency field “?2,500.00” that has got an additional “,” as well that messes
up things
replaceAll() does not work
Any other alternatives?
Thanks,
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this message
shall not be understood as given or endorsed by Peridale Technology Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees
accept any responsibility.
From: Andrew Ehrlich [mailto:[email protected]]
Sent: 19 February 2016 01:22
To: Mich Talebzadeh <[email protected]>
Cc: User <[email protected]>
Subject: Re: Hive REGEXP_REPLACE use or equivalent in Spark
Use the scala method .split(",") to split the string into a collection of
strings, and try using .replaceAll() on the field with the "?" to remove it.
On Thu, Feb 18, 2016 at 2:09 PM, Mich Talebzadeh <[email protected]
<mailto:[email protected]> > wrote:
Hi,
What is the equivalent of this Hive statement in Spark
select "?2,500.00", REGEXP_REPLACE("?2,500.00",'[^\\d\\.]','');
+------------+----------+--+
| _c0 | _c1 |
+------------+----------+--+
| ?2,500.00 | 2500.00 |
+------------+----------+--+
Basically I want to get rid of "?" and "," in the csv file
The full csv line is
scala> csv2.first
res94: String = 360,10/02/2014,"?2,500.00",?0.00,"?2,500.00"
I want to transform that string into 5 columns and use "," as the split
Thanks,
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this message
shall not be understood as given or endorsed by Peridale Technology Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees
accept any responsibility.