Hi Mich,
Try to use a regexp to parse your string instead of the split.
Thanks, Alex.
On Thu, Feb 18, 2016 at 6:35 PM, Mich Talebzadeh <
mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>
>
> thanks,
>
>
>
> I have an issue here.
>
> define rdd to read the CSV file
>
> scala> var csv = sc.
> thanks,
>
> I have an issue here.
>
> define rdd to read the CSV file
>
> scala> var csv = sc.textFile("/data/stg/table2")
> csv: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[69] at textFile at
> :27
>
> I then get rid of the header
>
> scala> val csv2 = csv.mapPartitionsWith
Hi Mich,
You can use data frames (
http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes)
to achieve that.
val sqlContext = new HiveContext(sc)
var rdd = sc.textFile("/data/stg/table2")
//...
//perform you business logic, cleanups, etc.
//...
sqlContext.createDataFrame(resu
Hi,
We put csv files that are zipped using bzip into a staging are on hdfs
In Hive an external table is created as below:
DROP TABLE IF EXISTS stg_t2;
CREATE EXTERNAL TABLE stg_t2 (
INVOICENUMBER string
,PAYMENTDATE string
,NET string
,VAT string
,TOTAL string
)
COMMENT 'from csv file fro