Hi All,

In the above scenario if the field delimiter is default of hive then Spark
is able to parse the data as expected , hence i believe this is a bug.

​Regards,
Shiva Achari​


On Tue, Apr 5, 2016 at 8:15 PM, Shiva Achari <[email protected]> wrote:

> Hi,
>
> I have created a hive external table stored as textfile partitioned by
> event_date Date.
>
> How do we have to specify a specific format of csv while reading in spark
> from Hive table ?
>
> The environment is
>
>      1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version 2.10.4(Java
> HotSpot(TM) 64 - Bit Server VM, Java 1.7.0_67)
>      2. Hive 1.1, CDH 5.5.1
>
> scala script
>
>         sqlContext.setConf("hive.exec.dynamic.partition", "true")
>         sqlContext.setConf("hive.exec.dynamic.partition.mode",
> "nonstrict")
>
>         val distData = sc.parallelize(Array((1, 1, 1), (2, 2, 2), (3, 3,
> 3))).toDF
>         val distData_1 = distData.withColumn("event_date", current_date())
>         distData_1: org.apache.spark.sql.DataFrame = [_1: int, _2: int,
> _3: int, event_date: date]
>
>         scala > distData_1.show
>         + ---+---+---+----------+
>         |_1 |_2 |_3 | event_date |
>         | 1 | 1 | 1 | 2016-03-25 |
>         | 2 | 2 | 2 | 2016-03-25 |
>         | 3 | 3 | 3 | 2016-03-25 |
>
>
> distData_1.write.mode("append").partitionBy("event_date").saveAsTable("part_table")
>
>
>         scala > sqlContext.sql("select * from part_table").show
>         | a    | b    | c    | event_date |
>         |1,1,1 | null | null | 2016-03-25 |
>         |2,2,2 | null | null | 2016-03-25 |
>         |3,3,3 | null | null | 2016-03-25 |
>
>
>
> Hive table
>
>         create external table part_table (a String, b int, c bigint)
>         partitioned by (event_date Date)
>         row format delimited fields terminated by ','
>         stored as textfile  LOCATION "/user/hdfs/hive/part_table";
>
>         select * from part_table shows
>         |part_table.a | part_table.b | part_table.c |
> part_table.event_date |
>         |1                 |1                 |1
>  |2016-03-25
>         |2                 |2                 |2
>  |2016-03-25
>         |3                 |3                 |3
>  |2016-03-25
>
>
> Looking at the hdfs
>
>
>         The path has 2 part files
> /user/hdfs/hive/part_table/event_date=2016-03-25
>         part-00000
>         part-00001
>
>           part-00000 content
>             1,1,1
>           part-00001 content
>             2,2,2
>             3,3,3
>
>
> P.S. if we store the table as orc it writes and reads the data as
> expected.
>
>

Reply via email to