Hi All, In the above scenario if the field delimiter is default of hive then Spark is able to parse the data as expected , hence i believe this is a bug.
Regards, Shiva Achari On Tue, Apr 5, 2016 at 8:15 PM, Shiva Achari <[email protected]> wrote: > Hi, > > I have created a hive external table stored as textfile partitioned by > event_date Date. > > How do we have to specify a specific format of csv while reading in spark > from Hive table ? > > The environment is > > 1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version 2.10.4(Java > HotSpot(TM) 64 - Bit Server VM, Java 1.7.0_67) > 2. Hive 1.1, CDH 5.5.1 > > scala script > > sqlContext.setConf("hive.exec.dynamic.partition", "true") > sqlContext.setConf("hive.exec.dynamic.partition.mode", > "nonstrict") > > val distData = sc.parallelize(Array((1, 1, 1), (2, 2, 2), (3, 3, > 3))).toDF > val distData_1 = distData.withColumn("event_date", current_date()) > distData_1: org.apache.spark.sql.DataFrame = [_1: int, _2: int, > _3: int, event_date: date] > > scala > distData_1.show > + ---+---+---+----------+ > |_1 |_2 |_3 | event_date | > | 1 | 1 | 1 | 2016-03-25 | > | 2 | 2 | 2 | 2016-03-25 | > | 3 | 3 | 3 | 2016-03-25 | > > > distData_1.write.mode("append").partitionBy("event_date").saveAsTable("part_table") > > > scala > sqlContext.sql("select * from part_table").show > | a | b | c | event_date | > |1,1,1 | null | null | 2016-03-25 | > |2,2,2 | null | null | 2016-03-25 | > |3,3,3 | null | null | 2016-03-25 | > > > > Hive table > > create external table part_table (a String, b int, c bigint) > partitioned by (event_date Date) > row format delimited fields terminated by ',' > stored as textfile LOCATION "/user/hdfs/hive/part_table"; > > select * from part_table shows > |part_table.a | part_table.b | part_table.c | > part_table.event_date | > |1 |1 |1 > |2016-03-25 > |2 |2 |2 > |2016-03-25 > |3 |3 |3 > |2016-03-25 > > > Looking at the hdfs > > > The path has 2 part files > /user/hdfs/hive/part_table/event_date=2016-03-25 > part-00000 > part-00001 > > part-00000 content > 1,1,1 > part-00001 content > 2,2,2 > 3,3,3 > > > P.S. if we store the table as orc it writes and reads the data as > expected. > >
