
I am using spark 1.4 and HiveContext to append data into a partitioned hive table. I found that the data insert into the table is correct, but the partition(folder) created is totally wrong.
Below is my code snippet>>

val schemaString = "zone z year month date hh x y height u v w ph phb t p pb qvapor qgraup qnice qnrain tke_pbl el_pbl"
    val schema =
        schemaString.split(" ").map(fieldName =>
if (fieldName.equals("zone") || fieldName.equals("z") || fieldName.equals("year") || fieldName.equals("month") || fieldName.equals("date") || fieldName.equals("hh") || fieldName.equals("x") || fieldName.equals("y"))
            StructField(fieldName, IntegerType, true)
            StructField(fieldName, FloatType, true)

val pairVarRDD =

val partitionedTestDF2 = sqlContext.createDataFrame(pairVarRDD, schema)



The table contains 23 columns (longer than Tuple maximum length), so I use Row Object to store raw data, not Tuple.
Here is some message from spark when it saved data>>

15/06/16 10:39:22 INFO metadata.Hive: Renaming src:hdfs://service-10-0.local:8020/tmp/hive-patcharee/hive_2015-06-16_10-39-21_205_8768669104487548472-1/-ext-10000/zone=13195/z=0/year=0/month=0/part-00001;dest: hdfs://service-10-0.local:8020/apps/hive/warehouse/test4dimBySpark/zone=13195/z=0/year=0/month=0/part-00001;Status:true 15/06/16 10:39:22 INFO metadata.Hive: New loading path = hdfs://service-10-0.local:8020/tmp/hive-patcharee/hive_2015-06-16_10-39-21_205_8768669104487548472-1/-ext-10000/zone=13195/z=0/year=0/month=0 with partSpec {zone=13195, z=0, year=0, month=0}

From the raw data (pairVarRDD) zone = 2, z = 42, year = 2009, month = 3. But spark created a partition {zone=13195, z=0, year=0, month=0}.

When I queried from hive>>

hive> select * from test4dimBySpark;
2 42 2009 3 1.0 0.0 218.0 365.0 9989.497 29.627113 19.071793 0.11982734 -3174.6812 97735.2 16.389032 -96.62891 25135.365 2.6476808E-5 0.0 13195 0 0 0
hive> select zone, z, year, month from test4dimBySpark;
13195    0    0    0
hive> dfs -ls /apps/hive/warehouse/test4dimBySpark/*/*/*/*;
Found 2 items
-rw-r--r-- 3 patcharee hdfs 1411 2015-06-16 10:39 /apps/hive/warehouse/test4dimBySpark/zone=13195/z=0/year=0/month=0/part-00001

The data stored in the table is correct zone = 2, z = 42, year = 2009, month = 3, but the partition created was wrong "zone=13195/z=0/year=0/month=0"

Is this a bug or what could be wrong? Any suggestion is appreciated.


To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to