Ravikumar Ramasamy created SPARK-23621:
------------------------------------------

             Summary: DataFrame.insertInto() is persisting all columns for 
mixed structured data-type
                 Key: SPARK-23621
                 URL: https://issues.apache.org/jira/browse/SPARK-23621
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Ravikumar Ramasamy


The  configuration data is stored in Cassandra which is unstructured data 
contains string columns and one json structure string column. In this case, 
DataFrame saveAsTable is persisting all the column values properly but 
insertInto function is not storing all the columns especially json data is 
truncated and sub-sequent column in not stored. 

To reproduce the issue, I stored the data into Hive table and reading from 
there.

 
{code:java}
CREATE TABLE zone_status (
siteid string, 
orgid string, 
groupid string, 
zoneid string, 
parkingtype string, 
capacity int, 
config string, 
ts bigint) 
STORED AS TEXTFILE;
{code}
{code:java}
val spark = SparkSession.builder().appName("Spark SQL Test").
  config("hive.exec.dynamic.partition", "true").
  config("hive.exec.dynamic.partition.mode", "nonstrict").
enableHiveSupport().getOrCreate()

val zoneStatus = spark.table("zone_status")

zoneStatus.select(col("siteid"),col("orgid"), col("parkinggroupid"), 
col("parkingzoneid"), col("parkingtype"), lit(0), col("config"), 
unix_timestamp().alias("ts")).
  write.mode(SaveMode.Overwrite).saveAsTable("dwh_zone_status_save")

{code}
Records in dwh_zone_status_save table
{noformat}
a8f11f90-20c9-11e8-b93e-2fc569d27605    efe5bdb3-baac-5d8e-6cae57771c13 Unknown 
E657F298-2D96-4C7D-8516-E228153FE010    NonDemarcated   0       
{"orgid":"efe5bdb3-baac-5d8e-6cae57771c13","nodeid":"N02c00056","parkingzoneid":"E657F298-2D96-4C7D-8516-E228153FE010","siteid":"a8f11f90-20c9-11e8-b93e-2fc569d27605","channel":1,"type":"NonDemarcatedParkingConfig","active":true,"tag":"","configured_date":"2017-10-23
 
23:29:11.20","roi":{"roiid":"7854D5F1-9ECD-4E02-8364-7BFB15C2A01C","name":"Parking_Area_1","image_bounding_box":[[{"x":0.5083333253860474,"y":0.25468748807907104},{"x":0.6277777552604675,"y":0.45781248807907104},{"x":0.855555534362793,"y":0.42656248807907104},{"x":0.7138888835906982,"y":0.17656250298023224}]],"world_bounding_box":[[{"latitude":41.88759132852836,"longitude":-87.62231239554004},{"latitude":41.887652271934634,"longitude":-87.62230098708424},{"latitude":41.88765219325104,"longitude":-87.62227158629935},{"latitude":41.88757153728604,"longitude":-87.62227165116063}]],"vs":[5.0,1.7999999523162842,1.5]}}
        1520453589{noformat}
 
{code:java}
zoneStatus.
  select(col("siteid"),col("orgid"), col("parkinggroupid"), 
col("parkingzoneid"), col("parkingtype"), lit(0), col("config"), 
unix_timestamp().alias("ts")).
  write.mode(SaveMode.Overwrite).insertInto("dwh_zone_status_insert")
{code}
Records in dwh_zone_status_insert  table is 
{noformat}
985feb70-18f4-11e8-9912-e9bbd4db7f62 efe5bdb3-baac-5d8e-6cae57771c13 Unknown 
04ABD29C-FA0F-4E4D-BFF2-4EC290DC29AE Demarcated 0 {"description":"" 
NULL{noformat}
 

 The json string column is not storing entire content and sub-sequent columns 
values also not stored in table.  The defined table is TEXT format only.

Our Environment is :

scala 2.11.8

Spark 2.2.0

Hive  

EMR

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to