[GitHub] spark issue #13779: Replace NullType to StringType in a DataFrame Schema, th...

RaghavendraS Mon, 20 Jun 2016 13:39:01 -0700

Github user RaghavendraS commented on the issue:

    https://github.com/apache/spark/pull/13779
  
    Thanks @AmplabJenkins , @hvanhovell @marmbrus @akatz  
    
    **In my case:** When we are fetching incremental data from Mongo DB and 
storing it into parquet file, then we are getting NullType Error. Because in 
parquet there is no NullType data type. So I come up with below solutions.
    
    **Case-1: If we convert NullType to StringType.**
    This helps us for doing union of last n days incremental parquet data, 
without getting any error. We only need to compare schema from bottom to top 
and make data type changes accordingly, apply schema to data frames and make a 
union from bottom to top. 
    
    **Case-2: If we drop NullType field.**
    In this case we need to transform each RDD according to final schema. In 
Case-1 we are only transforming schema but not transforming RDD, so Case-1 is 
better than Case-2.
    
     @hvanhovell  Please let me know if you know any other solution.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #13779: Replace NullType to StringType in a DataFrame Schema, th...

Reply via email to