[GitHub] spark pull request #18590: [SPARK-21365][PYTHON] Deduplicate logics parsing ...

cloud-fan Mon, 10 Jul 2017 22:17:51 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18590#discussion_r126597140
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -806,43 +786,43 @@ def _parse_datatype_string(s):
         >>> _parse_datatype_string("blabla") # doctest: 
+IGNORE_EXCEPTION_DETAIL
         Traceback (most recent call last):
             ...
    -    ValueError:...
    +    ParseException:...
         >>> _parse_datatype_string("a: int,") # doctest: 
+IGNORE_EXCEPTION_DETAIL
         Traceback (most recent call last):
             ...
    -    ValueError:...
    +    ParseException:...
         >>> _parse_datatype_string("array<int") # doctest: 
+IGNORE_EXCEPTION_DETAIL
         Traceback (most recent call last):
             ...
    -    ValueError:...
    +    ParseException:...
         >>> _parse_datatype_string("map<int, boolean>>") # doctest: 
+IGNORE_EXCEPTION_DETAIL
         Traceback (most recent call last):
             ...
    -    ValueError:...
    +    ParseException:...
         """
    -    s = s.strip()
    -    if s.startswith("array<"):
    -        if s[-1] != ">":
    -            raise ValueError("'>' should be the last char, but got: %s" % 
s)
    -        return ArrayType(_parse_datatype_string(s[6:-1]))
    -    elif s.startswith("map<"):
    -        if s[-1] != ">":
    -            raise ValueError("'>' should be the last char, but got: %s" % 
s)
    -        parts = _ignore_brackets_split(s[4:-1], ",")
    -        if len(parts) != 2:
    -            raise ValueError("The map type string format is: 
'map<key_type,value_type>', " +
    -                             "but got: %s" % s)
    -        kt = _parse_datatype_string(parts[0])
    -        vt = _parse_datatype_string(parts[1])
    -        return MapType(kt, vt)
    -    elif s.startswith("struct<"):
    -        if s[-1] != ">":
    -            raise ValueError("'>' should be the last char, but got: %s" % 
s)
    -        return _parse_struct_fields_string(s[7:-1])
    -    elif ":" in s:
    -        return _parse_struct_fields_string(s)
    -    else:
    -        return _parse_basic_datatype_string(s)
    +    sc = SparkContext._active_spark_context
    +
    +    def from_ddl_schema(type_str):
    +        return _parse_datatype_json_string(
    +            
sc._jvm.org.apache.spark.sql.types.StructType.fromDDL(type_str).json())
    +
    +    def from_ddl_datatype(type_str):
    +        return _parse_datatype_json_string(
    +            
sc._jvm.org.apache.spark.sql.api.python.PythonSQLUtils.parseDataType(type_str).json())
    +
    +    try:
    +        # DDL format, "fieldname datatype, fieldname datatype".
    +        return from_ddl_schema(s)
    +    except Exception as e:
    +        try:
    +            # For backwards compatibility, "integer", "struct<fieldname: 
datatype>" and etc.
    +            return from_ddl_datatype(s)
    +        except:
    +            try:
    +                # For backwards compatibility, "fieldname: datatype, 
fieldname: datatype" case.
    --- End diff --
    
    won't `fieldname: datatype, fieldname: datatype` be parsed as DDL schema?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18590: [SPARK-21365][PYTHON] Deduplicate logics parsing ...

Reply via email to