[jira] [Commented] (SPARK-5722) Infer_schema_type incorrect for Integers in pyspark

2015-02-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325209#comment-14325209
 ] 

Apache Spark commented on SPARK-5722:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/4666

 Infer_schema_type incorrect for Integers in pyspark
 ---

 Key: SPARK-5722
 URL: https://issues.apache.org/jira/browse/SPARK-5722
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
Reporter: Don Drake

 The Integers datatype in Python does not match what a Scala/Java integer is 
 defined as.   This causes inference of data types and schemas to fail when 
 data is larger than 2^32 and it is inferred incorrectly as an Integer.
 Since the range of valid Python integers is wider than Java Integers, this 
 causes problems when inferring Integer vs. Long datatypes.  This will cause 
 problems when attempting to save SchemaRDD as Parquet or JSON.
 Here's an example:
 {code}
  sqlCtx = SQLContext(sc)
  from pyspark.sql import Row
  rdd = sc.parallelize([Row(f1='a', f2=100)])
  srdd = sqlCtx.inferSchema(rdd)
  srdd.schema()
 StructType(List(StructField(f1,StringType,true),StructField(f2,IntegerType,true)))
 {code}
 That number is a LongType in Java, but an Integer in python.  We need to 
 check the value to see if it should really by a LongType when a IntegerType 
 is initially inferred.
 More tests:
 {code}
  from pyspark.sql import _infer_type
 # OK
  print _infer_type(1)
 IntegerType
 # OK
  print _infer_type(2**31-1)
 IntegerType
 #WRONG
  print _infer_type(2**31)
 #WRONG
 IntegerType
  print _infer_type(2**61 )
 #OK
 IntegerType
  print _infer_type(2**71 )
 LongType
 {code}
 Java Primitive Types defined:
 http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
 Python Built-in Types:
 https://docs.python.org/2/library/stdtypes.html#typesnumeric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5722) Infer_schema_type incorrect for Integers in pyspark

2015-02-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323641#comment-14323641
 ] 

Apache Spark commented on SPARK-5722:
-

User 'dondrake' has created a pull request for this issue:
https://github.com/apache/spark/pull/4641

 Infer_schema_type incorrect for Integers in pyspark
 ---

 Key: SPARK-5722
 URL: https://issues.apache.org/jira/browse/SPARK-5722
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
Reporter: Don Drake

 The Integers datatype in Python does not match what a Scala/Java integer is 
 defined as.   This causes inference of data types and schemas to fail when 
 data is larger than 2^32 and it is inferred incorrectly as an Integer.
 Since the range of valid Python integers is wider than Java Integers, this 
 causes problems when inferring Integer vs. Long datatypes.  This will cause 
 problems when attempting to save SchemaRDD as Parquet or JSON.
 Here's an example:
 {code}
  sqlCtx = SQLContext(sc)
  from pyspark.sql import Row
  rdd = sc.parallelize([Row(f1='a', f2=100)])
  srdd = sqlCtx.inferSchema(rdd)
  srdd.schema()
 StructType(List(StructField(f1,StringType,true),StructField(f2,IntegerType,true)))
 {code}
 That number is a LongType in Java, but an Integer in python.  We need to 
 check the value to see if it should really by a LongType when a IntegerType 
 is initially inferred.
 More tests:
 {code}
  from pyspark.sql import _infer_type
 # OK
  print _infer_type(1)
 IntegerType
 # OK
  print _infer_type(2**31-1)
 IntegerType
 #WRONG
  print _infer_type(2**31)
 #WRONG
 IntegerType
  print _infer_type(2**61 )
 #OK
 IntegerType
  print _infer_type(2**71 )
 LongType
 {code}
 Java Primitive Types defined:
 http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
 Python Built-in Types:
 https://docs.python.org/2/library/stdtypes.html#typesnumeric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5722) Infer_schema_type incorrect for Integers in pyspark

2015-02-11 Thread Don Drake (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316940#comment-14316940
 ] 

Don Drake commented on SPARK-5722:
--

Hi, I've submitted 2 pull requests for branch-1.2 and branch-1.3.

Please approve.

 Infer_schema_type incorrect for Integers in pyspark
 ---

 Key: SPARK-5722
 URL: https://issues.apache.org/jira/browse/SPARK-5722
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
Reporter: Don Drake

 The Integers datatype in Python does not match what a Scala/Java integer is 
 defined as.   This causes inference of data types and schemas to fail when 
 data is larger than 2^32 and it is inferred incorrectly as an Integer.
 Since the range of valid Python integers is wider than Java Integers, this 
 causes problems when inferring Integer vs. Long datatypes.  This will cause 
 problems when attempting to save SchemaRDD as Parquet or JSON.
 Here's an example:
 {code}
  sqlCtx = SQLContext(sc)
  from pyspark.sql import Row
  rdd = sc.parallelize([Row(f1='a', f2=100)])
  srdd = sqlCtx.inferSchema(rdd)
  srdd.schema()
 StructType(List(StructField(f1,StringType,true),StructField(f2,IntegerType,true)))
 {code}
 That number is a LongType in Java, but an Integer in python.  We need to 
 check the value to see if it should really by a LongType when a IntegerType 
 is initially inferred.
 More tests:
 {code}
  from pyspark.sql import _infer_type
 # OK
  print _infer_type(1)
 IntegerType
 # OK
  print _infer_type(2**31-1)
 IntegerType
 #WRONG
  print _infer_type(2**31)
 #WRONG
 IntegerType
  print _infer_type(2**61 )
 #OK
 IntegerType
  print _infer_type(2**71 )
 LongType
 {code}
 Java Primitive Types defined:
 http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
 Python Built-in Types:
 https://docs.python.org/2/library/stdtypes.html#typesnumeric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5722) Infer_schema_type incorrect for Integers in pyspark

2015-02-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316936#comment-14316936
 ] 

Apache Spark commented on SPARK-5722:
-

User 'dondrake' has created a pull request for this issue:
https://github.com/apache/spark/pull/4538

 Infer_schema_type incorrect for Integers in pyspark
 ---

 Key: SPARK-5722
 URL: https://issues.apache.org/jira/browse/SPARK-5722
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
Reporter: Don Drake

 The Integers datatype in Python does not match what a Scala/Java integer is 
 defined as.   This causes inference of data types and schemas to fail when 
 data is larger than 2^32 and it is inferred incorrectly as an Integer.
 Since the range of valid Python integers is wider than Java Integers, this 
 causes problems when inferring Integer vs. Long datatypes.  This will cause 
 problems when attempting to save SchemaRDD as Parquet or JSON.
 Here's an example:
 {code}
  sqlCtx = SQLContext(sc)
  from pyspark.sql import Row
  rdd = sc.parallelize([Row(f1='a', f2=100)])
  srdd = sqlCtx.inferSchema(rdd)
  srdd.schema()
 StructType(List(StructField(f1,StringType,true),StructField(f2,IntegerType,true)))
 {code}
 That number is a LongType in Java, but an Integer in python.  We need to 
 check the value to see if it should really by a LongType when a IntegerType 
 is initially inferred.
 More tests:
 {code}
  from pyspark.sql import _infer_type
 # OK
  print _infer_type(1)
 IntegerType
 # OK
  print _infer_type(2**31-1)
 IntegerType
 #WRONG
  print _infer_type(2**31)
 #WRONG
 IntegerType
  print _infer_type(2**61 )
 #OK
 IntegerType
  print _infer_type(2**71 )
 LongType
 {code}
 Java Primitive Types defined:
 http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
 Python Built-in Types:
 https://docs.python.org/2/library/stdtypes.html#typesnumeric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5722) Infer_schema_type incorrect for Integers in pyspark

2015-02-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315569#comment-14315569
 ] 

Apache Spark commented on SPARK-5722:
-

User 'dondrake' has created a pull request for this issue:
https://github.com/apache/spark/pull/4521

 Infer_schema_type incorrect for Integers in pyspark
 ---

 Key: SPARK-5722
 URL: https://issues.apache.org/jira/browse/SPARK-5722
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
Reporter: Don Drake

 The Integers datatype in Python does not match what a Scala/Java integer is 
 defined as.   This causes inference of data types and schemas to fail when 
 data is larger than 2^32 and it is inferred incorrectly as an Integer.
 Since the range of valid Python integers is wider than Java Integers, this 
 causes problems when inferring Integer vs. Long datatypes.  This will cause 
 problems when attempting to save SchemaRDD as Parquet or JSON.
 Here's an example:
 {code}
  sqlCtx = SQLContext(sc)
  from pyspark.sql import Row
  rdd = sc.parallelize([Row(f1='a', f2=100)])
  srdd = sqlCtx.inferSchema(rdd)
  srdd.schema()
 StructType(List(StructField(f1,StringType,true),StructField(f2,IntegerType,true)))
 {code}
 That number is a LongType in Java, but an Integer in python.  We need to 
 check the value to see if it should really by a LongType when a IntegerType 
 is initially inferred.
 More tests:
 {code}
  from pyspark.sql import _infer_type
 # OK
  print _infer_type(1)
 IntegerType
 # OK
  print _infer_type(2**31-1)
 IntegerType
 #WRONG
  print _infer_type(2**31)
 #WRONG
 IntegerType
  print _infer_type(2**61 )
 #OK
 IntegerType
  print _infer_type(2**71 )
 LongType
 {code}
 Java Primitive Types defined:
 http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
 Python Built-in Types:
 https://docs.python.org/2/library/stdtypes.html#typesnumeric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org