[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-19 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-19 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
We just change the dirty hacking to something like:
```
if python2.7 and not pypy:
 do the dirty hacking
 fool the unit test
```

_Sent from my OnePlus ONEPLUS A3000 using 
[FastHub](https://play.google.com/store/apps/details?id=com.fastaccess.github)_


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-19 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
@ueshin Maybe we don't have to do the same thing. If this is the problem, 
then I think in pypy environment, 'L' was originally unsupported and there is 
no need to support it now.

_Sent from my OnePlus ONEPLUS A3000 using 
[FastHub](https://play.google.com/store/apps/details?id=com.fastaccess.github)_


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-19 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
Hmm, how was 'L' supported in master python 2? Why there was no such an 
error there?

_Sent from my OnePlus ONEPLUS A3000 using 
[FastHub](https://play.google.com/store/apps/details?id=com.fastaccess.github)_


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-19 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
Looks to be an unrelated error.

_Sent from my OnePlus ONEPLUS A3000 using 
[FastHub](https://play.google.com/store/apps/details?id=com.fastaccess.github)_


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
@HyukjinKwon Take a look at my newest commit. I think I find a better way 
to solve the problem that keeps all the hacking code for  `SPARK-21465` in a 
single place, making it easier to be removed in the future. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128142511
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

https://issues.apache.org/jira/browse/SPARK-21465


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128141018
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

How about doing the following?
1. Make changes to tests as you suggest, make sure we clearly document why 
for python2 'L' is an exception as comments.
3. Open a new JIRA, explaining the issue about 'L' support.
4. Finish this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128141384
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

OK. Let me do this. If it is possible, I would like to finish this PR 
before I travel.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128138629
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

If we fall back to `_infer_type`, then there should be some dirty changes 
in test cases to make it pass: Consider the following question:
1. Should we add 'L' as exception for python2 in unsupported types tests, 
or do we just completely remove unsupported tests?
2. Should we test 'L' for python 2?

I really like how the tests now are organized and these changes above will 
makes the test very messy. My opinion is, we are not changing the status of 'L' 
from "supported" to "unsupported", but from "undefined support status" to 
"unsupported". If changing from "undefined support status" to "unsupported" 
sounds bad, instead of making these changes to the test cases, I would rather 
to solve this problem now and keep the test cases clean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128132778
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

@HyukjinKwon what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128132584
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

Actually I don't think 'L' should be supported because there is no type in 
Scala that can hold a 64bit unsigned integer...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128132120
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

@HyukjinKwon So, the conclusion is, we do support everything we supported 
before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128131993
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

Test result for python 2 is here:
```python
Python 2.7.13 (default, Jul  2 2017, 22:24:59) 
[GCC 7.1.1 20170621] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/07/18 20:59:26 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
17/07/18 20:59:26 WARN Utils: Your hostname, archlinux resolves to a 
loopback address: 127.0.0.1; using 192.168.88.2 instead (on interface eno1)
17/07/18 20:59:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
another address
17/07/18 20:59:29 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.0
  /_/

Using Python version 2.7.13 (default, Jul  2 2017 22:24:59)
SparkSession available as 'spark'.
>>> import array,sys
>>> from pyspark import *
>>> from pyspark.sql import *
>>> 
>>> def assertCollectSuccess(typecode, value):
...  row = Row(myarray=array.array(typecode, [value]))
...  df = spark.createDataFrame([row])
...  print(typecode)
...  df.show()
... 
>>> assertCollectSuccess('u',u'a')
u
+---+
|myarray|
+---+
|[a]|
+---+

>>> assertCollectSuccess('f',1.2)
f
+---+
|myarray|
+---+
| [null]|
+---+

>>> assertCollectSuccess('d',1.2)
d
+---+
|myarray|
+---+
|  [1.2]|
+---+

>>> assertCollectSuccess('b',1)
b
+---+
|myarray|
+---+
| [null]|
+---+

>>> assertCollectSuccess('B',1)
B
+---+
|myarray|
+---+
| [null]|
+---+

>>> assertCollectSuccess('h',1)
h
+---+
|myarray|
+---+
| [null]|
+---+

>>> assertCollectSuccess('H',1)
H
+---+
|myarray|
+---+
|[1]|
+---+

>>> assertCollectSuccess('i',1)
i
+---+
|myarray|
+---+
|[1]|
+---+

>>> assertCollectSuccess('I',1)
I
+---+
|myarray|
+---+
|[1]|
+---+

>>> assertCollectSuccess('l',1)
l
+---+
|myarray|
+---+
|[1]|
+---+

>>> assertCollectSuccess('L',1)
L
+---+
|myarray|
+---+
|[1]|
+---+

>>> 
>>> 
>>> if sys.version_info[0] > 2:
... assertCollectSuccess('q',1)
... assertCollectSuccess('Q',1)
... 
>>> 
>>> if sys.version_info[0] < 3:
... assertCollectSuccess('c','a')
... 
c
+---+
|myarray|
+---+
|[a]|
+---+

>>>
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128131853
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

Test result of python3 & spark 2.2.0 is here. I will check on python 2 in a 
few minutes. I paste the whole command line output, it's a bit lengthy. 
@HyukjinKwon could you please double check if my test is complete?

```python
>>> import array,sys
>>> from pyspark import *
>>> from pyspark.sql import *
>>> 
>>> def assertCollectSuccess(typecode, value):
...  row = Row(myarray=array.array(typecode, [value]))
...  df = spark.createDataFrame([row])
...  print(typecode)
...  df.show()
... 
>>> assertCollectSuccess('u',u'a')
u
+---+
|myarray|
+---+
|[a]|
+---+

>>> assertCollectSuccess('f',1.2)
f
+---+
|myarray|
+---+
| [null]|
+---+

>>> assertCollectSuccess('d',1.2)
d
+---+
|myarray|
+---+
|  [1.2]|
+---+

>>> assertCollectSuccess('b',1)
b
+---+
|myarray|
+---+
| [null]|
+---+

>>> assertCollectSuccess('B',1)
B
+---+
|myarray|
+---+
| [null]|
+---+

>>> assertCollectSuccess('h',1)
h
+---+
|myarray|
+---+
| [null]|
+---+

>>> assertCollectSuccess('H',1)
H
+---+
|myarray|
+---+
|[1]|
+---+

>>> assertCollectSuccess('i',1)
i
+---+
|myarray|
+---+
|[1]|
+---+

>>> assertCollectSuccess('I',1)
I
+---+
|myarray|
+---+
|[1]|
+---+

>>> assertCollectSuccess('l',1)
l
+---+
|myarray|
+---+
|[1]|
+---+

>>> assertCollectSuccess('L',1)
L
17/07/18 20:55:55 ERROR Executor: Exception in task 14.0 in stage 119.0 
(TID 799)
net.razorvine.pickle.PickleException: unsupported datatype: 64-bits 
unsigned long
at 
net.razorvine.pickle.objects.ArrayConstructor.constructLongArrayFromUInt64(ArrayConstructor.java:302)
at 
net.razorvine.pickle.objects.ArrayConstructor.construct(ArrayConstructor.java:240)
at 
net.razorvine.pickle.objects.ArrayConstructor.construct(ArrayConstructor.java:26)
at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:152)
at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:151)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.s

[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128130893
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala 
---
@@ -55,13 +55,12 @@ private[spark] object SerDeUtil extends Logging {
 //{'d', sizeof(double), d_getitem, d_setitem},
 //{'\0', 0, 0, 0} /* Sentinel */
 //  };
-// TODO: support Py_UNICODE with 2 bytes
 val machineCodes: Map[Char, Int] = if 
(ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN)) {
-  Map('c' -> 1, 'B' -> 0, 'b' -> 1, 'H' -> 3, 'h' -> 5, 'I' -> 7, 'i' 
-> 9,
+  Map('B' -> 0, 'b' -> 1, 'H' -> 3, 'h' -> 5, 'I' -> 7, 'i' -> 9,
 'L' -> 11, 'l' -> 13, 'f' -> 15, 'd' -> 17, 'u' -> 21
   )
 } else {
-  Map('c' -> 1, 'B' -> 0, 'b' -> 1, 'H' -> 2, 'h' -> 4, 'I' -> 6, 'i' 
-> 8,
+  Map('B' -> 0, 'b' -> 1, 'H' -> 2, 'h' -> 4, 'I' -> 6, 'i' -> 8,
--- End diff --

Yes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
By the way, I'm traveling tomorrow and will be back next Tuesday. During 
traveling, I may not be able to answer any comments, questions, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128129716
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

Yes, this worth a double check. Let me do some test on old spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128129280
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

If we fall back to `_infer_type` here, then `array('q')` will be inferred 
as something like array of Long, and then the user will see an error from 
`net.razorvine.pickle` saying that bad array typecode `q`, due to 
`SPARK-21420`, which seems to me more confusing than explicitly fails here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128128561
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala 
---
@@ -55,13 +55,12 @@ private[spark] object SerDeUtil extends Logging {
 //{'d', sizeof(double), d_getitem, d_setitem},
 //{'\0', 0, 0, 0} /* Sentinel */
 //  };
-// TODO: support Py_UNICODE with 2 bytes
 val machineCodes: Map[Char, Int] = if 
(ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN)) {
-  Map('c' -> 1, 'B' -> 0, 'b' -> 1, 'H' -> 3, 'h' -> 5, 'I' -> 7, 'i' 
-> 9,
+  Map('B' -> 0, 'b' -> 1, 'H' -> 3, 'h' -> 5, 'I' -> 7, 'i' -> 9,
 'L' -> 11, 'l' -> 13, 'f' -> 15, 'd' -> 17, 'u' -> 21
   )
 } else {
-  Map('c' -> 1, 'B' -> 0, 'b' -> 1, 'H' -> 2, 'h' -> 4, 'I' -> 6, 'i' 
-> 8,
+  Map('B' -> 0, 'b' -> 1, 'H' -> 2, 'h' -> 4, 'I' -> 6, 'i' -> 8,
--- End diff --

`c` was supported by spark 2.2.0:
```python
>>> row = Row(myarray=array('c', ["a"]))
>>> df = spark.createDataFrame([row])
>>> df.first()["myarray"][0]
u'a'
```

This support I think was because array was infered the same way as list. 
But after we make changes on the type infer of array, we have to change this 
accordingly to bring it back to work. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r128127304
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1016,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), False)
+else:
+raise TypeError("not supported type: array(%s)" % obj.typecode)
--- End diff --

Is there any reason to do so? I don't think array with unsupported typecode 
will be correctly serialized or deserialized. In this case, it would be better 
to raise an TypeError and let the user to pick another type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-18 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
@HyukjinKwon all done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-17 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127814477
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1023,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
--- End diff --

I don't think `i in my_dict` and `i in my_dict.keys()` has much difference 
in performance. A simple test
```python
from random import random,shuffle
from time import time

N = 1000
d = {}
for i in range(N):
d[i] = random()

tests = list(range(2*N))
shuffle(tests)

time1 = time()

dummy = 0
for i in tests:
if i in d:
dummy += 1

time2 = time()

dummy = 0
for i in tests:
if i in d.keys():
dummy += 1

time3 = time()

print(time2-time1)
print(time3-time2)
```
gives
```
11.389298915863037
12.40167999268
```
on my MacBook Pro


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-14 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127555013
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala 
---
@@ -57,11 +57,11 @@ private[spark] object SerDeUtil extends Logging {
 //  };
 // TODO: support Py_UNICODE with 2 bytes
--- End diff --

Since `array('u')` pass the tests, should this TODO be removed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-14 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
I updated my code according to @HyukjinKwon's suggestion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-14 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127521525
  
--- Diff: python/pyspark/sql/types.py ---
@@ -915,6 +916,90 @@ def _parse_datatype_json_value(json_value):
 long: LongType,
 })
 
+# Mapping Python array types to Spark SQL DataType
+# We should be careful here. The size of these types in python depends on C
+# implementation. We need to make sure that this conversion does not lose 
any
+# precision.
+#
+# Reference for C integer size, see:
+# ISO/IEC 9899:201x specification, chapter 5.2.4.2.1 Sizes of integer 
types .
+# Reference for python array typecode, see:
+# https://docs.python.org/2/library/array.html
+# https://docs.python.org/3.6/library/array.html
+
+_array_int_typecode_ctype_mappings = {
+'b': ctypes.c_byte,
+'B': ctypes.c_ubyte,
+'h': ctypes.c_short,
+'H': ctypes.c_ushort,
+'i': ctypes.c_int,
+'I': ctypes.c_uint,
+'l': ctypes.c_long,
+'L': ctypes.c_ulong
+}
+
+# TODO: Uncomment this when 'q' and 'Q' are supported by 
net.razorvine.pickle
+# Type code 'q' and 'Q' are not available at python 2
+# if sys.version_info[0] >= 3:
+# _array_int_typecode_ctype_mappings.update({
+# 'q': ctypes.c_longlong,
+# 'Q': ctypes.c_ulonglong
+# })
+
+
+def _int_size_to_type(size):
+"""
+Return the Scala type from the size of integers.
+"""
+if size <= 8:
+return ByteType
+if size <= 16:
+return ShortType
+if size <= 32:
+return IntegerType
+if size <= 64:
+return LongType
+raise TypeError("not supported type: integer size too large.")
+
+# The list of all supported array typecodes is stored here
+_array_type_mappings = {
+# Warning: Actual properties for float and double in C is not 
specified in C.
+# On almost every system supported by both python and JVM, they are 
IEEE 754
+# single-precision binary floating-point format and IEEE 754 
double-precision
+# binary floating-point format. And we do assume the same thing here 
for now.
+'f': FloatType,
+'d': DoubleType
+}
+
+# compute array typecode mappings for integer types
+for _typecode in _array_int_typecode_ctype_mappings.keys():
+size = ctypes.sizeof(_array_int_typecode_ctype_mappings[_typecode]) * 8
+if _typecode.isupper():  # 1 extra bit is required to store unsigned 
types
--- End diff --

This is actually a good suggestion!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-14 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127513810
  
--- Diff: python/pyspark/sql/types.py ---
@@ -935,6 +936,90 @@ def _parse_datatype_json_value(json_value):
 long: LongType,
 })
 
+# Mapping Python array types to Spark SQL DataType
+# We should be careful here. The size of these types in python depends on C
+# implementation. We need to make sure that this conversion does not lose 
any
+# precision.
+#
+# Reference for C integer size, see:
+# ISO/IEC 9899:201x specification, chapter 5.2.4.2.1 Sizes of integer 
types .
+# Reference for python array typecode, see:
+# https://docs.python.org/2/library/array.html
+# https://docs.python.org/3.6/library/array.html
+
+_array_int_typecode_ctype_mappings = {
+'b': ctypes.c_byte,
+'B': ctypes.c_ubyte,
+'h': ctypes.c_short,
+'H': ctypes.c_ushort,
+'i': ctypes.c_int,
+'I': ctypes.c_uint,
+'l': ctypes.c_long,
+'L': ctypes.c_ulong
+}
+
+# TODO: Uncomment this when 'q' and 'Q' are supported by 
net.razorvine.pickle
--- End diff --

I created a JIRA at https://issues.apache.org/jira/browse/SPARK-21420
I updated the comment here and add reference to that JIRA. I didn't remove 
this todo because I think it might be a help for future fix.
Do you prefer to remove this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-13 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
Finally, all tests pass...
I'm happy that my test case helps find two more bugs and thank @ueshin for 
fixing these two bugs :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-11 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r126809517
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala 
---
@@ -72,7 +72,11 @@ private[spark] object SerDeUtil extends Logging {
 val typecode = args(0).asInstanceOf[String].charAt(0)
 // This must be ISO 8859-1 / Latin 1, not UTF-8, to interoperate 
correctly
 val data = 
args(1).asInstanceOf[String].getBytes(StandardCharsets.ISO_8859_1)
--- End diff --

Can anyone explain why `ISO_8859_1` is used here instead of UTF16 or UTF32?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-11 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r126806942
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala 
---
@@ -72,7 +72,11 @@ private[spark] object SerDeUtil extends Logging {
 val typecode = args(0).asInstanceOf[String].charAt(0)
 // This must be ISO 8859-1 / Latin 1, not UTF-8, to interoperate 
correctly
 val data = 
args(1).asInstanceOf[String].getBytes(StandardCharsets.ISO_8859_1)
-construct(typecode, machineCodes(typecode), data)
+val machine_code = machineCodes(typecode)
+// fix data alignment
+val unit_length = if (machine_code==18 || machine_code==19) 2 else 
4
+val aligned_data = data ++ Array.fill[Byte](unit_length - 
data.length % unit_length)(0)
--- End diff --

Not done yet. I think this will only works on little endian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-11 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
For some reason, I can not reproduce the error on my machine. I run the 
test using the following command:
```bash
PYSPARK_PYTHON=$(which python2) ./bin/spark-submit 
python/pyspark/sql/tests.py SQLTests.test_array_types 2>/dev/null
```
and I always get a pass...

So I have to commit and push to let Jenkins run the test to see if it will 
pass...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-07 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
```
net.razorvine.pickle.PickleException: for c/u type must be 18/19/20/21
at 
net.razorvine.pickle.objects.ArrayConstructor.construct(ArrayConstructor.java:154)
at 
org.apache.spark.api.python.SerDeUtil$ArrayConstructor.construct(SerDeUtil.scala:75)
at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:162)
```
strange


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-06 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r125910692
  
--- Diff: python/pyspark/sql/types.py ---
@@ -958,12 +1043,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), True)
--- End diff --

Yes, you are right
```python
>>> from array import *
>>> for i in set(typecodes):
... try:
... array(i,[None])
... print(i,'success')
... except:
... print(i,'fail')
... 
L fail
I fail
b fail
B fail
i fail
H fail
q fail
Q fail
f fail
l fail
u fail
h fail
d fail
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-05 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
Maybe the bug is in py4j


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-05 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
There seems to be another bug in spark on somewhere:
```python
rdd = sc.parallelize([array('l',[9223372036854775807])])
print(rdd._to_java_object_rdd().collect()[0][0])
```
The above code gives -1 on python2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-05 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
it is strange...
The test pass at python3, but at python2 with typecode `l`, 
`9223372036854775807` becomes `-1` after creating `DataFrame` and 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-05 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r125642042
  
--- Diff: python/pyspark/sql/types.py ---
@@ -935,6 +936,86 @@ def _parse_datatype_json_value(json_value):
 long: LongType,
 })
 
+# Mapping Python array types to Spark SQL DataType
+# We should be careful here. The size of these types in python depends on C
+# implementation. We need to make sure that this conversion does not lose 
any
+# precision.
+#
+# Reference for C integer size, see:
+# ISO/IEC 9899:201x specification, § 5.2.4.2.1 Sizes of integer types 
.
+# Reference for python array typecode, see:
+# https://docs.python.org/2/library/array.html
+# https://docs.python.org/3.6/library/array.html
+
+_array_int_typecode_ctype_mappings = {
+'b': ctypes.c_byte,
+'B': ctypes.c_ubyte,
+'h': ctypes.c_short,
+'H': ctypes.c_ushort,
+'i': ctypes.c_int,
+'I': ctypes.c_uint,
+'l': ctypes.c_long,
+'L': ctypes.c_ulong
+}
+
+# TODO: Uncomment this when 'q' and 'Q' are supported by 
net.razorvine.pickle
+# Type code 'q' and 'Q' are not available at python 2
+# if sys.version > "2":
+# _array_int_typecode_ctype_mappings.update({
+# 'q': ctypes.c_longlong,
+# 'Q': ctypes.c_ulonglong
+# })
+
+
+def _int_size_to_type(size):
+"""
+Return the Scala type from the size of integers.
+"""
+if size <= 8:
+return ByteType
+if size <= 16:
+return ShortType
+if size <= 32:
+return IntegerType
+if size <= 64:
+return LongType
+raise TypeError("not supported type: integer size too large.")
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-03 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
test fails due to a `§` in comment...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
I'm not sure if Jenkins would listen to me :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r125173526
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2250,6 +2256,67 @@ def test_BinaryType_serialization(self):
 df = self.spark.createDataFrame(data, schema=schema)
 df.collect()
 
+# test for SPARK-16542
+def test_array_types(self):
+# This test need to make sure that the Scala type selected is at 
least
+# as large as the python's types. This is necessary because 
python's
+# array types depend on C implementation on the machine. Therefore 
there
+# is no machine independent correspondence between python's array 
types
+# and Scala types.
+# See: https://docs.python.org/2/library/array.html
+
+def assertCollectSuccess(typecode, value):
+a = array.array(typecode, [value])
+row = Row(myarray=a)
+df = self.spark.createDataFrame([row])
+self.assertEqual(df.collect()[0]["myarray"][0], value)
+
+supported_types = []
+
+# test string types
+if sys.version < "4":
--- End diff --

Yes, this looks better I think. I will change that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r125173517
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2250,6 +2256,67 @@ def test_BinaryType_serialization(self):
 df = self.spark.createDataFrame(data, schema=schema)
 df.collect()
 
+# test for SPARK-16542
+def test_array_types(self):
+# This test need to make sure that the Scala type selected is at 
least
+# as large as the python's types. This is necessary because 
python's
+# array types depend on C implementation on the machine. Therefore 
there
+# is no machine independent correspondence between python's array 
types
+# and Scala types.
+# See: https://docs.python.org/2/library/array.html
+
+def assertCollectSuccess(typecode, value):
+a = array.array(typecode, [value])
+row = Row(myarray=a)
+df = self.spark.createDataFrame([row])
+self.assertEqual(df.collect()[0]["myarray"][0], value)
+
+supported_types = []
+
+# test string types
+if sys.version < "4":
+supported_types += ['u']
+assertCollectSuccess('u', "a")
+if sys.version < "3":
+supported_types += ['c']
+assertCollectSuccess('c', "a")
+
+# test float and double, assuming IEEE 754 floating-point format
+supported_types += ['f', 'd']
+assertCollectSuccess('f', ctypes.c_float(1e+38).value)
+assertCollectSuccess('f', ctypes.c_float(1e-38).value)
+assertCollectSuccess('f', ctypes.c_float(1.123456).value)
+assertCollectSuccess('d', ctypes.c_double(1e+308).value)
+assertCollectSuccess('d', ctypes.c_double(1e+308).value)
+assertCollectSuccess('d', ctypes.c_double(1.123456789012345).value)
+
+# test int types
+supported_int = 
list(set(_array_int_typecode_ctype_mappings.keys()).
+ 
intersection(set(_array_type_mappings.keys(
+supported_types += supported_int
+for i in supported_int:
+ctype = _array_int_typecode_ctype_mappings[i]
+if i.isupper():
--- End diff --

Yes, good idea. I will change that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r125173508
  
--- Diff: python/pyspark/sql/types.py ---
@@ -935,6 +936,86 @@ def _parse_datatype_json_value(json_value):
 long: LongType,
 })
 
+# Mapping Python array types to Spark SQL DataType
+# We should be careful here. The size of these types in python depends on C
+# implementation. We need to make sure that this conversion does not lose 
any
+# precision.
+#
+# Reference for C integer size, see:
+# ISO/IEC 9899:201x specification, § 5.2.4.2.1 Sizes of integer types 
.
+# Reference for python array typecode, see:
+# https://docs.python.org/2/library/array.html
+# https://docs.python.org/3.6/library/array.html
+
+_array_int_typecode_ctype_mappings = {
+'b': ctypes.c_byte,
+'B': ctypes.c_ubyte,
+'h': ctypes.c_short,
+'H': ctypes.c_ushort,
+'i': ctypes.c_int,
+'I': ctypes.c_uint,
+'l': ctypes.c_long,
+'L': ctypes.c_ulong
+}
+
+# TODO: Uncomment this when 'q' and 'Q' are supported by 
net.razorvine.pickle
+# Type code 'q' and 'Q' are not available at python 2
+# if sys.version > "2":
+# _array_int_typecode_ctype_mappings.update({
+# 'q': ctypes.c_longlong,
+# 'Q': ctypes.c_ulonglong
+# })
+
+
+def _int_size_to_type(size):
+"""
+Return the Scala type from the size of integers.
+"""
+if size <= 8:
+return ByteType
+if size <= 16:
+return ShortType
+if size <= 32:
+return IntegerType
+if size <= 64:
+return LongType
+raise TypeError("not supported type: integer size too large.")
--- End diff --

But I think I would add a simple line of comment on this to make the code 
more readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r125173496
  
--- Diff: python/pyspark/sql/types.py ---
@@ -935,6 +936,86 @@ def _parse_datatype_json_value(json_value):
 long: LongType,
 })
 
+# Mapping Python array types to Spark SQL DataType
+# We should be careful here. The size of these types in python depends on C
+# implementation. We need to make sure that this conversion does not lose 
any
+# precision.
+#
+# Reference for C integer size, see:
+# ISO/IEC 9899:201x specification, § 5.2.4.2.1 Sizes of integer types 
.
+# Reference for python array typecode, see:
+# https://docs.python.org/2/library/array.html
+# https://docs.python.org/3.6/library/array.html
+
+_array_int_typecode_ctype_mappings = {
+'b': ctypes.c_byte,
+'B': ctypes.c_ubyte,
+'h': ctypes.c_short,
+'H': ctypes.c_ushort,
+'i': ctypes.c_int,
+'I': ctypes.c_uint,
+'l': ctypes.c_long,
+'L': ctypes.c_ulong
+}
+
+# TODO: Uncomment this when 'q' and 'Q' are supported by 
net.razorvine.pickle
+# Type code 'q' and 'Q' are not available at python 2
+# if sys.version > "2":
+# _array_int_typecode_ctype_mappings.update({
+# 'q': ctypes.c_longlong,
+# 'Q': ctypes.c_ulonglong
+# })
+
+
+def _int_size_to_type(size):
+"""
+Return the Scala type from the size of integers.
+"""
+if size <= 8:
+return ByteType
+if size <= 16:
+return ShortType
+if size <= 32:
+return IntegerType
+if size <= 64:
+return LongType
+raise TypeError("not supported type: integer size too large.")
--- End diff --

I don't think we should log this. This is just a helper function that helps 
to construct `_array_type_mappings`, which is a complete list of all supported 
type codes. Being filtered out here is not an error, it's by design. If users 
try to use unsupported typecode, they will see another `TypeError` due to line 
1052:
```python
.
elif isinstance(obj, array):
if obj.typecode in _array_type_mappings:
return ArrayType(_array_type_mappings[obj.typecode](), True)
else:
raise TypeError("not supported type: array(%s)" % obj.typecode)
.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r125173449
  
--- Diff: python/pyspark/sql/types.py ---
@@ -935,6 +936,86 @@ def _parse_datatype_json_value(json_value):
 long: LongType,
 })
 
+# Mapping Python array types to Spark SQL DataType
+# We should be careful here. The size of these types in python depends on C
+# implementation. We need to make sure that this conversion does not lose 
any
+# precision.
+#
+# Reference for C integer size, see:
+# ISO/IEC 9899:201x specification, § 5.2.4.2.1 Sizes of integer types 
.
+# Reference for python array typecode, see:
+# https://docs.python.org/2/library/array.html
+# https://docs.python.org/3.6/library/array.html
+
+_array_int_typecode_ctype_mappings = {
+'b': ctypes.c_byte,
+'B': ctypes.c_ubyte,
+'h': ctypes.c_short,
+'H': ctypes.c_ushort,
+'i': ctypes.c_int,
+'I': ctypes.c_uint,
+'l': ctypes.c_long,
+'L': ctypes.c_ulong
+}
+
+# TODO: Uncomment this when 'q' and 'Q' are supported by 
net.razorvine.pickle
+# Type code 'q' and 'Q' are not available at python 2
+# if sys.version > "2":
+# _array_int_typecode_ctype_mappings.update({
+# 'q': ctypes.c_longlong,
+# 'Q': ctypes.c_ulonglong
+# })
+
+
+def _int_size_to_type(size):
+"""
+Return the Scala type from the size of integers.
+"""
+if size <= 8:
+return ByteType
+if size <= 16:
+return ShortType
+if size <= 32:
+return IntegerType
+if size <= 64:
+return LongType
+raise TypeError("not supported type: integer size too large.")
+
+_array_type_mappings = {
+# Warning: Actual properties for float and double in C is not 
unspecified.
--- End diff --

Yes, you are correct. Thanks for figuring this out. 

I just checked, `sys.float_info` is the info for C type double, not for C 
type float:
```python
>>> import sys
>>> sys.float_info.max
1.7976931348623157e+308
>>> sys.float_info.dig
15
```
So we can not use this to check range for C float. But this is not the main 
reason I'm not using it. 

The main reason is:
Although C does not specify that we have to use IEEE-754 floating point 
types, all the C platform I have ever seen uses IEEE-754. (Also there is a 
StackOverflow question about this: 
https://stackoverflow.com/questions/31967040/is-it-safe-to-assume-floating-point-is-represented-using-ieee754-floats-in-c
 ) I don't even know if there exists a platform in the world that: python is 
supported, JVM is supported, and floating point types in C does not use 
IEEE-754. So, I think it would be OK to assume that these types are IEEE-754 
for now to make the code cleaner. It does not worth any effort to support 
something that might not even exist. But I'm not an expert on this either, so 
if you know someone that might know more on this, please ping them to double 
check. On the other hand, If there do exist users that use these platform find 
that this is a wrong assumption, they can report a new bug to fix this.  But 
yes, my comment seems to be confusing and I will try if I can make it clearer.

Also, thank you for pointing out the `sys.float_info`, although I don't 
think I need to use it here, it would be very useful in test cases. I will 
change part of my test cases to use it to make code more readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-01 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
@ueshin @HyukjinKwon I think I'm done now. There are still fails in tests, 
but it doesn't looks to be something related to my change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-06-28 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r124572922
  
--- Diff: python/pyspark/sql/types.py ---
@@ -958,12 +968,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
+return ArrayType(_array_type_mappings[obj.typecode](), True)
--- End diff --

@HyukjinKwon Could you send me the link to "the Pandas type related PR we 
helped review before"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-06-28 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
@HyukjinKwon Thanks for figuring that out. I will fix those issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-06-28 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r124536696
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2259,6 +2261,69 @@ def test_BinaryType_serialization(self):
 df = self.spark.createDataFrame(data, schema=schema)
 df.collect()
 
+# test for SPARK-16542
+def test_array_types(self):
+# This test need to make sure that the Scala type selected is at 
least
+# as large as the python's types. This is necessary because 
python's
+# array types depend on C implementation on the machine. Therefore 
there
+# is no machine independent correspondence between python's array 
types
+# and Scala types.
+# See: https://docs.python.org/2/library/array.html
--- End diff --

@ueshin The answer to your question is explained here, a couple lines of 
comments I just added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-06-28 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r124531036
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2259,6 +2261,60 @@ def test_BinaryType_serialization(self):
 df = self.spark.createDataFrame(data, schema=schema)
 df.collect()
 
+# test for SPARK-16542
+def test_array_types(self):
+int_types = set(['b', 'h', 'i', 'l'])
+float_types = set(['f', 'd'])
+unsupported_types = set(array.typecodes) - int_types - float_types
+
+def collected(a):
+row = Row(myarray=a)
+rdd = self.sc.parallelize([row])
+df = self.spark.createDataFrame(rdd)
+return df.collect()[0]["myarray"][0]
+# test whether pyspark can correctly handle int types
+for t in int_types:
+# test positive numbers
+a = array.array(t, [1])
+while True:
--- End diff --

Let me figure that out. This patch was more than a year ago and I forget 
what was happening.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14198: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-06-27 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/14198
  
reopened at https://github.com/apache/spark/pull/18444


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-06-27 Thread zasdfgbnm
GitHub user zasdfgbnm opened a pull request:

https://github.com/apache/spark/pull/18444

[SPARK-16542][SQL][PYSPARK] Fix bugs about types that result an array of 
null when creating DataFrame using python

## What changes were proposed in this pull request?
This is the reopen of https://github.com/apache/spark/pull/14198, with 
merge conflicts resolved. 

@ueshin Could you please take a look at my code?

Fix bugs about types that result an array of null when creating DataFrame 
using python.

Python's array.array have richer type than python itself, e.g. we can have 
`array('f',[1,2,3])` and `array('d',[1,2,3])`. Codes in spark-sql and pyspark 
didn't take this into consideration which might cause a problem that you get an 
array of null values when you have `array('f')` in your rows.

A simple code to reproduce this bug is:

```
from pyspark import SparkContext
from pyspark.sql import SQLContext,Row,DataFrame
from array import array

sc = SparkContext()
sqlContext = SQLContext(sc)

row1 = Row(floatarray=array('f',[1,2,3]), doublearray=array('d',[1,2,3]))
rows = sc.parallelize([ row1 ])
df = sqlContext.createDataFrame(rows)
df.show()
```

which have output

```
+---+--+
|doublearray|floatarray|
+---+--+
|[1.0, 2.0, 3.0]|[null, null, null]|
+---+--+
```

## How was this patch tested?

New test case added


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zasdfgbnm/spark fix_array_infer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18444.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18444


commit a127486d59528eae452dcbcc2ccfb68fdd7769b7
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-09T00:58:14Z

use array.typecode to infer type

Python's array has more type than python it self, for example
python only has float while array support 'f' (float) and 'd' (double)
Switching to array.typecode helps spark make a better inference

For example, for the code:

from pyspark.sql.types import _infer_type
from array import array
a = array('f',[1,2,3,4,5,6])
_infer_type(a)

We will get ArrayType(DoubleType,true) before change,
but ArrayType(FloatType,true) after change

commit 70131f3b81575edf9073d5be72553730d6316bd6
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-09T06:21:31Z

Merge branch 'master' into fix_array_infer

commit 505e819f415c2f754b5147908516ace6f6ddfe78
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-13T12:53:18Z

sync with upstream

commit 05979ca6eabf723cf3849ec2bf6f6e9de26cb138
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-14T08:07:12Z

add case (c: Float, FloatType) to fromJava

commit 5cd817a4e7ec68a693ee2a878a2e36b09b1965b6
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-14T08:09:25Z

sync with upstream

commit cd2ec6bc707fb6e7255b3a6a6822c3667866c63c
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-10-17T02:44:48Z

add test for array in dataframe

commit 527d969067e447f8bff6004570c27130346cdf76
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-10-17T03:13:47Z

merge with upstream/master

commit 82223c02082793b899c7eeca70f7bbfcea516c28
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-10-17T03:35:47Z

set unsigned types and Py_UNICODE as unsupported

commit 0a967e280b3250bf7217e61905ad28f010c4ed40
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-10-17T17:46:35Z

fix code style

commit 2059435b45ed1f6337a4f935adcd029084cfec91
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-10-18T00:11:05Z

fix the same problem for byte and short

commit 58b120c4d207d9332e6dcde20109651ad8e17190
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2017-06-28T01:28:03Z

sync with upstream




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14198: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-06-27 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/14198
  
@ueshin @gatorsmile I'm happy to resolve the conflicts IF AND ONLY IF there 
will be a developer work on the code review for this. This PR was opened more 
than a year ago and I keep waiting for the review for one year. If it is 
guaranteed that there will be a reviewer assigned for this recently, I will 
resolve the conflicts. Otherwise, I don't want to maintain a PR forever just to 
wait for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14198: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2016-10-17 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/14198
  
Hi @holdenk , I think I'm done. I create a test for this issue and I do 
find from the test that spark has the same issue not only for float but also 
for byte and short. After several commits, `./python/run-tests 
--modules=pyspark-sql` passes on my computer.

To be clear, I need to say that only array with typecode `b,h,i,l,f,d` are 
supported, array with typecode `u` is not supported because it "corresponds to 
Python’s obsolete unicode character", array with typecode `B,H,I,L` are not 
supported because there is no unsigned types on JVM, array with typecode `q,Q` 
are not supported because they "are available only if the platform C compiler 
used to build Python supports C long long", which makes supporting them 
complicated. For the unsupported typecodes, a TypeError will be raised if the 
user try to create a DataFrame of it.

Would you, or any other developer, review my code and get it merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14198: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2016-10-07 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/14198
  
Something to mention is, there is still one problem that I'm not sure 
whether I solve it correctly: in python's array, unsigned types are supported, 
but unsigned types are not supported in JVM. The solution in this PR is to 
convert unsigned types to a larger type, e.g. unsigned int -> long. I'm not 
sure whether it would be better to reject the unsigned types in python and 
throw an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14198: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2016-10-07 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/14198
  
I'd love to help


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-08-06 Thread zasdfgbnm
Github user zasdfgbnm closed the pull request at:

https://github.com/apache/spark/pull/14231


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14231: [SPARK-16586] Change the way the exit code of launcher i...

2016-07-30 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/14231
  
Yes this patch looks clearer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-29 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72803075
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

I can not find a neat solution to deal with the `\0`...
Another option is to replace the `\0` with `\n` in and store it in a 
variable. But this will be a problem if the user's path of spark contains a 
`\n`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-27 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72562015
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

This won't work. `build_command` is executed in a subshell. The `exit $?`  
will only terminate the subshell.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-27 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72460695
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

Things can also be implemented like
```
build_command() {
  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main 
"$@"
  LAUNCHER_EXIT_CODE=$?
  if [ "$LAUNCHER_EXIT_CODE" != 0 ]; then printf "\0"; fi
  printf "%d\0" $LAUNCHER_EXIT_CODE
}
```
Which will minimize the change to the original code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-27 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72458729
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

If you further add a line like `echo ${CMD[@]}`  after `if [ 
"$LAUNCHER_EXIT_CODE" != 0 ]; then`, it will output something like `There is an 
error1`, which will report the error happened but the ending error code `1` 
looks ugly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-27 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72457989
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

The original implementation actually fails here at line 83:
`if [ $LAUNCHER_EXIT_CODE != 0 ]; then`
for example if the launcher fails with an output "There is an error" and 
exit status 1, then this line becomes
`if [ There is an error1 != 0 ]; then`, which will trigger an `[: too many 
arguments`
The easiest modification is add `""` between `$LAUNCHER_EXIT_CODE`, which 
will make the script fails silently without any output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-27 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72410689
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

Yes this is what I'm doing in this PR:
'''
run launcher, encode it's output and get it's exit status
if(exit status is fail)
decode the launcher's output and print
else #exit code is success
process as before
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-26 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72364403
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

The launcher doesn't actually launch anything, instead, it just output a 
command that should be used to launch the desired class, separated by .`\0``


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-26 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72364159
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

If the launcher fails, it is sufficient to terminate the script and exit 
with the nonzero `$?`. But if it success, the the output, which contains `\0`, 
should be used to start a new command. That's why when we execute `"$RUNNER" 
-Xmx128m ...`, we should try to store both the exit code and the output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-26 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72276770
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

Since bash don't allow us to store \0 in a variable, we need to find some 
way to deal with it. One way to deal with it is to run the launcher in a 
subshell and pipe the output of the subshell to the current shell by using the 
"read" provided by bash (see the original code line 76-78). In this way, the $? 
will be the exit status of some command in the while loop of this shell, not 
the exit code of launcher in the subshell. That's why in the origin 
implementation, $? is appended in the end.  Another way to deal with \0, which 
is what I have proposed in this PR is, to make a hexdump of the output with 
'\0' and store the hexdump into a variable and when we just decode it we need 
to use the output with '\0'. In this way, the $? will be the exit status of 
"tr" rather than of the launcher.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-25 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72167441
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

If the output of launcher doesn't contain '\0', then the code gonna be easy:
```bash
output=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@")
LAUNCHER_EXIT_CODE=$?
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-25 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72167136
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

The biggest problem to achieve this is: If success, the exit status of the 
process will be 0 and the output of the process will be something sperate by 
'\0'. Bash don't allow us to store '\0' in a bash variable, we should find some 
alternative way to store it, which makes $? no longer the exit status of the 
launcher..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-16 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r71063901
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

By the way, the previous code assume that the output of the launcher end 
with a '\0' otherwise it will crash due to the same problem. I read the codes 
in `org.apache.spark.launcher`. It seems that the main of launcher rarely 
output something and exit a non-zero. So the only possibility that this is a 
problem is there are uncaught exceptions or there is something wrong with java.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-16 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r71063792
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

Yes, you are right. I found this problem when I try to run spark for test 
on a login node of a HPC, which limit the amount of memory applications can 
use. This is not some big problem, so it's ok to mark this as "won't fix", but 
this might be confusing to users who don't understand shell scripts. Actually 
this patch is a small change and it reduce the total lines of codes (but I 
agree that this line is a little bit tricky and harder to understand). That's 
why I add some comments to explain what's happening.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-16 Thread zasdfgbnm
GitHub user zasdfgbnm opened a pull request:

https://github.com/apache/spark/pull/14231

[SPARK-16586] Change the way the exit code of launcher is handled to avoid 
problem when launcher fails

## What changes were proposed in this pull request?

In the spark-class shell script, the exit code of the launcher is appended 
to the end of the output. However, when the launcher fails, their might not be 
a '\0' at the end of the launcher's output, which makes the test code `[ 
$LAUNCHER_EXIT_CODE != 0 ]` abort with a error `[: too many arguments`

This patch fixes this bug by changing the way the exit code of the launcher 
is passed

## How was this patch tested?

manually tested

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zasdfgbnm/spark fix_spark-class

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14231


commit 626be035b656750cc10394a1ed8bbfa021e53aa9
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-16T09:00:22Z

[SPARK-16586]
change the way the exit code of the laucher is handled so that there won't 
be a
problem when the launcher fails




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14198: Fix bugs about types that result an array of null...

2016-07-14 Thread zasdfgbnm
GitHub user zasdfgbnm opened a pull request:

https://github.com/apache/spark/pull/14198

Fix bugs about types that result an array of null when creating dataframe 
using python

## What changes were proposed in this pull request?

Fix bugs about types that result an array of null when creating dataframe 
using python.
Python's array.array have richer type than python itself, e.g. we can have 
array('f',[1,2,3]) and array('d',[1,2,3]). Codes in spark-sql didn't take this 
into consideration which might cause a problem that you get an array of null 
values when you have array('f') in your rows.

A simple code to reproduce this is:

`from pyspark import SparkContext`
`from pyspark.sql import SQLContext,Row,DataFrame`
`from array import array`

`sc = SparkContext()`
`sqlContext = SQLContext(sc)`

`row1 = Row(floatarray=array('f',[1,2,3]), doublearray=array('d',[1,2,3]))`
`rows = sc.parallelize([ row1 ])`
`df = sqlContext.createDataFrame(rows)`
`df.show()`

which have output
`+---+--+`
`|doublearray|floatarray|`
`+---+--+`
`|[1.0, 2.0, 3.0]|[null, null, null]|`
`+---+--+`


## How was this patch tested?
tested manually



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zasdfgbnm/spark fix_array_infer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14198


commit a127486d59528eae452dcbcc2ccfb68fdd7769b7
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-09T00:58:14Z

use array.typecode to infer type

Python's array has more type than python it self, for example
python only has float while array support 'f' (float) and 'd' (double)
Switching to array.typecode helps spark make a better inference

For example, for the code:

from pyspark.sql.types import _infer_type
from array import array
a = array('f',[1,2,3,4,5,6])
_infer_type(a)

We will get ArrayType(DoubleType,true) before change,
but ArrayType(FloatType,true) after change

commit 70131f3b81575edf9073d5be72553730d6316bd6
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-09T06:21:31Z

Merge branch 'master' into fix_array_infer

commit 505e819f415c2f754b5147908516ace6f6ddfe78
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-13T12:53:18Z

sync with upstream

commit 05979ca6eabf723cf3849ec2bf6f6e9de26cb138
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-14T08:07:12Z

add case (c: Float, FloatType) to fromJava

commit 5cd817a4e7ec68a693ee2a878a2e36b09b1965b6
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-14T08:09:25Z

sync with upstream




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14027: [SPARK-16352] Spark daemon run.sh

2016-07-02 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/14027
  
OK, but will the status of the original JIRA keep being "won't fix", if 
someone decide to reopen the discussion on it?
Can anyone give a comment on whether the running on foreground feature are 
welcome in spark upstream?
Also, if there has already been a good solution to run in foreground easily 
enough (so that my PR will be useless), could anyone send me a link to the 
document describing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #3881: [SPARK-5964] Allow spark-daemon.sh to support foreground ...

2016-07-02 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/3881
  
Can anyone explain what happened to this PR? Why people close this PR 
without adding any support for running in foreground?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14027: [SPARK-16352] Spark daemon run.sh

2016-07-02 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/14027
  
I followed that PR (https://github.com/apache/spark/pull/3881)
"I was under the impression all the shell scripts were getting refactored 
and that this patch had become obsolete. I agree it's best to close this out." 
by @hellertime 

I'm not sure what happened to that. It seems that the previous PR was 
closed because the scripts had changed a lot before that PR was merged. 

I didn't see anything that support running in foreground and nobody 
mentioned in that issue that running in foreground is useless and will never be 
supported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14027: [SPARK-16352] Spark daemon run.sh

2016-07-02 Thread zasdfgbnm
Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/14027
  
I see similar pull request here:
https://github.com/apache/spark/pull/3881

But I didn't get what happened to that and why it was closed without adding 
any support for running in foreground. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14027: Spark daemon run.sh

2016-07-02 Thread zasdfgbnm
GitHub user zasdfgbnm opened a pull request:

https://github.com/apache/spark/pull/14027

Spark daemon run.sh

## What changes were proposed in this pull request?

I add some shell scripts to support running master and slave foreground, 
which makes it more convenient to write systemd service and run on HPC.


## How was this patch tested?

I test it manually and it works well on my computer



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zasdfgbnm/spark spark-daemon-run.sh

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14027.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14027


commit 7b99b15656daa430081b1952bc090d3824d916ce
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-01T14:22:05Z

split spark-daemon.sh to allow foreground

commit 700247e5cae5bfd33d24a54c133eef2b5a28df96
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-01T15:09:46Z

fix bugs to make it work

commit 6a80b7986f831abaf5cdf23c42528055f4999b86
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-01T15:26:13Z

fixes

commit 3095187162a821292eabedcd44c18932dbf73ca6
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-01T15:27:11Z

starting->running

commit 1954640ccca1a1da7903455d67f1abdc2db0fd73
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-01T15:29:21Z

add run-master.sh

commit b96f4e1c4a28ffc893779221100080c5d31a2508
Author: Xiang Gao <qasdfgtyu...@gmail.com>
Date:   2016-07-01T15:36:26Z

add support run-slave.sh




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org