[GitHub] spark issue #10527: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} UDFs

2018-02-12 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/10527
  
@vectorijk Is this PR dead ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...

2017-11-07 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19234
  
I think that porting changes from Python 3.6 give us too complicated code.
I'm closing it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19234: [SPARK-22010][PySpark] Change fromInternal method...

2017-11-06 Thread maver1ck
Github user maver1ck closed the pull request at:

https://github.com/apache/spark/pull/19234


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_datatype_...

2017-11-06 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19255
  
OK. Let's close it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_da...

2017-11-06 Thread maver1ck
Github user maver1ck closed the pull request at:

https://github.com/apache/spark/pull/19255


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_datatype_...

2017-10-27 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19255
  
Ping @HyukjinKwon


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19566: [SPARK-22341][yarn] Impersonate correct user when prepar...

2017-10-25 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19566
  
@vanzin 
I tested your patch. It worked.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_da...

2017-10-23 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/19255#discussion_r146248695
  
--- Diff: python/pyspark/sql/types.py ---
@@ -24,6 +24,7 @@
 import re
 import base64
 from array import array
+from functools import lru_cache
--- End diff --

I added support for Python < 3.3.
What do you think ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18685: [SPARK-21439][PySpark] Support for ABCMeta in PyS...

2017-10-23 Thread maver1ck
Github user maver1ck closed the pull request at:

https://github.com/apache/spark/pull/18685


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_datatype_...

2017-10-23 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19255
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18685: [SPARK-21439][PySpark] Support for ABCMeta in PySpark

2017-10-23 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/18685
  
I realized that changes here was added also in SPARK-21070.

https://github.com/apache/spark/commit/751f513367ae776c6d6815e1ce138078924872eb

So we can close this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...

2017-09-21 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19234
  
It was introduced with this PEP.
https://www.python.org/dev/peps/pep-0495/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19255: [WIP][SPARK-22029][PySpark] Add lru_cache to _parse_data...

2017-09-20 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19255
  
@HyukjinKwon 
I added perf tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19246: [SPARK-22025][PySpark] Speeding up fromInternal for Stru...

2017-09-20 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19246
  
@HyukjinKwon
I created this before https://github.com/apache/spark/pull/19249, which 
greatly decrease function call.

I agree we can close it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19246: [SPARK-22025][PySpark] Speeding up fromInternal f...

2017-09-20 Thread maver1ck
Github user maver1ck closed the pull request at:

https://github.com/apache/spark/pull/19246


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...

2017-09-18 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19234
  
OK. It passed all tests, so let merge it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19234: [WIP][SPARK-22010][PySpark] Change fromInternal method o...

2017-09-18 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19234
  
I check with some samples and code with float can trigger errors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19249: [SPARK-22032][PySpark] Speed up StructType conversion

2017-09-18 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19249
  
@ueshin 
I think that for Maptype this is not a solution because every key / value 
of MapType is the same type so we need conversion for all entries or for nothing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...

2017-09-18 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19234
  
I'm asking because such a code is 2x faster than my solution


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...

2017-09-18 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19234
  
Any idea why we're not using
`datetime.datetime.fromtimestamp(ts / 10.)` ?
There is a comment about overflow. But if it exists ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19260: [SPARK-22043][PYTHON] Improves error message for show_pr...

2017-09-17 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19260
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19249: [SPARK-22032][PySpark] Speed up StructType conversion

2017-09-17 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19249
  
Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19249: [SPARK-22032][PySpark] Speed up StructType.fromIn...

2017-09-17 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/19249#discussion_r139306166
  
--- Diff: python/pyspark/sql/types.py ---
@@ -619,7 +621,8 @@ def fromInternal(self, obj):
 # it's already converted by pickler
 return obj
 if self._needSerializeAnyField:
-values = [f.fromInternal(v) for f, v in zip(self.fields, obj)]
+values = [f.fromInternal(v) if n else v
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19255: [WIP][SPARK-22029][PySpark] Add lru_cache to _par...

2017-09-17 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/19255#discussion_r139305244
  
--- Diff: python/pyspark/sql/types.py ---
@@ -24,6 +24,7 @@
 import re
 import base64
 from array import array
+from functools import lru_cache
--- End diff --

Or use backported library.
https://pypi.python.org/pypi/functools32


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19249: [SPARK-22032][PySpark] Speed up StructType.fromIn...

2017-09-17 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/19249#discussion_r139303509
  
--- Diff: python/pyspark/sql/types.py ---
@@ -619,7 +621,8 @@ def fromInternal(self, obj):
 # it's already converted by pickler
 return obj
 if self._needSerializeAnyField:
-values = [f.fromInternal(v) for f, v in zip(self.fields, obj)]
+values = [f.fromInternal(v) if n else v
--- End diff --

I'll add one more optimization here.
And then I'll do benchmarks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19255: [WIP][SPARK-22029] Add lru_cache to _parse_dataty...

2017-09-16 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/19255#discussion_r139292791
  
--- Diff: python/pyspark/sql/types.py ---
@@ -24,6 +24,7 @@
 import re
 import base64
 from array import array
+from functools import lru_cache
--- End diff --

Any ideas for Python 2.7 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19255: [WIP][SPARK-22029] Add lru_cache to _parse_dataty...

2017-09-16 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/19255

[WIP][SPARK-22029] Add lru_cache to _parse_datatype_json_string

## What changes were proposed in this pull request?

_parse_datatype_json_string is called many times for the same datatypes.
By cacheing its result we can speed up pySpark internals.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark spark_22029

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19255


commit c903860ee8d25afda0f969b582bdbdaa0aa8c9fe
Author: Maciej Bryński <maciek-git...@brynski.pl>
Date:   2017-09-16T18:51:49Z

Add lru_cache to _parse_datatype_json_string




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19246: [SPARK-22025] Speeding up fromInternal for Struct...

2017-09-16 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/19246#discussion_r139291502
  
--- Diff: python/pyspark/sql/types.py ---
@@ -410,6 +410,24 @@ def __init__(self, name, dataType, nullable=True, 
metadata=None):
 self.dataType = dataType
 self.nullable = nullable
 self.metadata = metadata or {}
+self.needConversion = dataType.needConversion
+self.toInternal = dataType.toInternal
+self.fromInternal = dataType.fromInternal
+
+def __getstate__(self):
--- End diff --

We need to handle pickle by ourselves because we have fields with function 
values


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal

2017-09-16 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19249
  
I added benchmark for this code


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...

2017-09-16 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/19234#discussion_r139290042
  
--- Diff: python/pyspark/sql/types.py ---
@@ -196,7 +199,9 @@ def toInternal(self, dt):
 def fromInternal(self, ts):
 if ts is not None:
 # using int to avoid precision loss in float
-return datetime.datetime.fromtimestamp(ts // 
100).replace(microsecond=ts % 100)
+y, m, d, hh, mm, ss, _, _, _ = (time.gmtime(ts // 100) if 
_is_utc
+else time.localtime(ts // 
100))
+return datetime.datetime(y, m, d, hh, mm, ss, ts % 100)
--- End diff --

I added some description and support for leap seconds


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal

2017-09-16 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19249
  
Yep.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18685: [SPARK-21439] Support for ABCMeta in PySpark

2017-09-16 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/18685
  
Ping received.
I'll try to add tests and resolve conflict


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal

2017-09-16 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19249
  
I was checking this with my production code.
This give me about 6-7% of speed up and remove 408 millions of function 
calls :)

I'll try to create benchmark for this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19246: [SPARK-22025] Speeding up fromInternal for StructField

2017-09-16 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19246
  
@dongjoon-hyun 
I'll do it on Monday.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...

2017-09-16 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/19234#discussion_r139284824
  
--- Diff: python/pyspark/sql/types.py ---
@@ -196,7 +199,9 @@ def toInternal(self, dt):
 def fromInternal(self, ts):
 if ts is not None:
 # using int to avoid precision loss in float
-return datetime.datetime.fromtimestamp(ts // 
100).replace(microsecond=ts % 100)
+y, m, d, hh, mm, ss, _, _, _ = (time.gmtime(ts // 100) if 
_is_utc
+else time.localtime(ts // 
100))
+return datetime.datetime(y, m, d, hh, mm, ss, ts % 100)
--- End diff --

I think the only difference is this `ss = min(ss, 59)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19249: [SPARK-22032] Speed up StructType.fromInternal

2017-09-15 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/19249

[SPARK-22032] Speed up StructType.fromInternal

## What changes were proposed in this pull request?

StructType.fromInternal is calling f.fromInternal(v) for every field.
We can use needConversion method to limit the number of function calls

## How was this patch tested?

Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark spark_22032

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19249


commit aa69a72d71c55e93b487ac28910b9187c0c71088
Author: Maciej Bryński <maciek-git...@brynski.pl>
Date:   2017-09-15T18:01:40Z

Update types.py




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19246: [SPARK-22025] Speeding up fromInternal for Struct...

2017-09-15 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/19246

[SPARK-22025] Speeding up fromInternal for StructField

## What changes were proposed in this pull request?

Change function call to references can greatly speed up function calling.

## How was this patch tested?

Existing tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark spark_22025

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19246


commit e3dfd221dbdd1bd3ba0226cc7b9cafe939cd1676
Author: Maciej Bryński <maciek-git...@brynski.pl>
Date:   2017-09-15T13:02:49Z

Change function call to references




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...

2017-09-14 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/19234

[SPARK-22010] Change fromInternal method of TimestampType

## What changes were proposed in this pull request?

This PR changes the way pySpark converts Timestamp format from internal to 
Python representation.

**Benchmarks**
Before change:
4.58 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)

After change:
System with UTC timezone
1.49 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100 loops 
each)

Other timezones:
3.15 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)

## How was this patch tested?

Existing tests.
Performance benchmarks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark spark_22010

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19234.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19234


commit 238b5563e444b6b936f2e2771ec7876f648af1e9
Author: Maciej Bryński <maciek-git...@brynski.pl>
Date:   2017-09-14T14:56:52Z

Change internal Timestamp conversion

commit 0cb2a482a41711531a9367b88bf1558f5c87ac4c
Author: Maciej Bryński <maciek-git...@brynski.pl>
Date:   2017-09-14T14:58:50Z

Typo fix

commit 02301eb4aa8686fcafdeba3b13ec772be8938ed6
Author: Maciej Bryński <maciek-git...@brynski.pl>
Date:   2017-09-14T15:07:22Z

Import fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18685: Add Weakref to cloudpickle

2017-07-19 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/18685

Add Weakref to cloudpickle

https://github.com/cloudpipe/cloudpickle/pull/104/files

## What changes were proposed in this pull request?

Possibility to use ABCMeta with Spark.

## How was this patch tested?

Manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark SPARK-21439

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18685


commit 8f51cfd7ce4d21dfc190298fefc889e260ee3a00
Author: Maciej Bryński <maciek-git...@brynski.pl>
Date:   2017-07-19T20:28:09Z

Add Weakref to cloudpickle 

https://github.com/cloudpipe/cloudpickle/pull/104/files




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17722: [SPARK-12717][PYSPARK][BRANCH-1.6] Resolving race condit...

2017-07-19 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/17722
  
Hi,
What about this issue ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18515: [SPARK-21287] Ability to use Integer.MIN_VALUE as...

2017-07-03 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/18515

[SPARK-21287] Ability to use Integer.MIN_VALUE as a fetchSize

## What changes were proposed in this pull request?

FIX for https://issues.apache.org/jira/browse/SPARK-21287

## How was this patch tested?

Existing automated tests + manual tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark spark-21287

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18515.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18515


commit 97b6d7bc74b1895db0c772b4c0de726c6be2c3f0
Author: Maciej Bryński <maciek-git...@brynski.pl>
Date:   2017-07-03T12:46:29Z

Update JDBCOptions.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...

2017-04-21 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/17694
  
@vundela 
Great. 
But I'm planning to migrate to 2.1 as soon as 2.1.1 will be released.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...

2017-04-21 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/17694
  
OK. I did additional tests.
Fix is working only with Spark 2.1. 
I tried to apply it on 2.0.2 and that was the reason of my problem.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...

2017-04-21 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/17694
  
I checked pyspark.zip of running container and everything is on its place.
So I assume that there is more that one race condition in this code.

I'll try to prepare example of the problem.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...

2017-04-20 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/17694
  
The funny thing is this code works for me on 4 threads and throws exception 
on 10 threads


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...

2017-04-20 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/17694
  
I tested your patch in our environment.

Problem still exists.
```
Job aborted due to stage failure: Task 0 in stage 22.0 failed 8 times, most 
recent failure: Lost task 0.7 in stage 22.0 (TID 138, dwh-hn30.adpilot.co): 
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File 
"/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/worker.py",
 line 161, in main
func, profiler, deserializer, serializer = read_command(pickleSer, 
infile)
  File 
"/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/worker.py",
 line 54, in read_command
command = serializer._read_with_length(file)
  File 
"/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/serializers.py",
 line 164, in _read_with_length
return self.loads(obj)
  File 
"/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/serializers.py",
 line 419, in loads
return pickle.loads(obj, encoding=encoding)
  File 
"/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/broadcast.py",
 line 39, in _from_id
raise Exception("Broadcast variable '%s' not loaded!" % bid)
Exception: Broadcast variable '22' not loaded!

at 
org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at 
org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure a

[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...

2017-03-20 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/17328
  
Looks good :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15599: [SPARK-18022][SQL] java.lang.NullPointerException instea...

2016-10-22 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/15599
  
I can try this fix on Monday.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15106: [SPARK-16439] [SQL] bring back the separator in SQL UI

2016-09-19 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/15106
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15106: [SPARK-16439] [SQL] bring back the separator in SQL UI

2016-09-15 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/15106
  
I think this patch could actually work.
Number format is executed on the server side.
I did some tests and it looks good.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14340: [SPARK-16534][Streaming][Kafka] Add Python API support f...

2016-09-14 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14340
  
@rxin,
Production streaming jobs can be written in Python and me and my company 
are example.
I wrote a little bit more in Jira. I think it's better place for discussion.

https://issues.apache.org/jira/browse/SPARK-16534?focusedCommentId=15491107=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15491107


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-08-12 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14388
  
@viirya 
I will after the weekend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14465: [SPARK-16321] Fixing performance regression when reading...

2016-08-04 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14465
  
OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14465: [SPARK-16321] Fixing performance regression when ...

2016-08-04 Thread maver1ck
Github user maver1ck closed the pull request at:

https://github.com/apache/spark/pull/14465


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14390: [SPARK-15541] Casting ConcurrentHashMap to Concur...

2016-08-04 Thread maver1ck
GitHub user maver1ck reopened a pull request:

https://github.com/apache/spark/pull/14390

[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (branch-1.6)

## What changes were proposed in this pull request?

Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with 
Java 8 on Java 7

## How was this patch tested?

Compilation. Existing automatic tests




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark spark-15541

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14390.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14390


commit 921e3c80869d251fd1ecfd78462fa6a2cd0566d5
Author: Maciej Brynski <maciej.bryn...@adpilot.pl>
Date:   2016-07-28T07:49:12Z

[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap

commit 3ffbff134dd72dfaeb890fbb39f9d7b963129c4e
Author: Maciej Brynski <maciej.bryn...@adpilot.pl>
Date:   2016-07-28T07:58:39Z

Fix for style error

commit 7bd4f4487bf9445b9ddeb961ed664bdf30b496c2
Author: Maciej Brynski <maciej.bryn...@adpilot.pl>
Date:   2016-07-28T08:08:26Z

Fix whitespaces

commit ea2810fb0b793588e714e9385d668c1cfe59ca7f
Author: Maciej Brynski <maciej.bryn...@adpilot.pl>
Date:   2016-08-02T18:42:14Z

Remove changes in Dispatcher.scala #14459




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...

2016-08-04 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14390
  
@srowen 
I missed one change in Catalog.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-08-03 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14388
  
@viirya 
I tried to test your patch on my production workflow.
Getting:
```
Py4JJavaError: An error occurred while calling o56.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
20 in stage 1.0 failed 1 times, most recent failure: Lost task 20.0 in stage 
1.0 (TID 21, 188.165.13.157): java.lang.ArrayIndexOutOfBoundsException: 4096
at 
org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.putIntsLittleEndian(OnHeapColumnVector.java:221)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readIntegers(VectorizedPlainValuesReader.java:68)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedRleValuesReader.readIntegers(VectorizedRleValuesReader.java:189)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readIntBatch(VectorizedColumnReader.java:388)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:247)
at 
org.apache.spark.sql.execution.vectorized.ColumnVector.readBatch(ColumnVector.java:1094)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:263)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:266)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:251)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:138)
at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:128)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
at org.apache.spark.util.EventLoop$$anon$1.run

[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...

2016-08-03 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14445
  
@rxin 
I added some comments to Jira.
I think both problems has solutions right now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-08-03 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/13701#discussion_r73304180
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -527,4 +536,43 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   assert(df.filter("_1 IS NOT NULL").count() === 4)
 }
   }
+
+  test("Fiters should be pushed down for vectorized Parquet reader at row 
group level") {
--- End diff --

@viirya 
I mean that we can also add test to check if we correctly push filter into 
ParquetRecordReader.
You know that you're also resolving SPARK-16321 
(https://github.com/apache/spark/pull/14465) ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-08-03 Thread maver1ck
Github user maver1ck commented on a diff in the pull request:

https://github.com/apache/spark/pull/13701#discussion_r73290562
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -527,4 +536,43 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   assert(df.filter("_1 IS NOT NULL").count() === 4)
 }
   }
+
+  test("Fiters should be pushed down for vectorized Parquet reader at row 
group level") {
--- End diff --

What about non-vectorized reader ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14390: [SPARK-15541] Casting ConcurrentHashMap to Concur...

2016-08-02 Thread maver1ck
Github user maver1ck closed the pull request at:

https://github.com/apache/spark/pull/14390


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14390
  
Done.
Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14465: [SPARK-16320][SPARK-16321] Fixing performance regression...

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14465
  
@davies 
No problem.
I just want to isolate the reason of performance regression.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14465: [SPARK-16320][SPARK-16321] Fixing performance reg...

2016-08-02 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/14465

[SPARK-16320][SPARK-16321] Fixing performance regression when reading…

## What changes were proposed in this pull request?

This PR add correct support for PPD when using non-vectorized Parquet 
reader.

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark spark-16320

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14465.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14465


commit 652f557c1e4b0d650e2febc0d36c61b506221dfb
Author: Maciej Brynski <maciej.bryn...@adpilot.pl>
Date:   2016-08-02T20:01:12Z

[SPARK-16320][SPARK-16321] Fixing performance regression when reading 
Parquet




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/13701
  
@gatorsmile 
I added comment in Jira.
"spark.sql.parquet.filterPushdown has true as a default.
Vectorized Reader isn't a case here because I have nested columns (and 
Vectorized Reader works only with Atomic Types)"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/13701
  
I think that this PR also resolves my problem here.

https://issues.apache.org/jira/browse/SPARK-16321?focusedCommentId=15383785=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15383785


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14390
  
As you merged https://github.com/apache/spark/pull/14459 I removed changes 
in Dispatcher.scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14445
  
@rxin 
I tested this patch.
The result are almost equal to Spark without this patch. (difference is 
less than 5%)
So maybe it's needed but it doesn't solve my problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14390
  
I know that. 
And this patch is quite different. On master there are changes only in 
Dispatcher.scala.
On branch-1.6 we need changes also in Catalog.scala.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14390
  
Could you tell me why ?
We need different PR against different branches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14390
  
I added another PR vs master. Using following command to find suspicious 
code.
```
for i in `grep -c -R ConcurrentHashMap | grep -v ':0' | sed -e s/:.*//`; do 
echo $i; grep keySet $i ; done
```

https://github.com/apache/spark/pull/14459


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14459: [SPARK-15541] Casting ConcurrentHashMap to Concur...

2016-08-02 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/14459

[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap

## What changes were proposed in this pull request?

Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with 
Java 8 on Java 7

## How was this patch tested?

Compilation. Existing automatic tests




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark spark-15541-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14459.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14459


commit 471999a144c965e8e61200a7635898281e567771
Author: Maciej Brynski <maciej.bryn...@adpilot.pl>
Date:   2016-08-02T11:28:31Z

[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...

2016-08-02 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14445
  
@rxin
I'll test this patch tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10909: [SPARK-10086] [MLlib] [Streaming] [PySpark] ignore Strea...

2016-07-28 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/10909
  
@jkbradley 
What about merging this to branch-1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap

2016-07-28 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14390
  
@jkbradley 
Could you look at it ?
I think this is problem from: 
https://issues.apache.org/jira/browse/SPARK-10086
Maybe we should merge this PR to branch-1.6 before testing ?
https://github.com/apache/spark/pull/10909/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14390: [SPARK-15541] Casting ConcurrentHashMap to Concur...

2016-07-28 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/14390

[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap

## What changes were proposed in this pull request?

Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with 
Java 8 on Java 7

## How was this patch tested?

Compilation. Existing automatic tests




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark spark-15541

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14390.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14390


commit 921e3c80869d251fd1ecfd78462fa6a2cd0566d5
Author: Maciej Brynski <maciej.bryn...@adpilot.pl>
Date:   2016-07-28T07:49:12Z

[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11445: [SPARK-13594][SQL] remove typed operations(e.g. map, fla...

2016-07-19 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/11445
  
@rxin 
As we're not planning to implement DataSets in Python is there a plan to 
revert this Jira ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14142: [SPARK-16439] Fix number formatting in SQL UI

2016-07-13 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14142
  
Merging ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14142: [SPARK-16439] Fix number formatting in SQL UI

2016-07-12 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14142
  
Can we test this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...

2016-07-12 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/12913
  
Up ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14142: [SPARK-16439] Fix number formatting in SQL UI

2016-07-11 Thread maver1ck
GitHub user maver1ck opened a pull request:

https://github.com/apache/spark/pull/14142

[SPARK-16439] Fix number formatting in SQL UI

## What changes were proposed in this pull request?

Spark SQL UI display numbers greater than 1000 with u00A0 as grouping 
separator.
Problem exists where server locale has no-breaking space as separator.
This patch turns off grouping and remove this separator.

## How was this patch tested?

Manual UI tests. Screenshot attached.


![image](https://cloud.githubusercontent.com/assets/4006010/16749556/5cb5a372-47cb-11e6-9a95-67fd3f9d1c71.png)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maver1ck/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14142


commit fef15cef2fb90c5dc06332723a14958bb584ed5c
Author: Maciej Brynski <maciej.bryn...@adpilot.pl>
Date:   2016-07-11T22:53:54Z

[SPARK-16439] Fix number formatting in SQL UI




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

2016-07-05 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/14054
  
@srowen 
Maybe we can add this as a configuration option ?

I'm not sure how this affects performance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13925: [SPARK-16226][SQL]change the way of JDBC commit

2016-07-01 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/13925
  
@srowen 
Maybe we should change this condition to 
`conn.getMetaData().supportsTransactions()` ?
I can prepare PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13925: [SPARK-16226][SQL]change the way of JDBC commit

2016-06-30 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/13925
  
@srowen 
Recently I modified MySQL JDBC driver because both 
supportsDataManipulationTransactionsOnly() and 
supportsDataDefinitionAndDataManipulationTransactions() return false.

So maybe we can change this condition ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10605][SQL] Create native collect_list/...

2016-05-12 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/12874#issuecomment-218849616
  
There is one more thing.
We observed that collect_list doesn't work in Spark 2.0
https://issues.apache.org/jira/browse/SPARK-15293


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10605][SQL] Create native collect_list/...

2016-05-12 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/12874#issuecomment-218721890
  
Hi,
What about this patch ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12200][SQL] Add __contains__ implementa...

2016-05-11 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10194#issuecomment-218553241
  
@holdenk 
Thanks :)

@davies 
I think everything is OK. Can we merge it also into 2.0 branch ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12200][SQL] Add __contains__ implementa...

2016-05-11 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10194#issuecomment-218547285
  
@davies
I fixed whitespaces. Can we test this one more time ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12200][SQL] Add __contains__ implementa...

2016-05-11 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10194#issuecomment-218384288
  
@davies 
But you mentioned current behaviour.

My patch is to change it, so you could access the column by 
`row['col_name']` and `'col_name' in row` will return **True**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-13335 Use declarative aggregate for coll...

2016-04-29 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/11688#issuecomment-215676018
  
Hi,
What about this PR ? 
Will be merged into Spark 2.0 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...

2016-01-19 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10194#issuecomment-172985889
  
@holdenk , @davies 
Can anyone verify this patch ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12504][SQL] Masking credentials in the ...

2016-01-06 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10452#issuecomment-169409283
  
For me this is critical security issue.
So I'd like to have it in 1.6 branch 
(I'm sure that 1.6.1 will be available earlier than 2.0.0)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12504][SQL] Masking credentials in the ...

2016-01-06 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10452#issuecomment-169420636
  
It's not the explain but SQL Tab on Spark web console.
As far as I understand information there are taken from the same source.
Am I right ? 
PS. I'm building Spark with this patch to check this out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12504][SQL] Masking credentials in the ...

2016-01-06 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10452#issuecomment-169395482
  
@marmbrus 
What about merging it to 1.6 branch ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...

2015-12-29 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10194#issuecomment-167743179
  
@holdenk 
Do you need something more ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...

2015-12-25 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/4027#issuecomment-167260638
  
I agree.
In YARN mode we have configuration per node
```
YARN: The --num-executors option to the Spark YARN client controls how many 
executors it will allocate on the cluster, while --executor-memory and 
--executor-cores control the resources per executor.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...

2015-12-24 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10194#issuecomment-167114248
  
@holdenk 
Is it OK to merge this patch ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL]Add subquery (not) in/exists ...

2015-12-15 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/9055#issuecomment-164912029
  
So what next ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...

2015-12-12 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10194#issuecomment-164124983
  
Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...

2015-12-09 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/10194#issuecomment-163521876
  
OK. I will add few words to documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >