[GitHub] spark issue #10527: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} UDFs
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/10527 @vectorijk Is this PR dead ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19234 I think that porting changes from Python 3.6 give us too complicated code. I'm closing it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19234: [SPARK-22010][PySpark] Change fromInternal method...
Github user maver1ck closed the pull request at: https://github.com/apache/spark/pull/19234 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_datatype_...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19255 OK. Let's close it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_da...
Github user maver1ck closed the pull request at: https://github.com/apache/spark/pull/19255 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_datatype_...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19255 Ping @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19566: [SPARK-22341][yarn] Impersonate correct user when prepar...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19566 @vanzin I tested your patch. It worked. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_da...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19255#discussion_r146248695 --- Diff: python/pyspark/sql/types.py --- @@ -24,6 +24,7 @@ import re import base64 from array import array +from functools import lru_cache --- End diff -- I added support for Python < 3.3. What do you think ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18685: [SPARK-21439][PySpark] Support for ABCMeta in PyS...
Github user maver1ck closed the pull request at: https://github.com/apache/spark/pull/18685 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [SPARK-22029][PySpark] Add lru_cache to _parse_datatype_...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19255 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18685: [SPARK-21439][PySpark] Support for ABCMeta in PySpark
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/18685 I realized that changes here was added also in SPARK-21070. https://github.com/apache/spark/commit/751f513367ae776c6d6815e1ce138078924872eb So we can close this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19234 It was introduced with this PEP. https://www.python.org/dev/peps/pep-0495/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19255: [WIP][SPARK-22029][PySpark] Add lru_cache to _parse_data...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19255 @HyukjinKwon I added perf tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19246: [SPARK-22025][PySpark] Speeding up fromInternal for Stru...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19246 @HyukjinKwon I created this before https://github.com/apache/spark/pull/19249, which greatly decrease function call. I agree we can close it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19246: [SPARK-22025][PySpark] Speeding up fromInternal f...
Github user maver1ck closed the pull request at: https://github.com/apache/spark/pull/19246 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19234 OK. It passed all tests, so let merge it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [WIP][SPARK-22010][PySpark] Change fromInternal method o...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19234 I check with some samples and code with float can trigger errors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032][PySpark] Speed up StructType conversion
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19249 @ueshin I think that for Maptype this is not a solution because every key / value of MapType is the same type so we need conversion for all entries or for nothing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19234 I'm asking because such a code is 2x faster than my solution --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19234 Any idea why we're not using `datetime.datetime.fromtimestamp(ts / 10.)` ? There is a comment about overflow. But if it exists ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19260: [SPARK-22043][PYTHON] Improves error message for show_pr...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19260 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032][PySpark] Speed up StructType conversion
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19249 Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19249: [SPARK-22032][PySpark] Speed up StructType.fromIn...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19249#discussion_r139306166 --- Diff: python/pyspark/sql/types.py --- @@ -619,7 +621,8 @@ def fromInternal(self, obj): # it's already converted by pickler return obj if self._needSerializeAnyField: -values = [f.fromInternal(v) for f, v in zip(self.fields, obj)] +values = [f.fromInternal(v) if n else v --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19255: [WIP][SPARK-22029][PySpark] Add lru_cache to _par...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19255#discussion_r139305244 --- Diff: python/pyspark/sql/types.py --- @@ -24,6 +24,7 @@ import re import base64 from array import array +from functools import lru_cache --- End diff -- Or use backported library. https://pypi.python.org/pypi/functools32 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19249: [SPARK-22032][PySpark] Speed up StructType.fromIn...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19249#discussion_r139303509 --- Diff: python/pyspark/sql/types.py --- @@ -619,7 +621,8 @@ def fromInternal(self, obj): # it's already converted by pickler return obj if self._needSerializeAnyField: -values = [f.fromInternal(v) for f, v in zip(self.fields, obj)] +values = [f.fromInternal(v) if n else v --- End diff -- I'll add one more optimization here. And then I'll do benchmarks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19255: [WIP][SPARK-22029] Add lru_cache to _parse_dataty...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19255#discussion_r139292791 --- Diff: python/pyspark/sql/types.py --- @@ -24,6 +24,7 @@ import re import base64 from array import array +from functools import lru_cache --- End diff -- Any ideas for Python 2.7 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19255: [WIP][SPARK-22029] Add lru_cache to _parse_dataty...
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/19255 [WIP][SPARK-22029] Add lru_cache to _parse_datatype_json_string ## What changes were proposed in this pull request? _parse_datatype_json_string is called many times for the same datatypes. By cacheing its result we can speed up pySpark internals. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark_22029 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19255.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19255 commit c903860ee8d25afda0f969b582bdbdaa0aa8c9fe Author: Maciej BryÅski <maciek-git...@brynski.pl> Date: 2017-09-16T18:51:49Z Add lru_cache to _parse_datatype_json_string --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19246: [SPARK-22025] Speeding up fromInternal for Struct...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19246#discussion_r139291502 --- Diff: python/pyspark/sql/types.py --- @@ -410,6 +410,24 @@ def __init__(self, name, dataType, nullable=True, metadata=None): self.dataType = dataType self.nullable = nullable self.metadata = metadata or {} +self.needConversion = dataType.needConversion +self.toInternal = dataType.toInternal +self.fromInternal = dataType.fromInternal + +def __getstate__(self): --- End diff -- We need to handle pickle by ourselves because we have fields with function values --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19249 I added benchmark for this code --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19234#discussion_r139290042 --- Diff: python/pyspark/sql/types.py --- @@ -196,7 +199,9 @@ def toInternal(self, dt): def fromInternal(self, ts): if ts is not None: # using int to avoid precision loss in float -return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) +y, m, d, hh, mm, ss, _, _, _ = (time.gmtime(ts // 100) if _is_utc +else time.localtime(ts // 100)) +return datetime.datetime(y, m, d, hh, mm, ss, ts % 100) --- End diff -- I added some description and support for leap seconds --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19249 Yep. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18685: [SPARK-21439] Support for ABCMeta in PySpark
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/18685 Ping received. I'll try to add tests and resolve conflict --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19249: [SPARK-22032] Speed up StructType.fromInternal
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19249 I was checking this with my production code. This give me about 6-7% of speed up and remove 408 millions of function calls :) I'll try to create benchmark for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19246: [SPARK-22025] Speeding up fromInternal for StructField
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19246 @dongjoon-hyun I'll do it on Monday. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/19234#discussion_r139284824 --- Diff: python/pyspark/sql/types.py --- @@ -196,7 +199,9 @@ def toInternal(self, dt): def fromInternal(self, ts): if ts is not None: # using int to avoid precision loss in float -return datetime.datetime.fromtimestamp(ts // 100).replace(microsecond=ts % 100) +y, m, d, hh, mm, ss, _, _, _ = (time.gmtime(ts // 100) if _is_utc +else time.localtime(ts // 100)) +return datetime.datetime(y, m, d, hh, mm, ss, ts % 100) --- End diff -- I think the only difference is this `ss = min(ss, 59)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19249: [SPARK-22032] Speed up StructType.fromInternal
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/19249 [SPARK-22032] Speed up StructType.fromInternal ## What changes were proposed in this pull request? StructType.fromInternal is calling f.fromInternal(v) for every field. We can use needConversion method to limit the number of function calls ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark_22032 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19249 commit aa69a72d71c55e93b487ac28910b9187c0c71088 Author: Maciej BryÅski <maciek-git...@brynski.pl> Date: 2017-09-15T18:01:40Z Update types.py --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19246: [SPARK-22025] Speeding up fromInternal for Struct...
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/19246 [SPARK-22025] Speeding up fromInternal for StructField ## What changes were proposed in this pull request? Change function call to references can greatly speed up function calling. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark_22025 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19246.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19246 commit e3dfd221dbdd1bd3ba0226cc7b9cafe939cd1676 Author: Maciej BryÅski <maciek-git...@brynski.pl> Date: 2017-09-15T13:02:49Z Change function call to references --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19234: [SPARK-22010] Change fromInternal method of Times...
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/19234 [SPARK-22010] Change fromInternal method of TimestampType ## What changes were proposed in this pull request? This PR changes the way pySpark converts Timestamp format from internal to Python representation. **Benchmarks** Before change: 4.58 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 10 loops each) After change: System with UTC timezone 1.49 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) Other timezones: 3.15 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 10 loops each) ## How was this patch tested? Existing tests. Performance benchmarks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark_22010 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19234.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19234 commit 238b5563e444b6b936f2e2771ec7876f648af1e9 Author: Maciej BryÅski <maciek-git...@brynski.pl> Date: 2017-09-14T14:56:52Z Change internal Timestamp conversion commit 0cb2a482a41711531a9367b88bf1558f5c87ac4c Author: Maciej BryÅski <maciek-git...@brynski.pl> Date: 2017-09-14T14:58:50Z Typo fix commit 02301eb4aa8686fcafdeba3b13ec772be8938ed6 Author: Maciej BryÅski <maciek-git...@brynski.pl> Date: 2017-09-14T15:07:22Z Import fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18685: Add Weakref to cloudpickle
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/18685 Add Weakref to cloudpickle https://github.com/cloudpipe/cloudpickle/pull/104/files ## What changes were proposed in this pull request? Possibility to use ABCMeta with Spark. ## How was this patch tested? Manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark SPARK-21439 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18685.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18685 commit 8f51cfd7ce4d21dfc190298fefc889e260ee3a00 Author: Maciej BryÅski <maciek-git...@brynski.pl> Date: 2017-07-19T20:28:09Z Add Weakref to cloudpickle https://github.com/cloudpipe/cloudpickle/pull/104/files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17722: [SPARK-12717][PYSPARK][BRANCH-1.6] Resolving race condit...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/17722 Hi, What about this issue ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18515: [SPARK-21287] Ability to use Integer.MIN_VALUE as...
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/18515 [SPARK-21287] Ability to use Integer.MIN_VALUE as a fetchSize ## What changes were proposed in this pull request? FIX for https://issues.apache.org/jira/browse/SPARK-21287 ## How was this patch tested? Existing automated tests + manual tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark-21287 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18515.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18515 commit 97b6d7bc74b1895db0c772b4c0de726c6be2c3f0 Author: Maciej BryÅski <maciek-git...@brynski.pl> Date: 2017-07-03T12:46:29Z Update JDBCOptions.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/17694 @vundela Great. But I'm planning to migrate to 2.1 as soon as 2.1.1 will be released. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/17694 OK. I did additional tests. Fix is working only with Spark 2.1. I tried to apply it on 2.0.2 and that was the reason of my problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/17694 I checked pyspark.zip of running container and everything is on its place. So I assume that there is more that one race condition in this code. I'll try to prepare example of the problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/17694 The funny thing is this code works for me on 4 threads and throws exception on 10 threads --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/17694 I tested your patch in our environment. Problem still exists. ``` Job aborted due to stage failure: Task 0 in stage 22.0 failed 8 times, most recent failure: Lost task 0.7 in stage 22.0 (TID 138, dwh-hn30.adpilot.co): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/worker.py", line 161, in main func, profiler, deserializer, serializer = read_command(pickleSer, infile) File "/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/worker.py", line 54, in read_command command = serializer._read_with_length(file) File "/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/serializers.py", line 419, in loads return pickle.loads(obj, encoding=encoding) File "/grid/3/hadoop/yarn/log/usercache/bi/appcache/application_1492634694033_0092/container_e538_1492634694033_0092_01_03/pyspark.zip/pyspark/broadcast.py", line 39, in _from_id raise Exception("Broadcast variable '%s' not loaded!" % bid) Exception: Broadcast variable '22' not loaded! at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) at org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:234) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure a
[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/17328 Looks good :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15599: [SPARK-18022][SQL] java.lang.NullPointerException instea...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/15599 I can try this fix on Monday. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15106: [SPARK-16439] [SQL] bring back the separator in SQL UI
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/15106 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15106: [SPARK-16439] [SQL] bring back the separator in SQL UI
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/15106 I think this patch could actually work. Number format is executed on the server side. I did some tests and it looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14340: [SPARK-16534][Streaming][Kafka] Add Python API support f...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14340 @rxin, Production streaming jobs can be written in Python and me and my company are example. I wrote a little bit more in Jira. I think it's better place for discussion. https://issues.apache.org/jira/browse/SPARK-16534?focusedCommentId=15491107=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15491107 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14388 @viirya I will after the weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14465: [SPARK-16321] Fixing performance regression when reading...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14465 OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14465: [SPARK-16321] Fixing performance regression when ...
Github user maver1ck closed the pull request at: https://github.com/apache/spark/pull/14465 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14390: [SPARK-15541] Casting ConcurrentHashMap to Concur...
GitHub user maver1ck reopened a pull request: https://github.com/apache/spark/pull/14390 [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (branch-1.6) ## What changes were proposed in this pull request? Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with Java 8 on Java 7 ## How was this patch tested? Compilation. Existing automatic tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark-15541 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14390.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14390 commit 921e3c80869d251fd1ecfd78462fa6a2cd0566d5 Author: Maciej Brynski <maciej.bryn...@adpilot.pl> Date: 2016-07-28T07:49:12Z [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap commit 3ffbff134dd72dfaeb890fbb39f9d7b963129c4e Author: Maciej Brynski <maciej.bryn...@adpilot.pl> Date: 2016-07-28T07:58:39Z Fix for style error commit 7bd4f4487bf9445b9ddeb961ed664bdf30b496c2 Author: Maciej Brynski <maciej.bryn...@adpilot.pl> Date: 2016-07-28T08:08:26Z Fix whitespaces commit ea2810fb0b793588e714e9385d668c1cfe59ca7f Author: Maciej Brynski <maciej.bryn...@adpilot.pl> Date: 2016-08-02T18:42:14Z Remove changes in Dispatcher.scala #14459 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14390 @srowen I missed one change in Catalog.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14388 @viirya I tried to test your patch on my production workflow. Getting: ``` Py4JJavaError: An error occurred while calling o56.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 1.0 failed 1 times, most recent failure: Lost task 20.0 in stage 1.0 (TID 21, 188.165.13.157): java.lang.ArrayIndexOutOfBoundsException: 4096 at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.putIntsLittleEndian(OnHeapColumnVector.java:221) at org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readIntegers(VectorizedPlainValuesReader.java:68) at org.apache.spark.sql.execution.datasources.parquet.VectorizedRleValuesReader.readIntegers(VectorizedRleValuesReader.java:189) at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readIntBatch(VectorizedColumnReader.java:388) at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:247) at org.apache.spark.sql.execution.vectorized.ColumnVector.readBatch(ColumnVector.java:1094) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:263) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:266) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:251) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:138) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:128) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607) at org.apache.spark.util.EventLoop$$anon$1.run
[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14445 @rxin I added some comments to Jira. I think both problems has solutions right now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73304180 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -527,4 +536,43 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex assert(df.filter("_1 IS NOT NULL").count() === 4) } } + + test("Fiters should be pushed down for vectorized Parquet reader at row group level") { --- End diff -- @viirya I mean that we can also add test to check if we correctly push filter into ParquetRecordReader. You know that you're also resolving SPARK-16321 (https://github.com/apache/spark/pull/14465) ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...
Github user maver1ck commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73290562 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -527,4 +536,43 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex assert(df.filter("_1 IS NOT NULL").count() === 4) } } + + test("Fiters should be pushed down for vectorized Parquet reader at row group level") { --- End diff -- What about non-vectorized reader ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14390: [SPARK-15541] Casting ConcurrentHashMap to Concur...
Github user maver1ck closed the pull request at: https://github.com/apache/spark/pull/14390 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14390 Done. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14465: [SPARK-16320][SPARK-16321] Fixing performance regression...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14465 @davies No problem. I just want to isolate the reason of performance regression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14465: [SPARK-16320][SPARK-16321] Fixing performance reg...
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/14465 [SPARK-16320][SPARK-16321] Fixing performance regression when reading⦠## What changes were proposed in this pull request? This PR add correct support for PPD when using non-vectorized Parquet reader. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark-16320 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14465.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14465 commit 652f557c1e4b0d650e2febc0d36c61b506221dfb Author: Maciej Brynski <maciej.bryn...@adpilot.pl> Date: 2016-08-02T20:01:12Z [SPARK-16320][SPARK-16321] Fixing performance regression when reading Parquet --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/13701 @gatorsmile I added comment in Jira. "spark.sql.parquet.filterPushdown has true as a default. Vectorized Reader isn't a case here because I have nested columns (and Vectorized Reader works only with Atomic Types)" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/13701 I think that this PR also resolves my problem here. https://issues.apache.org/jira/browse/SPARK-16321?focusedCommentId=15383785=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15383785 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14390 As you merged https://github.com/apache/spark/pull/14459 I removed changes in Dispatcher.scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14445 @rxin I tested this patch. The result are almost equal to Spark without this patch. (difference is less than 5%) So maybe it's needed but it doesn't solve my problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14390 I know that. And this patch is quite different. On master there are changes only in Dispatcher.scala. On branch-1.6 we need changes also in Catalog.scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14390 Could you tell me why ? We need different PR against different branches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14390 I added another PR vs master. Using following command to find suspicious code. ``` for i in `grep -c -R ConcurrentHashMap | grep -v ':0' | sed -e s/:.*//`; do echo $i; grep keySet $i ; done ``` https://github.com/apache/spark/pull/14459 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14459: [SPARK-15541] Casting ConcurrentHashMap to Concur...
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/14459 [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap ## What changes were proposed in this pull request? Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with Java 8 on Java 7 ## How was this patch tested? Compilation. Existing automatic tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark-15541-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14459.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14459 commit 471999a144c965e8e61200a7635898281e567771 Author: Maciej Brynski <maciej.bryn...@adpilot.pl> Date: 2016-08-02T11:28:31Z [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14445 @rxin I'll test this patch tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10909: [SPARK-10086] [MLlib] [Streaming] [PySpark] ignore Strea...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/10909 @jkbradley What about merging this to branch-1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14390: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14390 @jkbradley Could you look at it ? I think this is problem from: https://issues.apache.org/jira/browse/SPARK-10086 Maybe we should merge this PR to branch-1.6 before testing ? https://github.com/apache/spark/pull/10909/files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14390: [SPARK-15541] Casting ConcurrentHashMap to Concur...
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/14390 [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap ## What changes were proposed in this pull request? Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with Java 8 on Java 7 ## How was this patch tested? Compilation. Existing automatic tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark spark-15541 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14390.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14390 commit 921e3c80869d251fd1ecfd78462fa6a2cd0566d5 Author: Maciej Brynski <maciej.bryn...@adpilot.pl> Date: 2016-07-28T07:49:12Z [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11445: [SPARK-13594][SQL] remove typed operations(e.g. map, fla...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/11445 @rxin As we're not planning to implement DataSets in Python is there a plan to revert this Jira ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14142: [SPARK-16439] Fix number formatting in SQL UI
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14142 Merging ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14142: [SPARK-16439] Fix number formatting in SQL UI
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14142 Can we test this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/12913 Up ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14142: [SPARK-16439] Fix number formatting in SQL UI
GitHub user maver1ck opened a pull request: https://github.com/apache/spark/pull/14142 [SPARK-16439] Fix number formatting in SQL UI ## What changes were proposed in this pull request? Spark SQL UI display numbers greater than 1000 with u00A0 as grouping separator. Problem exists where server locale has no-breaking space as separator. This patch turns off grouping and remove this separator. ## How was this patch tested? Manual UI tests. Screenshot attached. ![image](https://cloud.githubusercontent.com/assets/4006010/16749556/5cb5a372-47cb-11e6-9a95-67fd3f9d1c71.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/maver1ck/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14142 commit fef15cef2fb90c5dc06332723a14958bb584ed5c Author: Maciej Brynski <maciej.bryn...@adpilot.pl> Date: 2016-07-11T22:53:54Z [SPARK-16439] Fix number formatting in SQL UI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/14054 @srowen Maybe we can add this as a configuration option ? I'm not sure how this affects performance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13925: [SPARK-16226][SQL]change the way of JDBC commit
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/13925 @srowen Maybe we should change this condition to `conn.getMetaData().supportsTransactions()` ? I can prepare PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13925: [SPARK-16226][SQL]change the way of JDBC commit
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/13925 @srowen Recently I modified MySQL JDBC driver because both supportsDataManipulationTransactionsOnly() and supportsDataDefinitionAndDataManipulationTransactions() return false. So maybe we can change this condition ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10605][SQL] Create native collect_list/...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/12874#issuecomment-218849616 There is one more thing. We observed that collect_list doesn't work in Spark 2.0 https://issues.apache.org/jira/browse/SPARK-15293 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10605][SQL] Create native collect_list/...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/12874#issuecomment-218721890 Hi, What about this patch ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12200][SQL] Add __contains__ implementa...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10194#issuecomment-218553241 @holdenk Thanks :) @davies I think everything is OK. Can we merge it also into 2.0 branch ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12200][SQL] Add __contains__ implementa...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10194#issuecomment-218547285 @davies I fixed whitespaces. Can we test this one more time ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12200][SQL] Add __contains__ implementa...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10194#issuecomment-218384288 @davies But you mentioned current behaviour. My patch is to change it, so you could access the column by `row['col_name']` and `'col_name' in row` will return **True**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-13335 Use declarative aggregate for coll...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/11688#issuecomment-215676018 Hi, What about this PR ? Will be merged into Spark 2.0 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10194#issuecomment-172985889 @holdenk , @davies Can anyone verify this patch ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12504][SQL] Masking credentials in the ...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10452#issuecomment-169409283 For me this is critical security issue. So I'd like to have it in 1.6 branch (I'm sure that 1.6.1 will be available earlier than 2.0.0) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12504][SQL] Masking credentials in the ...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10452#issuecomment-169420636 It's not the explain but SQL Tab on Spark web console. As far as I understand information there are taken from the same source. Am I right ? PS. I'm building Spark with this patch to check this out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12504][SQL] Masking credentials in the ...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10452#issuecomment-169395482 @marmbrus What about merging it to 1.6 branch ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10194#issuecomment-167743179 @holdenk Do you need something more ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/4027#issuecomment-167260638 I agree. In YARN mode we have configuration per node ``` YARN: The --num-executors option to the Spark YARN client controls how many executors it will allocate on the cluster, while --executor-memory and --executor-cores control the resources per executor. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10194#issuecomment-167114248 @holdenk Is it OK to merge this patch ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL]Add subquery (not) in/exists ...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/9055#issuecomment-164912029 So what next ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10194#issuecomment-164124983 Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-12200 Add __contains__ implementation to...
Github user maver1ck commented on the pull request: https://github.com/apache/spark/pull/10194#issuecomment-163521876 OK. I will add few words to documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org