Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/10527
@vectorijk Is this PR dead ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19234
I think that porting changes from Python 3.6 give us too complicated code.
I'm closing it.
---
-
To unsubscribe, e
Github user maver1ck closed the pull request at:
https://github.com/apache/spark/pull/19234
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19255
OK. Let's close it.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: re
Github user maver1ck closed the pull request at:
https://github.com/apache/spark/pull/19255
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19255
Ping @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19566
@vanzin
I tested your patch. It worked.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/19255#discussion_r146248695
--- Diff: python/pyspark/sql/types.py ---
@@ -24,6 +24,7 @@
import re
import base64
from array import array
+from functools import
Github user maver1ck closed the pull request at:
https://github.com/apache/spark/pull/18685
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19255
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/18685
I realized that changes here was added also in SPARK-21070.
https://github.com/apache/spark/commit/751f513367ae776c6d6815e1ce138078924872eb
So we can close this PR
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19234
It was introduced with this PEP.
https://www.python.org/dev/peps/pep-0495/
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19255
@HyukjinKwon
I added perf tests.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19246
@HyukjinKwon
I created this before https://github.com/apache/spark/pull/19249, which
greatly decrease function call.
I agree we can close it
Github user maver1ck closed the pull request at:
https://github.com/apache/spark/pull/19246
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19234
OK. It passed all tests, so let merge it
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19234
I check with some samples and code with float can trigger errors.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19249
@ueshin
I think that for Maptype this is not a solution because every key / value
of MapType is the same type so we need conversion for all entries or for nothing
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19234
I'm asking because such a code is 2x faster than my solution
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apach
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19234
Any idea why we're not using
`datetime.datetime.fromtimestamp(ts / 10.)` ?
There is a comment about overflow. But if it e
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19260
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19249
Done.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/19249#discussion_r139306166
--- Diff: python/pyspark/sql/types.py ---
@@ -619,7 +621,8 @@ def fromInternal(self, obj):
# it's already converted by pi
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/19255#discussion_r139305244
--- Diff: python/pyspark/sql/types.py ---
@@ -24,6 +24,7 @@
import re
import base64
from array import array
+from functools import
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/19249#discussion_r139303509
--- Diff: python/pyspark/sql/types.py ---
@@ -619,7 +621,8 @@ def fromInternal(self, obj):
# it's already converted by pi
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/19255#discussion_r139292791
--- Diff: python/pyspark/sql/types.py ---
@@ -24,6 +24,7 @@
import re
import base64
from array import array
+from functools import
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/19255
[WIP][SPARK-22029] Add lru_cache to _parse_datatype_json_string
## What changes were proposed in this pull request?
_parse_datatype_json_string is called many times for the same datatypes
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/19246#discussion_r139291502
--- Diff: python/pyspark/sql/types.py ---
@@ -410,6 +410,24 @@ def __init__(self, name, dataType, nullable=True,
metadata=None
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19249
I added benchmark for this code
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/19234#discussion_r139290042
--- Diff: python/pyspark/sql/types.py ---
@@ -196,7 +199,9 @@ def toInternal(self, dt):
def fromInternal(self, ts):
if ts is not None
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19249
Yep.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/18685
Ping received.
I'll try to add tests and resolve conflict
---
-
To unsubscribe, e-mail: reviews-uns
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19249
I was checking this with my production code.
This give me about 6-7% of speed up and remove 408 millions of function
calls :)
I'll try to create benchmark for
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/19246
@dongjoon-hyun
I'll do it on Monday.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For addit
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/19234#discussion_r139284824
--- Diff: python/pyspark/sql/types.py ---
@@ -196,7 +199,9 @@ def toInternal(self, dt):
def fromInternal(self, ts):
if ts is not None
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/19249
[SPARK-22032] Speed up StructType.fromInternal
## What changes were proposed in this pull request?
StructType.fromInternal is calling f.fromInternal(v) for every field.
We can use
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/19246
[SPARK-22025] Speeding up fromInternal for StructField
## What changes were proposed in this pull request?
Change function call to references can greatly speed up function calling
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/19234
[SPARK-22010] Change fromInternal method of TimestampType
## What changes were proposed in this pull request?
This PR changes the way pySpark converts Timestamp format from internal to
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/18685
Add Weakref to cloudpickle
https://github.com/cloudpipe/cloudpickle/pull/104/files
## What changes were proposed in this pull request?
Possibility to use ABCMeta with Spark
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/17722
Hi,
What about this issue ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/18515
[SPARK-21287] Ability to use Integer.MIN_VALUE as a fetchSize
## What changes were proposed in this pull request?
FIX for https://issues.apache.org/jira/browse/SPARK-21287
## How
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/17694
@vundela
Great.
But I'm planning to migrate to 2.1 as soon as 2.1.1 will be released.
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/17694
OK. I did additional tests.
Fix is working only with Spark 2.1.
I tried to apply it on 2.0.2 and that was the reason of my problem.
---
If your project is set up for it, you can reply to
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/17694
I checked pyspark.zip of running container and everything is on its place.
So I assume that there is more that one race condition in this code.
I'll try to prepare example o
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/17694
The funny thing is this code works for me on 4 threads and throws exception
on 10 threads
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/17694
I tested your patch in our environment.
Problem still exists.
```
Job aborted due to stage failure: Task 0 in stage 22.0 failed 8 times, most
recent failure: Lost task 0.7 in stage
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/17328
Looks good :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/15599
I can try this fix on Monday.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/15106
LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/15106
I think this patch could actually work.
Number format is executed on the server side.
I did some tests and it looks good.
---
If your project is set up for it, you can reply to this
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14340
@rxin,
Production streaming jobs can be written in Python and me and my company
are example.
I wrote a little bit more in Jira. I think it's better place for discussion.
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14388
@viirya
I will after the weekend.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14465
OK.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user maver1ck closed the pull request at:
https://github.com/apache/spark/pull/14465
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user maver1ck reopened a pull request:
https://github.com/apache/spark/pull/14390
[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (branch-1.6)
## What changes were proposed in this pull request?
Casting ConcurrentHashMap to ConcurrentMap allows to run code
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14390
@srowen
I missed one change in Catalog.scala
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14388
@viirya
I tried to test your patch on my production workflow.
Getting:
```
Py4JJavaError: An error occurred while calling o56.count.
: org.apache.spark.SparkException: Job
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14445
@rxin
I added some comments to Jira.
I think both problems has solutions right now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/13701#discussion_r73304180
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
---
@@ -527,4 +536,43 @@ class
Github user maver1ck commented on a diff in the pull request:
https://github.com/apache/spark/pull/13701#discussion_r73290562
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
---
@@ -527,4 +536,43 @@ class
Github user maver1ck closed the pull request at:
https://github.com/apache/spark/pull/14390
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14390
Done.
Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14465
@davies
No problem.
I just want to isolate the reason of performance regression.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/14465
[SPARK-16320][SPARK-16321] Fixing performance regression when readingâ¦
## What changes were proposed in this pull request?
This PR add correct support for PPD when using non-vectorized
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/13701
@gatorsmile
I added comment in Jira.
"spark.sql.parquet.filterPushdown has true as a default.
Vectorized Reader isn't a case here because I have nested columns (and
Vectori
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/13701
I think that this PR also resolves my problem here.
https://issues.apache.org/jira/browse/SPARK-16321?focusedCommentId=15383785&page=com.atlassian.jira.plugin.system.issuetabpanels:com
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14390
As you merged https://github.com/apache/spark/pull/14459 I removed changes
in Dispatcher.scala.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14445
@rxin
I tested this patch.
The result are almost equal to Spark without this patch. (difference is
less than 5%)
So maybe it's needed but it doesn't solve my problem.
-
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14390
I know that.
And this patch is quite different. On master there are changes only in
Dispatcher.scala.
On branch-1.6 we need changes also in Catalog.scala.
---
If your project is set up
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14390
Could you tell me why ?
We need different PR against different branches.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14390
I added another PR vs master. Using following command to find suspicious
code.
```
for i in `grep -c -R ConcurrentHashMap | grep -v ':0' | sed -e s/:.*//`; do
echo $i; grep
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/14459
[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap
## What changes were proposed in this pull request?
Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14445
@rxin
I'll test this patch tomorrow.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fe
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/10909
@jkbradley
What about merging this to branch-1.6
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14390
@jkbradley
Could you look at it ?
I think this is problem from:
https://issues.apache.org/jira/browse/SPARK-10086
Maybe we should merge this PR to branch-1.6 before testing ?
https
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/14390
[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap
## What changes were proposed in this pull request?
Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/11445
@rxin
As we're not planning to implement DataSets in Python is there a plan to
revert this Jira ?
---
If your project is set up for it, you can reply to this email and have your
reply a
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14142
Merging ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14142
Can we test this ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/12913
Up ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
GitHub user maver1ck opened a pull request:
https://github.com/apache/spark/pull/14142
[SPARK-16439] Fix number formatting in SQL UI
## What changes were proposed in this pull request?
Spark SQL UI display numbers greater than 1000 with u00A0 as grouping
separator
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/14054
@srowen
Maybe we can add this as a configuration option ?
I'm not sure how this affects performance.
---
If your project is set up for it, you can reply to this email and have
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/13925
@srowen
Maybe we should change this condition to
`conn.getMetaData().supportsTransactions()` ?
I can prepare PR.
---
If your project is set up for it, you can reply to this email and
Github user maver1ck commented on the issue:
https://github.com/apache/spark/pull/13925
@srowen
Recently I modified MySQL JDBC driver because both
supportsDataManipulationTransactionsOnly() and
supportsDataDefinitionAndDataManipulationTransactions() return false.
So
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/12874#issuecomment-218849616
There is one more thing.
We observed that collect_list doesn't work in Spark 2.0
https://issues.apache.org/jira/browse/SPARK-15293
---
If your project i
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/12874#issuecomment-218721890
Hi,
What about this patch ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10194#issuecomment-218553241
@holdenk
Thanks :)
@davies
I think everything is OK. Can we merge it also into 2.0 branch ?
---
If your project is set up for it, you can reply to
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10194#issuecomment-218547285
@davies
I fixed whitespaces. Can we test this one more time ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10194#issuecomment-218384288
@davies
But you mentioned current behaviour.
My patch is to change it, so you could access the column by
`row['col_name']` and `'col_nam
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/11688#issuecomment-215676018
Hi,
What about this PR ?
Will be merged into Spark 2.0 ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10194#issuecomment-172985889
@holdenk , @davies
Can anyone verify this patch ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10452#issuecomment-169420636
It's not the explain but SQL Tab on Spark web console.
As far as I understand information there are taken from the same source.
Am I right ?
PS
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10452#issuecomment-169409283
For me this is critical security issue.
So I'd like to have it in 1.6 branch
(I'm sure that 1.6.1 will be available earlier than 2.0.0)
---
If yo
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10452#issuecomment-169395482
@marmbrus
What about merging it to 1.6 branch ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10194#issuecomment-167743179
@holdenk
Do you need something more ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/4027#issuecomment-167260638
I agree.
In YARN mode we have configuration per node
```
YARN: The --num-executors option to the Spark YARN client controls how many
executors it will
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10194#issuecomment-167114248
@holdenk
Is it OK to merge this patch ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/9055#issuecomment-164912029
So what next ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10194#issuecomment-164124983
Done.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user maver1ck commented on the pull request:
https://github.com/apache/spark/pull/10194#issuecomment-163521876
OK. I will add few words to documentation.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
1 - 100 of 110 matches
Mail list logo