Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Maciej
+1 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 4/25/24 6:21 PM, Reynold Xin wrote: +1 On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale wrote: +1 On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun wrote: FYI, there is a proposal to drop

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-15 Thread Maciej
+1 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 4/15/24 8:16 PM, Rui Wang wrote: +1, non-binding. Thanks Dongjoon to drive this! -Rui On Mon, Apr 15, 2024 at 10:10 AM Xinrong Meng wrote: +1 Thank you @Dongjoon Hyun <mailto:dongjoo

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Maciej
+1 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 9/26/23 17:12, Michel Miotto Barbosa wrote: +1 A disposição | At your disposal Michel Miotto Barbosa https://www.linkedin.com/in/michelmiottobarbosa/ mmiottobarb...@gmail.com +55 11 984 342 347 On Tue

Re: LLM script for error message improvement

2023-08-04 Thread Maciej
used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase “Generated-by: ”.' and consider adjusting PR template / merge tool accordingly. Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP

Re: LLM script for error message improvement

2023-08-03 Thread Maciej
, with an official opinion from the ASF as the copyright owner. WDYT All? Shall we start a separate discussion? Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 8/3/23 18:33, Haejoon Lee wrote: Additional information: Please check https://issues.apache.org/jira

Re: [VOTE] SPIP: XML data source support

2023-07-29 Thread Maciej
+1 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 7/29/23 11:28, Mich Talebzadeh wrote: +1 for me. Though Databriks did a good job releasing the code. GitHub - databricks/spark-xml: XML data source for Spark SQL and DataFrames <https://github.

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Maciej
That's a great idea, as long as we can keep additional dependencies under control. Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 7/19/23 18:22, Franco Patano wrote: +1 Many people have struggled with incorporating this separate library into their Spark

Re: [VOTE][SPIP] Python Data Source API

2023-07-06 Thread Maciej
+0 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 7/6/23 17:41, Xiao Li wrote: +1 Xiao Hyukjin Kwon 于2023年7月5日周三 17:28写道: +1. See https://youtu.be/yj7XlTB1Jvc?t=604 :-). On Thu, 6 Jul 2023 at 09:15, Allison Wang wrote: Hi all

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-25 Thread Maciej
experience in terms of reliability and execution cost. Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 6/24/23 23:42, Martin Grund wrote: Hey, I would like to express my strong support for Python Data Sources even though they might not be immediately as powerful

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-24 Thread Maciej
sources through 3rd party FDWs? Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 6/20/23 16:23, Wenchen Fan wrote: In an ideal world, every data source you want to connect to already has a Spark data source implementation (either v1 or v2), then this Python API

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Maciej
+1 -- Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 6/21/23 17:35, Holden Karau wrote: A small request, it’s pride weekend in San Francisco where some of the core developers are and right before one of the larger spark related conferences so more folks

Re: [VOTE] Apache Spark PMC asks Databricks to differentiate its Spark version string

2023-06-20 Thread Maciej
fine, as argued at https://lists.apache.org/thread/p15tc772j9qwyvn852sh8ksmzrol9cof - There is no argument any of this has caused a problem for the community anyway; there is just nothing to 'fix' I would again ask we not simply repeat the same thread again. -- Best rega

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-20 Thread Maciej
extensible or customizable sources, in case there is such a need. -- Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 6/20/23 05:19, Hyukjin Kwon wrote: Actually I support this idea in a way that Python developers don't have to learn Scala to write their own source

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread Maciej
Spark Connect, effectively limiting the target audience for any 3rd party library. > Martin > > > On Fri, May 26, 2023 at 5:39 PM Maciej > wrote: > > It might be a good idea to have a discussion about how new connect > clients fit into the overall process we ha

Re: [CONNECT] New Clients for Go and Rust

2023-05-26 Thread Maciej
connect is, it is not exactly a replacement for many existing deployments. Furthermore, it doesn't make extending Spark much easier and the current ecosystem is, subjectively speaking, a bit brittle. -- Best regards, Maciej On 5/26/23 07:26, Martin Grund wrote: Thanks everyone for your feedbac

Re: [DISCUSS] Add SQL functions into Scala, Python and R API

2023-05-26 Thread Maciej
Weren't some of these functions provided only for compatibility  and intentionally left out of the language APIs? -- Best regards, Maciej On 5/25/23 23:21, Hyukjin Kwon wrote: I don't think it'd be a release blocker .. I think we can implement them across multiple releases. On Fri, May 26

Re: [CONNECT] New Clients for Go and Rust

2023-05-19 Thread Maciej
is particularly active, as far as I'm aware.  Taking responsibility for more clients, without being sure that we have resources to maintain them and there is enough community around them to make such effort worthwhile, doesn't seem like a good idea. -- Best regards, Maciej Szymkiewicz

Re: Slack for Spark Community: Merging various threads

2023-04-08 Thread Maciej
of us. -- Maciej On 4/7/23 21:02, Bjørn Jørgensen wrote: Yes, I have done some search for slack alternatives <https://itsfoss.com/open-source-slack-alternative/> I feel that we should do some search, to find if there can be a better solution than slack. For what I have found, there a

Re: Slack for Spark Community: Merging various threads

2023-04-06 Thread Maciej
-- Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 4/6/23 17:13, Denny Lee wrote: Thanks Dongjoon, but I don't think this is misleading insofar that this is not a /self-service process/ but an invite process which admittedly I did not state explicitly in my

Re: Slack for Spark Community: Merging various threads

2023-04-06 Thread Maciej
the various aspects of Slack (code of conduct, linen.dev <http://linen.dev> and search/archive process, invite management, etc.). HTH! Denny -- Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: [DISCUSS] Make release cadence predictable

2023-02-15 Thread Maciej
every year regardless of the actual release date. I believe it both makes the release cadence predictable, and relaxes the burden about making releases. WDYT? -- Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description

Re: How can I get the same spark context in two different python processes

2022-12-13 Thread Maciej
for that matter) so don't do it unless you fully understand the implications (including, but not limited to, risk of leaking the token). Use this approach at your own risk. On 12/13/22 03:52, Kevin Su wrote: Maciej, Thanks for the reply. Could you share an example to achieve it? Maciej

Re: How can I get the same spark context in two different python processes

2022-12-12 Thread Maciej
A. How can I achieve that? I've tried  pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but it will create a new spark context. -- Best regards, Maciej Szymkiewicz Web: https://zero323.

Re: Syndicate Apache Spark Twitter to Mastodon?

2022-12-11 Thread Maciej
tps://twitter.com/holdenkarau> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau <https://www.youtube.com/user/holdenkarau> -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: [VOTE][SPIP] Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Maciej
- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org <mailto:dev-unsubscr...@spark.apache.org> -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: Is it possible to specify explicitly map() key/value types?

2022-08-27 Thread Maciej
eate a map by specifying the key-value type explicitly? So far, I came up with a workaround using map('', '') to initialise the map for string key-value and using map_filter() to exclude/remove the redundant map('', '') key-value item: val mergeExpr = expr("map_filter(aggregate(data, map

Re: [DISCUSS] [Spark SQL, PySpark] Combining StructTypes into a new StructType

2022-08-14 Thread Maciej
--- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org <mailto:dev-unsubscr...@spark.apache.org> -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: Welcome Xinrong Meng as a Spark committer

2022-08-10 Thread Maciej
--- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org <mailto:dev-unsubscr...@spark.apache.org> -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: Welcoming three new PMC members

2022-08-10 Thread Maciej
> > >>>>> The Spark PMC > > > > > > > > > > > > -- > > > Takuya UESHIN > > > > > > > ---

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-17 Thread Maciej
quot; in > many places when we: import pyspark.pandas as ps. > This is similar to "Structured Streaming" in JIRA, and "SS" in PR title. > > I think it'd be easier to track the changes here with that. > Currently it's a bit difficult to identify it fr

Re: Apache Spark 3.3 Release

2022-04-29 Thread Maciej
>> > >> >> Let me clarify my above suggestion. Maybe > we can wait 3 more days to collect the list of > actively developed PRs that we want to merge to 3.3 > after the branch cut? >

Re: Apache Spark 3.3 Release

2022-03-06 Thread Maciej
here are any remaining works for Spark > 3.3, and switch to QA mode, cut a branch and keep everything on track. I > would like to volunteer to help drive this process. > > Best regards, > Max Gekk -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: [How To] run test suites for specific module

2022-01-24 Thread Maciej
d really appreciate any suggestion or comment. > > > Best regards, > > Fangjia Shen > > Purdue University > > > -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Maciej
immediately. Everything else please > retarget to an appropriate release. > == But my bug isn't > fixed? == In order to > make timely releases, we will typically > not hold the release unless the bug in > question is a regression from the > previous release. That being said, if > there is something which is a regression > that has not been correctly targeted > please ping me or a committer to help > target the issue. > > > > -- > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 > > > > -- > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 > > > > -- > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 > -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: PySpark Dynamic DataFrame for easier inheritance

2021-12-29 Thread Maciej
On 12/29/21 16:18, Pablo Alcain wrote: > Hey Maciej! Thanks for your answer and the comments :)  > > On Wed, Dec 29, 2021 at 3:06 PM Maciej <mailto:mszymkiew...@gmail.com>> wrote: > > This seems like a lot of trouble for not so common use case that has > vi

Re: PySpark Dynamic DataFrame for easier inheritance

2021-12-29 Thread Maciej
and > the expected output. > > I'm sharing this here in case you feel like this approach can be > useful for anyone else. In our case it greatly sped up the > development of abstraction layers and allowed us to write cleaner > code. One of the advantages is that it would simply be a "plugin" > over pyspark, that does not modify anyhow already existing code or > application interfaces. > > If you think that this can be helpful, I can write a PR as a more > refined proof of concept. > > Thanks! > > Pablo > -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

[R] SparkR on conda-forge

2021-12-19 Thread Maciej
Hi everyone, FYI ‒ thanks to good folks from conda-forge we have now these: * https://github.com/conda-forge/r-sparkr-feedstock * https://anaconda.org/conda-forge/r-sparkr -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description

Re: [MISC] Should we add .github/FUNDING.yml

2021-12-15 Thread Maciej
t; On Wed, Dec 15, 2021, 8:34 AM Maciej wrote: > > Hi All, > > Just wondering ‒ would it make sense to add > .github/FUNDING.yml with custom link pointing to one (or both) > of these: > > * https://www.apache.org/foundation/spon

[MISC] Should we add .github/FUNDING.yml

2021-12-15 Thread Maciej
Hi All, Just wondering ‒ would it make sense to add .github/FUNDING.yml with custom link pointing to one (or both) of these: * https://www.apache.org/foundation/sponsorship.html * https://www.apache.org/foundation/contributing.html -- Best regards, Maciej Szymkiewicz Web: https://zero323

Nabble archive is down

2021-08-17 Thread Maciej
archives? -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero323 PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: Time to start publishing Spark Docker Images?

2021-08-17 Thread Maciej
se be liable for any monetary > damages arising from such loss, damage or destruction. > >   > > > > On Mon, 16 Aug 2021 at 18:46, Maciej <mailto:mszymkiew...@gmail.com>> wrote: > > I have a few concerns regarding PySpark and SparkR images. > >

Re: Time to start publishing Spark Docker Images?

2021-08-16 Thread Maciej
F%2Ftwitter.com%2Fholdenkarau=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=x6fXgTuoQqVYqu9JPbt0hG2P0zl6l3p%2FrU5bDng85AY%3D=0> > > Books (Learning Spark, High Performance Spark, > etc.): https://amzn.to/2MaRAG9  > > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=WCHuF%2BcEl0rBZyVOePRQT1AOefwRDlIavu9B0wDmmOk%3D=0> > > YouTube Live > Streams: https://www.youtube.com/user/holdenkarau > > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790739490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=52hSM52z%2FFRahVO%2FcRwJ6eDuDInvhhtt1xQfbhMRazQ%3D=0> > > > > -- > Twitter: https://twitter.com/holdenkarau > <https://twitter.com/holdenkarau> > Books (Learning Spark, High Performance Spark, > etc.): https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > <https://www.youtube.com/user/holdenkarau> > -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero323 PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-26 Thread Maciej
ase vote on the SPIP for the next 72 hours: > > [ ] +1: Accept the proposal as an official SPIP > [ ] +0 > [ ] -1: I don’t think this is a good idea because … > > -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero323 PGP: A30CEF0C31A501EC OpenPGP_signature Description: OpenPGP digital signature

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-15 Thread Maciej
logies-battling-head-to-head-a453a1f8cc13>. > >   > > * > > There are many important features missing that are > very common in data science. One of the most > important features

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-30 Thread Maciej
limit on how many functions we can add, and it also makes it > difficult to browse through the docs when there are a lot of > functions. > > > > On Thu, Jan 28, 2021 at 1:09 PM, Maciej > mailto:mszymkiew...@gmail.com

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Maciej
s, without requiring maintainers to write tests for each > language's version of the functions. Would that address the > maintenance burden? With R we don't really test most of the functions beyond the simple "callability". One the complex ones, that require some non-trivial tran

Re: Broken rlang installation on AppVeyor

2020-10-09 Thread Maciej
uot;. Can you > open a PR to change? > > 2020년 10월 9일 (금) 오전 4:36, Maciej <mailto:mszymkiew...@gmail.com>>님이 작성: > > Hi Everyone, > > I've been digging into AppVeyor test failures for > https://github.com/apache/spark/pull/29978 > > >

Broken rlang installation on AppVeyor

2020-10-08 Thread Maciej
t is there any reason why we seem to default to i386 (https://github.com/apache/spark/blob/c5f6af9f17498bb0ec393c16616f2d99e5d3ee3d/dev/appveyor-install-dependencies.ps1#L22) for R installation, while RTools are hard coded to x86_64  (https://github.com/apache/spark/blob/c5f6af9f17498bb0ec393c16616f2d99e5d3ee3d/dev/

[DISCUSS][R] Adding magrittr as a dependency for SparkR

2020-09-30 Thread Maciej
otal evidence, most of the SparkR applications I've seen out there, already use magrittr. Non-goals: * Supporting non-standard evaluation. Thanks in advance for your input. -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https:

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
there are also some upstream changes that haven't been reflected in stubs master... On 8/27/20 10:24 PM, Driesprong, Fokko wrote: > . Any action points that we can define and that I can help on? I'm > fine with taking the route that Hyukjin suggests :) > -- Best regards, Maciej Szymkie

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
see this happen. Any action > points that we can define and that I can help on? I'm fine with taking > the route that Hyukjin suggests :) > > Cheers, Fokko > > Op do 27 aug. 2020 om 18:45 schreef Maciej <mailto:mszymkiew...@gmail.com>>: > > Well, technicall

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
scussion Pandas didn't type check and had no clear timeline for advertising annotations. -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero323 PGP: A30CEF0C31A501EC signature.asc Description: OpenPGP digital signature

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
so I've noticed that all the methods that aren't in the pyi file > are *unable to be called from other python files*. I was unaware of > this effect of the pyi files. As soon as you create the files, all the > methods are shielded from external access. Feels like going back to > cpp :'(

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Maciej Szymkiewicz
asked. > > >   > ---- > *From:* Maciej Szymkiewicz > *Sent:* Tuesday, August 4, 2020 12:59 PM > *To:* Sean Owen > *Cc:* Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; Holden Karau; > Spark Dev List > *Subject:* Re: [PySpark] Revisiting PySpark type annotations

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Maciej Szymkiewicz
tubs/graphs/contributors) and at least some use cases (https://stackoverflow.com/q/40163106/). So, subjectively speaking, it seems we're already beyond POC. -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero3

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Maciej Szymkiewicz
separate git repo? >> >> >> From: Hyukjin Kwon >> Sent: Monday, August 3, 2020 1:58:55 AM >> To: Maciej Szymkiewicz >> Cc: Driesprong, Fokko ; Holden Karau >> ; Spark Dev List >> Subject: Re: [PySpark] Revisiting PySpark type annotati

Re: [PySpark] Revisiting PySpark type annotations

2020-07-22 Thread Maciej Szymkiewicz
nt stubs for different versions of Python? I had to > look up the literals: https://www.python.org/dev/peps/pep-0586/ > I think it is more about portability between Spark versions > > > Cheers, Fokko > > Op wo 22 jul. 2020 om 09:40 schreef Maciej Szymkiewicz < > mszy

Re: [PySpark] Revisiting PySpark type annotations

2020-07-22 Thread Maciej Szymkiewicz
e. -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero323 PGP: A30CEF0C31A501EC signature.asc Description: OpenPGP digital signature

Re: [PySpark] Revisiting PySpark type annotations

2020-07-22 Thread Maciej Szymkiewicz
- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@spark.apache.org> > > > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, > et

Re: [PySpark] Revisiting PySpark type annotations

2020-07-22 Thread Maciej Szymkiewicz
-- > Sent from: > http://apache-spark-developers-list.1001551.n3.nabble.com/ > > > - > To unsubscribe e-mail: >

Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

2020-03-18 Thread Maciej Szymkiewicz
treated as private. > > Is this intentional?  If so, what's the rationale?  If not, then it > feels like a bug and DataFrame should have some form of public access > back to the context/session.  I'm happy to log the bug but thought I > would ask here first.  Thanks! -- Best regards, Maciej Szym

Re: Apache Spark Docker image repository

2020-02-06 Thread Maciej Szymkiewicz
Action Jobs and Jenkins K8s > Integration Tests to speed up jobs and to have more stabler > environments) > > > > Bests, > > Dongjoon. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@

Re: [DISCUSS] PostgreSQL dialect

2019-11-26 Thread Maciej Szymkiewicz
's behavior violates SQL standard. But for others, let's just > update the answer files of PostgreSQL tests. > > Any comments are welcome! > > Thanks, > Wenchen -- Best regards, Maciej

Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-30 Thread Maciej Szymkiewicz
Apache Spark 3.0.0 RC1 will start next January > (https://spark.apache.org/versioning-policy.html), > I'm +1 for the deprecation (Python < 3.6) > at Apache Spark 3.0.0. > > It's just a deprecation to p

[DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-24 Thread Maciej Szymkiewicz
well? -- Best regards, Maciej

Is SPARK-9961 is still relevant?

2019-10-05 Thread Maciej Szymkiewicz
lanned defaultEvaluator was the primary reason to use such annotation there. -- Best regards, Maciej

Re: Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

2019-03-20 Thread Maciej Szymkiewicz
take a look at my > proposal and share their opinion from their own component's perspective. If > we get on the same page I'll eventually open Jiras to cover this > improvement for each mentioned systems. > > Cheers, > Gabor > > > > -- Regards, Maciej

Re: Feature request: split dataset based on condition

2019-02-03 Thread Maciej Szymkiewicz
; mobile: +98 912 468 1859 <+98+912+468+1859> >>>> site: www.moein.xyz >>>> email: moein...@gmail.com >>>> [image: linkedin] <https://www.linkedin.com/in/moeinhm> >>>> [image: twitter] <https://twitter.com/moein7tl> >>>> >>>> >> >> -- >> >> Moein Hosseini >> Data Engineer >> mobile: +98 912 468 1859 <+98+912+468+1859> >> site: www.moein.xyz >> email: moein...@gmail.com >> [image: linkedin] <https://www.linkedin.com/in/moeinhm> >> [image: twitter] <https://twitter.com/moein7tl> >> >> -- Regards, Maciej

[PySpark] Revisiting PySpark type annotations

2019-01-25 Thread Maciej Szymkiewicz
, Maciej

Re: Documentation of boolean column operators missing?

2018-10-23 Thread Maciej Szymkiewicz
Even if these were documented Sphinx doesn't include dunder methods by default (with exception to __init__). There is :special-members: option which could be passed to, for example, autoclass. On Tue, 23 Oct 2018 at 21:32, Sean Owen wrote: > (& and | are both logical and bitwise operators in

Re: Python friendly API for Spark 3.0

2018-09-15 Thread Maciej Szymkiewicz
ut do we want to take that baggage into Apache Spark 3.x > era? The next time you may drop it would be only 4.0 release because > of breaking change. > > -- > ,,,^..^,,, > On Sat, Sep 15, 2018 at 2:21 PM Maciej Szymkiewicz > wrote: > > > > There is no need to ditch Python 2.

Re: Python friendly API for Spark 3.0

2018-09-15 Thread Maciej Szymkiewicz
There is no need to ditch Python 2. There are basically two options - Use stub files and limit yourself to support only Python 3 support. Python 3 users benefit from type hints, Python 2 users don't, but no core functionality is affected. This is the approach I've used with

Re: [DISCUSS] move away from python doctests

2018-08-29 Thread Maciej Szymkiewicz
Hi Imran, On Wed, 29 Aug 2018 at 22:26, Imran Rashid wrote: > Hi Li, > > yes that makes perfect sense. That more-or-less is the same as my view, > though I framed it differently. I guess in that case, I'm really asking: > > Can pyspark changes please be accompanied by more unit tests, and not

Re: Spark DataFrame UNPIVOT feature

2018-08-22 Thread Maciej Szymkiewicz
Given popularity of related SO questions: - https://stackoverflow.com/q/41670103/1560062 - https://stackoverflow.com/q/42465568/1560062 - https://stackoverflow.com/q/41670103/1560062 it is probably more "nobody thought about asking", than "it is not used often". On Wed, 22 Aug 2018

Re: Increase Timeout or optimize Spark UT?

2017-08-24 Thread Maciej Szymkiewicz
/src/test/scala/org/apache/spark/sql/test/TestSQLContext.scala#L60-L61> > ? > > On Tue, Aug 22, 2017 at 3:25 PM, Maciej Szymkiewicz < > mszymkiew...@gmail.com> wrote: > >> Hi, >> >> From my experience it is possible to cut quite a lot by reducing >> s

Re: Increase Timeout or optimize Spark UT?

2017-08-22 Thread Maciej Szymkiewicz
Hi, >From my experience it is possible to cut quite a lot by reducing spark.sql.shuffle.partitions to some reasonable value (let's say comparable to the number of cores). 200 is a serious overkill for most of the test cases anyway. Best, Maciej On 21 August 2017 at 03:00, Dong Joon Hyun

Re: Possible bug: inconsistent timestamp behavior

2017-08-15 Thread Maciej Szymkiewicz
ere? > > > > Thanks, > > Assaf > > > > -- > View this message in context: Possible bug: inconsistent timestamp > behavior > <http://apache-spark-developers-list.1001551.n3.nabble.com/Possible-bug-inconsistent-timestamp-behavior-tp22144.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. > -- Z poważaniem, Maciej Szymkiewicz

Re: Speeding up Catalyst engine

2017-07-25 Thread Maciej Bryński
difference changing spark.sql.constraintPropagation.enabled and any other spark.sql option. So I will leave your patch on top of 2.2 Thank you. M. 2017-07-25 1:39 GMT+02:00 Liang-Chi Hsieh <vii...@gmail.com>: > > Hi Maciej, > > For backportting https://issues.apache.org/jira/browse/SPAR

Speeding up Catalyst engine

2017-07-24 Thread Maciej Bryński
Hi Everyone, I'm trying to speed up my Spark streaming application and I have following problem. I'm using a lot of joins in my app and full catalyst analysis is triggered during every join. I found 2 options to speed up. 1) spark.sql.selfJoinAutoResolveAmbiguity option But looking at code:

Re: [ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-19 Thread Maciej Bryński
Oh yeah, new Spark version, new regression bugs :) https://issues.apache.org/jira/browse/SPARK-21470 M. 2017-07-17 22:01 GMT+02:00 Sam Elamin : > Well done! This is amazing news :) Congrats and really cant wait to > spread the structured streaming love! > > On Mon,

Re: Slowness of Spark Thrift Server

2017-07-17 Thread Maciej Bryński
I did the test on Spark 2.2.0 and problem still exists. Any ideas how to fix it ? Regards, Maciek 2017-07-11 11:52 GMT+02:00 Maciej Bryński <mac...@brynski.pl>: > Hi, > I have following issue. > I'm trying to use Spark as a proxy to Cassandra. > The problem is the thri

Slowness of Spark Thrift Server

2017-07-11 Thread Maciej Bryński
Hi, I have following issue. I'm trying to use Spark as a proxy to Cassandra. The problem is the thrift server overhead. I'm using following query: select * from table where primay_key = 123 Job time (from jobs tab) is around 50ms. (and it's similar to query time from SQL tab) Unfortunately query

Re: Handling nulls in vector columns is non-trivial

2017-06-21 Thread Maciej Szymkiewicz
Since 2.2 there is Imputer: https://github.com/apache/spark/blob/branch-2.2/examples/src/main/python/ml/imputer_example.py which should at least partially address the problem. On 06/22/2017 03:03 AM, Franklyn D'souza wrote: > I just wanted to highlight some of the rough edges around using >

Re: spark messing up handling of native dependency code?

2017-06-02 Thread Maciej Szymkiewicz
Maybe not related, but in general geotools are not thread safe,so using from workers is most likely a gamble. On 06/03/2017 01:26 AM, Georg Heiler wrote: > Hi, > > There is a weird problem with spark when handling native dependency code: > I want to use a library (JAI) with spark to parse some

Re: [PYTHON] PySpark typing hints

2017-05-23 Thread Maciej Szymkiewicz
pyspark, they just have > to be run with a compatible packaging (e.g. mypy). > > Meaning that porting for python 2 would provide a very small advantage > over the immediate advantages (IDE usage and testing for most cases). > > > > Am I missing something? > > > &g

Re: [PYTHON] PySpark typing hints

2017-05-23 Thread Maciej Szymkiewicz
metaclasses), which is could be resolved without significant loss of function. On 05/23/2017 12:08 PM, Reynold Xin wrote: > Seems useful to do. Is there a way to do this so it doesn't break > Python 2.x? > > > On Sun, May 14, 2017 at 11:44 PM, Maciej Szymkiewicz > <m

[PYTHON] PySpark typing hints

2017-05-14 Thread Maciej Szymkiewicz
- interesting presentation by Marco Bonzanini -- Best, Maciej signature.asc Description: OpenPGP digital signature

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-29 Thread Maciej Szymkiewicz
I am not sure if it is relevant but explode_outer and posexplode_outer seem to be broken: SPARK-20534 On 04/28/2017 12:49 AM, Sean Owen wrote: > By the way the RC looks good. Sigs and license are OK, tests pass with > -Phive -Pyarn

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-14 Thread Maciej Bryński
https://issues.apache.org/jira/browse/SPARK-12717 This bug is in Spark since 1.6.0. Any chance to get this fixed ? M. 2017-04-14 6:39 GMT+02:00 Holden Karau : > If it would help I'd be more than happy to look at kicking off the packaging > for RC3 since I'v been poking

Re: [Pyspark, SQL] Very slow IN operator

2017-04-06 Thread Maciej Bryński
2017-04-06 4:00 GMT+02:00 Michael Segel : > Just out of curiosity, what would happen if you put your 10K values in to a > temp table and then did a join against it? The answer is predicates pushdown. In my case I'm using this kind of query on JDBC table and IN

[Pyspark, SQL] Very slow IN operator

2017-04-05 Thread Maciej Bryński
Hi, I'm trying to run queries with many values in IN operator. The result is that for more than 10K values IN operator is getting slower. For example this code is running about 20 seconds. df = spark.range(0,10,1,1) df.where('id in ({})'.format(','.join(map(str,range(10).count()

[SQL] Unresolved reference with chained window functions.

2017-03-24 Thread Maciej Szymkiewicz
errors.package$.attachTree(package.scala:56) ... Caused by: java.lang.RuntimeException: Couldn't find AmtPaidCumSum#366 in [sum#385,max#386,x#360,AmtPaid#361] ... Is it a known issue or do we need a JIRA? -- Best, Maciej Szymkiewicz - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

[ML][PYTHON] Collecting data in a class extending SparkSessionTestCase causes AttributeError:

2017-03-06 Thread Maciej Szymkiewicz
there something obvious I miss here? -- Best, Maciej diff --git a/python/pyspark/ml/tests.py b/python/pyspark/ml/tests.py index 3524160557..cc6e49d6cf 100755 --- a/python/pyspark/ml/tests.py +++ b/python/pyspark/ml/tests.py @@ -1245,6 +1245,17 @@ class ALSTest(SparkSessionTestCa

Re: [PYTHON][DISCUSS] Moving to cloudpickle and or Py4J as a dependencies?

2017-02-14 Thread Maciej Szymkiewicz
> py4j in our repo but could instead have a pinned version > required. While we do depend on a lot of py4j internal APIs, > version pinning should be sufficient to ensure functionality > (and simplify the update process). > > Cheers, > >

Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-13 Thread Maciej Szymkiewicz
Congratulations! On 02/13/2017 08:16 PM, Reynold Xin wrote: > Hi all, > > Takuya-san has recently been elected an Apache Spark committer. He's > been active in the SQL area and writes very small, surgical patches > that are high quality. Please join me in congratulating Takuya-san! >

Re: [SQL][ML] Pipeline performance regression between 1.6 and 2.x

2017-02-03 Thread Maciej Szymkiewicz
Hi Liang-Chi, Thank you for the updates. This looks promising. On 02/03/2017 08:34 AM, Liang-Chi Hsieh wrote: > Hi Maciej, > > FYI, this fix is submitted at https://github.com/apache/spark/pull/16785. > > > Liang-Chi Hsieh wrote >> Hi Maciej, >> >> After

Re: [SQL][ML] Pipeline performance regression between 1.6 and 2.x

2017-02-02 Thread Maciej Szymkiewicz
processing part in the first place. On 02/02/2017 07:22 AM, Liang-Chi Hsieh wrote: > Hi Maciej, > > FYI, the PR is at https://github.com/apache/spark/pull/16775. > > > Liang-Chi Hsieh wrote >> Hi Maciej, >> >> Basically the fitting algorithm in Pipeline is an itera

[SQL][ML] Pipeline performance regression between 1.6 and 2.x

2017-01-31 Thread Maciej Szymkiewicz
utCol)) val stages: Array[PipelineStage] = indexers ++ encoders :+ assembler new Pipeline().setStages(stages).fit(df).transform(df).show Task execution time is comparable and executors are most of the time idle so it looks like it is a problem with the optimizer

Re: [SQL][SPARK-14160] Maximum interval for o.a.s.sql.functions.window

2017-01-18 Thread Maciej Szymkiewicz
/18/2017 05:52 PM, Burak Yavuz wrote: > Hi Maciej, > > I believe it would be useful to either fix the documentation or fix > the implementation. I'll leave it to the community to comment on. The > code right now disallows intervals provided in months and years, > because they are n

[SQL][SPARK-14160] Maximum interval for o.a.s.sql.functions.window

2017-01-18 Thread Maciej Szymkiewicz
-01-01").toDF("date").groupBy(window($"date", "999 days")) with results which look sensible at first glance. Is it a matter of a faulty validation logic (months will be assigned only if there is a match against years or months https://git.io/vMPdi) or expected behavior and I simply misunderstood the intentions? -- Best, Maciej

  1   2   >