Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread antonkulaga
-1 as https://issues.apache.org/jira/browse/SPARK-16406 does not seem to be back-ported to 2.3.1 and it causes a lot of pain -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-14 Thread antonkulaga
Is it not going to be backported to 2.3.2? I am totally blocked by this issue in one of my projects. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-06 Thread antonkulaga
I have the same problem with gene expressions data ( javascript:portalClient.browseDatasets.downloadFile('GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_tpm.gct.gz','gtex_analysis_v7/rna_seq_data/GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_tpm.gct.gz') where I have tens of thousands genes as

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-20 Thread antonkulaga
makatun, did you try to test somewhing more complex, like dataframe.describe or PCA? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Interesting implications of supporting Scala 2.13

2019-05-29 Thread antonkulaga
There is https://github.com/scala/scala-collection-compat to enable 2.13 collections in Scala 2.12, so probably you can use it to avoid having separate source trees for 2.12 and 2.13 -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-06 Thread antonkulaga
>Hadoop 3 has not been supported in 2.4.x. 2.12 has been since 2.4.0, I see. I thought it was as I saw many posts about configuring Spark for Hadoop 3 as well as hadoop 3 based spark docker containers -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-03 Thread antonkulaga
Can you prove release version for Hadoop 3 and Scala 2.12 this time? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Packages to release in 3.0.0-preview

2019-10-30 Thread antonkulaga
Why not trying the current Scala (2.13)? Spark has always been one (sometimes - two) Scala versions away from the whole Scala ecosystem and it has always been a big pain point for everybody. I understand that in the past you could not switch because of compatibility issues, but 3.x is a major

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread antonkulaga
I think for sure SPARK-28547 At the moment there are some flows in Spark architecture and it performs miserably or even freezes everywhere where column number exceeds 10-15K (even simple describe function takes ages while the