from:"antonkulaga"

Re: Packages to release in 3.0.0-preview

2019-10-30 Thread antonkulaga

Why not trying the current Scala (2.13)? Spark has always been one (sometimes
- two) Scala versions away from the whole Scala ecosystem and it has always
been a big pain point for everybody. I understand that in the past you could
not switch because of compatibility issues, but 3.x is a major version
update and you can break things, maybe you can finally consider to use the
current Scala?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread antonkulaga

I think for sure  SPARK-28547
  
At the moment there are some flows in Spark architecture and it performs
miserably or even freezes everywhere where column number exceeds 10-15K
(even simple describe function takes ages while the same functions with
pandas and no Spark take seconds). In many fields (like bioinformatics) wide
datasets with both large numbers of rows and columns are very common (gene
expression data is a good example here) and Spark is totally useless there.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Interesting implications of supporting Scala 2.13

2019-05-29 Thread antonkulaga

There is https://github.com/scala/scala-collection-compat to enable 2.13
collections in Scala 2.12, so probably you can use it to avoid having
separate source trees for 2.12 and 2.13



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-06 Thread antonkulaga

>Hadoop 3 has not been supported in 2.4.x. 2.12 has been since 2.4.0,

I see. I thought it was as I saw many posts about configuring Spark for
Hadoop 3 as well as hadoop 3 based spark docker containers



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-03 Thread antonkulaga

Can you prove release version for Hadoop 3 and Scala 2.12 this time?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-20 Thread antonkulaga

makatun, did you try to test somewhing more complex, like dataframe.describe
or PCA? 



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-14 Thread antonkulaga

Is it not going to be backported to 2.3.2? I am totally blocked by this issue
in one of my projects.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread antonkulaga

-1 as https://issues.apache.org/jira/browse/SPARK-16406 does not seem to be
back-ported to 2.3.1 and it causes a lot of pain



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-06 Thread antonkulaga

I have the same problem with gene expressions data (
javascript:portalClient.browseDatasets.downloadFile('GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_tpm.gct.gz','gtex_analysis_v7/rna_seq_data/GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_tpm.gct.gz')
 
where I have tens of thousands genes as columns. No idea why Spark is
slooow there



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Packages to release in 3.0.0-preview

Re: Spark 3.0 preview release feature list and major changes

Re: Interesting implications of supporting Scala 2.13

Re: [VOTE] Release Apache Spark 2.4.3

Re: [VOTE] Release Apache Spark 2.4.3

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

Re: [VOTE] SPARK 2.3.2 (RC5)

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

9 matches

Site Navigation

Mail list logo

Footer information