date:20190423

Re: spark stddev() giving '?' as output how to handle it ? i.e replace null/0

2019-04-23 Thread Shyam P

Sorry, yeah i fixed this ...its a formatting issue . please ignore... thank you. On Wed, Apr 24, 2019 at 11:58 AM Shyam P wrote: > > https://stackoverflow.com/questions/55823608/how-to-handle-spark-stddev-function-output-value-when-there-there-is-no-data > > > Regards, > Shyam >

Handle empty partitions in pyspark

2019-04-23 Thread kanchan tewary

Hi All, I have a situation where the rdd is having some empty partitions, which I would like to identify and handle while applying mapPartitions or similar functions. Is there a way to do this in pyspark? The method isEmpty works on the rdd only and can not be applied. Much appreciated. Code blo

spark stddev() giving '?' as output how to handle it ? i.e replace null/0

2019-04-23 Thread Shyam P

https://stackoverflow.com/questions/55823608/how-to-handle-spark-stddev-function-output-value-when-there-there-is-no-data Regards, Shyam

Fwd: autoBroadcastJoinThreshold not working as expected

2019-04-23 Thread Mike Chan

Dear all, I'm on a case that when certain table being exposed to broadcast join, the query will eventually failed with remote block error. Firstly. We set the spark.sql.autoBroadcastJoinThreshold to 10MB, namely 10485760 [image: image.png] Then we proceed to perform query. In the SQL plan, we fo

Re: Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-23 Thread Weichen Xu

Could you provide your code, and running cluster info ? On Tue, Apr 23, 2019 at 4:10 PM Qian He wrote: > The dataset was using a sparse representation before feeding into > LogisticRegression. > > On Tue, Apr 23, 2019 at 3:15 PM Weichen Xu > wrote: > >> Hi Qian, >> >> Do your dataset use sparse

Re: Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-23 Thread Qian He

The dataset was using a sparse representation before feeding into LogisticRegression. On Tue, Apr 23, 2019 at 3:15 PM Weichen Xu wrote: > Hi Qian, > > Do your dataset use sparse vector format ? > > > > On Mon, Apr 22, 2019 at 5:03 PM Qian He wrote: > >> Hi all, >> >> I'm using Spark provided Lo

spark 2.4.1 -> 3.0.0-SNAPSHOT mllib

2019-04-23 Thread Koert Kuipers

we recently started compiling against spark 3.0.0-SNAPSHOT (build inhouse from master branch) to uncover any breaking changes that might be an issue for us. we ran into some of our tests breaking where we use mllib. most of it is immaterial: we had some magic numbers hard-coded and the results ar

Re: Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-23 Thread Weichen Xu

Hi Qian, Do your dataset use sparse vector format ? On Mon, Apr 22, 2019 at 5:03 PM Qian He wrote: > Hi all, > > I'm using Spark provided LogisticRegression to fit a dataset. Each row of > the data has 1.7 million columns, but it is sparse with only hundreds of > 1s. The Spark Ui reported hig

Re: toDebugString - RDD Logical Plan

2019-04-23 Thread kanchan tewary

Hello Dylan, Thank you for help. The result do look formatted after making the change. However, from the following code, I was expecting RDD types like MappedRDD and filteredRDD to be present in the lineage. However, I can only see PythonRDD and parallelCollectionRDD in the lineage [I am running i

Re: Update / Delete records in Parquet

2019-04-23 Thread Khare, Ankit

Hi Chetan, I also agree that for this usecase parquet would not be the best option . I had similar usecase , 50 different tables to be download from MSSQL . Source : MSSQL Destination. : Apache KUDU (Since it supports very well change data capture use cases) We used Streamset CDC module to co

Re: spark stddev() giving '?' as output how to handle it ? i.e replace null/0

Handle empty partitions in pyspark

spark stddev() giving '?' as output how to handle it ? i.e replace null/0

Fwd: autoBroadcastJoinThreshold not working as expected

Re: Spark LogisticRegression got stuck on dataset with millions of columns

Re: Spark LogisticRegression got stuck on dataset with millions of columns

spark 2.4.1 -> 3.0.0-SNAPSHOT mllib

Re: Spark LogisticRegression got stuck on dataset with millions of columns

Re: toDebugString - RDD Logical Plan

Re: Update / Delete records in Parquet

10 matches

Site Navigation

Mail list logo

Footer information