date:20190530

Re: dynamic allocation in spark-shell

2019-05-30 Thread Deepak Sharma

You can start spark-shell with these properties: --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.initialExecutors=2 --conf spark.dynamicAllocation.minExecutors=2 --conf spark.dynamicAllocation.maxExecutors=5 On Fri, May 31, 2019 at 5:30 AM Qian He wrote: > Sometimes it

Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-30 Thread Gourav Sengupta

Hi Rishi, I think that if you are using sorting and then appending data locally there will no need to bucket data and you are good with external tables that way. Regards, Gourav On Fri, May 31, 2019 at 3:43 AM Rishi Shah wrote: > Hi All, > > Can we use bucketing with sorting functionality to s

[pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-30 Thread Rishi Shah

Hi All, Can we use bucketing with sorting functionality to save data incrementally (say daily) ? I understand bucketing is supported in Spark only with saveAsTable, however can this be used with mode "append" instead of "overwrite"? My understanding around bucketing was, you need to rewrite entir

dynamic allocation in spark-shell

2019-05-30 Thread Qian He

Sometimes it's convenient to start a spark-shell on cluster, like ./spark/bin/spark-shell --master yarn --deploy-mode client --num-executors 100 --executor-memory 15g --executor-cores 4 --driver-memory 10g --queue myqueue However, with command like this, those allocated resources will be occupied u

Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Xiangrui Meng

Here is the draft announcement: === Plan for dropping Python 2 support As many of you already knew, Python core development team and many utilized Python packages like Pandas and NumPy will drop Python 2 support in or before 2020/01/01. Apache Spark has supported both Python 2 and 3 since Spark 1

Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Xiangrui Meng

I created https://issues.apache.org/jira/browse/SPARK-27884 to track the work. On Thu, May 30, 2019 at 2:18 AM Felix Cheung wrote: > We don’t usually reference a future release on website > > > Spark website and state that Python 2 is deprecated in Spark 3.0 > > I suspect people will then ask wh

Re: Upsert for hive tables

2019-05-30 Thread Magnus Nilsson

Since parquet don't support updates you have to backfill your dataset. If that is your regular scenario you should partition your parquet files so backfilling becomes easier. As the data is structured now you have to update everything just to upsert quite a small amount of changed data. Look at yo

Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Felix Cheung

We don’t usually reference a future release on website > Spark website and state that Python 2 is deprecated in Spark 3.0 I suspect people will then ask when is Spark 3.0 coming out then. Might need to provide some clarity on that. From: Reynold Xin Sent: Thur

Re: Upsert for hive tables

2019-05-30 Thread Tomasz Krol

Unfortunately, dont have timestamps in those tables:( Only key on which I can check existence of specific record. But even with the timestamp how would you make the update.? When I say update I mean to overwrite existing record. For example you have following in table A key| field1 | field2 1

Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Reynold Xin

+1 on Xiangrui’s plan. On Thu, May 30, 2019 at 7:55 AM shane knapp wrote: > I don't have a good sense of the overhead of continuing to support >> Python 2; is it large enough to consider dropping it in Spark 3.0? >> >> from the build/test side, it will actually be pretty easy to continue > suppo

Re: dynamic allocation in spark-shell

Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

[pyspark 2.3+] Bucketing with sort - incremental data load?

dynamic allocation in spark-shell

Re: Should python-2 be supported in Spark 3.0?

Re: Should python-2 be supported in Spark 3.0?

Re: Upsert for hive tables

Re: Should python-2 be supported in Spark 3.0?

Re: Upsert for hive tables

Re: Should python-2 be supported in Spark 3.0?

10 matches

Site Navigation

Mail list logo

Footer information