You can start spark-shell with these properties:
--conf spark.dynamicAllocation.enabled=true --conf
spark.dynamicAllocation.initialExecutors=2 --conf
spark.dynamicAllocation.minExecutors=2 --conf
spark.dynamicAllocation.maxExecutors=5
On Fri, May 31, 2019 at 5:30 AM Qian He wrote:
> Sometimes it
Hi Rishi,
I think that if you are using sorting and then appending data locally there
will no need to bucket data and you are good with external tables that way.
Regards,
Gourav
On Fri, May 31, 2019 at 3:43 AM Rishi Shah wrote:
> Hi All,
>
> Can we use bucketing with sorting functionality to s
Hi All,
Can we use bucketing with sorting functionality to save data incrementally
(say daily) ? I understand bucketing is supported in Spark only with
saveAsTable, however can this be used with mode "append" instead of
"overwrite"?
My understanding around bucketing was, you need to rewrite entir
Sometimes it's convenient to start a spark-shell on cluster, like
./spark/bin/spark-shell --master yarn --deploy-mode client --num-executors
100 --executor-memory 15g --executor-cores 4 --driver-memory 10g --queue
myqueue
However, with command like this, those allocated resources will be occupied
u
Here is the draft announcement:
===
Plan for dropping Python 2 support
As many of you already knew, Python core development team and many utilized
Python packages like Pandas and NumPy will drop Python 2 support in or
before 2020/01/01. Apache Spark has supported both Python 2 and 3 since
Spark 1
I created https://issues.apache.org/jira/browse/SPARK-27884 to track the
work.
On Thu, May 30, 2019 at 2:18 AM Felix Cheung
wrote:
> We don’t usually reference a future release on website
>
> > Spark website and state that Python 2 is deprecated in Spark 3.0
>
> I suspect people will then ask wh
Since parquet don't support updates you have to backfill your dataset. If
that is your regular scenario you should partition your parquet files so
backfilling becomes easier.
As the data is structured now you have to update everything just to upsert
quite a small amount of changed data. Look at yo
We don’t usually reference a future release on website
> Spark website and state that Python 2 is deprecated in Spark 3.0
I suspect people will then ask when is Spark 3.0 coming out then. Might need to
provide some clarity on that.
From: Reynold Xin
Sent: Thur
Unfortunately, dont have timestamps in those tables:( Only key on which I
can check existence of specific record.
But even with the timestamp how would you make the update.? When I say
update I mean to overwrite existing record.
For example you have following in table A
key| field1 | field2
1
+1 on Xiangrui’s plan.
On Thu, May 30, 2019 at 7:55 AM shane knapp wrote:
> I don't have a good sense of the overhead of continuing to support
>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>
>> from the build/test side, it will actually be pretty easy to continue
> suppo
10 matches
Mail list logo