Re: Not what I‘ve expected Performance

Jürgen Albersdorfer Thu, 01 Feb 2018 00:06:52 -0800

Hi Kurt, thanks for your response.
I indeed utilized Spark - what I've forgot to mention - and I did it nearly
the same as in the example you gave me.
Just without that .select(PK).sample(false, 0.1) Instruction which I don't
actually get what it's useful for - and maybe that's the key to the castle.

I already found out that I require some more Spark Executors - really lots
of them.
And it was a bad Idea in the first place to ./spark-submit without any
parameters about executor-memory, total-executor-cores and especially
executor-cores.
I now submitted it with --executor-cores 1 --total-executor-cores 100 --
executor-memory 8G to get more Executors out of my Cluster.
Without that limits, a Spark Executor will utilize all of the available
cores. With the limitations, The Spark Worker will be able to start more
Workers in parallel which boosts in my example,
but is still way to slow and far away from requiring to throttle it. And
that is what I actually expected when 100 Processes start beating with the
Database Cluster.

Definitelly I'll give your Code a try.

2018-02-01 6:36 GMT+01:00 kurt greaves <k...@instaclustr.com>:

> How are you copying? With CQLSH COPY or your own script? If you've got
> spark already it's quite simple to copy between tables and it should be
> pretty much as fast as you can get it. (you may even need to throttle).
> There's some sample code here (albeit it's copying between clusters but
> easily tailored to copy between tables). https://www.
> instaclustr.com/support/documentation/apache-spark/
> using-spark-to-sample-data-from-one-cassandra-cluster-
> and-write-to-another/
>
> On 30 January 2018 at 21:05, Jürgen Albersdorfer <jalbersdor...@gmail.com>
> wrote:
>
>> Hi, We are using C* 3.11.1 with a 9 Node Cluster built on CentOS Servers
>> eac=
>> h having 2x Quad Core Xeon, 128GB of RAM and two separate 2TB spinning
>> Disks=
>> , one for Log one for Data with Spark on Top.
>>
>> Due to bad Schema (Partitions of about 4 to 8 GB) I need to copy a whole
>> Tab=
>> le into another having same fields but different partitioning.=20
>>
>> I expected glowing Iron when I started the copy Job, but instead cannot
>> even=
>> See some Impact on CPU, mem or disks. - but the Job does copy the Data
>> over=
>> veeerry slowly at about a MB or two per Minute.
>>
>> Any suggestion where to start investigation?
>>
>> Thanks already
>>
>> Von meinem iPhone gesendet
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>

Re: Not what I‘ve expected Performance

Reply via email to