RE: Regarding spark-3.2.0 decommission features.

2022-01-26 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi Dongjoon Hyun,

Any inputs on the below issue would be helpful. Please let us know if we're 
missing anything?

Thanks and Regards,
Abhishek

From: Patidar, Mohanlal (Nokia - IN/Bangalore) 
Sent: Thursday, January 20, 2022 11:58 AM
To: user@spark.apache.org
Subject: Suspected SPAM - RE: Regarding spark-3.2.0 decommission features.

Gentle reminder!!!

Br,
-Mohan Patidar



From: Patidar, Mohanlal (Nokia - IN/Bangalore)
Sent: Tuesday, January 18, 2022 2:02 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Cc: Rao, Abhishek (Nokia - IN/Bangalore) 
mailto:abhishek@nokia.com>>; Gowda Tp, Thimme 
(Nokia - IN/Bangalore) 
mailto:thimme.gowda...@nokia.com>>; Sharma, Prakash 
(Nokia - IN/Bangalore) 
mailto:prakash.sha...@nokia.com>>; Tarun, N (Nokia - 
IN/Bangalore) mailto:n.ta...@nokia.com>>; Badagandi, 
Srinivas B. (Nokia - IN/Bangalore) 
mailto:srinivas.b.badaga...@nokia.com>>
Subject: Regarding spark-3.2.0 decommission features.

Hi,
 We're using Spark 3.2.0 and we have enabled the spark decommission 
feature. As part of validating this feature, we wanted to check if the rdd 
blocks and shuffle blocks from the decommissioned executors are migrated to 
other executors.
However, we could not see this happening. Below is the configuration we used.

  1.  Spark Configuration used:
 spark.local.dir /mnt/spark-ldir
 spark.decommission.enabled true
 spark.storage.decommission.enabled true
 spark.storage.decommission.rddBlocks.enabled true
 spark.storage.decommission.shuffleBlocks.enabled true
 spark.dynamicAllocation.enabled true
  2.  Brought up spark-driver and executors on the different nodes.
NAME
  READY  STATUS   NODE
decommission-driver 
1/1 Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-1  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-2  1/1 
Running   Node2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-3  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-4  1/1 
Running   Node2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-5  1/1 
Running   Node1
  3.  Bringdown Node2 so status of pods as are following.

NAME
  READY  STATUS   NODE
decommission-driver 
1/1 Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-1  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-2  1/1 
TerminatingNode2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-3  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-4  1/1 
TerminatingNode2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-5  1/1 
Running   Node1
  4.  Driver logs:
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.296Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.459Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.564Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.601Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.667Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.885Z", 
"timezone":"UTC", "log":"Notify executor 5 to decommissioning."}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.887Z", 
"timezone":"UTC", "log":"Notify executor 1 to decommissioning."}
{"type":"log", "level":"INFO"

RE: Regarding spark-3.2.0 decommission features.

2022-01-19 Thread Patidar, Mohanlal (Nokia - IN/Bangalore)
Gentle reminder!!!

Br,
-Mohan Patidar



From: Patidar, Mohanlal (Nokia - IN/Bangalore)
Sent: Tuesday, January 18, 2022 2:02 PM
To: user@spark.apache.org
Cc: Rao, Abhishek (Nokia - IN/Bangalore) ; Gowda Tp, 
Thimme (Nokia - IN/Bangalore) ; Sharma, Prakash 
(Nokia - IN/Bangalore) ; Tarun, N (Nokia - 
IN/Bangalore) ; Badagandi, Srinivas B. (Nokia - 
IN/Bangalore) 
Subject: Regarding spark-3.2.0 decommission features.

Hi,
 We're using Spark 3.2.0 and we have enabled the spark decommission 
feature. As part of validating this feature, we wanted to check if the rdd 
blocks and shuffle blocks from the decommissioned executors are migrated to 
other executors.
However, we could not see this happening. Below is the configuration we used.

  1.  Spark Configuration used:
 spark.local.dir /mnt/spark-ldir
 spark.decommission.enabled true
 spark.storage.decommission.enabled true
 spark.storage.decommission.rddBlocks.enabled true
 spark.storage.decommission.shuffleBlocks.enabled true
 spark.dynamicAllocation.enabled true
  2.  Brought up spark-driver and executors on the different nodes.
NAME
  READY  STATUS   NODE
decommission-driver 
1/1 Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-1  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-2  1/1 
Running   Node2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-3  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-4  1/1 
Running   Node2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-5  1/1 
Running   Node1
  3.  Bringdown Node2 so status of pods as are following.

NAME
  READY  STATUS   NODE
decommission-driver 
1/1 Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-1  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-2  1/1 
TerminatingNode2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-3  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-4  1/1 
TerminatingNode2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-5  1/1 
Running   Node1
  4.  Driver logs:
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.296Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.459Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.564Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.601Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.667Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.885Z", 
"timezone":"UTC", "log":"Notify executor 5 to decommissioning."}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.887Z", 
"timezone":"UTC", "log":"Notify executor 1 to decommissioning."}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.887Z", 
"timezone":"UTC", "log":"Notify executor 3 to decommissioning."}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.887Z", 
"timezone":"UTC", "log":"Mark BlockManagers (BlockManagerId(5, X.X.X.X, 33359, 
None), BlockManagerId(1, X.X.X.X, 38655, None), BlockManagerId(3, X.X.X.X, 
35797, None)) as being decommissioning."}
{"type":"log", "leve

Regarding spark-3.2.0 decommission features.

2022-01-18 Thread Patidar, Mohanlal (Nokia - IN/Bangalore)
Hi,
 We're using Spark 3.2.0 and we have enabled the spark decommission 
feature. As part of validating this feature, we wanted to check if the rdd 
blocks and shuffle blocks from the decommissioned executors are migrated to 
other executors.
However, we could not see this happening. Below is the configuration we used.

  1.  Spark Configuration used:
 spark.local.dir /mnt/spark-ldir
 spark.decommission.enabled true
 spark.storage.decommission.enabled true
 spark.storage.decommission.rddBlocks.enabled true
 spark.storage.decommission.shuffleBlocks.enabled true
 spark.dynamicAllocation.enabled true
  2.  Brought up spark-driver and executors on the different nodes.
NAME
  READY  STATUS   NODE
decommission-driver 
1/1 Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-1  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-2  1/1 
Running   Node2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-3  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-4  1/1 
Running   Node2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-5  1/1 
Running   Node1
  3.  Bringdown Node2 so status of pods as are following.

NAME
  READY  STATUS   NODE
decommission-driver 
1/1 Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-1  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-2  1/1 
TerminatingNode2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-3  1/1 
Running   Node1
gzip-compression-test-ae0b0b7e4d7fbe40-exec-4  1/1 
TerminatingNode2
gzip-compression-test-ae0b0b7e4d7fbe40-exec-5  1/1 
Running   Node1
  4.  Driver logs:
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.296Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.459Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.564Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.601Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:55:28.667Z", 
"timezone":"UTC", "log":"Adding decommission script to lifecycle"}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.885Z", 
"timezone":"UTC", "log":"Notify executor 5 to decommissioning."}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.887Z", 
"timezone":"UTC", "log":"Notify executor 1 to decommissioning."}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.887Z", 
"timezone":"UTC", "log":"Notify executor 3 to decommissioning."}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:58:21.887Z", 
"timezone":"UTC", "log":"Mark BlockManagers (BlockManagerId(5, X.X.X.X, 33359, 
None), BlockManagerId(1, X.X.X.X, 38655, None), BlockManagerId(3, X.X.X.X, 
35797, None)) as being decommissioning."}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:59:24.426Z", 
"timezone":"UTC", "log":"Executor 2 is removed. Remove reason statistics: 
(gracefully decommissioned: 0, decommision unfinished: 0, driver killed: 0, 
unexpectedly exited: 1)."}
{"type":"log", "level":"INFO", "time":"2022-01-12T08:59:24.426Z", 
"timezone":"UTC", "log":"Executor 4 is removed. Remove reason statistics: 
(gracefully decommissioned: 0, decommision unfinished: 0, driver killed: 0, 
unexpectedly exited: 2)."}
  5.  Verified by Execute into all live executors(1,3,5) and checked at 
location (/mnt/spark-ldir/) so only one blockManger id present, not seeing any 
other blockManager id copied to this location.
Example:
$kubectl exec -it 
gzip-compression-test-ae0b0b7e4d7fbe40-exec-1   -n test bash
$cd /mnt/spark-ldir/
$ blockmgr-60872c99-e7d6-43ba-a43e-a97fc9f619ca

Since the migration was not happening, we tried to use fallback storage option 
by specifying the hdfs storage. But unfortunately we could not see the rdd and 
shuffle blocks in this fallback storage location as well. Below is the 
configuration we used.


  1.  Spark Configuration Used:
 spark.decommission.enabled true
 spark.storage.decommission.enabled true
 spark.storage.decommission.rddBlocks.enabled true
 spark.storage.decommiss