[jira] [Commented] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361347#comment-16361347
 ] 

Apache Spark commented on SPARK-23394:
--

User 'attilapiros' has created a pull request for this issue:
https://github.com/apache/spark/pull/20589

> Storage info's Cached Partitions doesn't consider the replications (but 
> sc.getRDDStorageInfo does)
> --
>
> Key: SPARK-23394
> URL: https://issues.apache.org/jira/browse/SPARK-23394
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Attila Zsolt Piros
>Priority: Major
> Attachments: Spark_2.2.1.png, Spark_2.4.0-SNAPSHOT.png, 
> Storage_Tab.png
>
>
> Start spark as:
> {code:bash}
> $ bin/spark-shell --master local-cluster[2,1,1024]
> {code}
> {code:scala}
> scala> import org.apache.spark.storage.StorageLevel._
> import org.apache.spark.storage.StorageLevel._
> scala> sc.parallelize((1 to 100), 10).persist(MEMORY_AND_DISK_2).count
> res0: Long = 100  
>   
> scala> sc.getRDDStorageInfo(0).numCachedPartitions
> res1: Int = 20
> {code}
> h2. Cached Partitions 
> On the UI at the Storage tab Cached Partitions is 10:
>  !Storage_Tab.png! .
> h2. Full tab
> Moreover the replicated partitions was also listed on the old 2.2.1 like:
>  !Spark_2.2.1.png! 
> But now it is like:
>  !Spark_2.4.0-SNAPSHOT.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360757#comment-16360757
 ] 

Marcelo Vanzin commented on SPARK-23394:


I talked to Attila offline, and to me it seems like the new UI is more correct. 
There are only 10 cached partitions, each one replicated to 2 executors; the 
table also reflects that (instead of the old UI, where the same block showed up 
twice). The only potential adjustment here would be to show the executor 
addresses instead of the executor IDs.

In the context of what lead us here (SPARK-20659 / 
https://github.com/apache/spark/pull/20546#discussion_r167070392), I think that 
we should fix the tests that rely on the old code returning the total count 
including replication, so that they work with the new code that returns more 
accurate information.

> Storage info's Cached Partitions doesn't consider the replications (but 
> sc.getRDDStorageInfo does)
> --
>
> Key: SPARK-23394
> URL: https://issues.apache.org/jira/browse/SPARK-23394
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Attila Zsolt Piros
>Priority: Major
> Attachments: Spark_2.2.1.png, Spark_2.4.0-SNAPSHOT.png, 
> Storage_Tab.png
>
>
> Start spark as:
> {code:bash}
> $ bin/spark-shell --master local-cluster[2,1,1024]
> {code}
> {code:scala}
> scala> import org.apache.spark.storage.StorageLevel._
> import org.apache.spark.storage.StorageLevel._
> scala> sc.parallelize((1 to 100), 10).persist(MEMORY_AND_DISK_2).count
> res0: Long = 100  
>   
> scala> sc.getRDDStorageInfo(0).numCachedPartitions
> res1: Int = 20
> {code}
> h2. Cached Partitions 
> On the UI at the Storage tab Cached Partitions is 10:
>  !Storage_Tab.png! .
> h2. Full tab
> Moreover the replicated partitions was also listed on the old 2.2.1 like:
>  !Spark_2.2.1.png! 
> But now it is like:
>  !Spark_2.4.0-SNAPSHOT.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Marco Gaido (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360593#comment-16360593
 ] 

Marco Gaido commented on SPARK-23394:
-

I think this is not an issue. `numCachedPartitions ` is 20 because each 
partition is cached twice (it has 2 replicas). I think it is not a bug, but 
maybe we can improve names/docs about it. But since `RDDInfo` is a developer 
API, I think it is not needed.

> Storage info's Cached Partitions doesn't consider the replications (but 
> sc.getRDDStorageInfo does)
> --
>
> Key: SPARK-23394
> URL: https://issues.apache.org/jira/browse/SPARK-23394
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Attila Zsolt Piros
>Priority: Major
> Attachments: Screen Shot 2018-02-12 at 11.24.22.png
>
>
> Start spark as:
> {code:bash}
> $ bin/spark-shell --master local-cluster[2,1,1024]
> {code}
> {code:scala}
> scala> import org.apache.spark.storage.StorageLevel._
> import org.apache.spark.storage.StorageLevel._
> scala> sc.parallelize((1 to 100), 10).persist(MEMORY_AND_DISK_2).count
> res0: Long = 100  
>   
> scala> sc.getRDDStorageInfo(0).numCachedPartitions
> res1: Int = 20
> {code}
> But on the UI at the Storage tab Cached Partitions is 10. See attached 
> screenshot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org