got it. makes sense. i am surprised it worked before...
On Apr 18, 2014 9:12 PM, Andrew Or and...@databricks.com wrote:
Hi Koert,
I've tracked down what the bug is. The caveat is that each StageInfo only
keeps around the RDDInfo of the last RDD associated with the Stage. More
concretely, if
The reason why it worked before was because the UI would directly access
sc.getStorageStatus, instead of getting it through Task and Stage events.
This is not necessarily the best design, however, because the SparkContext
and the SparkUI are closely coupled, and there is no way to create a
SparkUI
Hi Koert,
I've tracked down what the bug is. The caveat is that each StageInfo only
keeps around the RDDInfo of the last RDD associated with the Stage. More
concretely, if you have something like
sc.parallelize(1 to 1000).persist.map(i = (i, i)).count()
This creates two RDDs within one Stage,
i tried again with latest master, which includes commit below, but ui page
still shows nothing on storage tab.
koert
commit ada310a9d3d5419e101b24d9b41398f609da1ad3
Author: Andrew Or andrewo...@gmail.com
Date: Mon Mar 31 23:01:14 2014 -0700
[Hot Fix #42] Persisted RDD disappears on
That commit did work for me. Could you confirm the following:
1) After you called cache(), did you make any actions like count() or
reduce()? If you don't materialize the RDD, it won't show up in the
storage tab.
2) Did you run ./make-distribution.sh after you switched to the current master?
note that for a cached rdd in the spark shell it all works fine. but
something is going wrong with the spark-shell in our applications that
extensively cache and re-use RDDs
On Tue, Apr 8, 2014 at 12:33 PM, Koert Kuipers ko...@tresata.com wrote:
i tried again with latest master, which includes
sorry, i meant to say: note that for a cached rdd in the spark shell it all
works fine. but something is going wrong with the SPARK-APPLICATION-UI in
our applications that extensively cache and re-use RDDs
On Tue, Apr 8, 2014 at 12:55 PM, Koert Kuipers ko...@tresata.com wrote:
note that for a
That commit fixed the exact problem you described. That is why I want to
confirm that you switched to the master branch. bin/spark-shell doesn't
detect code changes, so you need to run ./make-distribution.sh to
re-compile Spark first. -Xiangrui
On Tue, Apr 8, 2014 at 9:57 AM, Koert Kuipers
yes i call an action after cache, and i can see that the RDDs are fully
cached using context.getRDDStorageInfo which we expose via our own api.
i did not run make-distribution.sh, we have our own scripts to build a
distribution. however if your question is if i correctly deployed the
latest
yes i am definitely using latest
On Tue, Apr 8, 2014 at 1:07 PM, Xiangrui Meng men...@gmail.com wrote:
That commit fixed the exact problem you described. That is why I want to
confirm that you switched to the master branch. bin/spark-shell doesn't
detect code changes, so you need to run
i put some println statements in BlockManagerUI
i have RDDs that are cached in memory. I see this:
*** onStageSubmitted **
rddInfo: RDD 2 (2) Storage: StorageLevel(false, false, false, false, 1);
CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0
yet at same time i can see via our own api:
storageInfo: {
diskSize: 0,
memSize: 19944,
numCachedPartitions: 1,
numPartitions: 1
}
On Tue, Apr 8, 2014 at 2:25 PM, Koert Kuipers ko...@tresata.com wrote:
i put some println statements in BlockManagerUI
1) at the end of the callback
2) yes we simply expose sc.getRDDStorageInfo to the user via REST
3) yes exactly. we define the RDDs at startup, all of them are cached. from
that point on we only do calculations on these cached RDDs.
i will add some more println statements for storageStatusList
our one cached RDD in this run has id 3
*** onStageSubmitted **
rddInfo: RDD 2 (2) Storage: StorageLevel(false, false, false, false, 1);
CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0 B;TachyonSize: 0.0
B; DiskSize: 0.0 B
_rddInfoMap: Map(2 - RDD 2
14 matches
Mail list logo