Re: [graph-tool] Cookbook model averaging question

2017-08-09 Thread Tiago de Paula Peixoto
On 09.08.2017 18:04, topinsky wrote:
> I attached the plots. As you can see the model always use a few (nonempty)
> blocks from 6 to 9. 
> But at the same time amount of different marginal states (with positive
> probabilities) 
> for some vertices are around 70 (almost the all potential 77 =
> g.num_vertices()).
> Which means that during independent runs model can get new set of 6 to 9
> blocks but 
> just with some other labels of it. This is what I meant by: 
> "May be it's just the result of independent launches of mcmc algorithm and
> random nature of groups labelling?"

Oh, the actual vertex labels are not meaningful. You can just re-label them
in a contiguous range before computing the histogram.

> Is there any way how to do sampling without specifying exact B? 
> But rather with sampling of B as it described in
> https://arxiv.org/pdf/1705.10225.pdf Ch. IV. ? 

This is exactly what happens; this is why your histogram has many values of
non-empty groups.

(The number of total groups, including empty ones, will always grow as
necessary.)

Best,
Tiago

-- 
Tiago de Paula Peixoto 





signature.asc
Description: OpenPGP digital signature
___
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool


Re: [graph-tool] Cookbook model averaging question

2017-08-09 Thread topinsky
Good day, 
Thank you for the reply. 

I want to demonstrate the relevant confusing observation.
I ran the example from cookbook: 



I attached the plots. As you can see the model always use a few (nonempty)
blocks from 6 to 9. 
But at the same time amount of different marginal states (with positive
probabilities) 
for some vertices are around 70 (almost the all potential 77 =
g.num_vertices()).
Which means that during independent runs model can get new set of 6 to 9
blocks but 
just with some other labels of it. This is what I meant by: 
"May be it's just the result of independent launches of mcmc algorithm and
random nature of groups labelling?"

Is there any way how to do sampling without specifying exact B? 
But rather with sampling of B as it described in
https://arxiv.org/pdf/1705.10225.pdf Ch. IV. ? 


 

 




--
View this message in context: 
http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Cookbook-model-averaging-question-tp4027332p4027334.html
Sent from the Main discussion list for the graph-tool project mailing list 
archive at Nabble.com.
___
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool


Re: [graph-tool] Cookbook model averaging question

2017-08-08 Thread Tiago de Paula Peixoto
On 08.08.2017 01:10, Valery Topinsky wrote:
> 
> I work on my own network in exact same way, trying to perform sampling to
> estimate 
> some metrics. But the results are in some way replicates the behaviour from
> the cookbook example: 
> For both cases (simple and nested SBM) the marginal distributions for
> vertices most of the times has too many non-zero values for different
> clusters, hence the colouring is so fine granular. Only few (1-2) clusters
> obey some explicit dominant group membership. But the rest of clusters
> exhibit very distributed marginals. 
> Do you have any explanation for this?

This means that the posterior distribution is broad, i.e. not concentrated
on any particular distribution. This implies either that the model is
mispecified, i.e. your network does not have well-defined groups, or that it
is very noisy.

> In case of my network I also have only 1-3 groups of nodes with some
> explicit dominant group membership. And the rest of vertices has too many
> non-zero, almost uniformly distributed marginals. I was thinking that for
> the simple cookbook example it's not natural that some vertices has more
> than 10 non-zero marginal values. 
> May be it's just the result of independent launches of mcmc algorithm and
> random nature of groups labelling? Or there is some intuition behind this
> high marginal variance in group membership? 
> I launched several times the optimisation, and drew the results.
> Topologically the outputs were very close to each other, although colouring
> was always different except a few kind of "stable" vertices. Hence, I guess,
> the resulted marginals for them have the same properties. But labels are not
> informative  it selves. May be there is some trick how to force some
> deterministic labelling policies to stabilise it ? 

There is no trick; this variance in the posterior reflects the nature of
your data. You if you want a single partition to represent it, you have to
choose between two extremes of the bias-variance trade-off:

   1. Choose the most likely partition, i.e. the one that minimizes the
  description length. (more bias, less variance)

   2. Choose the maximum a posteriori estimate for each node, i.e., the
  most likely node label according to the node marginals. (less bias,
  more variance)

Option 2 averages over the noise, but might not be representative of any
particular fit (specially if the number of groups is fluctuating). Option 1
usually underfits, but may also overfit, depending on your data.

There is a discussion on this here: https://arxiv.org/abs/1705.10225

Best,
Tiago


-- 
Tiago de Paula Peixoto 



signature.asc
Description: OpenPGP digital signature
___
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool