Re: [Haskell-cafe] Large graphs

2012-05-21 Thread tomberek
Benjamin,

 - Immutable structure, mutable labels.  After initially reading in the 
 graphs, their shape doesn't change, but information flows around the graph, 
 changing the labels on nodes and edges.

I have been working on a similar problem for a while now, hence my
interest and dissatisfaction with FGL. I'm not sure if this is exactly
what you are looking for, but I've been playing around with mutable
graphs (both structure and labels) and found issues with GC, so I set
about the goal of creating an unboxed in-place mutable graph that has
O(1) neighbor retrieval. The graph has to live in ST or IO (i've been
using ST, and it works fine, though I'd like to try STM and try a
multiple-spider traversal). So far, it can do very fast traversal and
label mutation with no allocation or GC after the initial build.

 I wrote some code that reads in graphs and some some basic flow computations 
 on them ... When I tried some larger graphs (~100k), the memory consumption 
 spiked into multiple GB, ...

I can try to use the nodes/specs you provide to give you an estimate
of what my framework can handle. If that works for you, I'll clean up
my code and you can give it a shot. Send me whatever other details you
think are relevant.

I am also curious what would happen if I take out the mutable
structure feature and just stick with mutable labels and how it
affects performance.

-Tom

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Large graphs

2012-05-21 Thread tomberek
 I can try to use the nodes/specs you provide to give you an estimate
 of what my framework can handle. If that works for you, I'll clean up
 my code and you can give it a shot. Send me whatever other details you
 think are relevant.

Benjamin,

I had a few moments, so I made a sparse graph of 100k nodes, each with
2-3 edges with each node holding an unboxed int. This is created in
0.12 sec and then traversed for 100 million steps (add one to each
node, then move to next node) in an additional 9 secs but no more GC's
or allocations.

Let me know if this would help you out, I would like a chance to use
my pet project for something useful.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Large graphs

2012-05-20 Thread Benjamin Ylvisaker
I have a problem that I'm trying to use Haskell for, and I think I'm running 
into scalability issues in FGL.  However, I am quite new to practical 
programming in Haskell, so it's possible that I have some other bone-headed 
performance bug in my code.  I tried looking around for concrete information 
about the scalability of Haskell's graph libraries, but didn't find much.  So 
here are the characteristics of the problem I'm working on:

- Large directed graphs.  Mostly 10k-100k nodes, but some in the low 100ks.
- Sparse graphs.  The number of edges is only 2-3x the number of nodes.
- Immutable structure, mutable labels.  After initially reading in the graphs, 
their shape doesn't change, but information flows around the graph, changing 
the labels on nodes and edges.

I wrote some code that reads in graphs and some some basic flow computations on 
them.  The first few graphs I tried were around 10k nodes, and the performance 
was okay (on the order of several seconds).  When I tried some larger graphs 
(~100k), the memory consumption spiked into multiple GB, the CPU utilization 
went down to single digit percentages and the overall running time was closer 
to hours than seconds.

Because the graph structure is basically immutable for my problem, I'm tempted 
to write my own graph representation based on mutable arrays.  Before I embark 
on that, I wonder if anyone else can share their experience with large graphs 
in Haskell?  Is there a library (FGL or otherwise) that should be able to scale 
up to the size of graph I'm interested in, if I write my code correctly?

Thanks,
Ben

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Large graphs

2012-05-20 Thread Serguey Zefirov
2012/5/20 Benjamin Ylvisaker benjam...@fastmail.fm:
 I have a problem that I'm trying to use Haskell for, and I think I'm running 
 into scalability issues in FGL.  However, I am quite new to practical 
 programming in Haskell, so it's possible that I have some other bone-headed 
 performance bug in my code.  I tried looking around for concrete information 
 about the scalability of Haskell's graph libraries, but didn't find much.  So 
 here are the characteristics of the problem I'm working on:

 - Large directed graphs.  Mostly 10k-100k nodes, but some in the low 100ks.
 - Sparse graphs.  The number of edges is only 2-3x the number of nodes.
 - Immutable structure, mutable labels.  After initially reading in the 
 graphs, their shape doesn't change, but information flows around the graph, 
 changing the labels on nodes and edges.

I would like to suggest to you a representation based in 32-bit
integers as vertex index. I.e., roll your own

Use strict IntMap IntSet for neighbor information, it is very efficient.

 I wrote some code that reads in graphs and some some basic flow computations 
 on them.  The first few graphs I tried were around 10k nodes, and the 
 performance was okay (on the order of several seconds).  When I tried some 
 larger graphs (~100k), the memory consumption spiked into multiple GB, the 
 CPU utilization went down to single digit percentages and the overall running 
 time was closer to hours than seconds.

Looks like your code does not force everything. It leaves some thunks
unevaluated, check for that situation.

It is common pitfall, not only for computations on graphs.


 Because the graph structure is basically immutable for my problem, I'm 
 tempted to write my own graph representation based on mutable arrays.  Before 
 I embark on that, I wonder if anyone else can share their experience with 
 large graphs in Haskell?  Is there a library (FGL or otherwise) that should 
 be able to scale up to the size of graph I'm interested in, if I write my 
 code correctly?

The above structure (IntMap IntSet) allowed for fast computations on
relatively large arrays, in order of 1M vertices and 16M
undirected/32M directed edges.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Large graphs

2012-05-20 Thread Clark Gaebel
I had issues with FGL in the past, too. Although FGL is really nice to
work with, it just uses a ridiculous amount of memory for large
graphs.

In the end, I used Data.Graph from containers [1]. This was a lot more
reasonable, and let me finish my project relatively easily.

Regards,
  - Clark

[1] 
http://hackage.haskell.org/packages/archive/containers/0.5.0.0/doc/html/Data-Graph.html

On Sun, May 20, 2012 at 10:55 AM, Serguey Zefirov sergu...@gmail.com wrote:
 2012/5/20 Benjamin Ylvisaker benjam...@fastmail.fm:
 I have a problem that I'm trying to use Haskell for, and I think I'm running 
 into scalability issues in FGL.  However, I am quite new to practical 
 programming in Haskell, so it's possible that I have some other bone-headed 
 performance bug in my code.  I tried looking around for concrete information 
 about the scalability of Haskell's graph libraries, but didn't find much.  
 So here are the characteristics of the problem I'm working on:

 - Large directed graphs.  Mostly 10k-100k nodes, but some in the low 100ks.
 - Sparse graphs.  The number of edges is only 2-3x the number of nodes.
 - Immutable structure, mutable labels.  After initially reading in the 
 graphs, their shape doesn't change, but information flows around the 
 graph, changing the labels on nodes and edges.

 I would like to suggest to you a representation based in 32-bit
 integers as vertex index. I.e., roll your own

 Use strict IntMap IntSet for neighbor information, it is very efficient.

 I wrote some code that reads in graphs and some some basic flow computations 
 on them.  The first few graphs I tried were around 10k nodes, and the 
 performance was okay (on the order of several seconds).  When I tried some 
 larger graphs (~100k), the memory consumption spiked into multiple GB, the 
 CPU utilization went down to single digit percentages and the overall 
 running time was closer to hours than seconds.

 Looks like your code does not force everything. It leaves some thunks
 unevaluated, check for that situation.

 It is common pitfall, not only for computations on graphs.


 Because the graph structure is basically immutable for my problem, I'm 
 tempted to write my own graph representation based on mutable arrays.  
 Before I embark on that, I wonder if anyone else can share their experience 
 with large graphs in Haskell?  Is there a library (FGL or otherwise) that 
 should be able to scale up to the size of graph I'm interested in, if I 
 write my code correctly?

 The above structure (IntMap IntSet) allowed for fast computations on
 relatively large arrays, in order of 1M vertices and 16M
 undirected/32M directed edges.

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe