Graph Data Structures

2006-11-25 Thread Nathan Harmston
Hi All,

Currently I am working on a generic graph library  so I can do various
graph based analysis for various projects I have ideas for. Currently
I am implementing Graph as a wrapper around a dictionary. Currently my
implementation works like this:

t = Graph()
n1 = Node(Node1)
n2 = Node(Test2)
edge1 = Edge(Test3)
t += n1{ n1:{}}
t[n1][n2] = edge1{ n1:{n2:edge1}

However this isnt actually ending up with the structure I want. I want
it to finally end up as ..{ n1:{n2:edge1}, n2:{}}. Is
there anyway I can do this simply

Also I am looking at having a large graph and was wondering if anyone
knew of anyway I could reduce the memory requirements of this
structure and improve the speed of queries on it. I m thinking writing
a C extension for itis this a good idea and where would I start?
Or does Python have some kind of transparent memory access module I
can implement.

Many Thanks in advance,

Nathan

PS.Please find my code below:

class Graph(object):
def __init__(self, g= { } ):
self.graph = g
def __iadd__(self, p):
if p not in self.graph:
self.graph[p] = PathsDict()
return self
def __getitem__(self, p):
try:
return self.graph[p]
except KeyError:
raise KeyError( %s not in graph %(repr(p)) )
def __str__(self):
return str(self.graph)
def filter(self, filter):
pass

class PathsDict(object):
def __init__(self):
self.paths = { }
def __setitem__(self, p, val):
if p not in self.paths:
self.paths[p] = val
def __getitem__(self, p):
return self.paths[p]
# catch exception here
def paths(self):
for k, v in self.paths:
yield (k, v)
def edges(self):
return self.paths.values()
def __str__(self):
return str(self.paths)
def __len__(self):
return len(self.paths)

class Node(object):
def __init__(self, name):
self.name = name
def __str__(self):
return self.name

class Edge(dict):
def __init__(self, name, weight = 1):
self[name] = name
self[weight] = weight
def __str__(self):
return self[name]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread Szabolcs Nagy
i haven't read your code, but there are many graph implementations in
python.
in case you haven't found these yet:
http://wiki.python.org/moin/PythonGraphApi

if you only want to do some analysis i think you need this one (as it's
pretty complete and simple):
https://networkx.lanl.gov/

i also recommend Guido's essay to read:
http://www.python.org/doc/essays/graphs.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread Steven D'Aprano
On Sat, 25 Nov 2006 14:05:27 +, Nathan Harmston wrote:

 Hi All,
 
 Currently I am working on a generic graph library  so I can do various
 graph based analysis for various projects I have ideas for. Currently
 I am implementing Graph as a wrapper around a dictionary. Currently my
 implementation works like this:

[snip]

http://www.python.org/doc/essays/graphs.html

http://mail.python.org/pipermail/python-list/2002-April/137593.html


Hope this helps.


-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread Robin Becker
Szabolcs Nagy wrote:

 if you only want to do some analysis i think you need this one (as it's
 pretty complete and simple):
 https://networkx.lanl.gov/


seems to be broken at present with a python traceback coming out; not a 
good advert for python and/or trac
-- 
Robin Becker
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread bearophileHUGS
Szabolcs Nagy:
 i haven't read your code, but there are many graph implementations in
 python.
 in case you haven't found these yet:
 http://wiki.python.org/moin/PythonGraphApi

 if you only want to do some analysis i think you need this one (as it's
 pretty complete and simple):
 https://networkx.lanl.gov/

 i also recommend Guido's essay to read:
 http://www.python.org/doc/essays/graphs.html

I can also suggest my one:
http://sourceforge.net/projects/pynetwork/

And boost graph bindings for Python, quite fast:
http://www.osl.iu.edu/~dgregor/bgl-python/

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread Nathan Harmston
  https://networkx.lanl.gov/

This was working for me earlier, I managed to get everything from
there earlier. It seems a very good package. It seems theres more out
there than what I had thought, which unfortunately makes it harder for
me to decide what to use (pynetwork and bgl look useful aswell). I m
going to do some testing on it later and see what happens with it.
Thanks a lot for your help.

Has anyone got an idea how I could split the contents of a node and
its representation (to save memory in my graph). ie the nodes
contain the start and end coordinates and id and the actual
representation contains the string. I was going to have :

class Node(object):
pass

class Section(Node):
pass

class Item(object):
   pass

Where section contains a slice of the Item which im interested. I m
just not sure how I can access the contents of item without storing
it. --- If u get what I mean???

Many Thanks in advance

Nathan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread John Machin

Nathan Harmston wrote:
   https://networkx.lanl.gov/

 This was working for me earlier, I managed to get everything from
 there earlier. It seems a very good package. It seems theres more out
 there than what I had thought, which unfortunately makes it harder for
 me to decide what to use (pynetwork and bgl look useful aswell). I m
 going to do some testing on it later and see what happens with it.
 Thanks a lot for your help.

 Has anyone got an idea how I could split the contents of a node and
 its representation (to save memory in my graph). ie the nodes
 contain the start and end coordinates and id and the actual
 representation contains the string. I was going to have :

 class Node(object):
 pass

 class Section(Node):
 pass

 class Item(object):
pass

 Where section contains a slice of the Item which im interested. I m
 just not sure how I can access the contents of item without storing
 it. --- If u get what I mean???

No. Not at all. pass is not very informative. Neither are
representation and the string. Please tell us what you mean by
slice. What is an item, if it's not a node? Try listing out the
attributes of a node, with a couple of sample values for each, and then
we might get a clue.

What makes you think that you need to save memory?

What makes you think that you could save memory by splitting whatever
it is?

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread Nathan Harmston
Hi,

The idea is that I m going to use it to build graphs for sequence
alignment (at the moment), I read a discussion on the corebio
(reimplementation of biopython) group about using intervals to
represent sequence slices. The idea being that, my graph may contain
millions of alignments and storing the sequence (the actual ATGC) is
not required.

class Node(object):
 pass

class Interval(Node):
_id = gene1
 _start = 50
_end = 200
_strand = 1

class Sequence(object):
_sequence = atgtcgtgagagagagttgtgag.

So one interval on one sequence would align to another interval from
another sequence, but I want changes I make to the interval to be
reflected in the representation later. If I reverse complement it i
want the interval to store this information but the Sequence only
shows this later on when I call use it calling repr or str.

Do you get what I mean.
Many Thanks

Nathan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread Robin Becker
Nathan Harmston wrote:
  https://networkx.lanl.gov/
..

I got it back just once, but when I clicked again I see this



RuntimeErrorPython 2.4.4c1: /usr/bin/python
Sat Nov 25 16:21:16 2006

A problem occurred in a Python script. Here is the sequence of function 
calls leading up to the error, in the order they occurred.
  /build/bdist.linux-x86_64/egg/tracrst/macro.py in 
render_macro(self=tracrst.macro.TracReSTMacro object, req=trac.web.api

..

  782   self.__dict__[_parent_pool] = \
   783 parent_pool or libsvn.core.application_pool;
   784   if self.__dict__[_parent_pool]:
self = libsvn.repos.svn_repos_t; proxy of C svn_repos_t instance, 
self.__dict__ = {'this': Swig Object of type 'svn_repos_t *'}, 
parent_pool = libsvn.core.apr_pool_t; proxy of C apr_pool_t instance, 
libsvn = module 'libsvn' from 
'/usr/lib/python2.4/site-packages/libsvn/__init__.pyc', libsvn.core = 
module 'libsvn.core' from 
'/usr/lib/python2.4/site-packages/libsvn/core.pyc', 
libsvn.core.application_pool = libsvn.core.apr_pool_t; proxy of C 
apr_pool_t instance

RuntimeError: instance.__dict__ not accessible in restricted mode
   args = ('instance.__dict__ not accessible in restricted mode',)


perhaps I'm seeing different apache processes or something
-- 
Robin Becker
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread John Machin

Nathan Harmston wrote:
 Hi,

 The idea is that I m going to use it to build graphs for sequence
 alignment (at the moment), I read a discussion on the corebio
 (reimplementation of biopython) group about using intervals to
 represent sequence slices. The idea being that, my graph may contain
 millions of alignments and storing the sequence (the actual ATGC) is
 not required.

 class Node(object):
  pass

 class Interval(Node):
 _id = gene1
  _start = 50
 _end = 200
 _strand = 1

What is the point of subclassing Node if it's just a dummy?


 class Sequence(object):
 _sequence = atgtcgtgagagagagttgtgag.

 So one interval on one sequence would align to another interval from
 another sequence, but I want changes I make to the interval to be
 reflected in the representation later. If I reverse complement it i
 want the interval to store this information but the Sequence only
 shows this later on when I call use it calling repr or str.

 Do you get what I mean.

Only vaguely. You use several terms which appear to be from your trade
jargon as they are not understandable when interpreted in either the
context of Python-speak or ordinary English e.g. sequence,
alignment, ATGC, reverse complement, interval.

Two options:
(a) communicate understandably
(b) wait till your wontoks are back from holidays.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread Nathan Harmston
Hi,

It seems that by just going through the problem writing out a better
explanation for the reply I have figured out a solution and the
problem isnt as difficult as I thought it would be.

What is a wontok?

Thanks

Nathan

PS -- the start of my reply:

class Interval(object):
 _id = gene1
  _start = 50
 _end = 200
 _strand = 1

class Sequence(object):
_sequence = atgtcgtgagagagagttgtgag.

 Only vaguely. You use several terms which appear to be from your trade
 jargon

Sequence is a string made from a restricted alphabet (A,T,G,C...).
Sequences can be aligned:  1 ATGCTGCAT
  2 TAGCTGTTA
  ---
 25

I m trying to represent this as a graph Interval(id=1, start=2, end=6,
strand=1) ---edge--Interval(id=2, start=2, end=6, strand=1)

The problem is I was planning on storing the sequences in a dictionary
{id:Seq}, however each dictionary would represent a different source
of sequences. File1, File2... (
STORE THE SOURCES AS A DICT AND HAVE SOURCE IN INTERVAL ASWELL
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread John Machin

Nathan Harmston wrote:
 Hi,

 It seems that by just going through the problem writing out a better
 explanation for the reply I have figured out a solution and the
 problem isnt as difficult as I thought it would be.

Often happens.


 What is a wontok?

It's Melanesian Pidgin (from the English one talk) meaning a person
who speaks the same language as you, a member of your clan, ... the
context being that [at least in Papua New Guinea] there are relatively
many languages each with relatively not many speakers :-)


 Thanks

 Nathan

 PS -- the start of my reply:

 class Interval(object):
  _id = gene1
   _start = 50
  _end = 200
  _strand = 1

 class Sequence(object):
 _sequence = atgtcgtgagagagagttgtgag.

  Only vaguely. You use several terms which appear to be from your trade
  jargon

 Sequence is a string made from a restricted alphabet (A,T,G,C...).
 Sequences can be aligned:  1 ATGCTGCAT
   2 TAGCTGTTA
   ---
  25

I'm sure they can be, but appearances can be deceptive when you mix
tabs and spaces -- or whatever caused the above 4 lines to be not
vertically aligned but staggered diagonally like a flight of ducks
heading equatorwards for winter.

Sometimes a line of code (e.g. str1[2:6] == str2[2:6]) is worth a
thousand pictures :-)


 I m trying to represent this as a graph Interval(id=1, start=2, end=6,
 strand=1) ---edge--Interval(id=2, start=2, end=6, strand=1)

 The problem is I was planning on storing the sequences in a dictionary
 {id:Seq}, however each dictionary would represent a different source
 of sequences. File1, File2... (
 STORE THE SOURCES AS A DICT

Mapping what keys to what values?

 AND HAVE SOURCE IN INTERVAL ASWELL

So you had a data modelling problem. These are often better solved as a
separate step before you think about implementation details like
dictionaries.

Good luck with your project.

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Graph Data Structures

2006-11-25 Thread Pieter Swart
Nathan Harmston wrote:

 Currently I am working on a generic graph library  so I can do various
 graph based analysis for various projects I have ideas for. Currently
 I am implementing Graph as a wrapper around a dictionary. Currently my
 implementation works like this:

 t = Graph()
 n1 = Node(Node1)
 n2 = Node(Test2)
 edge1 = Edge(Test3)
 t += n1{ n1:{}}
 t[n1][n2] = edge1{ n1:{n2:edge1}

 However this isnt actually ending up with the structure I want. I want
 it to finally end up as ..{ n1:{n2:edge1}, n2:{}}. Is
 there anyway I can do this simply

Nathan

By now you probably discovered that the networkx package can handle
this.
If I have this right, you want to create a digraph with
a directed edge from Node1 to Node2 and this edge
has the string Test3 attached to it. In networkx, this is exacty what
the  XDiGraph class was designed to do. Here DiGraph means
directed graph and the X means you are allowed to add (any)
data to the edge,for example:

 import networkx as nx
 t = nx.XDiGraph()
 t.add_edge( Node1, Node2, Test3)


 Also I am looking at having a large graph and was wondering if anyone
 knew of anyway I could reduce the memory requirements of this
 structure and improve the speed of queries on it. I m thinking writing
 a C extension for itis this a good idea and where would I start?
 Or does Python have some kind of transparent memory access module I
 can implement.


Networkx was designed so that you can hook your own
C extension in. However, making it ispeed or memory efficient
is quite application dependent. I am still not clear as to exactly what

class of algorithms you want to implement via a string-interval
representation, and whether you demand exact alignment or whether
missing/incorrect data etc. is allowed as part of the alignment
problem.

HTH
Pieter Swart

-- 
http://mail.python.org/mailman/listinfo/python-list