Graph Data Structures
Hi All, Currently I am working on a generic graph library so I can do various graph based analysis for various projects I have ideas for. Currently I am implementing Graph as a wrapper around a dictionary. Currently my implementation works like this: t = Graph() n1 = Node(Node1) n2 = Node(Test2) edge1 = Edge(Test3) t += n1{ n1:{}} t[n1][n2] = edge1{ n1:{n2:edge1} However this isnt actually ending up with the structure I want. I want it to finally end up as ..{ n1:{n2:edge1}, n2:{}}. Is there anyway I can do this simply Also I am looking at having a large graph and was wondering if anyone knew of anyway I could reduce the memory requirements of this structure and improve the speed of queries on it. I m thinking writing a C extension for itis this a good idea and where would I start? Or does Python have some kind of transparent memory access module I can implement. Many Thanks in advance, Nathan PS.Please find my code below: class Graph(object): def __init__(self, g= { } ): self.graph = g def __iadd__(self, p): if p not in self.graph: self.graph[p] = PathsDict() return self def __getitem__(self, p): try: return self.graph[p] except KeyError: raise KeyError( %s not in graph %(repr(p)) ) def __str__(self): return str(self.graph) def filter(self, filter): pass class PathsDict(object): def __init__(self): self.paths = { } def __setitem__(self, p, val): if p not in self.paths: self.paths[p] = val def __getitem__(self, p): return self.paths[p] # catch exception here def paths(self): for k, v in self.paths: yield (k, v) def edges(self): return self.paths.values() def __str__(self): return str(self.paths) def __len__(self): return len(self.paths) class Node(object): def __init__(self, name): self.name = name def __str__(self): return self.name class Edge(dict): def __init__(self, name, weight = 1): self[name] = name self[weight] = weight def __str__(self): return self[name] -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
i haven't read your code, but there are many graph implementations in python. in case you haven't found these yet: http://wiki.python.org/moin/PythonGraphApi if you only want to do some analysis i think you need this one (as it's pretty complete and simple): https://networkx.lanl.gov/ i also recommend Guido's essay to read: http://www.python.org/doc/essays/graphs.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
On Sat, 25 Nov 2006 14:05:27 +, Nathan Harmston wrote: Hi All, Currently I am working on a generic graph library so I can do various graph based analysis for various projects I have ideas for. Currently I am implementing Graph as a wrapper around a dictionary. Currently my implementation works like this: [snip] http://www.python.org/doc/essays/graphs.html http://mail.python.org/pipermail/python-list/2002-April/137593.html Hope this helps. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
Szabolcs Nagy wrote: if you only want to do some analysis i think you need this one (as it's pretty complete and simple): https://networkx.lanl.gov/ seems to be broken at present with a python traceback coming out; not a good advert for python and/or trac -- Robin Becker -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
Szabolcs Nagy: i haven't read your code, but there are many graph implementations in python. in case you haven't found these yet: http://wiki.python.org/moin/PythonGraphApi if you only want to do some analysis i think you need this one (as it's pretty complete and simple): https://networkx.lanl.gov/ i also recommend Guido's essay to read: http://www.python.org/doc/essays/graphs.html I can also suggest my one: http://sourceforge.net/projects/pynetwork/ And boost graph bindings for Python, quite fast: http://www.osl.iu.edu/~dgregor/bgl-python/ Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
https://networkx.lanl.gov/ This was working for me earlier, I managed to get everything from there earlier. It seems a very good package. It seems theres more out there than what I had thought, which unfortunately makes it harder for me to decide what to use (pynetwork and bgl look useful aswell). I m going to do some testing on it later and see what happens with it. Thanks a lot for your help. Has anyone got an idea how I could split the contents of a node and its representation (to save memory in my graph). ie the nodes contain the start and end coordinates and id and the actual representation contains the string. I was going to have : class Node(object): pass class Section(Node): pass class Item(object): pass Where section contains a slice of the Item which im interested. I m just not sure how I can access the contents of item without storing it. --- If u get what I mean??? Many Thanks in advance Nathan -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
Nathan Harmston wrote: https://networkx.lanl.gov/ This was working for me earlier, I managed to get everything from there earlier. It seems a very good package. It seems theres more out there than what I had thought, which unfortunately makes it harder for me to decide what to use (pynetwork and bgl look useful aswell). I m going to do some testing on it later and see what happens with it. Thanks a lot for your help. Has anyone got an idea how I could split the contents of a node and its representation (to save memory in my graph). ie the nodes contain the start and end coordinates and id and the actual representation contains the string. I was going to have : class Node(object): pass class Section(Node): pass class Item(object): pass Where section contains a slice of the Item which im interested. I m just not sure how I can access the contents of item without storing it. --- If u get what I mean??? No. Not at all. pass is not very informative. Neither are representation and the string. Please tell us what you mean by slice. What is an item, if it's not a node? Try listing out the attributes of a node, with a couple of sample values for each, and then we might get a clue. What makes you think that you need to save memory? What makes you think that you could save memory by splitting whatever it is? HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
Hi, The idea is that I m going to use it to build graphs for sequence alignment (at the moment), I read a discussion on the corebio (reimplementation of biopython) group about using intervals to represent sequence slices. The idea being that, my graph may contain millions of alignments and storing the sequence (the actual ATGC) is not required. class Node(object): pass class Interval(Node): _id = gene1 _start = 50 _end = 200 _strand = 1 class Sequence(object): _sequence = atgtcgtgagagagagttgtgag. So one interval on one sequence would align to another interval from another sequence, but I want changes I make to the interval to be reflected in the representation later. If I reverse complement it i want the interval to store this information but the Sequence only shows this later on when I call use it calling repr or str. Do you get what I mean. Many Thanks Nathan -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
Nathan Harmston wrote: https://networkx.lanl.gov/ .. I got it back just once, but when I clicked again I see this RuntimeErrorPython 2.4.4c1: /usr/bin/python Sat Nov 25 16:21:16 2006 A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred. /build/bdist.linux-x86_64/egg/tracrst/macro.py in render_macro(self=tracrst.macro.TracReSTMacro object, req=trac.web.api .. 782 self.__dict__[_parent_pool] = \ 783 parent_pool or libsvn.core.application_pool; 784 if self.__dict__[_parent_pool]: self = libsvn.repos.svn_repos_t; proxy of C svn_repos_t instance, self.__dict__ = {'this': Swig Object of type 'svn_repos_t *'}, parent_pool = libsvn.core.apr_pool_t; proxy of C apr_pool_t instance, libsvn = module 'libsvn' from '/usr/lib/python2.4/site-packages/libsvn/__init__.pyc', libsvn.core = module 'libsvn.core' from '/usr/lib/python2.4/site-packages/libsvn/core.pyc', libsvn.core.application_pool = libsvn.core.apr_pool_t; proxy of C apr_pool_t instance RuntimeError: instance.__dict__ not accessible in restricted mode args = ('instance.__dict__ not accessible in restricted mode',) perhaps I'm seeing different apache processes or something -- Robin Becker -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
Nathan Harmston wrote: Hi, The idea is that I m going to use it to build graphs for sequence alignment (at the moment), I read a discussion on the corebio (reimplementation of biopython) group about using intervals to represent sequence slices. The idea being that, my graph may contain millions of alignments and storing the sequence (the actual ATGC) is not required. class Node(object): pass class Interval(Node): _id = gene1 _start = 50 _end = 200 _strand = 1 What is the point of subclassing Node if it's just a dummy? class Sequence(object): _sequence = atgtcgtgagagagagttgtgag. So one interval on one sequence would align to another interval from another sequence, but I want changes I make to the interval to be reflected in the representation later. If I reverse complement it i want the interval to store this information but the Sequence only shows this later on when I call use it calling repr or str. Do you get what I mean. Only vaguely. You use several terms which appear to be from your trade jargon as they are not understandable when interpreted in either the context of Python-speak or ordinary English e.g. sequence, alignment, ATGC, reverse complement, interval. Two options: (a) communicate understandably (b) wait till your wontoks are back from holidays. -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
Hi, It seems that by just going through the problem writing out a better explanation for the reply I have figured out a solution and the problem isnt as difficult as I thought it would be. What is a wontok? Thanks Nathan PS -- the start of my reply: class Interval(object): _id = gene1 _start = 50 _end = 200 _strand = 1 class Sequence(object): _sequence = atgtcgtgagagagagttgtgag. Only vaguely. You use several terms which appear to be from your trade jargon Sequence is a string made from a restricted alphabet (A,T,G,C...). Sequences can be aligned: 1 ATGCTGCAT 2 TAGCTGTTA --- 25 I m trying to represent this as a graph Interval(id=1, start=2, end=6, strand=1) ---edge--Interval(id=2, start=2, end=6, strand=1) The problem is I was planning on storing the sequences in a dictionary {id:Seq}, however each dictionary would represent a different source of sequences. File1, File2... ( STORE THE SOURCES AS A DICT AND HAVE SOURCE IN INTERVAL ASWELL -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
Nathan Harmston wrote: Hi, It seems that by just going through the problem writing out a better explanation for the reply I have figured out a solution and the problem isnt as difficult as I thought it would be. Often happens. What is a wontok? It's Melanesian Pidgin (from the English one talk) meaning a person who speaks the same language as you, a member of your clan, ... the context being that [at least in Papua New Guinea] there are relatively many languages each with relatively not many speakers :-) Thanks Nathan PS -- the start of my reply: class Interval(object): _id = gene1 _start = 50 _end = 200 _strand = 1 class Sequence(object): _sequence = atgtcgtgagagagagttgtgag. Only vaguely. You use several terms which appear to be from your trade jargon Sequence is a string made from a restricted alphabet (A,T,G,C...). Sequences can be aligned: 1 ATGCTGCAT 2 TAGCTGTTA --- 25 I'm sure they can be, but appearances can be deceptive when you mix tabs and spaces -- or whatever caused the above 4 lines to be not vertically aligned but staggered diagonally like a flight of ducks heading equatorwards for winter. Sometimes a line of code (e.g. str1[2:6] == str2[2:6]) is worth a thousand pictures :-) I m trying to represent this as a graph Interval(id=1, start=2, end=6, strand=1) ---edge--Interval(id=2, start=2, end=6, strand=1) The problem is I was planning on storing the sequences in a dictionary {id:Seq}, however each dictionary would represent a different source of sequences. File1, File2... ( STORE THE SOURCES AS A DICT Mapping what keys to what values? AND HAVE SOURCE IN INTERVAL ASWELL So you had a data modelling problem. These are often better solved as a separate step before you think about implementation details like dictionaries. Good luck with your project. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph Data Structures
Nathan Harmston wrote: Currently I am working on a generic graph library so I can do various graph based analysis for various projects I have ideas for. Currently I am implementing Graph as a wrapper around a dictionary. Currently my implementation works like this: t = Graph() n1 = Node(Node1) n2 = Node(Test2) edge1 = Edge(Test3) t += n1{ n1:{}} t[n1][n2] = edge1{ n1:{n2:edge1} However this isnt actually ending up with the structure I want. I want it to finally end up as ..{ n1:{n2:edge1}, n2:{}}. Is there anyway I can do this simply Nathan By now you probably discovered that the networkx package can handle this. If I have this right, you want to create a digraph with a directed edge from Node1 to Node2 and this edge has the string Test3 attached to it. In networkx, this is exacty what the XDiGraph class was designed to do. Here DiGraph means directed graph and the X means you are allowed to add (any) data to the edge,for example: import networkx as nx t = nx.XDiGraph() t.add_edge( Node1, Node2, Test3) Also I am looking at having a large graph and was wondering if anyone knew of anyway I could reduce the memory requirements of this structure and improve the speed of queries on it. I m thinking writing a C extension for itis this a good idea and where would I start? Or does Python have some kind of transparent memory access module I can implement. Networkx was designed so that you can hook your own C extension in. However, making it ispeed or memory efficient is quite application dependent. I am still not clear as to exactly what class of algorithms you want to implement via a string-interval representation, and whether you demand exact alignment or whether missing/incorrect data etc. is allowed as part of the alignment problem. HTH Pieter Swart -- http://mail.python.org/mailman/listinfo/python-list