Hi all, I'd like to ask for suggestions regarding suitable datastracture for storing textual metadata along with a plain text string. The input format I have is simply text with custom tags like <tag_name tag_value>; I'd prefer to have this human readable format the original data source. For the application, the tags are extracted and stored in a custom datastructure along with the plain text.The queries should either return the tagset for a given text position (index) or reversely the text indices for a given tag-value combination.. (I posted some more detailed remarks earlier, as I was beginning with this topic http://mail.python.org/pipermail/python-list/2007-December/1130275.html http://mail.python.org/pipermail/python-list/2008-May/1141958.html
Meanwhile I managed to get working code using python native datastructures (nested defaultdicts and sets); however after some time I am now planning to adapt this code for a web-based program and am considering alternative datastructures, for simplicity I started with sqlite (which should actually suffice for the data volume and traffic in question). I put together some code, which works as expected, but I suspect somehow, that there must be better ways of doing it. Two things I am not quite clear about are using the placeholders for the data identifiers and "chaining" the SELECT parameters. I Couldn't find a way to use "?" placeholder for table or column names, hence I ended up using string interpolation for them and placeholders for the data values, like. curs.execute('SELECT * FROM "%s" WHERE "%s"==?' % (text_name, index_col), (text_index,)) is there a better way or is it not supposed to supply these identifiers programatically? For getting the matching text indices given the tags, tag_values combination I ended up with a clumsy query: combined_query_list = ['SELECT "%s" FROM "%s" WHERE "%s"==?' % (index_col, text_name, tag) for tag in tags] sql_query = " INTERSECT ".join(combined_query_list) curs.execute(sql_query, tag_values) which produces e.g.: SELECT "ind" FROM "n" WHERE "KC"==? INTERSECT SELECT "ind" FROM "n" WHERE "VN"==? or alternatively: select_begin = 'SELECT "%s" FROM "%s" WHERE ' % (index_col, text_name) where_subquery = " AND ".join('"%s"==?' % (tag,) for tag in tags) sql_query = select_begin + where_subquery with the resulting query string like: SELECT "ind" FROM "n" WHERE "KC"==? AND "VN"==? ('12', '1') (BTW, are these queries equivalent, as the outputs suggest, or are there some distinctions to be aware of?) Anyway, I can't really believe, this would be the expected way ... (I only marginally looked into sqlalchemy, which might simplify this a bit, is this true? - Currently I only use the standard lib, depending on the available server setup (python 2.6)). Thanks in advance for any suggestions or pointers on both the sql usage as well as the general datatype question. regards, Vlastimil Brom -- http://mail.python.org/mailman/listinfo/python-list