OK the first part of everything I get, that was the model you have, OK. The second part with that new table "single_values", is the idea that that is a simplified version of the more complex model you want to do? Also the way that CTE is being used, that is, get data from it then use that to *render* SQL, that's kind of wrong, there has to be another way to do what you want with that. I see some XML stuff going on and I don't know what that has to do with anything, so I'd need to see what the *query* you want to do looks like, e.g. what is the input/output you want.
But if you can work with the example I gave you, do that, I don't have a pressing need to figure out the other way you were doing it if you dont. On Fri, Mar 1, 2019 at 12:23 PM Andrew Martin <[email protected]> wrote: > > Thanks for the reply, Mike. I should've offered the SQL equivalent of what I > was going for to begin with. This is for a side project and I'm at work, so I > didn't put as much thought into the question as I should've. > > I think the link to the Polymorphic valued vertical table is what I'm after. > I'll have to dig into to understand it fully, but it looks like what I want > to do. > > Let me see if I can clear up what I was originally after. I only have SQL > server at work, but the first part is basically the same in postgres (which > is what I'll be using at some point when I get outside the bounds of what > SQLite can handle) > > The way I had the tables defined before with the Values class and the > subclassessubclass leads me to this raw SQL for what I'm wanting. Note, I've > renamed values to vals because shadowning is bad. Here's the whole > creation/populating and the queries I'm looking at. > > CREATE TABLE datasets ( > id INTEGER NOT NULL, > uid varchar(50) NOT NULL, > name varchar(50) NOT NULL, > description varchar(50), > CONSTRAINT pk_datasets PRIMARY KEY (id) > ) > > CREATE TABLE variables ( > id INTEGER NOT NULL, > uid varchar(50) NOT NULL, > dataset_id INT NOT NULL, > name varchar(50) NOT NULL, > description varchar(50), > group_var_col_id INTEGER, > group_var_row_id INTEGER, > value_map varchar(50), > CONSTRAINT pk_variables PRIMARY KEY (id), > CONSTRAINT fk_variables_dataset_id_datasets FOREIGN KEY(dataset_id) > REFERENCES datasets (id), > CONSTRAINT fk_variables_group_var_col_id_variables FOREIGN > KEY(group_var_col_id) REFERENCES variables (id), > CONSTRAINT fk_variables_group_var_row_id_variables FOREIGN > KEY(group_var_row_id) REFERENCES variables (id) > ) > > INSERT INTO datasets (id, uid, name, description) > values > (1, 'a1', 'ds1', 'test1') > > INSERT INTO variables (id, uid, dataset_id, name, description) > values > (2, 'av2', 1, 'v2', 'testvar2'), > (1, 'av1', 1, 'v1', 'testvar1'), > (3, 'av3', 1, 'v3', 'testvar3') > > > CREATE TABLE vals ( > id INTEGER NOT NULL, > uid varchar(50) NOT NULL, > variable_id INT NOT NULL, > observation_id varchar(50) NOT NULL, > val_type varchar(50) NOT NULL, > CONSTRAINT pk_values PRIMARY KEY (id), > CONSTRAINT fk_values_variable_id_variables FOREIGN KEY(variable_id) > REFERENCES variables (id) > ) > > INSERT INTO vals (id, uid, variable_id, observation_id, val_type) > values > (1, 'intval1', 1, 'resp1', 'int'), > (2, 'intval2', 1, 'resp2', 'int'), > (3, 'intval3', 1, 'resp3', 'int'), > (4, 'intval4', 1, 'resp4', 'int'), > (5, 'intval5', 1, 'resp5', 'int'), > (6, 'strval1', 2, 'resp1', 'str'), > (7, 'strval2', 2, 'resp2', 'str'), > (8, 'strval3', 2, 'resp3', 'str'), > (9, 'strval4', 2, 'resp4', 'str'), > (10, 'strval5', 2, 'resp5', 'str') > > CREATE TABLE int_vals ( > id INTEGER NOT NULL, > val INTEGER, > CONSTRAINT pk_int_val PRIMARY KEY (id), > CONSTRAINT fk_int_vals_id_values FOREIGN KEY(id) REFERENCES vals (id) > ) > > INSERT INTO int_vals (id, val) > VALUES > (1, 1), > (2, 2), > (3, 3), > (4, 4), > (5, 5) > > CREATE TABLE string_vals ( > id INTEGER NOT NULL, > val varchar(50), > CONSTRAINT pk_string_values PRIMARY KEY (id), > CONSTRAINT fk_string_values_id_values FOREIGN KEY(id) REFERENCES vals (id) > ) > > INSERT INTO string_vals (id, val) > VALUES > (6, 'one'), > (7, 'two'), > (8, 'three'), > (9, 'four'), > (10, 'five') > > > SELECT COALESCE(iv.val, sv.val) as val FROM > vals v > LEFT JOIN int_vals iv on iv.id = v.id > LEFT JOIN string_vals sv on sv.id = v.id > WHERE v.variable_id = 1 > > This is only sort of close because COALESCE implicitly converts the string to > the int and blows up. But if that didn't happen it would be the behavior I > want. > > > For dynamic PIVOT queries, it works *really* differently between SQL Server > and postgres. Both of them are ugly, and SQL Server is also gross. Postgres > is more complex to set up because outputs of pivots have to have column types > defined, and SQL Server does not care at all, which I'm assuming means that > everything is cast to varchar and defeats my purpose for trying to maintain > types. > > But I'll show it in SQL Server because that's what I have in front of me. For > this case, I'm leaving the first part of coalescing alone and using a single > table to store all the values as strings. > > CREATE TABLE single_values ( > id INTEGER NOT NULL, > uid varchar(50) NOT NULL, > variable_id INT NOT NULL, > observation_id varchar(50) NOT NULL, > value_type varchar(50) NOT NULL, > value varchar(50), > CONSTRAINT pk_single_values PRIMARY KEY (id), > CONSTRAINT fk_single_values_variable_id_variables FOREIGN > KEY(variable_id) REFERENCES variables (id) > ) > > INSERt INTO single_values (id, uid, variable_id, observation_id, value_type, > value) > values > (6, 'val6', 2, 'respid1', 'string', '6'), > (7, 'val7', 2, 'respid2', 'string', '7'), > (8, 'val8', 2, 'respid3', 'string', '8'), > (9, 'val9', 2, 'respid4', 'string', '9'), > (10, 'val10', 2, 'respid5', 'string', '10'), > (1, 'val1', 1, 'respid1', 'int', '1'), > (2, 'val2', 1, 'respid2', 'int', '2'), > (3, 'val3', 1, 'respid3', 'int', '3'), > (4, 'val4', 1, 'respid4', 'int', '4'), > (5, 'val5', 1, 'respid5', 'int', '5'), > (11, 'val11', 3, 'respid1', 'float', '1.0'), > (12, 'val12', 3, 'respid2', 'float', '2.0'), > (13, 'val13', 3, 'respid3', 'float', '3.0'), > (14, 'val14', 3, 'respid4', 'float', '4.0'), > (15, 'val15', 3, 'respid5', 'float', '5.0') > > > DECLARE @DynamicPivotQuery AS NVARCHAR(MAX) > > ;WITH ColumnNames (names) AS ( > SELECT TOP 1 STUFF((SELECT ',' + QUOTENAME(name) > FROM variables > WHERE dataset_id = 1 > FOR XML PATH('')), 1,1,'') [cols] > FROM (SELECT DISTINCT name FROM variables WHERE dataset_id = 1) n > ) > > SELECT @DynamicPivotQuery = > N'SELECT observation_id, ' + (SELECT names FROM ColumnNames) + ' > FROM > (SELECT sv.observation_id, v.name, sv.value > FROM single_values sv > INNER JOIN variables v on v.id = sv.variable_id > WHERE v.dataset_id = 1) s > PIVOT(MAX(value) > FOR name IN (' + (SELECT names FROM ColumnNames) + ')) p' > --print @DynamicPivotQuery > EXEC sp_executesql @DynamicPivotQuery > > Like I said, it's not only ugly, it's also gross. When I get home from work > today, I can show how I would do it in postgres if that's at all useful to > anyone. > > Anyway, like I said above, I think your polymorphic example is where I want > to go with the data model. I just need to understand it because I hate not > understanding code I'm using. I suspect that for some of these cases, I need > to either, use pandas to do the pivot, which is fineas long as I'm keeping > types. Or I should call a stored procedure to do the dynamic pivot if I just > need to grab the whole dataset in that format. > > Thanks! > > > > On Thursday, February 28, 2019 at 12:34:16 PM UTC-6, Andrew Martin wrote: >> >> Hi All, >> >> Trying something out here for an analytics app here, and I'm not sure any of >> this is a good idea. I'm trying to create a data model that's flexible >> enough to handle a lot of different types of analysis. From plain CSVs to >> time series to survey questions with complex hierarchies. My approach is to >> have a pretty plain model and query as needed to construct dataframes that >> will end up in pandas and sci-kit learn. >> >> The basic idea is that there's a dataset that has variables associated with >> it. Variables have values associated. I want to retain type information on >> the values. If I'm reading a file and loading it, I want to store an int as >> an int or a datetime as a datetime and not stuff everything into a string >> and have to guess what things are based on variable metadata. That's where >> I'm kind of stuck. I think I have the multi-table inheritance set up >> correctly to store different types of value. I'm totally unsure how to query >> without having to know the subclass. I'd like to be able to extract values >> for pandas processing with something like this: >> >> vals = Values.value.filter_by(variable_id=123456).all() >> >> without having to know per variable which subclass to choose. I.e., I don't >> want to have to do >> vals = IntValue.value.filter_by(variable_id=123456).all() >> >> >> The second part of my question is only relevant if this whole design isn't >> hosed, which I'm fine if you tell me that it is and I need to go back to the >> drawing board. >> >> The second part is how to extract an entire dataset by ID in a way that >> would be comfortable in pandas. If I were doing this in raw SQL, I'd use a >> CTE to get the variables by dataset id and use a pivot to create the >> data-frame-shaped table. I'm confounded by how to do this in alchemy, >> especially with the polymorphic setup. >> >> >> Here are my current model definitions: >> >> >> from sqlalchemy import Column, Index, Integer, Text, DateTime, Float, >> ForeignKey >> from sqlalchemy.orm import relationship >> >> from .meta import Base >> >> >> class DataSet(Base): >> __tablename__ = 'datasets' >> id = Column(Integer, primary_key=True) >> name = Column(Text, nullable=False) >> description = Column(Text, nullable=True) >> >> >> >> class Variable(Base): >> __tablename__ = 'variables' >> id = Column(Integer, primary_key=True) >> dataset_id = Column(Integer, ForeignKey('datasets.id'), Nullable=False) >> name = Column(Text, nullable=False) >> description = Column(Text, nullable=True) >> group_var_col_id = Column(Integer, ForeignKey('variables.id'), >> nullable=True) >> group_var_row_id = Column(Integer, ForeignKey('variables.id'), >> nullable=True) >> >> value_map = Column(Text, nullable=True) #change to JSONB when move from >> SQLite to Postgres >> >> dataset = relationship('DataSet', backref='variables') >> >> >> class Value(Base): >> __tablename__ = 'values' >> id = Column(Integer, primary_key=True) >> variable_id = Column(Integer, ForeignKey('variables.id'), Nullable=False) >> observation_id = Column(Text, nullable=False) >> value_type = Column(Text, nullable=False) >> >> variable = relationship('Variable', backref='values') >> >> __mapper_args__ = {'polymorphic_on': value_type} >> >> >> class IntValue(Value): >> __tablename__ = 'int_values' >> id = Column(None, ForeignKey('values.id'), primary_key=True) >> value = Column(Integer, nullable=True) >> >> __mapper_args__ = {'polymorphic_identity': 'int'} >> >> >> class StringValue(Value): >> __tablename__ = 'string_values' >> id = Column(None, ForeignKey('values.id'), primary_key=True) >> value = Column(Text, nullable=True) >> >> __mapper_args__ = {'polymorphic_identity': 'string'} >> >> >> class FloatValue(Value): >> __tablename__ = 'float_values' >> id = Column(None, ForeignKey('values.id'), primary_key=True) >> value = Column(Float, nullable=True) >> >> __mapper_args__ = {'polymorphic_identity': 'float'} >> >> >> class DateTimeValue(Value): >> __tablename__ = 'datetime_values' >> id = Column(None, ForeignKey('values.id'), primary_key=True) >> value = Column(DateTime, nullable=True) >> >> __mapper_args__ = {'polymorphic_identity': 'datetime'} >> >> Thanks in advance for any suggestions you might have! > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
