Hi Michael, Glad you figured it out. And indeed, in Arrow, field names are case sensitive (and the same in pandas DataFrames).
Joris On Thu, 19 Sep 2019 at 15:46, Bourgon, Michael < [email protected]> wrote: > > It doesn't look like 'fieldA' is a column name in the object you're > passing in. Does result_port_map['fieldA'] work? That's the error that's > happening > > Wes, thanks for the quick response. Let me try to make a better repro. > > (...half an hour later) > *expletive*.... the various parts are case sensitive. I think I need to > open a ticket with pypyodbc, at the least. > > Here's what happened: > My query against the SQL server is "select fieldA", but when I do > "result_port_map' it comes back as "fielda"... which doesn't match the > "fieldA" in my dict. > If I change the field to match, case-wise, it succeeds. I don't do enough > python, but I assume case is supposed to matter with Dataframes and PyArrow? > > >>> result_port_map > fielda b c d > e f g h i j k l m n o p > q r s subdate subhour > 0 1 1 2129825741647779 2129825237696360 2019-09-01 > 00:00:00.147 14 245 19010 None 175135.0 D0 B1 1 A 01 11838489 > 20190901 0017089 1 2019-09-01 0 > 1 3 1 1769537722825846 1769537267874427 2019-09-01 > 00:00:00.503 14 337 5736 None 180942.0 D0 B1 1 A 01 1386152858 > 20190901 000000103470 1 2019-09-01 0 > > > But if the case doesn't match exactly, I get: > > >>> fields = [ > ... pa.field('fieldA', pa.int64()), > ... pa.field('b', pa.int64()), > ... pa.field('c', pa.int64()), > ... pa.field('d', pa.int64()), > ... pa.field('e', pa.timestamp('ms')), > ... pa.field('f', pa.float64()), > ... pa.field('g', pa.float64()), > ... pa.field('h', pa.float64()), > ... pa.field('i', pa.float64()), > ... pa.field('j', pa.float64()), > ... pa.field('k', pa.string()), > ... pa.field('l', pa.string()), > ... pa.field('m', pa.string()), > ... pa.field('n', pa.string()), > ... pa.field('o', pa.string()), > ... pa.field('p', pa.string()), > ... pa.field('q', pa.string()), > ... pa.field('r', pa.string()), > ... pa.field('s', pa.int16()), > ... pa.field('subdate',pa.string()), > ... pa.field('subhour',pa.int8()) > ... ] > >>> my_schema = pa.schema(fields) > >>> tableForcedSchema = pa.Table.from_pandas(result_port_map, > schema=my_schema) > Traceback (most recent call last): > File > "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", > line 2890, in get_loc > return self._engine.get_loc(key) > File "pandas\_libs\index.pyx", line 107, in > pandas._libs.index.IndexEngine.get_loc > File "pandas\_libs\index.pyx", line 131, in > pandas._libs.index.IndexEngine.get_loc > File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: 'fieldA' > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "pyarrow\table.pxi", line 1174, in pyarrow.lib.Table.from_pandas > File > "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", > line 460, in dataframe_to_arrays > columns) > File > "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", > line 346, in _get_columns_to_convert > col = df[name] > File > "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", > line 2975, in __getitem__ > indexer = self.columns.get_loc(key) > File > "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", > line 2892, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas\_libs\index.pyx", line 107, in > pandas._libs.index.IndexEngine.get_loc > File "pandas\_libs\index.pyx", line 131, in > pandas._libs.index.IndexEngine.get_loc > File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: 'fieldA' > >>> >
