Re: Trying to use schema when using a dataframe and from_pandas, but getting exceptions around the first field.

Bourgon, Michael Thu, 19 Sep 2019 06:46:08 -0700

> It doesn't look like 'fieldA' is a column name in the object you're passing 
> in. Does result_port_map['fieldA'] work? That's the error that's happening


Wes, thanks for the quick response.  Let me try to make a better repro.

(...half an hour later)
*expletive*.... the various parts are case sensitive.  I think I need to open a 
ticket with pypyodbc, at the least.    

Here's what happened: 
My query against the SQL server is "select fieldA", but when I do 
"result_port_map' it comes back as "fielda"... which doesn't match the "fieldA" 
in my dict.
If I change the field to match, case-wise, it succeeds.  I don't do enough 
python, but I assume case is supposed to matter with Dataframes and PyArrow?

>>> result_port_map
       fielda  b                   c                   d                       
e   f    g      h     i         j   k   l  m  n   o                p         q  
           r  s     subdate  subhour
0           1  1  2129825741647779  2129825237696360 2019-09-01 00:00:00.147  
14  245  19010  None  175135.0  D0  B1  1  A  01  11838489       20190901       
0017089  1  2019-09-01        0
1           3  1  1769537722825846  1769537267874427 2019-09-01 00:00:00.503  
14  337   5736  None  180942.0  D0  B1  1  A  01  1386152858       20190901  
000000103470  1  2019-09-01        0


But if the case doesn't match exactly, I get:

>>> fields = [
...        pa.field('fieldA', pa.int64()),
...        pa.field('b', pa.int64()),
...        pa.field('c', pa.int64()),
...        pa.field('d', pa.int64()),
...        pa.field('e', pa.timestamp('ms')),
...        pa.field('f', pa.float64()),
...        pa.field('g', pa.float64()),
...        pa.field('h', pa.float64()),
...        pa.field('i', pa.float64()),
...        pa.field('j', pa.float64()),
...        pa.field('k', pa.string()),
...        pa.field('l', pa.string()),
...        pa.field('m', pa.string()),
...        pa.field('n', pa.string()),
...        pa.field('o', pa.string()),
...        pa.field('p', pa.string()),
...        pa.field('q', pa.string()),
...        pa.field('r', pa.string()),
...        pa.field('s', pa.int16()),
...        pa.field('subdate',pa.string()),
...        pa.field('subhour',pa.int8())
... ]
>>> my_schema = pa.schema(fields)
>>> tableForcedSchema = pa.Table.from_pandas(result_port_map, schema=my_schema)
Traceback (most recent call last):
  File 
"C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py",
 line 2890, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 107, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 131, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'fieldA'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow\table.pxi", line 1174, in pyarrow.lib.Table.from_pandas
  File 
"C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py",
 line 460, in dataframe_to_arrays
    columns)
  File 
"C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py",
 line 346, in _get_columns_to_convert
    col = df[name]
  File 
"C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py",
 line 2975, in __getitem__
    indexer = self.columns.get_loc(key)
  File 
"C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py",
 line 2892, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 107, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 131, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'fieldA'
>>>

Re: Trying to use schema when using a dataframe and from_pandas, but getting exceptions around the first field.

Reply via email to