Re: Trying to use schema when using a dataframe and from_pandas, but getting exceptions around the first field.

Joris Van den Bossche Thu, 19 Sep 2019 06:53:01 -0700

Hi Michael,

Glad you figured it out. And indeed, in Arrow, field names are case
sensitive (and the same in pandas DataFrames).


Joris

On Thu, 19 Sep 2019 at 15:46, Bourgon, Michael <
[email protected]> wrote:

> > It doesn't look like 'fieldA' is a column name in the object you're
> passing in. Does result_port_map['fieldA'] work? That's the error that's
> happening
>
> Wes, thanks for the quick response.  Let me try to make a better repro.
>
> (...half an hour later)
> *expletive*.... the various parts are case sensitive.  I think I need to
> open a ticket with pypyodbc, at the least.
>
> Here's what happened:
> My query against the SQL server is "select fieldA", but when I do
> "result_port_map' it comes back as "fielda"... which doesn't match the
> "fieldA" in my dict.
> If I change the field to match, case-wise, it succeeds.  I don't do enough
> python, but I assume case is supposed to matter with Dataframes and PyArrow?
>
> >>> result_port_map
>        fielda  b                   c                   d
>      e   f    g      h     i         j   k   l  m  n   o                p
>        q             r  s     subdate  subhour
> 0           1  1  2129825741647779  2129825237696360 2019-09-01
> 00:00:00.147  14  245  19010  None  175135.0  D0  B1  1  A  01  11838489
>    20190901       0017089  1  2019-09-01        0
> 1           3  1  1769537722825846  1769537267874427 2019-09-01
> 00:00:00.503  14  337   5736  None  180942.0  D0  B1  1  A  01  1386152858
>      20190901  000000103470  1  2019-09-01        0
>
>
> But if the case doesn't match exactly, I get:
>
> >>> fields = [
> ...        pa.field('fieldA', pa.int64()),
> ...        pa.field('b', pa.int64()),
> ...        pa.field('c', pa.int64()),
> ...        pa.field('d', pa.int64()),
> ...        pa.field('e', pa.timestamp('ms')),
> ...        pa.field('f', pa.float64()),
> ...        pa.field('g', pa.float64()),
> ...        pa.field('h', pa.float64()),
> ...        pa.field('i', pa.float64()),
> ...        pa.field('j', pa.float64()),
> ...        pa.field('k', pa.string()),
> ...        pa.field('l', pa.string()),
> ...        pa.field('m', pa.string()),
> ...        pa.field('n', pa.string()),
> ...        pa.field('o', pa.string()),
> ...        pa.field('p', pa.string()),
> ...        pa.field('q', pa.string()),
> ...        pa.field('r', pa.string()),
> ...        pa.field('s', pa.int16()),
> ...        pa.field('subdate',pa.string()),
> ...        pa.field('subhour',pa.int8())
> ... ]
> >>> my_schema = pa.schema(fields)
> >>> tableForcedSchema = pa.Table.from_pandas(result_port_map,
> schema=my_schema)
> Traceback (most recent call last):
>   File
> "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py",
> line 2890, in get_loc
>     return self._engine.get_loc(key)
>   File "pandas\_libs\index.pyx", line 107, in
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas\_libs\index.pyx", line 131, in
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: 'fieldA'
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pyarrow\table.pxi", line 1174, in pyarrow.lib.Table.from_pandas
>   File
> "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py",
> line 460, in dataframe_to_arrays
>     columns)
>   File
> "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py",
> line 346, in _get_columns_to_convert
>     col = df[name]
>   File
> "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py",
> line 2975, in __getitem__
>     indexer = self.columns.get_loc(key)
>   File
> "C:\Users\michael.bourgon\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py",
> line 2892, in get_loc
>     return self._engine.get_loc(self._maybe_cast_indexer(key))
>   File "pandas\_libs\index.pyx", line 107, in
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas\_libs\index.pyx", line 131, in
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: 'fieldA'
> >>>
>

Re: Trying to use schema when using a dataframe and from_pandas, but getting exceptions around the first field.

Reply via email to