[jira] [Updated] (ARROW-1998) [Python] Table.from_pandas crashes when data frame is empty
[ https://issues.apache.org/jira/browse/ARROW-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Jimenez updated ARROW-1998: -- Environment: Windows 10 Build 15063.850 Python: 3.6.3 Numpy: 1.14.0 Pandas: 0.22.0 was: Windows 10 Build 15063.850 Numpy: 1.14.0 Pandas: 0.22.0 > [Python] Table.from_pandas crashes when data frame is empty > --- > > Key: ARROW-1998 > URL: https://issues.apache.org/jira/browse/ARROW-1998 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: Windows 10 Build 15063.850 > Python: 3.6.3 > Numpy: 1.14.0 > Pandas: 0.22.0 >Reporter: Victor Jimenez >Priority: Major > > Loading an empty CSV file, and then attempting to create a PyArrow Table from > it makes the application crash. The following code should be able to > reproduce the issue: > {code} > import numpy as np > import pandas as pd > import pyarrow as pa > FIELDS = ['id', 'name'] > NUMPY_TYPES = { > 'id': np.int64, > 'name': np.unicode > } > PYARROW_SCHEMA = pa.schema([ > pa.field('id', pa.int64()), > pa.field('name', pa.string()) > ]) > file = open('input.csv', 'w') > file.close() > df = pd.read_csv( > 'input.csv', > header=None, > names=FIELDS, > dtype=NUMPY_TYPES, > engine='c', > ) > pa.Table.from_pandas(df, schema=PYARROW_SCHEMA) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1999) [Python] from_numpy_dtype returns wrong types
[ https://issues.apache.org/jira/browse/ARROW-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Jimenez updated ARROW-1999: -- Environment: Windows 10 Build 15063.850 Python: 3.6.3 Numpy: 1.14.0 was: Windows 10 Build 15063.850 Numpy: 1.14.0 > [Python] from_numpy_dtype returns wrong types > - > > Key: ARROW-1999 > URL: https://issues.apache.org/jira/browse/ARROW-1999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: Windows 10 Build 15063.850 > Python: 3.6.3 > Numpy: 1.14.0 >Reporter: Victor Jimenez >Priority: Major > > The following code shows multiple issues when using {{from_numpy_dtype}}: > {code} > import numpy as np > import pyarrow as pa > pa.from_numpy_dtype(np.unicode) # returns DataType(bool) > pa.from_numpy_dtype(np.int) # returns DataType(bool) > pa.from_numpy_dtype(np.int64) # Fails with the following message: > # > # ArrowNotImplementedError Traceback (most recent call last) > # in () > # > 1 pa.from_numpy_dtype(np.int64) > # 2 > # > # types.pxi in pyarrow.lib.from_numpy_dtype() > # > # error.pxi in pyarrow.lib.check_status() > # > # ArrowNotImplementedError: Unsupported numpy type 32760 > {code} > Additionally, a potentially related issue is also seen when using > {{to_pandas_dtype}}: > {code} > pa.DataType.to_pandas_dtype(pa.string()) # Returns numpy.object_ > # (shouldn't it be numpy.unicode?) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1998) [Python] Table.from_pandas crashes when data frame is empty
[ https://issues.apache.org/jira/browse/ARROW-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Jimenez updated ARROW-1998: -- Environment: Windows 10 Build 15063.850 Numpy: 1.14.0 Pandas: 0.22.0 was:Windows 10 Build 15063.850 > [Python] Table.from_pandas crashes when data frame is empty > --- > > Key: ARROW-1998 > URL: https://issues.apache.org/jira/browse/ARROW-1998 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 > Environment: Windows 10 Build 15063.850 > Numpy: 1.14.0 > Pandas: 0.22.0 >Reporter: Victor Jimenez >Priority: Major > > Loading an empty CSV file, and then attempting to create a PyArrow Table from > it makes the application crash. The following code should be able to > reproduce the issue: > {code} > import numpy as np > import pandas as pd > import pyarrow as pa > FIELDS = ['id', 'name'] > NUMPY_TYPES = { > 'id': np.int64, > 'name': np.unicode > } > PYARROW_SCHEMA = pa.schema([ > pa.field('id', pa.int64()), > pa.field('name', pa.string()) > ]) > file = open('input.csv', 'w') > file.close() > df = pd.read_csv( > 'input.csv', > header=None, > names=FIELDS, > dtype=NUMPY_TYPES, > engine='c', > ) > pa.Table.from_pandas(df, schema=PYARROW_SCHEMA) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-1999) [Python] from_numpy_dtype returns wrong types
Victor Jimenez created ARROW-1999: - Summary: [Python] from_numpy_dtype returns wrong types Key: ARROW-1999 URL: https://issues.apache.org/jira/browse/ARROW-1999 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.8.0 Environment: Windows 10 Build 15063.850 Numpy: 1.14.0 Reporter: Victor Jimenez The following code shows multiple issues when using {{from_numpy_dtype}}: {code} import numpy as np import pyarrow as pa pa.from_numpy_dtype(np.unicode) # returns DataType(bool) pa.from_numpy_dtype(np.int) # returns DataType(bool) pa.from_numpy_dtype(np.int64) # Fails with the following message: # # ArrowNotImplementedError Traceback (most recent call last) # in () # > 1 pa.from_numpy_dtype(np.int64) # 2 # # types.pxi in pyarrow.lib.from_numpy_dtype() # # error.pxi in pyarrow.lib.check_status() # # ArrowNotImplementedError: Unsupported numpy type 32760 {code} Additionally, a potentially related issue is also seen when using {{to_pandas_dtype}}: {code} pa.DataType.to_pandas_dtype(pa.string()) # Returns numpy.object_ # (shouldn't it be numpy.unicode?) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-1998) [Python] Table.from_pandas crashes when data frame is empty
Victor Jimenez created ARROW-1998: - Summary: [Python] Table.from_pandas crashes when data frame is empty Key: ARROW-1998 URL: https://issues.apache.org/jira/browse/ARROW-1998 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.8.0 Environment: Windows 10 Build 15063.850 Reporter: Victor Jimenez Loading an empty CSV file, and then attempting to create a PyArrow Table from it makes the application crash. The following code should be able to reproduce the issue: {code} import numpy as np import pandas as pd import pyarrow as pa FIELDS = ['id', 'name'] NUMPY_TYPES = { 'id': np.int64, 'name': np.unicode } PYARROW_SCHEMA = pa.schema([ pa.field('id', pa.int64()), pa.field('name', pa.string()) ]) file = open('input.csv', 'w') file.close() df = pd.read_csv( 'input.csv', header=None, names=FIELDS, dtype=NUMPY_TYPES, engine='c', ) pa.Table.from_pandas(df, schema=PYARROW_SCHEMA) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)