[jira] [Commented] (ARROW-1743) Table to_pandas fails when index contains categorical column
[ https://issues.apache.org/jira/browse/ARROW-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223173#comment-16223173 ] ASF GitHub Bot commented on ARROW-1743: --- Licht-T opened a new pull request #1260: ARROW-1743: [Python] Avoid non-array writable check URL: https://github.com/apache/arrow/pull/1260 This closes [ARROW-1743](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-1743). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Table to_pandas fails when index contains categorical column > > > Key: ARROW-1743 > URL: https://issues.apache.org/jira/browse/ARROW-1743 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.1 >Reporter: Brian Pendleton >Assignee: Licht Takeuchi > Labels: pull-request-available > > Categorical columns in the index of a dataframe are causing a roundtrip > failure. > {code} > >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}) > >>> df['a'] = df.a.astype('category') > >>> df = df.set_index('a') > >>> tbl = pa.Table.from_pandas(df) > >>> tbl.to_pandas() > Traceback (most recent call last): > File "", line 1, in > File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas > File > "C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py", > line 303, in table_to_blockmanager > if not values.flags.writeable: > AttributeError: 'Categorical' object has no attribute 'flags' > {code} > Works as expected when you don't change have the categorical: > {code} > >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}) > >>> df = df.set_index('a') > >>> tbl = pa.Table.from_pandas(df) > >>> tbl.to_pandas() >b > a > 1 1 > 2 2 > 3 3 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1743) Table to_pandas fails when index contains categorical column
[ https://issues.apache.org/jira/browse/ARROW-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Licht Takeuchi reassigned ARROW-1743: - Assignee: Licht Takeuchi > Table to_pandas fails when index contains categorical column > > > Key: ARROW-1743 > URL: https://issues.apache.org/jira/browse/ARROW-1743 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.1 >Reporter: Brian Pendleton >Assignee: Licht Takeuchi > > Categorical columns in the index of a dataframe are causing a roundtrip > failure. > {code} > >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}) > >>> df['a'] = df.a.astype('category') > >>> df = df.set_index('a') > >>> tbl = pa.Table.from_pandas(df) > >>> tbl.to_pandas() > Traceback (most recent call last): > File "", line 1, in > File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas > File > "C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py", > line 303, in table_to_blockmanager > if not values.flags.writeable: > AttributeError: 'Categorical' object has no attribute 'flags' > {code} > Works as expected when you don't change have the categorical: > {code} > >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}) > >>> df = df.set_index('a') > >>> tbl = pa.Table.from_pandas(df) > >>> tbl.to_pandas() >b > a > 1 1 > 2 2 > 3 3 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1689) [Python] Categorical Indices Should Be Zero-Copy
[ https://issues.apache.org/jira/browse/ARROW-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223122#comment-16223122 ] ASF GitHub Bot commented on ARROW-1689: --- Licht-T commented on issue #1237: ARROW-1689: [Python] Implement zero-copy conversions for DictionaryArray URL: https://github.com/apache/arrow/pull/1237#issuecomment-340129218 @wesm Wow! Thanks for your great idea! I didn't get such idea! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Categorical Indices Should Be Zero-Copy > > > Key: ARROW-1689 > URL: https://issues.apache.org/jira/browse/ARROW-1689 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.7.1 >Reporter: Nick White >Assignee: Nick White > Labels: pull-request-available > Fix For: 0.8.0 > > > It seems like > [WriteIndices|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L955-L981] > could reuse some of the logic in > [ConvertValuesZeroCopy|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L1348-L1385] > to avoid copying the integer indices array? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1689) [Python] Categorical Indices Should Be Zero-Copy
[ https://issues.apache.org/jira/browse/ARROW-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223121#comment-16223121 ] ASF GitHub Bot commented on ARROW-1689: --- Licht-T commented on issue #1237: ARROW-1689: [Python] Implement zero-copy conversions for DictionaryArray URL: https://github.com/apache/arrow/pull/1237#issuecomment-340129218 @wesm Wow! Thanks for your good idea! I didn't get such idea! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Categorical Indices Should Be Zero-Copy > > > Key: ARROW-1689 > URL: https://issues.apache.org/jira/browse/ARROW-1689 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.7.1 >Reporter: Nick White >Assignee: Nick White > Labels: pull-request-available > Fix For: 0.8.0 > > > It seems like > [WriteIndices|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L955-L981] > could reuse some of the logic in > [ConvertValuesZeroCopy|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L1348-L1385] > to avoid copying the integer indices array? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1609) Plasma: Build fails with Xcode 9.0
[ https://issues.apache.org/jira/browse/ARROW-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222877#comment-16222877 ] ASF GitHub Bot commented on ARROW-1609: --- pcmoritz commented on issue #1144: ARROW-1609: [Plasma] Xcode 9 compilation workaround URL: https://github.com/apache/arrow/pull/1144#issuecomment-340094718 I was still using XCode 8 but installing version 9 now and looking into this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Plasma: Build fails with Xcode 9.0 > -- > > Key: ARROW-1609 > URL: https://issues.apache.org/jira/browse/ARROW-1609 > Project: Apache Arrow > Issue Type: Bug > Components: Plasma (C++) >Affects Versions: 0.7.0 >Reporter: Uwe L. Korn >Assignee: Philipp Moritz > Labels: pull-request-available > Fix For: 0.8.0 > > > Tensorflow has the same issue: > https://github.com/tensorflow/tensorflow/issues/13220 > {code} > [4/102] Building CXX object src/plasma/CMakeFiles/plasma_store.dir/store.cc.o > FAILED: src/plasma/CMakeFiles/plasma_store.dir/store.cc.o > /usr/local/bin/ccache > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ >-isystem /Users/ukorn/miniconda3/envs/pyarrow-dev/include -isystem > googletest_ep-prefix/src/googletest_ep/include -isystem > gbenchmark_ep/src/gbenchmark_ep-install/include -isystem > ../thirdparty/hadoop/include -I../src -isystem > /Users/ukorn/miniconda3/envs/pyarrow-dev/include/python3.6m -I../src/plasma > -I../src/plasma/thirdparty -I../src/plasma/.. -O3 -DNDEBUG -Wall -std=c++11 > -msse3 -stdlib=libc++ -Qunused-arguments -D_XOPEN_SOURCE=500 > -D_POSIX_C_SOURCE=200809L -fPIC -O3 -DNDEBUG -std=gnu++11 -MD -MT > src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -MF > src/plasma/CMakeFiles/plasma_store.dir/store.cc.o.d -o > src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -c ../src/plasma/store.cc > In file included from ../src/plasma/store.cc:29: > In file included from ../src/plasma/store.h:25: > In file included from ../src/plasma/common.h:30: > In file included from ../src/arrow/util/logging.h:22: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/iostream:38: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/ios:216: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__locale:18: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/mutex:189: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__mutex_base:17: > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:156:1: > error: unknown type name 'mach_port_t' > mach_port_t __libcpp_thread_get_port(); > ^ > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:300:1: > error: unknown type name 'mach_port_t' > mach_port_t __libcpp_thread_get_port() { > ^ > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:301:12: > error: use of undeclared identifier 'pthread_mach_thread_np' > return pthread_mach_thread_np(pthread_self()); >^ > 3 errors generated. > ninja: build stopped: subcommand failed. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1718) [Python] Implement casts from timestamp to date32/date64 and support in Array.from_pandas
[ https://issues.apache.org/jira/browse/ARROW-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222844#comment-16222844 ] ASF GitHub Bot commented on ARROW-1718: --- xhochy commented on a change in pull request #1258: ARROW-1718: [C++/Python] Implement casts from timestamp to date32/64, properly handle NumPy datetime64[D] -> date32 URL: https://github.com/apache/arrow/pull/1258#discussion_r147518348 ## File path: .travis.yml ## @@ -51,12 +51,12 @@ matrix: os: linux group: deprecated before_script: -- export CC="gcc-4.9" Review comment: Is this expected to be in here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Implement casts from timestamp to date32/date64 and support in > Array.from_pandas > - > > Key: ARROW-1718 > URL: https://issues.apache.org/jira/browse/ARROW-1718 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Bryan Cutler >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > When calling {{Array.from_pandas}} with a pandas.Series of dates and > specifying the desired pyarrow type, an error occurs. If the type is not > specified then {{from_pandas}} will interpret the data as a timestamp type. > {code} > import pandas as pd > import pyarrow as pa > import datetime > arr = pa.array([datetime.date(2017, 10, 23)]) > c = pa.Column.from_array("d", arr) > s = c.to_pandas() > print(s) > # 0 2017-10-23 > # Name: d, dtype: datetime64[ns] > result = pa.Array.from_pandas(s, type=pa.date32()) > print(result) > """ > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221) > File > "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", > line 28, in array_format > values.append(value_format(x, 0)) > File > "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", > line 49, in value_format > return repr(x) > File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535) > File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368) > ValueError: year is out of range > """ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1727) [Format] Expand Arrow streaming format to permit new dictionaries and deltas / additions to existing dictionaries
[ https://issues.apache.org/jira/browse/ARROW-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222832#comment-16222832 ] ASF GitHub Bot commented on ARROW-1727: --- TheNeuralBit commented on a change in pull request #1257: ARROW-1727: [Format] Expand Arrow streaming format to permit deltas / additions to existing dictionaries URL: https://github.com/apache/arrow/pull/1257#discussion_r147517384 ## File path: format/IPC.md ## @@ -67,15 +67,18 @@ We provide a streaming format for record batches. It is presented as a sequence of encapsulated messages, each of which follows the format above. The schema comes first in the stream, and it is the same for all of the record batches that follow. If any fields in the schema are dictionary-encoded, one or more -`DictionaryBatch` messages will follow the schema. +`DictionaryBatch` messages will be included. `DictionaryBatch` and +`RecordBatch` messages may be interleaved, but before any dictionary key is used +in a `RecordBatch` it should be defined in a `DictionaryBatch`. ``` ... - ... + +... Review comment: Yeah thats fair, I can tweak it to make it clear that any dictionaries after the first record batch should be modifying the originals. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Expand Arrow streaming format to permit new dictionaries and deltas > / additions to existing dictionaries > - > > Key: ARROW-1727 > URL: https://issues.apache.org/jira/browse/ARROW-1727 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1727) [Format] Expand Arrow streaming format to permit new dictionaries and deltas / additions to existing dictionaries
[ https://issues.apache.org/jira/browse/ARROW-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222825#comment-16222825 ] ASF GitHub Bot commented on ARROW-1727: --- TheNeuralBit commented on a change in pull request #1257: ARROW-1727: [Format] Expand Arrow streaming format to permit deltas / additions to existing dictionaries URL: https://github.com/apache/arrow/pull/1257#discussion_r147516370 ## File path: format/IPC.md ## @@ -189,6 +197,10 @@ in the schema, so that dictionaries can even be used for multiple fields. See the [Physical Layout][4] document for more about the semantics of dictionary-encoded data. +The dictionary `isDelta` flag allows dictionary batches to be modified mid-stream. +A dictionary batch with `isDelta` set indicates that its vector should be +concatenated with those of any previous batches with the same `id`. Review comment: Would something like the example in my initial email work? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Expand Arrow streaming format to permit new dictionaries and deltas > / additions to existing dictionaries > - > > Key: ARROW-1727 > URL: https://issues.apache.org/jira/browse/ARROW-1727 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1663) [Java] Follow up on ARROW-1347 and make schema backward compatible
[ https://issues.apache.org/jira/browse/ARROW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222791#comment-16222791 ] ASF GitHub Bot commented on ARROW-1663: --- BryanCutler commented on a change in pull request #1193: ARROW-1663: use consistent name for null and not-null in FixedSizeLis… URL: https://github.com/apache/arrow/pull/1193#discussion_r147513746 ## File path: java/vector/src/test/java/org/apache/arrow/vector/pojo/TestConvert.java ## @@ -48,6 +55,73 @@ */ public class TestConvert { + private static String badSchemaJson = "{\n" + +" \"fields\" : [ {\n" + +"\"nullable\" : true,\n" + +"\"type\" : {\n" + +" \"name\" : \"struct\"\n" + +"},\n" + +"\"children\" : [ {\n" + +" \"nullable\" : true,\n" + +" \"type\" : {\n" + +"\"name\" : \"list\"\n" + +" },\n" + +" \"children\" : [ {\n" + +"\"nullable\" : true,\n" + +"\"type\" : {\n" + +" \"name\" : \"null\"\n" + +"},\n" + +"\"children\" : [ ],\n" + +"\"typeLayout\" : {\n" + +" \"vectors\" : [ ]\n" + +"},\n" + +"\"name\" : \"[DEFAULT]\"\n" + +" } ],\n" + +" \"typeLayout\" : {\n" + +"\"vectors\" : [ {\n" + +" \"type\" : \"VALIDITY\",\n" + +" \"typeBitWidth\" : 1\n" + +"}, {\n" + +" \"type\" : \"OFFSET\",\n" + +" \"typeBitWidth\" : 32\n" + +"} ]\n" + +" },\n" + +" \"name\" : \"list\"\n" + +"}, {\n" + +" \"nullable\" : true,\n" + +" \"type\" : {\n" + +"\"name\" : \"fixedsizelist\",\n" + +"\"listSize\" : 5\n" + +" },\n" + +" \"children\" : [ {\n" + +"\"nullable\" : true,\n" + +"\"type\" : {\n" + +" \"name\" : \"null\"\n" + +"},\n" + +"\"children\" : [ ],\n" + +"\"typeLayout\" : {\n" + +" \"vectors\" : [ ]\n" + +"},\n" + +"\"name\" : \"$data$\"\n" + +" } ],\n" + +" \"typeLayout\" : {\n" + +"\"vectors\" : [ {\n" + +" \"type\" : \"VALIDITY\",\n" + +" \"typeBitWidth\" : 1\n" + +"} ]\n" + +" },\n" + +" \"name\" : \"fixedlist\"\n" + +"} ],\n" + +"\"typeLayout\" : {\n" + +" \"vectors\" : [ {\n" + +"\"type\" : \"VALIDITY\",\n" + +"\"typeBitWidth\" : 1\n" + +" } ]\n" + +"},\n" + +"\"name\" : \"a\"\n" + +" } ]\n" + +"}\n"; Review comment: This is a large block that's very hard to read just to check the name of a field. Is there a more compact way you can test this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Follow up on ARROW-1347 and make schema backward compatible > -- > > Key: ARROW-1663 > URL: https://issues.apache.org/jira/browse/ARROW-1663 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Yuliya Feldman >Assignee: Yuliya Feldman > Labels: pull-request-available > Fix For: 0.8.0 > > > ARROW-1347 covered ListVector to have name of the field $data$ instead of > [DEFAULT] > We left FixedSizeListVector behind. > Another case is backward compatibility - if schema was created before > ARROW-1347 was in place application may still suffer from side effects as it > would not be updated based on new code. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1663) [Java] Follow up on ARROW-1347 and make schema backward compatible
[ https://issues.apache.org/jira/browse/ARROW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222787#comment-16222787 ] ASF GitHub Bot commented on ARROW-1663: --- BryanCutler commented on a change in pull request #1193: ARROW-1663: use consistent name for null and not-null in FixedSizeLis… URL: https://github.com/apache/arrow/pull/1193#discussion_r147513485 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestListVector.java ## @@ -633,11 +633,11 @@ public void testGetBufferAddress() throws Exception { public void testConsistentChildName() throws Exception { try (ListVector listVector = ListVector.empty("sourceVector", allocator)) { String emptyListStr = listVector.getField().toString(); - assertTrue(emptyListStr.contains(ListVector.DATA_VECTOR_NAME)); + Assert.assertTrue(emptyListStr.contains(ListVector.DATA_VECTOR_NAME)); - listVector.addOrGetVector(FieldType.nullable(MinorType.INT.getType())); + listVector.addOrGetVector(FieldType.nullable(Types.MinorType.INT.getType())); String emptyVectorStr = listVector.getField().toString(); - assertTrue(emptyVectorStr.contains(ListVector.DATA_VECTOR_NAME)); + Assert.assertTrue(emptyVectorStr.contains(ListVector.DATA_VECTOR_NAME)); Review comment: The rest of the tests in this file match the way it was previously. If you're going to change it here then I would prefer making the others consistent as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Follow up on ARROW-1347 and make schema backward compatible > -- > > Key: ARROW-1663 > URL: https://issues.apache.org/jira/browse/ARROW-1663 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Yuliya Feldman >Assignee: Yuliya Feldman > Labels: pull-request-available > Fix For: 0.8.0 > > > ARROW-1347 covered ListVector to have name of the field $data$ instead of > [DEFAULT] > We left FixedSizeListVector behind. > Another case is backward compatibility - if schema was created before > ARROW-1347 was in place application may still suffer from side effects as it > would not be updated based on new code. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1718) [Python] Implement casts from timestamp to date32/date64 and support in Array.from_pandas
[ https://issues.apache.org/jira/browse/ARROW-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222775#comment-16222775 ] ASF GitHub Bot commented on ARROW-1718: --- BryanCutler commented on issue #1258: ARROW-1718: [C++/Python] Implement casts from timestamp to date32/64, properly handle NumPy datetime64[D] -> date32 URL: https://github.com/apache/arrow/pull/1258#issuecomment-340080163 +1 looks good, thanks for doing this! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Implement casts from timestamp to date32/date64 and support in > Array.from_pandas > - > > Key: ARROW-1718 > URL: https://issues.apache.org/jira/browse/ARROW-1718 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Bryan Cutler >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > When calling {{Array.from_pandas}} with a pandas.Series of dates and > specifying the desired pyarrow type, an error occurs. If the type is not > specified then {{from_pandas}} will interpret the data as a timestamp type. > {code} > import pandas as pd > import pyarrow as pa > import datetime > arr = pa.array([datetime.date(2017, 10, 23)]) > c = pa.Column.from_array("d", arr) > s = c.to_pandas() > print(s) > # 0 2017-10-23 > # Name: d, dtype: datetime64[ns] > result = pa.Array.from_pandas(s, type=pa.date32()) > print(result) > """ > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221) > File > "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", > line 28, in array_format > values.append(value_format(x, 0)) > File > "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", > line 49, in value_format > return repr(x) > File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535) > File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368) > ValueError: year is out of range > """ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1743) Table to_pandas fails when index contains categorical column
[ https://issues.apache.org/jira/browse/ARROW-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Pendleton updated ARROW-1743: --- Description: Categorical columns in the index of a dataframe are causing a roundtrip failure. {code} >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}) >>> df['a'] = df.a.astype('category') >>> df = df.set_index('a') >>> tbl = pa.Table.from_pandas(df) >>> tbl.to_pandas() Traceback (most recent call last): File "", line 1, in File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas File "C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py", line 303, in table_to_blockmanager if not values.flags.writeable: AttributeError: 'Categorical' object has no attribute 'flags' {code} Works as expected when you don't change have the categorical: {code} >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}) >>> df = df.set_index('a') >>> tbl = pa.Table.from_pandas(df) >>> tbl.to_pandas() b a 1 1 2 2 3 3 {code} was: Categorical columns in the index of a dataframe are causing a roundtrip failure. {code} >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}) >>> df['a'] = df.a.astype('category') >>> df = df.set_index('a') >>> tbl = pa.Table.from_pandas(df) >>> tbl.to_pandas() Traceback (most recent call last): File "", line 1, in File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas File "C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py", line 303, in table_to_blockmanager if not values.flags.writeable: AttributeError: 'Categorical' object has no attribute 'flags' {code} > Table to_pandas fails when index contains categorical column > > > Key: ARROW-1743 > URL: https://issues.apache.org/jira/browse/ARROW-1743 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.1 >Reporter: Brian Pendleton > > Categorical columns in the index of a dataframe are causing a roundtrip > failure. > {code} > >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}) > >>> df['a'] = df.a.astype('category') > >>> df = df.set_index('a') > >>> tbl = pa.Table.from_pandas(df) > >>> tbl.to_pandas() > Traceback (most recent call last): > File "", line 1, in > File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas > File > "C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py", > line 303, in table_to_blockmanager > if not values.flags.writeable: > AttributeError: 'Categorical' object has no attribute 'flags' > {code} > Works as expected when you don't change have the categorical: > {code} > >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}) > >>> df = df.set_index('a') > >>> tbl = pa.Table.from_pandas(df) > >>> tbl.to_pandas() >b > a > 1 1 > 2 2 > 3 3 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1718) [Python] Implement casts from timestamp to date32/date64 and support in Array.from_pandas
[ https://issues.apache.org/jira/browse/ARROW-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222737#comment-16222737 ] ASF GitHub Bot commented on ARROW-1718: --- wesm opened a new pull request #1258: ARROW-1718: [C++/Python] Implement casts from timestamp to date32/64, properly handle NumPy datetime64[D] -> date32 URL: https://github.com/apache/arrow/pull/1258 This was sort of a can of worms. cc @xhochy @cpcloud @BryanCutler This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Implement casts from timestamp to date32/date64 and support in > Array.from_pandas > - > > Key: ARROW-1718 > URL: https://issues.apache.org/jira/browse/ARROW-1718 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Bryan Cutler >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > When calling {{Array.from_pandas}} with a pandas.Series of dates and > specifying the desired pyarrow type, an error occurs. If the type is not > specified then {{from_pandas}} will interpret the data as a timestamp type. > {code} > import pandas as pd > import pyarrow as pa > import datetime > arr = pa.array([datetime.date(2017, 10, 23)]) > c = pa.Column.from_array("d", arr) > s = c.to_pandas() > print(s) > # 0 2017-10-23 > # Name: d, dtype: datetime64[ns] > result = pa.Array.from_pandas(s, type=pa.date32()) > print(result) > """ > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221) > File > "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", > line 28, in array_format > values.append(value_format(x, 0)) > File > "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", > line 49, in value_format > return repr(x) > File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535) > File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368) > ValueError: year is out of range > """ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1718) [Python] Implement casts from timestamp to date32/date64 and support in Array.from_pandas
[ https://issues.apache.org/jira/browse/ARROW-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1718: -- Labels: pull-request-available (was: ) > [Python] Implement casts from timestamp to date32/date64 and support in > Array.from_pandas > - > > Key: ARROW-1718 > URL: https://issues.apache.org/jira/browse/ARROW-1718 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Bryan Cutler >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > When calling {{Array.from_pandas}} with a pandas.Series of dates and > specifying the desired pyarrow type, an error occurs. If the type is not > specified then {{from_pandas}} will interpret the data as a timestamp type. > {code} > import pandas as pd > import pyarrow as pa > import datetime > arr = pa.array([datetime.date(2017, 10, 23)]) > c = pa.Column.from_array("d", arr) > s = c.to_pandas() > print(s) > # 0 2017-10-23 > # Name: d, dtype: datetime64[ns] > result = pa.Array.from_pandas(s, type=pa.date32()) > print(result) > """ > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221) > File > "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", > line 28, in array_format > values.append(value_format(x, 0)) > File > "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", > line 49, in value_format > return repr(x) > File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535) > File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py > (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368) > ValueError: year is out of range > """ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1742) C++: clang-format is not detected correct on OSX anymore
Uwe L. Korn created ARROW-1742: -- Summary: C++: clang-format is not detected correct on OSX anymore Key: ARROW-1742 URL: https://issues.apache.org/jira/browse/ARROW-1742 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.7.1 Reporter: Uwe L. Korn Assignee: Uwe L. Korn Fix For: 0.8.0 Paths changed slightly in recent homebrew builds. We need to adjust our script to call the correct executable again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222508#comment-16222508 ] ASF GitHub Bot commented on ARROW-1728: --- xhochy commented on issue #1251: ARROW-1728: [C++] Run clang-format checks in Travis CI URL: https://github.com/apache/arrow/pull/1251#issuecomment-339996289 On macOS it is a simple `brew install llvm@{version}`. I think currently the autodetection of the correct version is broken there and only working for me as I have a quickfix-symlink in place, made https://issues.apache.org/jira/browse/ARROW-1742 for this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Run clang-format checks in Travis CI > -- > > Key: ARROW-1728 > URL: https://issues.apache.org/jira/browse/ARROW-1728 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > I think it's reasonable to expect contributors to run clang-format on their > code. This may lead to a higher number of failed builds but will eliminate > noise diffs in unrelated patches -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222500#comment-16222500 ] ASF GitHub Bot commented on ARROW-1728: --- wesm commented on issue #1251: ARROW-1728: [C++] Run clang-format checks in Travis CI URL: https://github.com/apache/arrow/pull/1251#issuecomment-339995039 Now that we have gitbox unless contributors uncheck the "allow edits from maintainers" box, we can also do the clang-format ourselves if that's blocking a time-sensitive PR getting merged. The one downside is that getting the right version of clang-format on Windows and macOS may be more work than on Linux; we probably should add instructions to help contributors This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Run clang-format checks in Travis CI > -- > > Key: ARROW-1728 > URL: https://issues.apache.org/jira/browse/ARROW-1728 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > I think it's reasonable to expect contributors to run clang-format on their > code. This may lead to a higher number of failed builds but will eliminate > noise diffs in unrelated patches -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222491#comment-16222491 ] ASF GitHub Bot commented on ARROW-1728: --- xhochy commented on issue #1251: ARROW-1728: [C++] Run clang-format checks in Travis CI URL: https://github.com/apache/arrow/pull/1251#issuecomment-339994265 @wesm I'm ok with this. If this gets onerous, it might be worth taking a look at how easy it would be to program a bot that could `clang-format` someone's PR (but I guess this is not worth the effort just for Arrow). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Run clang-format checks in Travis CI > -- > > Key: ARROW-1728 > URL: https://issues.apache.org/jira/browse/ARROW-1728 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > I think it's reasonable to expect contributors to run clang-format on their > code. This may lead to a higher number of failed builds but will eliminate > noise diffs in unrelated patches -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222465#comment-16222465 ] ASF GitHub Bot commented on ARROW-1728: --- wesm closed pull request #1251: ARROW-1728: [C++] Run clang-format checks in Travis CI URL: https://github.com/apache/arrow/pull/1251 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.travis.yml b/.travis.yml index c682a9d9d..039ae9520 100644 --- a/.travis.yml +++ b/.travis.yml @@ -56,6 +56,8 @@ matrix: - export ARROW_TRAVIS_USE_TOOLCHAIN=1 - export ARROW_TRAVIS_VALGRIND=1 - export ARROW_TRAVIS_PLASMA=1 +- export ARROW_TRAVIS_CLANG_FORMAT=1 +- $TRAVIS_BUILD_DIR/ci/travis_install_clang_tools.sh - $TRAVIS_BUILD_DIR/ci/travis_lint.sh - $TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh script: diff --git a/ci/travis_install_clang_tools.sh b/ci/travis_install_clang_tools.sh old mode 100644 new mode 100755 diff --git a/ci/travis_lint.sh b/ci/travis_lint.sh index 8c956646c..e234b7b01 100755 --- a/ci/travis_lint.sh +++ b/ci/travis_lint.sh @@ -26,6 +26,10 @@ pushd $TRAVIS_BUILD_DIR/cpp/lint cmake .. make lint +if [ "$ARROW_TRAVIS_CLANG_FORMAT" == "1" ]; then + make check-format +fi + popd # Fail fast on style checks diff --git a/ci/travis_script_cpp.sh b/ci/travis_script_cpp.sh index a2079036c..3d61bc5b8 100755 --- a/ci/travis_script_cpp.sh +++ b/ci/travis_script_cpp.sh @@ -27,14 +27,6 @@ git archive HEAD --prefix=apache-arrow/ --output=arrow-src.tar.gz pushd $CPP_BUILD_DIR -# ARROW-209: checks depending on the LLVM toolchain are disabled temporarily -# until we are able to install the full LLVM toolchain in Travis CI again - -# if [ $TRAVIS_OS_NAME == "linux" ]; then -# make check-format -# make check-clang-tidy -# fi - ctest -VV -L unittest popd diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt index a159b1e56..d8dc5df88 100644 --- a/cpp/CMakeLists.txt +++ b/cpp/CMakeLists.txt @@ -446,10 +446,10 @@ add_custom_target(format ${BUILD_SUPPORT_DIR}/run_clang_format.py # runs clang format and exits with a non-zero exit code if any files need to be reformatted # TODO(wesm): Make this work in run_clang_format.py -# add_custom_target(check-format ${BUILD_SUPPORT_DIR}/run_clang_format.py -# ${CLANG_FORMAT_VERSION} -# ${BUILD_SUPPORT_DIR}/clang_format_exclusions.txt -# ${CMAKE_CURRENT_SOURCE_DIR}/src 1) +add_custom_target(check-format ${BUILD_SUPPORT_DIR}/run_clang_format.py + ${CLANG_FORMAT_VERSION} + ${BUILD_SUPPORT_DIR}/clang_format_exclusions.txt + ${CMAKE_CURRENT_SOURCE_DIR}/src 1) # "make clang-tidy" and "make check-clang-tidy" targets diff --git a/cpp/build-support/run_clang_format.py b/cpp/build-support/run_clang_format.py index f1a448f53..fcf39ecc6 100755 --- a/cpp/build-support/run_clang_format.py +++ b/cpp/build-support/run_clang_format.py @@ -31,6 +31,12 @@ EXCLUDE_GLOBS_FILENAME = sys.argv[2] SOURCE_DIR = sys.argv[3] +if len(sys.argv) > 4: +CHECK_FORMAT = int(sys.argv[4]) == 1 +else: +CHECK_FORMAT = False + + exclude_globs = [line.strip() for line in open(EXCLUDE_GLOBS_FILENAME, "r")] files_to_format = [] @@ -49,18 +55,24 @@ if not excluded: files_to_format.append(name) -# TODO(wesm): Port this to work with Python, for check-format -# NUM_CORRECTIONS=`$CLANG_FORMAT -output-replacements-xml $@ | -# grep offset | wc -l` -# if [ "$NUM_CORRECTIONS" -gt "0" ]; then -# echo "clang-format suggested changes, please run 'make format'" -# exit 1 -# fi +if CHECK_FORMAT: +output = subprocess.check_output([CLANG_FORMAT, '-output-replacements-xml'] + + files_to_format, + stderr=subprocess.STDOUT).decode('utf8') + +to_fix = [] +for line in output.split('\n'): +if 'offset' in line: +to_fix.append(line) -try: -cmd = [CLANG_FORMAT, '-i'] + files_to_format -subprocess.check_output(cmd, stderr=subprocess.STDOUT) -except Exception as e: -print(e) -print(' '.join(cmd)) -raise +if len(to_fix) > 0: +print("clang-format checks failed, run 'make format' to fix") +sys.exit(-1) +else: +try: +cmd = [CLANG_FORMAT, '-i'] + files_to_format +subprocess.check_output(cmd, stderr=subprocess.STDOUT) +except Exception as e: +print(e) +print(' '.join(cmd)) +raise This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For
[jira] [Resolved] (ARROW-1728) [C++] Run clang-format checks in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1728. - Resolution: Fixed Issue resolved by pull request 1251 [https://github.com/apache/arrow/pull/1251] > [C++] Run clang-format checks in Travis CI > -- > > Key: ARROW-1728 > URL: https://issues.apache.org/jira/browse/ARROW-1728 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > I think it's reasonable to expect contributors to run clang-format on their > code. This may lead to a higher number of failed builds but will eliminate > noise diffs in unrelated patches -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222463#comment-16222463 ] ASF GitHub Bot commented on ARROW-1728: --- wesm commented on issue #1251: ARROW-1728: [C++] Run clang-format checks in Travis CI URL: https://github.com/apache/arrow/pull/1251#issuecomment-339989949 +1, I'm going to go out on a limb here and merge this. If failing builds due to clang-format becomes onerous, I will be happy to revisit the issue This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Run clang-format checks in Travis CI > -- > > Key: ARROW-1728 > URL: https://issues.apache.org/jira/browse/ARROW-1728 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > I think it's reasonable to expect contributors to run clang-format on their > code. This may lead to a higher number of failed builds but will eliminate > noise diffs in unrelated patches -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1609) Plasma: Build fails with Xcode 9.0
[ https://issues.apache.org/jira/browse/ARROW-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222309#comment-16222309 ] ASF GitHub Bot commented on ARROW-1609: --- xhochy commented on issue #1144: ARROW-1609: [Plasma] Xcode 9 compilation workaround URL: https://github.com/apache/arrow/pull/1144#issuecomment-339962409 @pcmoritz Do you also see these problems or is there something non-standard in my setup? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Plasma: Build fails with Xcode 9.0 > -- > > Key: ARROW-1609 > URL: https://issues.apache.org/jira/browse/ARROW-1609 > Project: Apache Arrow > Issue Type: Bug > Components: Plasma (C++) >Affects Versions: 0.7.0 >Reporter: Uwe L. Korn >Assignee: Philipp Moritz > Labels: pull-request-available > Fix For: 0.8.0 > > > Tensorflow has the same issue: > https://github.com/tensorflow/tensorflow/issues/13220 > {code} > [4/102] Building CXX object src/plasma/CMakeFiles/plasma_store.dir/store.cc.o > FAILED: src/plasma/CMakeFiles/plasma_store.dir/store.cc.o > /usr/local/bin/ccache > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ >-isystem /Users/ukorn/miniconda3/envs/pyarrow-dev/include -isystem > googletest_ep-prefix/src/googletest_ep/include -isystem > gbenchmark_ep/src/gbenchmark_ep-install/include -isystem > ../thirdparty/hadoop/include -I../src -isystem > /Users/ukorn/miniconda3/envs/pyarrow-dev/include/python3.6m -I../src/plasma > -I../src/plasma/thirdparty -I../src/plasma/.. -O3 -DNDEBUG -Wall -std=c++11 > -msse3 -stdlib=libc++ -Qunused-arguments -D_XOPEN_SOURCE=500 > -D_POSIX_C_SOURCE=200809L -fPIC -O3 -DNDEBUG -std=gnu++11 -MD -MT > src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -MF > src/plasma/CMakeFiles/plasma_store.dir/store.cc.o.d -o > src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -c ../src/plasma/store.cc > In file included from ../src/plasma/store.cc:29: > In file included from ../src/plasma/store.h:25: > In file included from ../src/plasma/common.h:30: > In file included from ../src/arrow/util/logging.h:22: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/iostream:38: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/ios:216: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__locale:18: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/mutex:189: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__mutex_base:17: > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:156:1: > error: unknown type name 'mach_port_t' > mach_port_t __libcpp_thread_get_port(); > ^ > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:300:1: > error: unknown type name 'mach_port_t' > mach_port_t __libcpp_thread_get_port() { > ^ > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:301:12: > error: use of undeclared identifier 'pthread_mach_thread_np' > return pthread_mach_thread_np(pthread_self()); >^ > 3 errors generated. > ninja: build stopped: subcommand failed. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1609) Plasma: Build fails with Xcode 9.0
[ https://issues.apache.org/jira/browse/ARROW-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222306#comment-16222306 ] ASF GitHub Bot commented on ARROW-1609: --- xhochy commented on issue #1144: ARROW-1609: [Plasma] Xcode 9 compilation workaround URL: https://github.com/apache/arrow/pull/1144#issuecomment-339962156 I'm seeing this error now also pop up in the unittests of plasma on macOS. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Plasma: Build fails with Xcode 9.0 > -- > > Key: ARROW-1609 > URL: https://issues.apache.org/jira/browse/ARROW-1609 > Project: Apache Arrow > Issue Type: Bug > Components: Plasma (C++) >Affects Versions: 0.7.0 >Reporter: Uwe L. Korn >Assignee: Philipp Moritz > Labels: pull-request-available > Fix For: 0.8.0 > > > Tensorflow has the same issue: > https://github.com/tensorflow/tensorflow/issues/13220 > {code} > [4/102] Building CXX object src/plasma/CMakeFiles/plasma_store.dir/store.cc.o > FAILED: src/plasma/CMakeFiles/plasma_store.dir/store.cc.o > /usr/local/bin/ccache > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ >-isystem /Users/ukorn/miniconda3/envs/pyarrow-dev/include -isystem > googletest_ep-prefix/src/googletest_ep/include -isystem > gbenchmark_ep/src/gbenchmark_ep-install/include -isystem > ../thirdparty/hadoop/include -I../src -isystem > /Users/ukorn/miniconda3/envs/pyarrow-dev/include/python3.6m -I../src/plasma > -I../src/plasma/thirdparty -I../src/plasma/.. -O3 -DNDEBUG -Wall -std=c++11 > -msse3 -stdlib=libc++ -Qunused-arguments -D_XOPEN_SOURCE=500 > -D_POSIX_C_SOURCE=200809L -fPIC -O3 -DNDEBUG -std=gnu++11 -MD -MT > src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -MF > src/plasma/CMakeFiles/plasma_store.dir/store.cc.o.d -o > src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -c ../src/plasma/store.cc > In file included from ../src/plasma/store.cc:29: > In file included from ../src/plasma/store.h:25: > In file included from ../src/plasma/common.h:30: > In file included from ../src/arrow/util/logging.h:22: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/iostream:38: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/ios:216: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__locale:18: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/mutex:189: > In file included from > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__mutex_base:17: > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:156:1: > error: unknown type name 'mach_port_t' > mach_port_t __libcpp_thread_get_port(); > ^ > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:300:1: > error: unknown type name 'mach_port_t' > mach_port_t __libcpp_thread_get_port() { > ^ > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:301:12: > error: use of undeclared identifier 'pthread_mach_thread_np' > return pthread_mach_thread_np(pthread_self()); >^ > 3 errors generated. > ninja: build stopped: subcommand failed. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222196#comment-16222196 ] ASF GitHub Bot commented on ARROW-1555: --- benjigoldberg commented on issue #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#issuecomment-339946849 @wesm my username on JIRA is `benjigoldberg` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)