[jira] [Commented] (ARROW-1743) Table to_pandas fails when index contains categorical column

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223173#comment-16223173
 ] 

ASF GitHub Bot commented on ARROW-1743:
---

Licht-T opened a new pull request #1260: ARROW-1743: [Python] Avoid non-array 
writable check
URL: https://github.com/apache/arrow/pull/1260
 
 
   This closes 
[ARROW-1743](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-1743).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Table to_pandas fails when index contains categorical column
> 
>
> Key: ARROW-1743
> URL: https://issues.apache.org/jira/browse/ARROW-1743
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Brian Pendleton
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
>
> Categorical columns in the index of a dataframe are causing a roundtrip 
> failure.  
> {code}
> >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
> >>> df['a'] = df.a.astype('category')
> >>> df = df.set_index('a')
> >>> tbl = pa.Table.from_pandas(df)
> >>> tbl.to_pandas()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas
>   File 
> "C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py",
>  line 303, in table_to_blockmanager
> if not values.flags.writeable:
> AttributeError: 'Categorical' object has no attribute 'flags'
> {code}
> Works as expected when you don't change have the categorical:
> {code}
> >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
> >>> df = df.set_index('a')
> >>> tbl = pa.Table.from_pandas(df)
> >>> tbl.to_pandas()
>b
> a
> 1  1
> 2  2
> 3  3
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1743) Table to_pandas fails when index contains categorical column

2017-10-27 Thread Licht Takeuchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Licht Takeuchi reassigned ARROW-1743:
-

Assignee: Licht Takeuchi

> Table to_pandas fails when index contains categorical column
> 
>
> Key: ARROW-1743
> URL: https://issues.apache.org/jira/browse/ARROW-1743
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Brian Pendleton
>Assignee: Licht Takeuchi
>
> Categorical columns in the index of a dataframe are causing a roundtrip 
> failure.  
> {code}
> >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
> >>> df['a'] = df.a.astype('category')
> >>> df = df.set_index('a')
> >>> tbl = pa.Table.from_pandas(df)
> >>> tbl.to_pandas()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas
>   File 
> "C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py",
>  line 303, in table_to_blockmanager
> if not values.flags.writeable:
> AttributeError: 'Categorical' object has no attribute 'flags'
> {code}
> Works as expected when you don't change have the categorical:
> {code}
> >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
> >>> df = df.set_index('a')
> >>> tbl = pa.Table.from_pandas(df)
> >>> tbl.to_pandas()
>b
> a
> 1  1
> 2  2
> 3  3
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1689) [Python] Categorical Indices Should Be Zero-Copy

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223122#comment-16223122
 ] 

ASF GitHub Bot commented on ARROW-1689:
---

Licht-T commented on issue #1237: ARROW-1689: [Python] Implement zero-copy 
conversions for DictionaryArray
URL: https://github.com/apache/arrow/pull/1237#issuecomment-340129218
 
 
   @wesm Wow! Thanks for your great idea! I didn't get such idea!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Categorical Indices Should Be Zero-Copy
> 
>
> Key: ARROW-1689
> URL: https://issues.apache.org/jira/browse/ARROW-1689
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Nick White
>Assignee: Nick White
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> It seems like 
> [WriteIndices|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L955-L981]
>  could reuse some of the logic in 
> [ConvertValuesZeroCopy|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L1348-L1385]
>  to avoid copying the integer indices array?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1689) [Python] Categorical Indices Should Be Zero-Copy

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223121#comment-16223121
 ] 

ASF GitHub Bot commented on ARROW-1689:
---

Licht-T commented on issue #1237: ARROW-1689: [Python] Implement zero-copy 
conversions for DictionaryArray
URL: https://github.com/apache/arrow/pull/1237#issuecomment-340129218
 
 
   @wesm Wow! Thanks for your good idea! I didn't get such idea!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Categorical Indices Should Be Zero-Copy
> 
>
> Key: ARROW-1689
> URL: https://issues.apache.org/jira/browse/ARROW-1689
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Nick White
>Assignee: Nick White
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> It seems like 
> [WriteIndices|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L955-L981]
>  could reuse some of the logic in 
> [ConvertValuesZeroCopy|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L1348-L1385]
>  to avoid copying the integer indices array?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1609) Plasma: Build fails with Xcode 9.0

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222877#comment-16222877
 ] 

ASF GitHub Bot commented on ARROW-1609:
---

pcmoritz commented on issue #1144: ARROW-1609: [Plasma] Xcode 9 compilation 
workaround
URL: https://github.com/apache/arrow/pull/1144#issuecomment-340094718
 
 
   I was still using XCode 8 but installing version 9 now and looking into this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Plasma: Build fails with Xcode 9.0
> --
>
> Key: ARROW-1609
> URL: https://issues.apache.org/jira/browse/ARROW-1609
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Philipp Moritz
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Tensorflow has the same issue: 
> https://github.com/tensorflow/tensorflow/issues/13220
> {code}
> [4/102] Building CXX object src/plasma/CMakeFiles/plasma_store.dir/store.cc.o
> FAILED: src/plasma/CMakeFiles/plasma_store.dir/store.cc.o
> /usr/local/bin/ccache 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
>-isystem /Users/ukorn/miniconda3/envs/pyarrow-dev/include -isystem 
> googletest_ep-prefix/src/googletest_ep/include -isystem 
> gbenchmark_ep/src/gbenchmark_ep-install/include -isystem 
> ../thirdparty/hadoop/include -I../src -isystem 
> /Users/ukorn/miniconda3/envs/pyarrow-dev/include/python3.6m -I../src/plasma 
> -I../src/plasma/thirdparty -I../src/plasma/.. -O3 -DNDEBUG -Wall -std=c++11 
> -msse3 -stdlib=libc++  -Qunused-arguments  -D_XOPEN_SOURCE=500 
> -D_POSIX_C_SOURCE=200809L -fPIC -O3 -DNDEBUG   -std=gnu++11 -MD -MT 
> src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -MF 
> src/plasma/CMakeFiles/plasma_store.dir/store.cc.o.d -o 
> src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -c ../src/plasma/store.cc
> In file included from ../src/plasma/store.cc:29:
> In file included from ../src/plasma/store.h:25:
> In file included from ../src/plasma/common.h:30:
> In file included from ../src/arrow/util/logging.h:22:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/iostream:38:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/ios:216:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__locale:18:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/mutex:189:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__mutex_base:17:
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:156:1:
>  error: unknown type name 'mach_port_t'
> mach_port_t __libcpp_thread_get_port();
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:300:1:
>  error: unknown type name 'mach_port_t'
> mach_port_t __libcpp_thread_get_port() {
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:301:12:
>  error: use of undeclared identifier 'pthread_mach_thread_np'
> return pthread_mach_thread_np(pthread_self());
>^
> 3 errors generated.
> ninja: build stopped: subcommand failed.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1718) [Python] Implement casts from timestamp to date32/date64 and support in Array.from_pandas

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222844#comment-16222844
 ] 

ASF GitHub Bot commented on ARROW-1718:
---

xhochy commented on a change in pull request #1258: ARROW-1718: [C++/Python] 
Implement casts from timestamp to date32/64, properly handle NumPy 
datetime64[D] -> date32
URL: https://github.com/apache/arrow/pull/1258#discussion_r147518348
 
 

 ##
 File path: .travis.yml
 ##
 @@ -51,12 +51,12 @@ matrix:
 os: linux
 group: deprecated
 before_script:
-- export CC="gcc-4.9"
 
 Review comment:
   Is this expected to be in here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Implement casts from timestamp to date32/date64 and support in 
> Array.from_pandas
> -
>
> Key: ARROW-1718
> URL: https://issues.apache.org/jira/browse/ARROW-1718
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Bryan Cutler
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When calling {{Array.from_pandas}} with a pandas.Series of dates and 
> specifying the desired pyarrow type, an error occurs.  If the type is not 
> specified then {{from_pandas}} will interpret the data as a timestamp type.
> {code}
> import pandas as pd
> import pyarrow as pa
> import datetime
> arr = pa.array([datetime.date(2017, 10, 23)])
> c = pa.Column.from_array("d", arr)
> s = c.to_pandas()
> print(s)
> # 0   2017-10-23
> # Name: d, dtype: datetime64[ns]
> result = pa.Array.from_pandas(s, type=pa.date32())
> print(result)
> """
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
>   File 
> "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
>  line 28, in array_format
> values.append(value_format(x, 0))
>   File 
> "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
>  line 49, in value_format
> return repr(x)
>   File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
>   File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368)
> ValueError: year is out of range
> """
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1727) [Format] Expand Arrow streaming format to permit new dictionaries and deltas / additions to existing dictionaries

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222832#comment-16222832
 ] 

ASF GitHub Bot commented on ARROW-1727:
---

TheNeuralBit commented on a change in pull request #1257: ARROW-1727: [Format] 
Expand Arrow streaming format to permit deltas / additions to existing 
dictionaries
URL: https://github.com/apache/arrow/pull/1257#discussion_r147517384
 
 

 ##
 File path: format/IPC.md
 ##
 @@ -67,15 +67,18 @@ We provide a streaming format for record batches. It is 
presented as a sequence
 of encapsulated messages, each of which follows the format above. The schema
 comes first in the stream, and it is the same for all of the record batches
 that follow. If any fields in the schema are dictionary-encoded, one or more
-`DictionaryBatch` messages will follow the schema.
+`DictionaryBatch` messages will be included. `DictionaryBatch` and
+`RecordBatch` messages may be interleaved, but before any dictionary key is 
used
+in a `RecordBatch` it should be defined in a `DictionaryBatch`.
 
 ```
 
 
 ...
-
 
 ...
+
+...
 
 Review comment:
   Yeah thats fair, I can tweak it to make it clear that any dictionaries after 
the first record batch should be modifying the originals.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Expand Arrow streaming format to permit new dictionaries and deltas 
> / additions to existing dictionaries
> -
>
> Key: ARROW-1727
> URL: https://issues.apache.org/jira/browse/ARROW-1727
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1727) [Format] Expand Arrow streaming format to permit new dictionaries and deltas / additions to existing dictionaries

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222825#comment-16222825
 ] 

ASF GitHub Bot commented on ARROW-1727:
---

TheNeuralBit commented on a change in pull request #1257: ARROW-1727: [Format] 
Expand Arrow streaming format to permit deltas / additions to existing 
dictionaries
URL: https://github.com/apache/arrow/pull/1257#discussion_r147516370
 
 

 ##
 File path: format/IPC.md
 ##
 @@ -189,6 +197,10 @@ in the schema, so that dictionaries can even be used for 
multiple fields. See
 the [Physical Layout][4] document for more about the semantics of
 dictionary-encoded data.
 
+The dictionary `isDelta` flag allows dictionary batches to be modified 
mid-stream.
+A dictionary batch with `isDelta` set indicates that its vector should be
+concatenated with those of any previous batches with the same `id`.
 
 Review comment:
   Would something like the example in my initial email work?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Expand Arrow streaming format to permit new dictionaries and deltas 
> / additions to existing dictionaries
> -
>
> Key: ARROW-1727
> URL: https://issues.apache.org/jira/browse/ARROW-1727
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1663) [Java] Follow up on ARROW-1347 and make schema backward compatible

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222791#comment-16222791
 ] 

ASF GitHub Bot commented on ARROW-1663:
---

BryanCutler commented on a change in pull request #1193: ARROW-1663: use 
consistent name for null and not-null in FixedSizeLis…
URL: https://github.com/apache/arrow/pull/1193#discussion_r147513746
 
 

 ##
 File path: 
java/vector/src/test/java/org/apache/arrow/vector/pojo/TestConvert.java
 ##
 @@ -48,6 +55,73 @@
  */
 public class TestConvert {
 
+  private static String badSchemaJson = "{\n" +
+"  \"fields\" : [ {\n" +
+"\"nullable\" : true,\n" +
+"\"type\" : {\n" +
+"  \"name\" : \"struct\"\n" +
+"},\n" +
+"\"children\" : [ {\n" +
+"  \"nullable\" : true,\n" +
+"  \"type\" : {\n" +
+"\"name\" : \"list\"\n" +
+"  },\n" +
+"  \"children\" : [ {\n" +
+"\"nullable\" : true,\n" +
+"\"type\" : {\n" +
+"  \"name\" : \"null\"\n" +
+"},\n" +
+"\"children\" : [ ],\n" +
+"\"typeLayout\" : {\n" +
+"  \"vectors\" : [ ]\n" +
+"},\n" +
+"\"name\" : \"[DEFAULT]\"\n" +
+"  } ],\n" +
+"  \"typeLayout\" : {\n" +
+"\"vectors\" : [ {\n" +
+"  \"type\" : \"VALIDITY\",\n" +
+"  \"typeBitWidth\" : 1\n" +
+"}, {\n" +
+"  \"type\" : \"OFFSET\",\n" +
+"  \"typeBitWidth\" : 32\n" +
+"} ]\n" +
+"  },\n" +
+"  \"name\" : \"list\"\n" +
+"}, {\n" +
+"  \"nullable\" : true,\n" +
+"  \"type\" : {\n" +
+"\"name\" : \"fixedsizelist\",\n" +
+"\"listSize\" : 5\n" +
+"  },\n" +
+"  \"children\" : [ {\n" +
+"\"nullable\" : true,\n" +
+"\"type\" : {\n" +
+"  \"name\" : \"null\"\n" +
+"},\n" +
+"\"children\" : [ ],\n" +
+"\"typeLayout\" : {\n" +
+"  \"vectors\" : [ ]\n" +
+"},\n" +
+"\"name\" : \"$data$\"\n" +
+"  } ],\n" +
+"  \"typeLayout\" : {\n" +
+"\"vectors\" : [ {\n" +
+"  \"type\" : \"VALIDITY\",\n" +
+"  \"typeBitWidth\" : 1\n" +
+"} ]\n" +
+"  },\n" +
+"  \"name\" : \"fixedlist\"\n" +
+"} ],\n" +
+"\"typeLayout\" : {\n" +
+"  \"vectors\" : [ {\n" +
+"\"type\" : \"VALIDITY\",\n" +
+"\"typeBitWidth\" : 1\n" +
+"  } ]\n" +
+"},\n" +
+"\"name\" : \"a\"\n" +
+"  } ]\n" +
+"}\n";
 
 Review comment:
   This is a large block that's very hard to read just to check the name of a 
field.  Is there a more compact way you can test this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Follow up on ARROW-1347 and make schema backward compatible
> --
>
> Key: ARROW-1663
> URL: https://issues.apache.org/jira/browse/ARROW-1663
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> ARROW-1347 covered ListVector to have name of the field $data$ instead of 
> [DEFAULT]
> We left FixedSizeListVector behind.
> Another case is backward compatibility - if schema was created before 
> ARROW-1347 was in place  application may still suffer from side effects as it 
> would not be updated based on new code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1663) [Java] Follow up on ARROW-1347 and make schema backward compatible

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222787#comment-16222787
 ] 

ASF GitHub Bot commented on ARROW-1663:
---

BryanCutler commented on a change in pull request #1193: ARROW-1663: use 
consistent name for null and not-null in FixedSizeLis…
URL: https://github.com/apache/arrow/pull/1193#discussion_r147513485
 
 

 ##
 File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestListVector.java
 ##
 @@ -633,11 +633,11 @@ public void testGetBufferAddress() throws Exception {
   public void testConsistentChildName() throws Exception {
 try (ListVector listVector = ListVector.empty("sourceVector", allocator)) {
   String emptyListStr = listVector.getField().toString();
-  assertTrue(emptyListStr.contains(ListVector.DATA_VECTOR_NAME));
+  Assert.assertTrue(emptyListStr.contains(ListVector.DATA_VECTOR_NAME));
 
-  listVector.addOrGetVector(FieldType.nullable(MinorType.INT.getType()));
+  
listVector.addOrGetVector(FieldType.nullable(Types.MinorType.INT.getType()));
   String emptyVectorStr = listVector.getField().toString();
-  assertTrue(emptyVectorStr.contains(ListVector.DATA_VECTOR_NAME));
+  Assert.assertTrue(emptyVectorStr.contains(ListVector.DATA_VECTOR_NAME));
 
 Review comment:
   The rest of the tests in this file match the way it was previously.  If 
you're going to change it here then I would prefer making the others consistent 
as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Follow up on ARROW-1347 and make schema backward compatible
> --
>
> Key: ARROW-1663
> URL: https://issues.apache.org/jira/browse/ARROW-1663
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> ARROW-1347 covered ListVector to have name of the field $data$ instead of 
> [DEFAULT]
> We left FixedSizeListVector behind.
> Another case is backward compatibility - if schema was created before 
> ARROW-1347 was in place  application may still suffer from side effects as it 
> would not be updated based on new code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1718) [Python] Implement casts from timestamp to date32/date64 and support in Array.from_pandas

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222775#comment-16222775
 ] 

ASF GitHub Bot commented on ARROW-1718:
---

BryanCutler commented on issue #1258: ARROW-1718: [C++/Python] Implement casts 
from timestamp to date32/64, properly handle NumPy datetime64[D] -> date32
URL: https://github.com/apache/arrow/pull/1258#issuecomment-340080163
 
 
   +1 looks good, thanks for doing this!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Implement casts from timestamp to date32/date64 and support in 
> Array.from_pandas
> -
>
> Key: ARROW-1718
> URL: https://issues.apache.org/jira/browse/ARROW-1718
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Bryan Cutler
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When calling {{Array.from_pandas}} with a pandas.Series of dates and 
> specifying the desired pyarrow type, an error occurs.  If the type is not 
> specified then {{from_pandas}} will interpret the data as a timestamp type.
> {code}
> import pandas as pd
> import pyarrow as pa
> import datetime
> arr = pa.array([datetime.date(2017, 10, 23)])
> c = pa.Column.from_array("d", arr)
> s = c.to_pandas()
> print(s)
> # 0   2017-10-23
> # Name: d, dtype: datetime64[ns]
> result = pa.Array.from_pandas(s, type=pa.date32())
> print(result)
> """
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
>   File 
> "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
>  line 28, in array_format
> values.append(value_format(x, 0))
>   File 
> "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
>  line 49, in value_format
> return repr(x)
>   File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
>   File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368)
> ValueError: year is out of range
> """
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1743) Table to_pandas fails when index contains categorical column

2017-10-27 Thread Brian Pendleton (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Pendleton updated ARROW-1743:
---
Description: 
Categorical columns in the index of a dataframe are causing a roundtrip 
failure.  

{code}
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
>>> df['a'] = df.a.astype('category')
>>> df = df.set_index('a')
>>> tbl = pa.Table.from_pandas(df)
>>> tbl.to_pandas()
Traceback (most recent call last):
  File "", line 1, in 
  File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas
  File 
"C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py",
 line 303, in table_to_blockmanager
if not values.flags.writeable:
AttributeError: 'Categorical' object has no attribute 'flags'
{code}


Works as expected when you don't change have the categorical:
{code}
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
>>> df = df.set_index('a')
>>> tbl = pa.Table.from_pandas(df)
>>> tbl.to_pandas()
   b
a
1  1
2  2
3  3
{code}


  was:
Categorical columns in the index of a dataframe are causing a roundtrip 
failure.  

{code}
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
>>> df['a'] = df.a.astype('category')
>>> df = df.set_index('a')
>>> tbl = pa.Table.from_pandas(df)
>>> tbl.to_pandas()
Traceback (most recent call last):
  File "", line 1, in 
  File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas
  File 
"C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py",
 line 303, in table_to_blockmanager
if not values.flags.writeable:
AttributeError: 'Categorical' object has no attribute 'flags'
{code}




> Table to_pandas fails when index contains categorical column
> 
>
> Key: ARROW-1743
> URL: https://issues.apache.org/jira/browse/ARROW-1743
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Brian Pendleton
>
> Categorical columns in the index of a dataframe are causing a roundtrip 
> failure.  
> {code}
> >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
> >>> df['a'] = df.a.astype('category')
> >>> df = df.set_index('a')
> >>> tbl = pa.Table.from_pandas(df)
> >>> tbl.to_pandas()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "table.pxi", line 881, in pyarrow.lib.Table.to_pandas
>   File 
> "C:\Users\bpendlet\Miniconda3\envs\panpy3\lib\site-packages\pyarrow\pandas_compat.py",
>  line 303, in table_to_blockmanager
> if not values.flags.writeable:
> AttributeError: 'Categorical' object has no attribute 'flags'
> {code}
> Works as expected when you don't change have the categorical:
> {code}
> >>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
> >>> df = df.set_index('a')
> >>> tbl = pa.Table.from_pandas(df)
> >>> tbl.to_pandas()
>b
> a
> 1  1
> 2  2
> 3  3
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1718) [Python] Implement casts from timestamp to date32/date64 and support in Array.from_pandas

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222737#comment-16222737
 ] 

ASF GitHub Bot commented on ARROW-1718:
---

wesm opened a new pull request #1258: ARROW-1718: [C++/Python] Implement casts 
from timestamp to date32/64, properly handle NumPy datetime64[D] -> date32
URL: https://github.com/apache/arrow/pull/1258
 
 
   This was sort of a can of worms. cc @xhochy @cpcloud @BryanCutler 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Implement casts from timestamp to date32/date64 and support in 
> Array.from_pandas
> -
>
> Key: ARROW-1718
> URL: https://issues.apache.org/jira/browse/ARROW-1718
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Bryan Cutler
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When calling {{Array.from_pandas}} with a pandas.Series of dates and 
> specifying the desired pyarrow type, an error occurs.  If the type is not 
> specified then {{from_pandas}} will interpret the data as a timestamp type.
> {code}
> import pandas as pd
> import pyarrow as pa
> import datetime
> arr = pa.array([datetime.date(2017, 10, 23)])
> c = pa.Column.from_array("d", arr)
> s = c.to_pandas()
> print(s)
> # 0   2017-10-23
> # Name: d, dtype: datetime64[ns]
> result = pa.Array.from_pandas(s, type=pa.date32())
> print(result)
> """
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
>   File 
> "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
>  line 28, in array_format
> values.append(value_format(x, 0))
>   File 
> "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
>  line 49, in value_format
> return repr(x)
>   File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
>   File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368)
> ValueError: year is out of range
> """
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1718) [Python] Implement casts from timestamp to date32/date64 and support in Array.from_pandas

2017-10-27 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1718:
--
Labels: pull-request-available  (was: )

> [Python] Implement casts from timestamp to date32/date64 and support in 
> Array.from_pandas
> -
>
> Key: ARROW-1718
> URL: https://issues.apache.org/jira/browse/ARROW-1718
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Bryan Cutler
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When calling {{Array.from_pandas}} with a pandas.Series of dates and 
> specifying the desired pyarrow type, an error occurs.  If the type is not 
> specified then {{from_pandas}} will interpret the data as a timestamp type.
> {code}
> import pandas as pd
> import pyarrow as pa
> import datetime
> arr = pa.array([datetime.date(2017, 10, 23)])
> c = pa.Column.from_array("d", arr)
> s = c.to_pandas()
> print(s)
> # 0   2017-10-23
> # Name: d, dtype: datetime64[ns]
> result = pa.Array.from_pandas(s, type=pa.date32())
> print(result)
> """
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
>   File 
> "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
>  line 28, in array_format
> values.append(value_format(x, 0))
>   File 
> "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py",
>  line 49, in value_format
> return repr(x)
>   File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
>   File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py 
> (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368)
> ValueError: year is out of range
> """
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1742) C++: clang-format is not detected correct on OSX anymore

2017-10-27 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-1742:
--

 Summary: C++: clang-format is not detected correct on OSX anymore
 Key: ARROW-1742
 URL: https://issues.apache.org/jira/browse/ARROW-1742
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.8.0


Paths changed slightly in recent homebrew builds. We need to adjust our script 
to call the correct executable again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222508#comment-16222508
 ] 

ASF GitHub Bot commented on ARROW-1728:
---

xhochy commented on issue #1251: ARROW-1728: [C++] Run clang-format checks in 
Travis CI
URL: https://github.com/apache/arrow/pull/1251#issuecomment-339996289
 
 
   On macOS it is a simple `brew install llvm@{version}`. I think currently the 
autodetection of the correct version is broken there and only working for me as 
I have a quickfix-symlink in place, made 
https://issues.apache.org/jira/browse/ARROW-1742 for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Run clang-format checks in Travis CI
> --
>
> Key: ARROW-1728
> URL: https://issues.apache.org/jira/browse/ARROW-1728
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> I think it's reasonable to expect contributors to run clang-format on their 
> code. This may lead to a higher number of failed builds but will eliminate 
> noise diffs in unrelated patches



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222500#comment-16222500
 ] 

ASF GitHub Bot commented on ARROW-1728:
---

wesm commented on issue #1251: ARROW-1728: [C++] Run clang-format checks in 
Travis CI
URL: https://github.com/apache/arrow/pull/1251#issuecomment-339995039
 
 
   Now that we have gitbox unless contributors uncheck the "allow edits from 
maintainers" box, we can also do the clang-format ourselves if that's blocking 
a time-sensitive PR getting merged. The one downside is that getting the right 
version of clang-format on Windows and macOS may be more work than on Linux; we 
probably should add instructions to help contributors


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Run clang-format checks in Travis CI
> --
>
> Key: ARROW-1728
> URL: https://issues.apache.org/jira/browse/ARROW-1728
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> I think it's reasonable to expect contributors to run clang-format on their 
> code. This may lead to a higher number of failed builds but will eliminate 
> noise diffs in unrelated patches



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222491#comment-16222491
 ] 

ASF GitHub Bot commented on ARROW-1728:
---

xhochy commented on issue #1251: ARROW-1728: [C++] Run clang-format checks in 
Travis CI
URL: https://github.com/apache/arrow/pull/1251#issuecomment-339994265
 
 
   @wesm I'm ok with this. If this gets onerous, it might be worth taking a 
look at how easy it would be to program a bot that could `clang-format` 
someone's PR (but I guess this is not worth the effort just for Arrow).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Run clang-format checks in Travis CI
> --
>
> Key: ARROW-1728
> URL: https://issues.apache.org/jira/browse/ARROW-1728
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> I think it's reasonable to expect contributors to run clang-format on their 
> code. This may lead to a higher number of failed builds but will eliminate 
> noise diffs in unrelated patches



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222465#comment-16222465
 ] 

ASF GitHub Bot commented on ARROW-1728:
---

wesm closed pull request #1251: ARROW-1728: [C++] Run clang-format checks in 
Travis CI
URL: https://github.com/apache/arrow/pull/1251
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.travis.yml b/.travis.yml
index c682a9d9d..039ae9520 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -56,6 +56,8 @@ matrix:
 - export ARROW_TRAVIS_USE_TOOLCHAIN=1
 - export ARROW_TRAVIS_VALGRIND=1
 - export ARROW_TRAVIS_PLASMA=1
+- export ARROW_TRAVIS_CLANG_FORMAT=1
+- $TRAVIS_BUILD_DIR/ci/travis_install_clang_tools.sh
 - $TRAVIS_BUILD_DIR/ci/travis_lint.sh
 - $TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh
 script:
diff --git a/ci/travis_install_clang_tools.sh b/ci/travis_install_clang_tools.sh
old mode 100644
new mode 100755
diff --git a/ci/travis_lint.sh b/ci/travis_lint.sh
index 8c956646c..e234b7b01 100755
--- a/ci/travis_lint.sh
+++ b/ci/travis_lint.sh
@@ -26,6 +26,10 @@ pushd $TRAVIS_BUILD_DIR/cpp/lint
 cmake ..
 make lint
 
+if [ "$ARROW_TRAVIS_CLANG_FORMAT" == "1" ]; then
+  make check-format
+fi
+
 popd
 
 # Fail fast on style checks
diff --git a/ci/travis_script_cpp.sh b/ci/travis_script_cpp.sh
index a2079036c..3d61bc5b8 100755
--- a/ci/travis_script_cpp.sh
+++ b/ci/travis_script_cpp.sh
@@ -27,14 +27,6 @@ git archive HEAD --prefix=apache-arrow/ 
--output=arrow-src.tar.gz
 
 pushd $CPP_BUILD_DIR
 
-# ARROW-209: checks depending on the LLVM toolchain are disabled temporarily
-# until we are able to install the full LLVM toolchain in Travis CI again
-
-# if [ $TRAVIS_OS_NAME == "linux" ]; then
-#   make check-format
-#   make check-clang-tidy
-# fi
-
 ctest -VV -L unittest
 
 popd
diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
index a159b1e56..d8dc5df88 100644
--- a/cpp/CMakeLists.txt
+++ b/cpp/CMakeLists.txt
@@ -446,10 +446,10 @@ add_custom_target(format 
${BUILD_SUPPORT_DIR}/run_clang_format.py
 # runs clang format and exits with a non-zero exit code if any files need to 
be reformatted
 
 # TODO(wesm): Make this work in run_clang_format.py
-# add_custom_target(check-format ${BUILD_SUPPORT_DIR}/run_clang_format.py
-#   ${CLANG_FORMAT_VERSION}
-#   ${BUILD_SUPPORT_DIR}/clang_format_exclusions.txt
-#   ${CMAKE_CURRENT_SOURCE_DIR}/src 1)
+add_custom_target(check-format ${BUILD_SUPPORT_DIR}/run_clang_format.py
+   ${CLANG_FORMAT_VERSION}
+   ${BUILD_SUPPORT_DIR}/clang_format_exclusions.txt
+   ${CMAKE_CURRENT_SOURCE_DIR}/src 1)
 
 
 # "make clang-tidy" and "make check-clang-tidy" targets
diff --git a/cpp/build-support/run_clang_format.py 
b/cpp/build-support/run_clang_format.py
index f1a448f53..fcf39ecc6 100755
--- a/cpp/build-support/run_clang_format.py
+++ b/cpp/build-support/run_clang_format.py
@@ -31,6 +31,12 @@
 EXCLUDE_GLOBS_FILENAME = sys.argv[2]
 SOURCE_DIR = sys.argv[3]
 
+if len(sys.argv) > 4:
+CHECK_FORMAT = int(sys.argv[4]) == 1
+else:
+CHECK_FORMAT = False
+
+
 exclude_globs = [line.strip() for line in open(EXCLUDE_GLOBS_FILENAME, "r")]
 
 files_to_format = []
@@ -49,18 +55,24 @@
 if not excluded:
 files_to_format.append(name)
 
-# TODO(wesm): Port this to work with Python, for check-format
-# NUM_CORRECTIONS=`$CLANG_FORMAT -output-replacements-xml  $@ |
-# grep offset | wc -l`
-# if [ "$NUM_CORRECTIONS" -gt "0" ]; then
-#   echo "clang-format suggested changes, please run 'make format'"
-#   exit 1
-# fi
+if CHECK_FORMAT:
+output = subprocess.check_output([CLANG_FORMAT, '-output-replacements-xml']
+ + files_to_format,
+ stderr=subprocess.STDOUT).decode('utf8')
+
+to_fix = []
+for line in output.split('\n'):
+if 'offset' in line:
+to_fix.append(line)
 
-try:
-cmd = [CLANG_FORMAT, '-i'] + files_to_format
-subprocess.check_output(cmd, stderr=subprocess.STDOUT)
-except Exception as e:
-print(e)
-print(' '.join(cmd))
-raise
+if len(to_fix) > 0:
+print("clang-format checks failed, run 'make format' to fix")
+sys.exit(-1)
+else:
+try:
+cmd = [CLANG_FORMAT, '-i'] + files_to_format
+subprocess.check_output(cmd, stderr=subprocess.STDOUT)
+except Exception as e:
+print(e)
+print(' '.join(cmd))
+raise


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For 

[jira] [Resolved] (ARROW-1728) [C++] Run clang-format checks in Travis CI

2017-10-27 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1728.
-
Resolution: Fixed

Issue resolved by pull request 1251
[https://github.com/apache/arrow/pull/1251]

> [C++] Run clang-format checks in Travis CI
> --
>
> Key: ARROW-1728
> URL: https://issues.apache.org/jira/browse/ARROW-1728
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> I think it's reasonable to expect contributors to run clang-format on their 
> code. This may lead to a higher number of failed builds but will eliminate 
> noise diffs in unrelated patches



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1728) [C++] Run clang-format checks in Travis CI

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222463#comment-16222463
 ] 

ASF GitHub Bot commented on ARROW-1728:
---

wesm commented on issue #1251: ARROW-1728: [C++] Run clang-format checks in 
Travis CI
URL: https://github.com/apache/arrow/pull/1251#issuecomment-339989949
 
 
   +1, I'm going to go out on a limb here and merge this. If failing builds due 
to clang-format becomes onerous, I will be happy to revisit the issue


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Run clang-format checks in Travis CI
> --
>
> Key: ARROW-1728
> URL: https://issues.apache.org/jira/browse/ARROW-1728
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> I think it's reasonable to expect contributors to run clang-format on their 
> code. This may lead to a higher number of failed builds but will eliminate 
> noise diffs in unrelated patches



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1609) Plasma: Build fails with Xcode 9.0

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222309#comment-16222309
 ] 

ASF GitHub Bot commented on ARROW-1609:
---

xhochy commented on issue #1144: ARROW-1609: [Plasma] Xcode 9 compilation 
workaround
URL: https://github.com/apache/arrow/pull/1144#issuecomment-339962409
 
 
   @pcmoritz Do you also see these problems or is there something non-standard 
in my setup?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Plasma: Build fails with Xcode 9.0
> --
>
> Key: ARROW-1609
> URL: https://issues.apache.org/jira/browse/ARROW-1609
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Philipp Moritz
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Tensorflow has the same issue: 
> https://github.com/tensorflow/tensorflow/issues/13220
> {code}
> [4/102] Building CXX object src/plasma/CMakeFiles/plasma_store.dir/store.cc.o
> FAILED: src/plasma/CMakeFiles/plasma_store.dir/store.cc.o
> /usr/local/bin/ccache 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
>-isystem /Users/ukorn/miniconda3/envs/pyarrow-dev/include -isystem 
> googletest_ep-prefix/src/googletest_ep/include -isystem 
> gbenchmark_ep/src/gbenchmark_ep-install/include -isystem 
> ../thirdparty/hadoop/include -I../src -isystem 
> /Users/ukorn/miniconda3/envs/pyarrow-dev/include/python3.6m -I../src/plasma 
> -I../src/plasma/thirdparty -I../src/plasma/.. -O3 -DNDEBUG -Wall -std=c++11 
> -msse3 -stdlib=libc++  -Qunused-arguments  -D_XOPEN_SOURCE=500 
> -D_POSIX_C_SOURCE=200809L -fPIC -O3 -DNDEBUG   -std=gnu++11 -MD -MT 
> src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -MF 
> src/plasma/CMakeFiles/plasma_store.dir/store.cc.o.d -o 
> src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -c ../src/plasma/store.cc
> In file included from ../src/plasma/store.cc:29:
> In file included from ../src/plasma/store.h:25:
> In file included from ../src/plasma/common.h:30:
> In file included from ../src/arrow/util/logging.h:22:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/iostream:38:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/ios:216:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__locale:18:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/mutex:189:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__mutex_base:17:
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:156:1:
>  error: unknown type name 'mach_port_t'
> mach_port_t __libcpp_thread_get_port();
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:300:1:
>  error: unknown type name 'mach_port_t'
> mach_port_t __libcpp_thread_get_port() {
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:301:12:
>  error: use of undeclared identifier 'pthread_mach_thread_np'
> return pthread_mach_thread_np(pthread_self());
>^
> 3 errors generated.
> ninja: build stopped: subcommand failed.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1609) Plasma: Build fails with Xcode 9.0

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222306#comment-16222306
 ] 

ASF GitHub Bot commented on ARROW-1609:
---

xhochy commented on issue #1144: ARROW-1609: [Plasma] Xcode 9 compilation 
workaround
URL: https://github.com/apache/arrow/pull/1144#issuecomment-339962156
 
 
   I'm seeing this error now also pop up in the unittests of plasma on macOS.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Plasma: Build fails with Xcode 9.0
> --
>
> Key: ARROW-1609
> URL: https://issues.apache.org/jira/browse/ARROW-1609
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Philipp Moritz
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Tensorflow has the same issue: 
> https://github.com/tensorflow/tensorflow/issues/13220
> {code}
> [4/102] Building CXX object src/plasma/CMakeFiles/plasma_store.dir/store.cc.o
> FAILED: src/plasma/CMakeFiles/plasma_store.dir/store.cc.o
> /usr/local/bin/ccache 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
>-isystem /Users/ukorn/miniconda3/envs/pyarrow-dev/include -isystem 
> googletest_ep-prefix/src/googletest_ep/include -isystem 
> gbenchmark_ep/src/gbenchmark_ep-install/include -isystem 
> ../thirdparty/hadoop/include -I../src -isystem 
> /Users/ukorn/miniconda3/envs/pyarrow-dev/include/python3.6m -I../src/plasma 
> -I../src/plasma/thirdparty -I../src/plasma/.. -O3 -DNDEBUG -Wall -std=c++11 
> -msse3 -stdlib=libc++  -Qunused-arguments  -D_XOPEN_SOURCE=500 
> -D_POSIX_C_SOURCE=200809L -fPIC -O3 -DNDEBUG   -std=gnu++11 -MD -MT 
> src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -MF 
> src/plasma/CMakeFiles/plasma_store.dir/store.cc.o.d -o 
> src/plasma/CMakeFiles/plasma_store.dir/store.cc.o -c ../src/plasma/store.cc
> In file included from ../src/plasma/store.cc:29:
> In file included from ../src/plasma/store.h:25:
> In file included from ../src/plasma/common.h:30:
> In file included from ../src/arrow/util/logging.h:22:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/iostream:38:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/ios:216:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__locale:18:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/mutex:189:
> In file included from 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__mutex_base:17:
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:156:1:
>  error: unknown type name 'mach_port_t'
> mach_port_t __libcpp_thread_get_port();
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:300:1:
>  error: unknown type name 'mach_port_t'
> mach_port_t __libcpp_thread_get_port() {
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/__threading_support:301:12:
>  error: use of undeclared identifier 'pthread_mach_thread_np'
> return pthread_mach_thread_np(pthread_self());
>^
> 3 errors generated.
> ninja: build stopped: subcommand failed.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222196#comment-16222196
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

benjigoldberg commented on issue #1240: ARROW-1555 [Python] Implement Dask 
exists function
URL: https://github.com/apache/arrow/pull/1240#issuecomment-339946849
 
 
   @wesm my username on JIRA is `benjigoldberg`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)