[jira] [Commented] (ARROW-2422) Support more filter operators on Hive partitioned Parquet files

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430628#comment-16430628
 ] 

ASF GitHub Bot commented on ARROW-2422:
---

xhochy commented on issue #1861: ARROW-2422 Support more operators for 
partition filtering
URL: https://github.com/apache/arrow/pull/1861#issuecomment-379777120
 
 
   Can you add unit tests for more than just integer as a type?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support more filter operators on Hive partitioned Parquet files
> ---
>
> Key: ARROW-2422
> URL: https://issues.apache.org/jira/browse/ARROW-2422
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
>
> After implementing basic filters ('=', '!=') on Hive partitioned Parquet 
> files (ARROW-2401), I'll extend them ('>', '<', '<=', '>=') with a new PR on 
> Github.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2326) cannot import pip installed pyarrow on OS X (10.9)

2018-04-09 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430674#comment-16430674
 ] 

Phillip Cloud commented on ARROW-2326:
--

[~xhochy] Is this fixed?

> cannot import pip installed pyarrow on OS X (10.9)
> --
>
> Key: ARROW-2326
> URL: https://issues.apache.org/jira/browse/ARROW-2326
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS X (10.9), Python 3.6
>Reporter: Paul Ivanov
>Priority: Major
> Fix For: 0.10.0
>
>
> {code:java}
> $ pip3 install pyarrow --user
> Collecting pyarrow
> Using cached pyarrow-0.8.0-cp36-cp36m-macosx_10_6_intel.whl
> Requirement already satisfied: six>=1.0.0 in 
> ./Library/Python/3.6/lib/python/site-packages (from pyarrow)
> Collecting numpy>=1.10 (from pyarrow)
> Using cached 
> numpy-1.14.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
> Installing collected packages: numpy, pyarrow
> Successfully installed numpy-1.14.2 pyarrow-0.8.0
> $ python3
> Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) 
> [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow
> Traceback (most recent call last):
> File "", line 1, in 
> File 
> "/Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/__init__.py", 
> line 32, in 
> from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: @rpath/libarrow.0.dylib
> Referenced from: 
> /Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/lib.cpython-36m-darwin.so
> Reason: image not found
> {code}
> I dug into it a bit and found that in older versions of install.rst, Wes 
> mentioned that XCode 6 had trouble with rpath, so not sure if that's what's 
> going on here for me. I'm on 10.9, I know it's really old, so if these wheels 
> can't be made to run on my ancient OS, I just wanted to report this so the 
> wheels uploaded to PyPI can reflect this incompatibility, if that is indeed 
> the case. I might also try some otool / install_name_tool tomfoolery to see 
> if I can get a workaround for myself.
> Thank you!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2305) [Python] Cython 0.25.2 compilation failure

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430753#comment-16430753
 ] 

ASF GitHub Bot commented on ARROW-2305:
---

pitrou closed pull request #1863: ARROW-2305: [Python] Bump Cython requirement 
to 0.27+
URL: https://github.com/apache/arrow/pull/1863
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/ci/msvc-build.bat b/ci/msvc-build.bat
index 678e29d58..d3f540b2d 100644
--- a/ci/msvc-build.bat
+++ b/ci/msvc-build.bat
@@ -68,10 +68,8 @@ if "%JOB%" == "Build_Debug" (
   exit /B 0
 )
 
-@rem Note: avoid Cython 0.28.0 due to 
https://github.com/cython/cython/issues/2148
 conda create -n arrow -q -y python=%PYTHON% ^
-  six pytest setuptools numpy pandas ^
-  cython=0.27.3 ^
+  six pytest setuptools numpy pandas cython ^
   thrift-cpp=0.11.0
 
 call activate arrow
diff --git a/ci/travis_script_python.sh b/ci/travis_script_python.sh
index aa3c3154c..a776c4263 100755
--- a/ci/travis_script_python.sh
+++ b/ci/travis_script_python.sh
@@ -36,13 +36,12 @@ source activate $CONDA_ENV_DIR
 python --version
 which python
 
-# Note: avoid Cython 0.28.0 due to https://github.com/cython/cython/issues/2148
 conda install -y -q pip \
   nomkl \
   cloudpickle \
   numpy=1.13.1 \
   pandas \
-  cython=0.27.3
+  cython
 
 # ARROW-2093: PyTorch increases the size of our conda dependency stack
 # significantly, and so we have disabled these tests in Travis CI for now
diff --git a/dev/release/verify-release-candidate.sh 
b/dev/release/verify-release-candidate.sh
index 34aff209a..ef058d172 100755
--- a/dev/release/verify-release-candidate.sh
+++ b/dev/release/verify-release-candidate.sh
@@ -104,7 +104,7 @@ setup_miniconda() {
 numpy \
 pandas \
 six \
-cython=0.27.3 -c conda-forge
+cython -c conda-forge
   source activate arrow-test
 }
 
diff --git a/python/manylinux1/scripts/build_virtualenvs.sh 
b/python/manylinux1/scripts/build_virtualenvs.sh
index 7e0d80cc7..a983721e9 100755
--- a/python/manylinux1/scripts/build_virtualenvs.sh
+++ b/python/manylinux1/scripts/build_virtualenvs.sh
@@ -34,7 +34,7 @@ for PYTHON_TUPLE in ${PYTHON_VERSIONS}; do
 
 echo "=== (${PYTHON}, ${U_WIDTH}) Installing build dependencies ==="
 $PIP install "numpy==1.10.4"
-$PIP install "cython==0.27.3"
+$PIP install "cython==0.28.1"
 $PIP install "pandas==0.20.3"
 $PIP install "virtualenv==15.1.0"
 
diff --git a/python/setup.py b/python/setup.py
index 7b0f17544..dd042c956 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -42,8 +42,8 @@
 # Check if we're running 64-bit Python
 is_64_bit = sys.maxsize > 2**32
 
-if Cython.__version__ < '0.19.1':
-raise Exception('Please upgrade to Cython 0.19.1 or newer')
+if Cython.__version__ < '0.27':
+raise Exception('Please upgrade to Cython 0.27 or newer')
 
 setup_dir = os.path.abspath(os.path.dirname(__file__))
 
@@ -491,7 +491,7 @@ def parse_version(root):
 ]
 },
 use_scm_version={"root": "..", "relative_to": __file__, "parse": 
parse_version},
-setup_requires=['setuptools_scm', 'cython >= 0.23'] + setup_requires,
+setup_requires=['setuptools_scm', 'cython >= 0.27'] + setup_requires,
 install_requires=install_requires,
 tests_require=['pytest', 'pandas'],
 description="Python library for Apache Arrow",


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Cython 0.25.2 compilation failure 
> ---
>
> Key: ARROW-2305
> URL: https://issues.apache.org/jira/browse/ARROW-2305
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Observed on master branch
> {code}
> Error compiling Cython file:
> 
> ...
> if hasattr(self, 'as_py'):
> return repr(self.as_py())
> else:
> return super(Scalar, self).__repr__()
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/scalar.pxi:67:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 

[jira] [Resolved] (ARROW-2305) [Python] Cython 0.25.2 compilation failure

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2305.
---
Resolution: Fixed

Issue resolved by pull request 1863
[https://github.com/apache/arrow/pull/1863]

> [Python] Cython 0.25.2 compilation failure 
> ---
>
> Key: ARROW-2305
> URL: https://issues.apache.org/jira/browse/ARROW-2305
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Observed on master branch
> {code}
> Error compiling Cython file:
> 
> ...
> if hasattr(self, 'as_py'):
> return repr(self.as_py())
> else:
> return super(Scalar, self).__repr__()
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/scalar.pxi:67:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 
> ...
> Return true if the tensors contains exactly equal data
> """
> self._validate()
> return self.tp.Equals(deref(other.tp))
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/array.pxi:571:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 
> ...
> cdef c_bool result = False
> with nogil:
> result = self.buffer.get().Equals(deref(other.buffer.get()))
> return result
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/io.pxi:675:4: Special method __eq__ must 
> be implemented via __richcmp__
> {code}
> Upgrading Cython made this go away. We should probably use {{__richcmp__}} 
> though



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2328:
-

Assignee: (was: Antoine Pitrou)

> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2369) Large (>~20 GB) files written to Parquet via PyArrow are corrupted

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430650#comment-16430650
 ] 

ASF GitHub Bot commented on ARROW-2369:
---

pitrou opened a new pull request #1866: ARROW-2369: [Python] Fix reading large 
Parquet files (> 4 GB)
URL: https://github.com/apache/arrow/pull/1866
 
 
   - Fix PythonFile.seek() for offsets > 4 GB
   - Avoid instantiating a PythonFile in ParquetFile, for efficiency


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Large (>~20 GB) files written to Parquet via PyArrow are corrupted
> --
>
> Key: ARROW-2369
> URL: https://issues.apache.org/jira/browse/ARROW-2369
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Reproduced on Ubuntu + Mac OSX
>Reporter: Justin Tan
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: Parquet, bug, pandas, parquetWriter, 
> pull-request-available, pyarrow
> Fix For: 0.10.0
>
> Attachments: Screen Shot 2018-03-30 at 11.54.01 pm.png
>
>
> When writing large Parquet files (above 10 GB or so) from Pandas to Parquet 
> via the command
> {{pq.write_table(my_df, 'table.parquet')}}
> The write succeeds, but when the parquet file is loaded, the error message
> {{ArrowIOError: Invalid parquet file. Corrupt footer.}}
> appears. This same error occurs when the parquet file is written chunkwise as 
> well. When the parquet files are small, say < 5 GB or so (drawn randomly from 
> the same dataset), everything proceeds as normal. I've also tried this with 
> Pandas df.to_parquet(), with the same results.
> Update: Looks like any DataFrame with size above ~5GB (on disk) returns the 
> same error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2100) [Python] Drop Python 3.4 support

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430746#comment-16430746
 ] 

ASF GitHub Bot commented on ARROW-2100:
---

pitrou closed pull request #1862: ARROW-2100: [Python] Drop Python 3.4 support
URL: https://github.com/apache/arrow/pull/1862
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/manylinux1/build_arrow.sh b/python/manylinux1/build_arrow.sh
index 6697733d0..9742da09f 100755
--- a/python/manylinux1/build_arrow.sh
+++ b/python/manylinux1/build_arrow.sh
@@ -26,7 +26,7 @@
 # * Copyright (c) 2013-2016, Matt Terry and Matthew Brett (BSD 2-clause)
 
 # Build different python versions with various unicode widths
-PYTHON_VERSIONS="${PYTHON_VERSIONS:-2.7,16 2.7,32 3.4,16 3.5,16 3.6,16}"
+PYTHON_VERSIONS="${PYTHON_VERSIONS:-2.7,16 2.7,32 3.5,16 3.6,16}"
 
 source /multibuild/manylinux_utils.sh
 
diff --git a/python/setup.py b/python/setup.py
index 7b0f17544..d9a68846b 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -500,7 +500,6 @@ def parse_version(root):
 classifiers=[
 'License :: OSI Approved :: Apache Software License',
 'Programming Language :: Python :: 2.7',
-'Programming Language :: Python :: 3.4',
 'Programming Language :: Python :: 3.5',
 'Programming Language :: Python :: 3.6'
 ],


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Drop Python 3.4 support
> 
>
> Key: ARROW-2100
> URL: https://issues.apache.org/jira/browse/ARROW-2100
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> conda-forge has already dropped it, Pandas dropped it in 0.21, we should also 
> think of dropping support for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430775#comment-16430775
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on issue #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#issuecomment-379803722
 
 
   Waiting for the AppVeyor build before merging this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430786#comment-16430786
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

pitrou commented on issue #1784: ARROW-2328: [C++] Fixed and unit tested 
feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#issuecomment-379806635
 
 
   Thank you! I will merge once the AppVeyor build passes (the Travis-CI 
failures in the Rust and glib builds are unrelated).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430623#comment-16430623
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180118162
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   What I'm suggesting is:
   ```cpp
   if (!options.allow_time_truncate) {
 // Ensure that intraday milliseconds have been zeroed out
 auto out_data = GetMutableValues(output, 1);
   
 if (input.null_count != 0) {
   internal::BitmapReader bit_reader(input.buffers[0]->data(), 
input.offset,
 input.length);
   
   for (int64_t i = 0; i < input.length; ++i) {
 const int64_t remainder = out_data[i] % kMillisecondsInDay;
 if (ARROW_PREDICT_FALSE(remainder > 0 && bit_reader.IsSet())) {
   ctx->SetStatus(
   Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
   break;
 }
 out_data[i] -= remainder;
 bit_reader.Next();
   }
 } else {
   for (int64_t i = 0; i < input.length; ++i) {
 const int64_t remainder = out_data[i] % kMillisecondsInDay;
 if (ARROW_PREDICT_FALSE(remainder > 0)) {
   ctx->SetStatus(
   Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
   break;
 }
 out_data[i] -= remainder;
   }
 }
   }
   ```
   
   Does it make sense?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> 

[jira] [Resolved] (ARROW-2424) [Rust] Missing import causing broken build

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2424.
---
   Resolution: Fixed
Fix Version/s: (was: 0.10.0)
   JS-0.4.0

Issue resolved by pull request 1864
[https://github.com/apache/arrow/pull/1864]

> [Rust] Missing import causing broken build
> --
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2424) [Rust] Missing import causing broken build

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430737#comment-16430737
 ] 

ASF GitHub Bot commented on ARROW-2424:
---

pitrou closed pull request #1864: ARROW-2424: [Rust] Fix build - add missing 
import
URL: https://github.com/apache/arrow/pull/1864
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/rust/src/builder.rs b/rust/src/builder.rs
index 832b2a4a8..9915a8b52 100644
--- a/rust/src/builder.rs
+++ b/rust/src/builder.rs
@@ -18,6 +18,7 @@
 use libc;
 use std::mem;
 use std::ptr;
+use std::slice;
 
 use super::buffer::*;
 use super::memory::*;


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Missing import causing broken build
> --
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430843#comment-16430843
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180156054
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   No problem :) I'm still learning arrow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2328.
---
Resolution: Fixed

Issue resolved by pull request 1784
[https://github.com/apache/arrow/pull/1784]

> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430953#comment-16430953
 ] 

ASF GitHub Bot commented on ARROW-2427:
---

pitrou opened a new pull request #1867: [WIP] ARROW-2427: [C++] Implement 
ReadAt properly
URL: https://github.com/apache/arrow/pull/1867
 
 
   Allow for concurrent I/O by avoiding locking and seeking.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] ReadAt implementations suboptimal
> ---
>
> Key: ARROW-2427
> URL: https://issues.apache.org/jira/browse/ARROW-2427
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> The {{ReadAt}} implementations for at least {{OSFile}} and 
> {{MemoryMappedFile}} take the file lock and seek. They could instead read 
> directly from the given offset, allowing concurrent I/O from multiple threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2427:
--
Labels: pull-request-available  (was: )

> [C++] ReadAt implementations suboptimal
> ---
>
> Key: ARROW-2427
> URL: https://issues.apache.org/jira/browse/ARROW-2427
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> The {{ReadAt}} implementations for at least {{OSFile}} and 
> {{MemoryMappedFile}} take the file lock and seek. They could instead read 
> directly from the given offset, allowing concurrent I/O from multiple threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2426) [CI] glib build failure

2018-04-09 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2426:
-

 Summary: [CI] glib build failure
 Key: ARROW-2426
 URL: https://issues.apache.org/jira/browse/ARROW-2426
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Antoine Pitrou


The glib build on Travis-CI fails:

[https://travis-ci.org/apache/arrow/jobs/364123364#L6840]

{code}
==> Installing gobject-introspection
==> Downloading 
https://homebrew.bintray.com/bottles/gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
==> Pouring gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
  /usr/local/Cellar/gobject-introspection/1.56.0_1: 173 files, 9.8MB
Installing gobject-introspection has failed!
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2424) [Rust] Missing import causing broken build

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430631#comment-16430631
 ] 

ASF GitHub Bot commented on ARROW-2424:
---

andygrove commented on issue #1864: ARROW-2424: [Rust] Fix build - add missing 
import
URL: https://github.com/apache/arrow/pull/1864#issuecomment-379777861
 
 
   @pitrou I updated it as requested


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Missing import causing broken build
> --
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2426) [CI] glib build failure

2018-04-09 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430630#comment-16430630
 ] 

Antoine Pitrou commented on ARROW-2426:
---

[~kou]

> [CI] glib build failure
> ---
>
> Key: ARROW-2426
> URL: https://issues.apache.org/jira/browse/ARROW-2426
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Major
>
> The glib build on Travis-CI fails:
> [https://travis-ci.org/apache/arrow/jobs/364123364#L6840]
> {code}
> ==> Installing gobject-introspection
> ==> Downloading 
> https://homebrew.bintray.com/bottles/gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
> ==> Pouring gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
>   /usr/local/Cellar/gobject-introspection/1.56.0_1: 173 files, 9.8MB
> Installing gobject-introspection has failed!
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2424) [Rust] Missing import causing broken build

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2424:
--
Labels: pull-request-available  (was: )

> [Rust] Missing import causing broken build
> --
>
> Key: ARROW-2424
> URL: https://issues.apache.org/jira/browse/ARROW-2424
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Recent merges broke the build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2428) [Python] Support ExtensionArrays in to_pandas conversion

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2428:
---
Labels: beginner  (was: )

> [Python] Support ExtensionArrays in to_pandas conversion
> 
>
> Key: ARROW-2428
> URL: https://issues.apache.org/jira/browse/ARROW-2428
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> With the next release of Pandas, it will be possible to define custom column 
> types that back a {{pandas.Series}}. Thus we will not be able to cover all 
> possible column types in the {{to_pandas}} conversion by default as we won't 
> be aware of all extension arrays.
> To enable users to create {{ExtensionArray}} instances from Arrow columns in 
> the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} 
> call where they can overload the default conversion routines with the ones 
> that produce their {{ExtensionArray}} instances.
> This should avoid additional copies in the case where we would nowadays first 
> convert the Arrow column into a default Pandas column (probably of object 
> type) and the user would afterwards convert it to a more efficient 
> {{ExtensionArray}}. This hook here will be especially useful when you build 
> {{ExtensionArrays}} where the storage is backed by Arrow.
> The meta-issue that tracks the implementation inside of Pandas is: 
> https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2429) [Python] Timestamp unit in schema changes when writing to Parquet file then reading back

2018-04-09 Thread Dave Challis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Challis updated ARROW-2429:

Description: 
When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')

print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}

  was:
When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')



print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}


> [Python] Timestamp unit in schema changes when writing to Parquet file then 
> reading back
> 
>
> Key: ARROW-2429
> URL: https://issues.apache.org/jira/browse/ARROW-2429
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python
>Reporter: Dave Challis
>Priority: Minor
>
> When creating an Arrow table from a Pandas DataFrame, the table schema 
> contains a field of type `timestamp[ns]`.
> When serialising that table to a parquet file and then immediately reading it 
> back, the schema of the table read instead contains a field with type 
> `timestamp[us]`.
>  
> {code:python}
> #!/usr/bin/env python
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> # create DataFrame with a datetime column
> df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
> df['created'] = pd.to_datetime(df['created'])
> # create Arrow table from DataFrame
> table = pa.Table.from_pandas(df, preserve_index=False)
> # write the table as a parquet file, then read it back again
> pq.write_table(table, 'foo.parquet')
> table2 = pq.read_table('foo.parquet')
> print(table.schema[0])  # pyarrow.Field (nanosecond 
> units)
> print(table2.schema[0]) # pyarrow.Field (microsecond 
> units)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2429) [Python] Timestamp unit in schema changes when writing to Parquet file then reading back

2018-04-09 Thread Dave Challis (JIRA)
Dave Challis created ARROW-2429:
---

 Summary: [Python] Timestamp unit in schema changes when writing to 
Parquet file then reading back
 Key: ARROW-2429
 URL: https://issues.apache.org/jira/browse/ARROW-2429
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
 Environment: Mac OS High Sierra
PyArrow 0.9.0 (py36_1)
Python
Reporter: Dave Challis


When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')



print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2429) [Python] Timestamp unit in schema changes when writing to Parquet file then reading back

2018-04-09 Thread Dave Challis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Challis updated ARROW-2429:

Description: 
When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

Minimal example:
 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')

print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}

  was:
When creating an Arrow table from a Pandas DataFrame, the table schema contains 
a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it 
back, the schema of the table read instead contains a field with type 
`timestamp[us]`.

 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')

print(table.schema[0])  # pyarrow.Field (nanosecond 
units)
print(table2.schema[0]) # pyarrow.Field (microsecond 
units)
{code}


> [Python] Timestamp unit in schema changes when writing to Parquet file then 
> reading back
> 
>
> Key: ARROW-2429
> URL: https://issues.apache.org/jira/browse/ARROW-2429
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python
>Reporter: Dave Challis
>Priority: Minor
>
> When creating an Arrow table from a Pandas DataFrame, the table schema 
> contains a field of type `timestamp[ns]`.
> When serialising that table to a parquet file and then immediately reading it 
> back, the schema of the table read instead contains a field with type 
> `timestamp[us]`.
> Minimal example:
>  
> {code:python}
> #!/usr/bin/env python
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> # create DataFrame with a datetime column
> df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
> df['created'] = pd.to_datetime(df['created'])
> # create Arrow table from DataFrame
> table = pa.Table.from_pandas(df, preserve_index=False)
> # write the table as a parquet file, then read it back again
> pq.write_table(table, 'foo.parquet')
> table2 = pq.read_table('foo.parquet')
> print(table.schema[0])  # pyarrow.Field (nanosecond 
> units)
> print(table2.schema[0]) # pyarrow.Field (microsecond 
> units)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1964) [Python] Expose Builder classes

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1964:
---
Description: 
Having the builder classes available from Python would be very helpful. 
Currently a construction of an Arrow array always need to have a Python list or 
numpy array as intermediate. As  the builder in combination with jemalloc are 
very efficient in building up non-chunked memory, it would be nice to directly 
use them in certain cases.

The most useful builders are the 
[StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
 and 
[DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
 as they provide functionality to create columns that are not easily 
constructed using NumPy methods in Python.

The basic approach would be to wrap the C++ classes in 
https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
 so that they can be used from Cython. Afterwards, we should start a new file 
{{python/pyarrow/builder.pxi}} where we have classes take typical Python 
objects like {{str}} and pass them on to the C++ classes. At the end, these 
classes should also return (Python accessible) {{pyarrow.Array}} instances.

  was:Having the builder classes available from Python would be very helpful. 
Currently a construction of an Arrow array always need to have a Python list or 
numpy array as intermediate. As  the builder in combination with jemalloc are 
very efficient in building up non-chunked memory, it would be nice to directly 
use them in certain cases.


> [Python] Expose Builder classes
> ---
>
> Key: ARROW-1964
> URL: https://issues.apache.org/jira/browse/ARROW-1964
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> Having the builder classes available from Python would be very helpful. 
> Currently a construction of an Arrow array always need to have a Python list 
> or numpy array as intermediate. As  the builder in combination with jemalloc 
> are very efficient in building up non-chunked memory, it would be nice to 
> directly use them in certain cases.
> The most useful builders are the 
> [StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
>  and 
> [DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
>  as they provide functionality to create columns that are not easily 
> constructed using NumPy methods in Python.
> The basic approach would be to wrap the C++ classes in 
> https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
>  so that they can be used from Cython. Afterwards, we should start a new file 
> {{python/pyarrow/builder.pxi}} where we have classes take typical Python 
> objects like {{str}} and pass them on to the C++ classes. At the end, these 
> classes should also return (Python accessible) {{pyarrow.Array}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2428) [Python] Support ExtensionArrays in to_pandas conversion

2018-04-09 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2428:
--

 Summary: [Python] Support ExtensionArrays in to_pandas conversion
 Key: ARROW-2428
 URL: https://issues.apache.org/jira/browse/ARROW-2428
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 1.0.0


With the next release of Pandas, it will be possible to define custom column 
types that back a {{pandas.Series}}. Thus we will not be able to cover all 
possible column types in the {{to_pandas}} conversion by default as we won't be 
aware of all extension arrays.

To enable users to create {{ExtensionArray}} instances from Arrow columns in 
the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} 
call where they can overload the default conversion routines with the ones that 
produce their {{ExtensionArray}} instances.

This should avoid additional copies in the case where we would nowadays first 
convert the Arrow column into a default Pandas column (probably of object type) 
and the user would afterwards convert it to a more efficient 
{{ExtensionArray}}. This hook here will be especially useful when you build 
{{ExtensionArrays}} where the storage is backed by Arrow.

The meta-issue that tracks the implementation inside of Pandas is: 
https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2326) cannot import pip installed pyarrow on OS X (10.9)

2018-04-09 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430679#comment-16430679
 ] 

Uwe L. Korn commented on ARROW-2326:


Yes it is.

> cannot import pip installed pyarrow on OS X (10.9)
> --
>
> Key: ARROW-2326
> URL: https://issues.apache.org/jira/browse/ARROW-2326
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: OS X (10.9), Python 3.6
>Reporter: Paul Ivanov
>Priority: Major
> Fix For: 0.10.0
>
>
> {code:java}
> $ pip3 install pyarrow --user
> Collecting pyarrow
> Using cached pyarrow-0.8.0-cp36-cp36m-macosx_10_6_intel.whl
> Requirement already satisfied: six>=1.0.0 in 
> ./Library/Python/3.6/lib/python/site-packages (from pyarrow)
> Collecting numpy>=1.10 (from pyarrow)
> Using cached 
> numpy-1.14.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
> Installing collected packages: numpy, pyarrow
> Successfully installed numpy-1.14.2 pyarrow-0.8.0
> $ python3
> Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) 
> [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow
> Traceback (most recent call last):
> File "", line 1, in 
> File 
> "/Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/__init__.py", 
> line 32, in 
> from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: @rpath/libarrow.0.dylib
> Referenced from: 
> /Users/pi/Library/Python/3.6/lib/python/site-packages/pyarrow/lib.cpython-36m-darwin.so
> Reason: image not found
> {code}
> I dug into it a bit and found that in older versions of install.rst, Wes 
> mentioned that XCode 6 had trouble with rpath, so not sure if that's what's 
> going on here for me. I'm on 10.9, I know it's really old, so if these wheels 
> can't be made to run on my ancient OS, I just wanted to report this so the 
> wheels uploaded to PyPI can reflect this incompatibility, if that is indeed 
> the case. I might also try some otool / install_name_tool tomfoolery to see 
> if I can get a workaround for myself.
> Thank you!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2305) [Python] Cython 0.25.2 compilation failure

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430748#comment-16430748
 ] 

ASF GitHub Bot commented on ARROW-2305:
---

pitrou commented on issue #1863: ARROW-2305: [Python] Bump Cython requirement 
to 0.27+
URL: https://github.com/apache/arrow/pull/1863#issuecomment-379798200
 
 
   AppVeyor build at https://ci.appveyor.com/project/pitrou/arrow/build/1.0.270
   
   The Travis-CI failure is unrelated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Cython 0.25.2 compilation failure 
> ---
>
> Key: ARROW-2305
> URL: https://issues.apache.org/jira/browse/ARROW-2305
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Observed on master branch
> {code}
> Error compiling Cython file:
> 
> ...
> if hasattr(self, 'as_py'):
> return repr(self.as_py())
> else:
> return super(Scalar, self).__repr__()
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/scalar.pxi:67:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 
> ...
> Return true if the tensors contains exactly equal data
> """
> self._validate()
> return self.tp.Equals(deref(other.tp))
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/array.pxi:571:4: Special method __eq__ 
> must be implemented via __richcmp__
> Error compiling Cython file:
> 
> ...
> cdef c_bool result = False
> with nogil:
> result = self.buffer.get().Equals(deref(other.buffer.get()))
> return result
> def __eq__(self, other):
>^
> 
> /home/wesm/code/arrow/python/pyarrow/io.pxi:675:4: Special method __eq__ must 
> be implemented via __richcmp__
> {code}
> Upgrading Cython made this go away. We should probably use {{__richcmp__}} 
> though



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2100) [Python] Drop Python 3.4 support

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2100.
---
Resolution: Fixed

Issue resolved by pull request 1862
[https://github.com/apache/arrow/pull/1862]

> [Python] Drop Python 3.4 support
> 
>
> Key: ARROW-2100
> URL: https://issues.apache.org/jira/browse/ARROW-2100
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> conda-forge has already dropped it, Pandas dropped it in 0.21, we should also 
> think of dropping support for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430645#comment-16430645
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180122730
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   Sure, but don't we need another branch then to handle when time truncation 
is allowed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1964) [Python] Expose Builder classes

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1964:
---
Summary: [Python] Expose Builder classes  (was: Python: Expose Builder 
classes)

> [Python] Expose Builder classes
> ---
>
> Key: ARROW-1964
> URL: https://issues.apache.org/jira/browse/ARROW-1964
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> Having the builder classes available from Python would be very helpful. 
> Currently a construction of an Arrow array always need to have a Python list 
> or numpy array as intermediate. As  the builder in combination with jemalloc 
> are very efficient in building up non-chunked memory, it would be nice to 
> directly use them in certain cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1964) [Python] Expose Builder classes

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1964:
---
Labels: beginner  (was: )

> [Python] Expose Builder classes
> ---
>
> Key: ARROW-1964
> URL: https://issues.apache.org/jira/browse/ARROW-1964
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> Having the builder classes available from Python would be very helpful. 
> Currently a construction of an Arrow array always need to have a Python list 
> or numpy array as intermediate. As  the builder in combination with jemalloc 
> are very efficient in building up non-chunked memory, it would be nice to 
> directly use them in certain cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-564:
--
Labels: beginner  (was: )

> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430734#comment-16430734
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180136727
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   Wow. Sorry, I had completely overlooked the `out_data[i] -= remainder;` line 
:-S


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2100) [Python] Drop Python 3.4 support

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2100:
-

Assignee: Antoine Pitrou

> [Python] Drop Python 3.4 support
> 
>
> Key: ARROW-2100
> URL: https://issues.apache.org/jira/browse/ARROW-2100
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> conda-forge has already dropped it, Pandas dropped it in 0.21, we should also 
> think of dropping support for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2328:
-

Assignee: Antoine Pitrou

> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2360) Add set_chunksize for RecordBatchReader in arrow/record_batch.h

2018-04-09 Thread Xianjin YE (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430118#comment-16430118
 ] 

Xianjin YE commented on ARROW-2360:
---

Ping [~wesmckinn]

> Add set_chunksize for RecordBatchReader in arrow/record_batch.h
> ---
>
> Key: ARROW-2360
> URL: https://issues.apache.org/jira/browse/ARROW-2360
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Xianjin YE
>Priority: Major
>
> As discussed in [https://github.com/apache/parquet-cpp/pull/445,] 
> Maybe it's better to expose chunksize related API in RecordBatchReader.
>  
> However RecordBatchStreamReader doesn't conforms to this requirement. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431103#comment-16431103
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] 
JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector 
Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180205035
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+import org.apache.arrow.vector.*;
+import org.apache.arrow.vector.types.DateUnit;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import java.nio.charset.Charset;
+import java.sql.*;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
+import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+
+
+/**
+ * Class that does most of the work to convert JDBC ResultSet data into Arrow 
columnar format Vector objects.
+ *
+ * @since 0.10.0
+ */
+public class JdbcToArrowUtils {
+
+private static final int DEFAULT_BUFFER_SIZE = 256;
+
+/**
+ * Create Arrow {@link Schema} object for the given JDBC {@link 
ResultSetMetaData}.
+ *
+ * This method currently performs following type mapping for JDBC SQL data 
types to corresponding Arrow data types.
+ *
+ * CHAR--> ArrowType.Utf8
+ * NCHAR   --> ArrowType.Utf8
+ * VARCHAR --> ArrowType.Utf8
+ * NVARCHAR --> ArrowType.Utf8
+ * LONGVARCHAR --> ArrowType.Utf8
+ * LONGNVARCHAR --> ArrowType.Utf8
+ * NUMERIC --> ArrowType.Decimal(precision, scale)
+ * DECIMAL --> ArrowType.Decimal(precision, scale)
+ * BIT --> ArrowType.Bool
+ * TINYINT --> ArrowType.Int(8, signed)
+ * SMALLINT --> ArrowType.Int(16, signed)
+ * INTEGER --> ArrowType.Int(32, signed)
+ * BIGINT --> ArrowType.Int(64, signed)
+ * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+ * BINARY --> ArrowType.Binary
+ * VARBINARY --> ArrowType.Binary
+ * LONGVARBINARY --> ArrowType.Binary
+ * DATE --> ArrowType.Date(DateUnit.MILLISECOND)
+ * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
+ * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
+ * CLOB --> ArrowType.Utf8
+ * BLOB --> ArrowType.Binary
+ *
+ * @param rsmd
+ * @return {@link Schema}
+ * @throws SQLException
+ */
+public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws 
SQLException {
+
+assert rsmd != null;
+
+//ImmutableList.Builder fields = ImmutableList.builder();
+List fields = new ArrayList<>();
+int columnCount = rsmd.getColumnCount();
+for (int i = 1; i <= columnCount; i++) {
+String columnName = rsmd.getColumnName(i);
+switch (rsmd.getColumnType(i)) {
+case Types.BOOLEAN:
+case Types.BIT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Bool()), null));
+break;
+case Types.TINYINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(8, true)), null));
+break;
+case Types.SMALLINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(16, true)), null));
+break;
+case Types.INTEGER:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(32, true)), null));
+break;
+case 

[jira] [Created] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-2430:
--

 Summary: MVP for branch based packaging automation
 Key: ARROW-2430
 URL: https://issues.apache.org/jira/browse/ARROW-2430
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Krisztian Szucs


Described in 
https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431126#comment-16431126
 ] 

ASF GitHub Bot commented on ARROW-2430:
---

kszucs opened a new pull request #1869: ARROW-2430: [Packaging] MVP for branch 
based packaging automation
URL: https://github.com/apache/arrow/pull/1869
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2430:
--
Labels: pull-request-available  (was: )

> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2328:
--

Assignee: Adrian

> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Assignee: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2399) Builder should not provide a set() method

2018-04-09 Thread Maximilian Roos (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Roos updated ARROW-2399:
---
Description: 
HArrays should be immutable, but we have a `set` method on Buffer that 
should not be there.

This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
own memory instead and not use Buffer?

  was:
Arrays should be immutable, but we have a `set` method on Buffer that should 
not be there.

This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
own memory instead and not use Buffer?


> Builder should not provide a set() method
> 
>
> Key: ARROW-2399
> URL: https://issues.apache.org/jira/browse/ARROW-2399
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> HArrays should be immutable, but we have a `set` method on Buffer that 
> should not be there.
> This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
> own memory instead and not use Buffer?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 7:48 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers




was (Author: kszucs):
Additional TODOs:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs commented on ARROW-2430:


Additional TODOs:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2399) Builder should not provide a set() method

2018-04-09 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431021#comment-16431021
 ] 

Antoine Pitrou commented on ARROW-2399:
---

Could you also please prefix Rust issues with "[Rust]", so that the list of 
issues gives more information? Thanks :-)

> Builder should not provide a set() method
> 
>
> Key: ARROW-2399
> URL: https://issues.apache.org/jira/browse/ARROW-2399
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> Arrays should be immutable, but we have a `set` method on Buffer that 
> should not be there.
> This is only used from the Bitmap struct. Perhaps Bitmap should maintain its 
> own memory instead and not use Buffer?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 7:50 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments
- consult about flattening the builds (remove build matrices)




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 7:59 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments
- consult about flattening the builds (remove build matrices)



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 7:59 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments
- consult about flattening the builds (remove build matrices)




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments
- consult about flattening the builds (remove build matrices)



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2430) MVP for branch based packaging automation

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431124#comment-16431124
 ] 

Krisztian Szucs edited comment on ARROW-2430 at 4/9/18 8:00 PM:


Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)
- format commit message




was (Author: kszucs):
Additional TODO notes:
- write readme
- create a docker container with the dependencies pre-installed
- not about turning off auto cancellation feature of CI servers
- setup deployments + conda deploy script
- consult about flattening the builds (remove build matrices)



> MVP for branch based packaging automation
> -
>
> Key: ARROW-2430
> URL: https://issues.apache.org/jira/browse/ARROW-2430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Described in 
> https://docs.google.com/document/d/1IyhbQpiElxTsI8HbMZ-g9EGPOtcFdtMBzEyDJv48BKc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431167#comment-16431167
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on issue #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#issuecomment-379882116
 
 
   Thank you @kszucs !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431175#comment-16431175
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on issue #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#issuecomment-379883273
 
 
   My pleasure!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1938) [Python] Error writing to partitioned Parquet dataset

2018-04-09 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud reassigned ARROW-1938:


Assignee: (was: Phillip Cloud)

> [Python] Error writing to partitioned Parquet dataset
> -
>
> Key: ARROW-1938
> URL: https://issues.apache.org/jira/browse/ARROW-1938
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Linux (Ubuntu 16.04)
>Reporter: Robert Dailey
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: ARROW-1938-test-data.csv.gz, ARROW-1938.py, 
> pyarrow_dataset_error.png
>
>
> I receive the following error after upgrading to pyarrow 0.8.0 when writing 
> to a dataset:
> * ArrowIOError: Column 3 had 187374 while previous column had 1
> The command was:
> write_table_values = {'row_group_size': 1}
> pq.write_to_dataset(pa.Table.from_pandas(df, preserve_index=True), 
> '/logs/parsed/test', partition_cols=['Product', 'year', 'month', 'day', 
> 'hour'], **write_table_values)
> I've also tried write_table_values = {'chunk_size': 1} and received the 
> same error.
> This same command works in version 0.7.1.  I am trying to troubleshoot the 
> problem but wanted to submit a ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if contain None

2018-04-09 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431276#comment-16431276
 ] 

Bryan Cutler commented on ARROW-2432:
-

I can work on this

> [Python] from_pandas fails when converting decimals if contain None
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> In [4]: s_dec
> Out[4]:
> 0 3.14
> 1 None
> dtype: object{code}
> The above error is raised when specifying decimal type.  When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2432) [Python] from_pandas fails when converting decimals if contain None

2018-04-09 Thread Bryan Cutler (JIRA)
Bryan Cutler created ARROW-2432:
---

 Summary: [Python] from_pandas fails when converting decimals if 
contain None
 Key: ARROW-2432
 URL: https://issues.apache.org/jira/browse/ARROW-2432
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Bryan Cutler


Using from_pandas to convert decimals fails if encounters a value of {{None}}. 
For example:
{code:java}
In [1]: import pyarrow as pa
...: import pandas as pd
...: from decimal import Decimal
...:

In [2]: s_dec = pd.Series([Decimal('3.14'), None])

In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
---
ArrowInvalid Traceback (most recent call last)
 in ()
> 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))

array.pxi in pyarrow.lib.Array.from_pandas()

array.pxi in pyarrow.lib.array()

error.pxi in pyarrow.lib.check_status()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
object of type NoneType but can only handle these types: decimal.Decimal

In [4]: s_dec
Out[4]:
0 3.14
1 None
dtype: object{code}

The above error is raised when specifying decimal type.  When no type is 
specified, a seg fault happens.

This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Bryan Cutler (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-2432:

Summary: [Python] from_pandas fails when converting decimals if have None 
values  (was: [Python] from_pandas fails when converting decimals if contain 
None)

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> In [4]: s_dec
> Out[4]:
> 0 3.14
> 1 None
> dtype: object{code}
> The above error is raised when specifying decimal type.  When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Bryan Cutler (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-2432:

Description: 
Using from_pandas to convert decimals fails if encounters a value of {{None}}. 
For example:
{code:java}
In [1]: import pyarrow as pa
...: import pandas as pd
...: from decimal import Decimal
...:

In [2]: s_dec = pd.Series([Decimal('3.14'), None])

In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
---
ArrowInvalid Traceback (most recent call last)
 in ()
> 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))

array.pxi in pyarrow.lib.Array.from_pandas()

array.pxi in pyarrow.lib.array()

error.pxi in pyarrow.lib.check_status()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
object of type NoneType but can only handle these types: decimal.Decimal
{code}
The above error is raised when specifying decimal type. When no type is 
specified, a seg fault happens.

This previously worked in 0.8.0.

  was:
Using from_pandas to convert decimals fails if encounters a value of {{None}}. 
For example:
{code:java}
In [1]: import pyarrow as pa
...: import pandas as pd
...: from decimal import Decimal
...:

In [2]: s_dec = pd.Series([Decimal('3.14'), None])

In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
---
ArrowInvalid Traceback (most recent call last)
 in ()
> 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))

array.pxi in pyarrow.lib.Array.from_pandas()

array.pxi in pyarrow.lib.array()

error.pxi in pyarrow.lib.check_status()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
object of type NoneType but can only handle these types: decimal.Decimal

In [4]: s_dec
Out[4]:
0 3.14
1 None
dtype: object{code}

The above error is raised when specifying decimal type.  When no type is 
specified, a seg fault happens.

This previously worked in 0.8.0.


> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431331#comment-16431331
 ] 

Antoine Pitrou commented on ARROW-2432:
---

Ow. For some reason it seems we have various code conversion paths depend on 
which API is called :-/

{code:python}
>>> data = [decimal.Decimal('3.14'), None]
>>> pa.array(data, type=pa.decimal128(12, 4))

[
  Decimal('3.1400'),
  NA
]
>>> pa.array(data, type=pa.decimal128(12, 4), from_pandas=True)

[
  Decimal('3.1400'),
  NA
]
>>> pa.Array.from_pandas(data, type=pa.decimal128(12, 4))

[
  Decimal('3.1400'),
  NA
]
>>> pa.Array.from_pandas(pd.Series(data), type=pa.decimal128(12, 4))
Traceback (most recent call last):
  File "", line 1, in 
pa.Array.from_pandas(pd.Series(data), type=pa.decimal128(12, 4))
  File "array.pxi", line 383, in pyarrow.lib.Array.from_pandas
  File "array.pxi", line 177, in pyarrow.lib.array
  File "error.pxi", line 77, in pyarrow.lib.check_status
  File "error.pxi", line 77, in pyarrow.lib.check_status
ArrowInvalid: /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1702 
code: converter.Convert()
Error converting from Python objects to Decimal: Got Python object of type 
NoneType but can only handle these types: decimal.Decimal

{code}

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431341#comment-16431341
 ] 

Bryan Cutler commented on ARROW-2432:
-

We really need to get the integration testing running regularly, or at least 
before a release

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431398#comment-16431398
 ] 

Phillip Cloud commented on ARROW-2432:
--

[~pitrou] FWIW, the code conversion paths are not specific to decimal types and 
have been around since before decimals existed. [~bryanc] If you're not already 
working on this, then I can probably get it fixed up pretty quickly.

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431438#comment-16431438
 ] 

Bryan Cutler commented on ARROW-2432:
-

It should be possible to share code paths when converting objects right?  I'd 
like to keep this with the minimum fix, lets look at possible refactoring 
after.  Thanks [~cpcloud], I already made the fix, just going to add tests.

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-09 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431450#comment-16431450
 ] 

Phillip Cloud commented on ARROW-2432:
--

[~bryanc] Awesome, thanks.

> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2387) negative decimal values get spurious rescaling error

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431508#comment-16431508
 ] 

ASF GitHub Bot commented on ARROW-2387:
---

cpcloud commented on issue #1832: ARROW-2387: flip test for rescale loss if 
value < 0
URL: https://github.com/apache/arrow/pull/1832#issuecomment-379927866
 
 
   @bwo Looks like this is failing for unrelated reasons, can you rebase on top 
of master and push again? Then we can merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> negative decimal values get spurious rescaling error
> 
>
> Key: ARROW-2387
> URL: https://issues.apache.org/jira/browse/ARROW-2387
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: ben w
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> $ python
> Python 2.7.12 (default, Nov 20 2017, 18:23:56)
> [GCC 5.4.0 20160609] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow as pa, decimal
> >>> one = decimal.Decimal('1.00')
> >>> neg_one = decimal.Decimal('-1.00')
> >>> pa.array([one], pa.decimal128(24, 12))
> 
> [
> Decimal('1.')
> ]
> >>> pa.array([neg_one], pa.decimal128(24, 12))
> Traceback (most recent call last):
> File "", line 1, in 
> File "array.pxi", line 181, in pyarrow.lib.array
> File "array.pxi", line 36, in pyarrow.lib._sequence_to_array
> File "error.pxi", line 77, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from 
> original scale of 6 to new scale of 12 would cause data loss
> >>> pa.__version__
> '0.9.0'
> {code}
> not only is the error spurious, the decimal value has been multiplied by one 
> million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still 
> pretty strange to me).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2433) [Rust] Add Builder.push_slice(&[T])

2018-04-09 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2433:
-

 Summary: [Rust] Add Builder.push_slice(&[T])
 Key: ARROW-2433
 URL: https://issues.apache.org/jira/browse/ARROW-2433
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.10.0


When populating a Builder with Utf8 data it is more efficient to push whole 
strings as &[u8] rather than one byte at a time.

The same optimization works for all other types too.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2423:
--
Labels: pull-request-available  (was: )

> [Python] PyArrow datatypes raise ValueError on equality checks against 
> non-PyArrow objects
> --
>
> Key: ARROW-2423
> URL: https://issues.apache.org/jira/browse/ARROW-2423
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Minor
>  Labels: pull-request-available
>
> Checking a PyArrow datatype object for equality with non-PyArrow datatypes 
> causes a `ValueError` to be raised, rather than either returning a True/False 
> value, or returning 
> [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented]
>  if the comparison isn't implemented.
> E.g. attempting to call:
> {code:java}
> import pyarrow
> pyarrow.int32() == 'foo'
> {code}
> results in:
> {code:java}
> Traceback (most recent call last):
>   File "types.pxi", line 1221, in pyarrow.lib.type_for_alias
> KeyError: 'foo'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "t.py", line 2, in 
> pyarrow.int32() == 'foo'
>   File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__
>   File "types.pxi", line 113, in pyarrow.lib.DataType.equals
>   File "types.pxi", line 1223, in pyarrow.lib.type_for_alias
> ValueError: No type alias for foo
> {code}
> The expected outcome for the above would be for the comparison to return 
> `False`, as that's the general behaviour for comparisons between objects of 
> different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431509#comment-16431509
 ] 

ASF GitHub Bot commented on ARROW-2423:
---

andygrove opened a new pull request #1871: ARROW-2423: [Rust] Add 
Builder.push_slice(&[T])
URL: https://github.com/apache/arrow/pull/1871
 
 
   This PR also fixes another instance of memory not being released.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] PyArrow datatypes raise ValueError on equality checks against 
> non-PyArrow objects
> --
>
> Key: ARROW-2423
> URL: https://issues.apache.org/jira/browse/ARROW-2423
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Minor
>  Labels: pull-request-available
>
> Checking a PyArrow datatype object for equality with non-PyArrow datatypes 
> causes a `ValueError` to be raised, rather than either returning a True/False 
> value, or returning 
> [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented]
>  if the comparison isn't implemented.
> E.g. attempting to call:
> {code:java}
> import pyarrow
> pyarrow.int32() == 'foo'
> {code}
> results in:
> {code:java}
> Traceback (most recent call last):
>   File "types.pxi", line 1221, in pyarrow.lib.type_for_alias
> KeyError: 'foo'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "t.py", line 2, in 
> pyarrow.int32() == 'foo'
>   File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__
>   File "types.pxi", line 113, in pyarrow.lib.DataType.equals
>   File "types.pxi", line 1223, in pyarrow.lib.type_for_alias
> ValueError: No type alias for foo
> {code}
> The expected outcome for the above would be for the comparison to return 
> `False`, as that's the general behaviour for comparisons between objects of 
> different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2433) [Rust] Add Builder.push_slice(&[T])

2018-04-09 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431510#comment-16431510
 ] 

Andy Grove commented on ARROW-2433:
---

PR: https://github.com/apache/arrow/pull/1871

> [Rust] Add Builder.push_slice(&[T])
> ---
>
> Key: ARROW-2433
> URL: https://issues.apache.org/jira/browse/ARROW-2433
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.10.0
>
>
> When populating a Builder with Utf8 data it is more efficient to push 
> whole strings as &[u8] rather than one byte at a time.
> The same optimization works for all other types too.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2426) [CI] glib build failure

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2426:
--
Labels: pull-request-available  (was: )

> [CI] glib build failure
> ---
>
> Key: ARROW-2426
> URL: https://issues.apache.org/jira/browse/ARROW-2426
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> The glib build on Travis-CI fails:
> [https://travis-ci.org/apache/arrow/jobs/364123364#L6840]
> {code}
> ==> Installing gobject-introspection
> ==> Downloading 
> https://homebrew.bintray.com/bottles/gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
> ==> Pouring gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
>   /usr/local/Cellar/gobject-introspection/1.56.0_1: 173 files, 9.8MB
> Installing gobject-introspection has failed!
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431613#comment-16431613
 ] 

ASF GitHub Bot commented on ARROW-2423:
---

paddyhoran commented on issue #1871: ARROW-2423: [Rust] Add 
Builder.push_slice(&[T])
URL: https://github.com/apache/arrow/pull/1871#issuecomment-379952111
 
 
   @andygrove just noticed that the jira for this one is 2433 not 2423


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] PyArrow datatypes raise ValueError on equality checks against 
> non-PyArrow objects
> --
>
> Key: ARROW-2423
> URL: https://issues.apache.org/jira/browse/ARROW-2423
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Minor
>  Labels: pull-request-available
>
> Checking a PyArrow datatype object for equality with non-PyArrow datatypes 
> causes a `ValueError` to be raised, rather than either returning a True/False 
> value, or returning 
> [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented]
>  if the comparison isn't implemented.
> E.g. attempting to call:
> {code:java}
> import pyarrow
> pyarrow.int32() == 'foo'
> {code}
> results in:
> {code:java}
> Traceback (most recent call last):
>   File "types.pxi", line 1221, in pyarrow.lib.type_for_alias
> KeyError: 'foo'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "t.py", line 2, in 
> pyarrow.int32() == 'foo'
>   File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__
>   File "types.pxi", line 113, in pyarrow.lib.DataType.equals
>   File "types.pxi", line 1223, in pyarrow.lib.type_for_alias
> ValueError: No type alias for foo
> {code}
> The expected outcome for the above would be for the comparison to return 
> `False`, as that's the general behaviour for comparisons between objects of 
> different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-2434:
--

 Summary: [Rust] Add windows support
 Key: ARROW-2434
 URL: https://issues.apache.org/jira/browse/ARROW-2434
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
 Fix For: 0.10.0


Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431621#comment-16431621
 ] 

ASF GitHub Bot commented on ARROW-2434:
---

paddyhoran opened a new pull request #1873: ARROW-2434: [Rust] Add windows 
support
URL: https://github.com/apache/arrow/pull/1873
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add windows support
> --
>
> Key: ARROW-2434
> URL: https://issues.apache.org/jira/browse/ARROW-2434
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-09 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-2435:
-

 Summary: [Rust] Add memory pool abstraction.
 Key: ARROW-2435
 URL: https://issues.apache.org/jira/browse/ARROW-2435
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.9.0
Reporter: Renjie Liu


Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2434:
--
Labels: pull-request-available  (was: )

> [Rust] Add windows support
> --
>
> Key: ARROW-2434
> URL: https://issues.apache.org/jira/browse/ARROW-2434
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431640#comment-16431640
 ] 

ASF GitHub Bot commented on ARROW-2434:
---

paddyhoran commented on issue #1873: ARROW-2434: [Rust] Add windows support
URL: https://github.com/apache/arrow/pull/1873#issuecomment-379958446
 
 
   ARROW-2436 will add CI for windows


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add windows support
> --
>
> Key: ARROW-2434
> URL: https://issues.apache.org/jira/browse/ARROW-2434
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431703#comment-16431703
 ] 

ASF GitHub Bot commented on ARROW-2434:
---

andygrove commented on issue #1873: ARROW-2434: [Rust] Add windows support
URL: https://github.com/apache/arrow/pull/1873#issuecomment-379973257
 
 
   Hi @paddyhoran I tried to assign to you in JIRA but couldn't find your 
username on there. I think you need to create yourself a JIRA account first and 
then you should be able to self-assign.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add windows support
> --
>
> Key: ARROW-2434
> URL: https://issues.apache.org/jira/browse/ARROW-2434
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431015#comment-16431015
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] 
JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector 
Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180185358
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+import org.apache.arrow.vector.*;
+import org.apache.arrow.vector.types.DateUnit;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import java.nio.charset.Charset;
+import java.sql.*;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
+import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+
+
+/**
+ * Class that does most of the work to convert JDBC ResultSet data into Arrow 
columnar format Vector objects.
+ *
+ * @since 0.10.0
+ */
+public class JdbcToArrowUtils {
+
+private static final int DEFAULT_BUFFER_SIZE = 256;
+
+/**
+ * Create Arrow {@link Schema} object for the given JDBC {@link 
ResultSetMetaData}.
+ *
+ * This method currently performs following type mapping for JDBC SQL data 
types to corresponding Arrow data types.
+ *
+ * CHAR--> ArrowType.Utf8
+ * NCHAR   --> ArrowType.Utf8
+ * VARCHAR --> ArrowType.Utf8
+ * NVARCHAR --> ArrowType.Utf8
+ * LONGVARCHAR --> ArrowType.Utf8
+ * LONGNVARCHAR --> ArrowType.Utf8
+ * NUMERIC --> ArrowType.Decimal(precision, scale)
+ * DECIMAL --> ArrowType.Decimal(precision, scale)
+ * BIT --> ArrowType.Bool
+ * TINYINT --> ArrowType.Int(8, signed)
+ * SMALLINT --> ArrowType.Int(16, signed)
+ * INTEGER --> ArrowType.Int(32, signed)
+ * BIGINT --> ArrowType.Int(64, signed)
+ * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+ * BINARY --> ArrowType.Binary
+ * VARBINARY --> ArrowType.Binary
+ * LONGVARBINARY --> ArrowType.Binary
+ * DATE --> ArrowType.Date(DateUnit.MILLISECOND)
+ * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
+ * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
+ * CLOB --> ArrowType.Utf8
+ * BLOB --> ArrowType.Binary
+ *
+ * @param rsmd
+ * @return {@link Schema}
+ * @throws SQLException
+ */
+public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws 
SQLException {
+
+assert rsmd != null;
+
+//ImmutableList.Builder fields = ImmutableList.builder();
+List fields = new ArrayList<>();
+int columnCount = rsmd.getColumnCount();
+for (int i = 1; i <= columnCount; i++) {
+String columnName = rsmd.getColumnName(i);
+switch (rsmd.getColumnType(i)) {
+case Types.BOOLEAN:
+case Types.BIT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Bool()), null));
+break;
+case Types.TINYINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(8, true)), null));
+break;
+case Types.SMALLINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(16, true)), null));
+break;
+case Types.INTEGER:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(32, true)), null));
+break;
+case 

[jira] [Created] (ARROW-2431) [Rust] Schema fidelity

2018-04-09 Thread Maximilian Roos (JIRA)
Maximilian Roos created ARROW-2431:
--

 Summary: [Rust] Schema fidelity
 Key: ARROW-2431
 URL: https://issues.apache.org/jira/browse/ARROW-2431
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Maximilian Roos


ref [https://github.com/apache/arrow/pull/1829#discussion_r179248743]

Currently our Traits are not loyal to 
[https://arrow.apache.org/docs/metadata.html].

For example, we nest `Field`s in the `DataType` (aka `type`) attribute of the 
parent Field (rather than having the type be `Struct` and a separate `Children` 
parameter)

 

Is this OK, assuming that we can read and write accurate schemas? Or should we 
move towards having the Schema trait be consistent with the metadata spec?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431128#comment-16431128
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou closed pull request #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/compute/kernels/cast.cc 
b/cpp/src/arrow/compute/kernels/cast.cc
index eaebd7cef..bfd519d18 100644
--- a/cpp/src/arrow/compute/kernels/cast.cc
+++ b/cpp/src/arrow/compute/kernels/cast.cc
@@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
-
 // Ensure that intraday milliseconds have been zeroed out
 auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
+
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
   }
-  out_data[i] -= remainder;
-  bit_reader.Next();
 }
   }
 };
diff --git a/python/pyarrow/tests/test_convert_pandas.py 
b/python/pyarrow/tests/test_convert_pandas.py
index c6e2b75be..de6120176 100644
--- a/python/pyarrow/tests/test_convert_pandas.py
+++ b/python/pyarrow/tests/test_convert_pandas.py
@@ -807,6 +807,44 @@ def test_datetime64_to_date32(self):
 
 assert arr2.equals(arr.cast('date32'))
 
+@pytest.mark.parametrize('mask', [
+None,
+np.ones(3),
+np.array([True, False, False]),
+])
+def test_pandas_datetime_to_date64(self, mask):
+s = pd.to_datetime([
+'2018-05-10T00:00:00',
+'2018-05-11T00:00:00',
+'2018-05-12T00:00:00',
+])
+arr = pa.Array.from_pandas(s, type=pa.date64(), mask=mask)
+
+data = np.array([
+date(2018, 5, 10),
+date(2018, 5, 11),
+date(2018, 5, 12)
+])
+expected = pa.array(data, mask=mask, type=pa.date64())
+
+assert arr.equals(expected)
+
+@pytest.mark.parametrize('mask', [
+None,
+np.ones(3),
+np.array([True, False, False])
+])
+def test_pandas_datetime_to_date64_failures(self, mask):
+s = pd.to_datetime([
+'2018-05-10T10:24:01',
+'2018-05-11T10:24:01',
+'2018-05-12T10:24:01',
+])
+
+expected_msg = 'Timestamp value had non-zero intraday milliseconds'
+with pytest.raises(pa.ArrowInvalid, msg=expected_msg):
+pa.Array.from_pandas(s, type=pa.date64(), mask=mask)
+
 def test_date_infer(self):
 df = pd.DataFrame({
 'date': [date(2000, 1, 1),


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> 

[jira] [Resolved] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2391.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1859
[https://github.com/apache/arrow/pull/1859]

> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-09 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2427:
-

 Summary: [C++] ReadAt implementations suboptimal
 Key: ARROW-2427
 URL: https://issues.apache.org/jira/browse/ARROW-2427
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


The {{ReadAt}} implementations for at least {{OSFile}} and {{MemoryMappedFile}} 
take the file lock and seek. They could instead read directly from the given 
offset, allowing concurrent I/O from multiple threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2353) Test correctness of built wheel on AppVeyor

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430283#comment-16430283
 ] 

ASF GitHub Bot commented on ARROW-2353:
---

pitrou closed pull request #1793: ARROW-2353: [CI] Check correctness of built 
wheel on AppVeyor
URL: https://github.com/apache/arrow/pull/1793
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/ci/msvc-build.bat b/ci/msvc-build.bat
index cec14297e..678e29d58 100644
--- a/ci/msvc-build.bat
+++ b/ci/msvc-build.bat
@@ -104,12 +104,12 @@ cmake -G "%GENERATOR%" ^
 cmake --build . --target install --config %CONFIGURATION%  || exit /B
 
 @rem Needed so python-test.exe works
-set OLD_PYTHONPATH=%PYTHONPATH%
-set 
PYTHONPATH=%CONDA_PREFIX%\Lib;%CONDA_PREFIX%\Lib\site-packages;%CONDA_PREFIX%\python35.zip;%CONDA_PREFIX%\DLLs;%CONDA_PREFIX%;%PYTHONPATH%
+set OLD_PYTHONHOME=%PYTHONHOME%
+set PYTHONHOME=%CONDA_PREFIX%
 
 ctest -VV  || exit /B
 
-set PYTHONPATH=%OLD_PYTHONPATH%
+set PYTHONHOME=%OLD_PYTHONHOME%
 popd
 
 @rem Build parquet-cpp
@@ -124,7 +124,8 @@ cmake -G "%GENERATOR%" ^
  -DCMAKE_INSTALL_PREFIX=%PARQUET_HOME% ^
  -DCMAKE_BUILD_TYPE=%CONFIGURATION% ^
  -DPARQUET_BOOST_USE_SHARED=OFF ^
- -DPARQUET_BUILD_TESTS=off .. || exit /B
+ -DPARQUET_BUILD_TESTS=OFF ^
+ .. || exit /B
 cmake --build . --target install --config %CONFIGURATION% || exit /B
 popd
 
@@ -135,13 +136,39 @@ popd
 pushd python
 
 set PYARROW_CXXFLAGS=/WX
-python setup.py build_ext --with-parquet --bundle-arrow-cpp 
--with-static-boost ^
+set PYARROW_CMAKE_GENERATOR=%GENERATOR%
+set PYARROW_BUNDLE_ARROW_CPP=ON
+set PYARROW_BUNDLE_BOOST=OFF
+set PYARROW_WITH_STATIC_BOOST=ON
+set PYARROW_WITH_PARQUET=ON
+
+python setup.py build_ext ^
 install -q --single-version-externally-managed --record=record.text ^
-bdist_wheel || exit /B
+bdist_wheel -q || exit /B
+
+for /F %%i in ('dir /B /S dist\*.whl') do set WHEEL_PATH=%%i
 
 @rem Test directly from installed location
 
+@rem Needed for test_cython
 SET PYARROW_PATH=%CONDA_PREFIX%\Lib\site-packages\pyarrow
 py.test -r sxX --durations=15 -v %PYARROW_PATH% --parquet || exit /B
 
 popd
+
+@rem Test pyarrow wheel from pristine environment
+
+call deactivate
+
+conda create -n wheel_test -q -y python=%PYTHON%
+
+call activate wheel_test
+
+pip install %WHEEL_PATH% || exit /B
+
+python -c "import pyarrow" || exit /B
+python -c "import pyarrow.parquet" || exit /B
+
+pip install pandas pytest pytest-faulthandler
+
+py.test -r sxX --durations=15 --pyargs pyarrow.tests || exit /B
diff --git a/python/CMakeLists.txt b/python/CMakeLists.txt
index cb3cd7023..fcc1d3cdc 100644
--- a/python/CMakeLists.txt
+++ b/python/CMakeLists.txt
@@ -141,13 +141,18 @@ endif()
 # For any C code, use the same flags.
 set(CMAKE_C_FLAGS "${CMAKE_CXX_FLAGS}")
 
-# set compile output directory
-string (TOLOWER ${CMAKE_BUILD_TYPE} BUILD_SUBDIR_NAME)
+if (MSVC)
+  # MSVC makes its own output directories based on the build configuration
+  set(BUILD_SUBDIR_NAME "")
+else()
+  # Set compile output directory
+  string (TOLOWER ${CMAKE_BUILD_TYPE} BUILD_SUBDIR_NAME)
+endif()
 
 # If build in-source, create the latest symlink. If build out-of-source, which 
is
 # preferred, simply output the binaries in the build folder
 if (${CMAKE_SOURCE_DIR} STREQUAL ${CMAKE_CURRENT_BINARY_DIR})
-  set(BUILD_OUTPUT_ROOT_DIRECTORY 
"${CMAKE_CURRENT_BINARY_DIR}/build/${BUILD_SUBDIR_NAME}/")
+  set(BUILD_OUTPUT_ROOT_DIRECTORY 
"${CMAKE_CURRENT_BINARY_DIR}/build/${BUILD_SUBDIR_NAME}")
   # Link build/latest to the current build directory, to avoid developers
   # accidentally running the latest debug build when in fact they're building
   # release builds.
@@ -155,15 +160,10 @@ if (${CMAKE_SOURCE_DIR} STREQUAL 
${CMAKE_CURRENT_BINARY_DIR})
   if (NOT APPLE)
 set(MORE_ARGS "-T")
   endif()
-EXECUTE_PROCESS(COMMAND ln ${MORE_ARGS} -sf ${BUILD_OUTPUT_ROOT_DIRECTORY}
-  ${CMAKE_CURRENT_BINARY_DIR}/build/latest)
+  EXECUTE_PROCESS(COMMAND ln ${MORE_ARGS} -sf ${BUILD_OUTPUT_ROOT_DIRECTORY}
+${CMAKE_CURRENT_BINARY_DIR}/build/latest)
 else()
-  if (MSVC)
-# MSVC makes its own output directories based on the build configuration
-set(BUILD_OUTPUT_ROOT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/")
-  else()
-set(BUILD_OUTPUT_ROOT_DIRECTORY 
"${CMAKE_CURRENT_BINARY_DIR}/${BUILD_SUBDIR_NAME}/")
-  endif()
+  set(BUILD_OUTPUT_ROOT_DIRECTORY 
"${CMAKE_CURRENT_BINARY_DIR}/${BUILD_SUBDIR_NAME}")
 endif()
 
 message(STATUS "Build output directory: ${BUILD_OUTPUT_ROOT_DIRECTORY}")
diff --git a/python/pyarrow/tests/test_feather.py 
b/python/pyarrow/tests/test_feather.py
index a14673f9f..171f28dfa 100644
--- a/python/pyarrow/tests/test_feather.py
+++ 

[jira] [Commented] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430288#comment-16430288
 ] 

ASF GitHub Bot commented on ARROW-2408:
---

xhochy closed pull request #1847: ARROW-2408: [Rust] Remove build warnings
URL: https://github.com/apache/arrow/pull/1847
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/ci/travis_script_rust.sh b/ci/travis_script_rust.sh
index 4405d036d..6a8ecf081 100755
--- a/ci/travis_script_rust.sh
+++ b/ci/travis_script_rust.sh
@@ -23,8 +23,12 @@ RUST_DIR=${TRAVIS_BUILD_DIR}/rust
 
 pushd $RUST_DIR
 
+# raises on any formatting errors
 rustup component add rustfmt-preview
 cargo fmt --all -- --write-mode=diff
+# raises on any warnings
+cargo rustc -- -D warnings 
+
 cargo build
 cargo test
 
diff --git a/rust/examples/array_from_builder.rs 
b/rust/examples/array_from_builder.rs
index 3a273a64d..ea1ecec45 100644
--- a/rust/examples/array_from_builder.rs
+++ b/rust/examples/array_from_builder.rs
@@ -18,7 +18,6 @@
 extern crate arrow;
 
 use arrow::array::*;
-use arrow::buffer::*;
 use arrow::builder::*;
 
 fn main() {
diff --git a/rust/src/buffer.rs b/rust/src/buffer.rs
index ab90a5b08..1f2ec6c8d 100644
--- a/rust/src/buffer.rs
+++ b/rust/src/buffer.rs
@@ -18,7 +18,6 @@
 use bytes::Bytes;
 use libc;
 use std::mem;
-use std::ptr;
 use std::slice;
 
 use super::memory::*;
diff --git a/rust/src/builder.rs b/rust/src/builder.rs
index 1cc024042..c8ba27477 100644
--- a/rust/src/builder.rs
+++ b/rust/src/builder.rs
@@ -15,11 +15,9 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use bytes::Bytes;
 use libc;
 use std::mem;
 use std::ptr;
-use std::slice;
 
 use super::buffer::*;
 use super::memory::*;
diff --git a/rust/src/datatypes.rs b/rust/src/datatypes.rs
index 85278f7bb..ac2c2c6ea 100644
--- a/rust/src/datatypes.rs
+++ b/rust/src/datatypes.rs
@@ -16,7 +16,6 @@
 // under the License.
 
 use super::error::ArrowError;
-use serde_json;
 use serde_json::Value;
 use std::fmt;
 
@@ -241,6 +240,7 @@ impl fmt::Display for Schema {
 #[cfg(test)]
 mod tests {
 use super::*;
+use serde_json;
 
 #[test]
 fn create_struct_type() {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2419) [Site] Website generation depends on local timezone

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430291#comment-16430291
 ] 

ASF GitHub Bot commented on ARROW-2419:
---

xhochy closed pull request #1858: ARROW-2419: [Site] Hard-code timezone
URL: https://github.com/apache/arrow/pull/1858
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/site/_config.yml b/site/_config.yml
index cbcf97dd3..b9dd72303 100644
--- a/site/_config.yml
+++ b/site/_config.yml
@@ -18,6 +18,7 @@ permalink: /blog/:year/:month/:day/:title/
 repository: https://github.com/apache/arrow
 destination: build
 excerpt_separator: ""
+timezone: America/New_York
 
 kramdown:
   input: GFM


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Site] Website generation depends on local timezone
> ---
>
> Key: ARROW-2419
> URL: https://issues.apache.org/jira/browse/ARROW-2419
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> See discussion at 
> https://github.com/apache/arrow/pull/1853#issuecomment-379670199



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2419) [Site] Website generation depends on local timezone

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2419.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1858
[https://github.com/apache/arrow/pull/1858]

> [Site] Website generation depends on local timezone
> ---
>
> Key: ARROW-2419
> URL: https://issues.apache.org/jira/browse/ARROW-2419
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> See discussion at 
> https://github.com/apache/arrow/pull/1853#issuecomment-379670199



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430299#comment-16430299
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on a change in pull request #1784: ARROW-2328: [C++] Fixed 
and unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r180032970
 
 

 ##
 File path: cpp/src/arrow/ipc/test-common.h
 ##
 @@ -223,15 +223,17 @@ Status MakeRandomBinaryArray(int64_t length, bool 
include_nulls, MemoryPool* poo
 if (include_nulls && values_index == 0) {
   RETURN_NOT_OK(builder.AppendNull());
 } else {
-  const std::string& value = values[values_index];
+  const std::string value =
+  i < int64_t(values.size()) ? values[values_index] : 
std::to_string(i);
 
 Review comment:
   The original random strings were not very random, just a repetition of the 
same few strings. I admit the new ones aren't very random either, in particular 
there are no repetitions which you would expect in a in real data. I probably 
should have added a new method (MakeRangeBinaryArray?).
   Of course you don't want true random data either in a test.
   If I revert the change I wouldn't be confident that my tests are still 
testing anything, the repetition might just align in a lucky pattern.
   You ok with leaving it, I think it is better than the previous code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430304#comment-16430304
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

pitrou commented on a change in pull request #1784: ARROW-2328: [C++] Fixed and 
unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r180034330
 
 

 ##
 File path: cpp/src/arrow/ipc/test-common.h
 ##
 @@ -223,15 +223,17 @@ Status MakeRandomBinaryArray(int64_t length, bool 
include_nulls, MemoryPool* poo
 if (include_nulls && values_index == 0) {
   RETURN_NOT_OK(builder.AppendNull());
 } else {
-  const std::string& value = values[values_index];
+  const std::string value =
+  i < int64_t(values.size()) ? values[values_index] : 
std::to_string(i);
 
 Review comment:
   I'd rather revert this change. If we want better random generation of binary 
arrays we should do it more thoroughly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2353) Test correctness of built wheel on AppVeyor

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2353:
--
Fix Version/s: (was: JS-0.4.0)

> Test correctness of built wheel on AppVeyor
> ---
>
> Key: ARROW-2353
> URL: https://issues.apache.org/jira/browse/ARROW-2353
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2408) [Rust] It should be possible to get a [T] from Builder

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2408.

Resolution: Fixed

Issue resolved by pull request 1847
[https://github.com/apache/arrow/pull/1847]

> [Rust] It should be possible to get a [T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2353) Test correctness of built wheel on AppVeyor

2018-04-09 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2353.
---
   Resolution: Fixed
Fix Version/s: JS-0.4.0

Issue resolved by pull request 1793
[https://github.com/apache/arrow/pull/1793]

> Test correctness of built wheel on AppVeyor
> ---
>
> Key: ARROW-2353
> URL: https://issues.apache.org/jira/browse/ARROW-2353
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2416) [C++] Support system libprotobuf

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430199#comment-16430199
 ] 

ASF GitHub Bot commented on ARROW-2416:
---

xhochy closed pull request #1854: ARROW-2416: [C++] Support system libprotobuf
URL: https://github.com/apache/arrow/pull/1854
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
index a61bcad83..b91368585 100644
--- a/cpp/CMakeLists.txt
+++ b/cpp/CMakeLists.txt
@@ -139,6 +139,10 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL 
"${CMAKE_CURRENT_SOURCE_DIR}")
 "Use vendored Boost instead of existing Boost"
 OFF)
 
+  option(ARROW_PROTOBUF_USE_SHARED
+"Rely on Protocol Buffers shared libraries where relevant"
+OFF)
+
   option(ARROW_PYTHON
 "Build the Arrow CPython extensions"
 OFF)
@@ -531,6 +535,7 @@ endif(UNIX)
 # Linker and Dependencies
 
 
+set(ARROW_LINK_LIBS)
 set(ARROW_STATIC_LINK_LIBS)
 
 if (ARROW_WITH_BROTLI)
@@ -568,8 +573,16 @@ endif()
 if (ARROW_ORC)
   SET(ARROW_STATIC_LINK_LIBS
 orc
-protobuf
 ${ARROW_STATIC_LINK_LIBS})
+  if (ARROW_PROTOBUF_USE_SHARED)
+SET(ARROW_LINK_LIBS
+  protobuf
+  ${ARROW_LINK_LIBS})
+  else()
+SET(ARROW_STATIC_LINK_LIBS
+  protobuf
+  ${ARROW_STATIC_LINK_LIBS})
+  endif()
 endif()
 
 if (ARROW_STATIC_LINK_LIBS)
@@ -583,7 +596,8 @@ set(ARROW_BENCHMARK_LINK_LIBS
   ${ARROW_STATIC_LINK_LIBS})
 
 set(ARROW_LINK_LIBS
-  ${ARROW_STATIC_LINK_LIBS})
+  ${ARROW_STATIC_LINK_LIBS}
+  ${ARROW_LINK_LIBS})
 
 set(ARROW_SHARED_PRIVATE_LINK_LIBS
   ${BOOST_SYSTEM_LIBRARY}
diff --git a/cpp/cmake_modules/FindProtobuf.cmake 
b/cpp/cmake_modules/FindProtobuf.cmake
index a42f4493a..9591bd1eb 100644
--- a/cpp/cmake_modules/FindProtobuf.cmake
+++ b/cpp/cmake_modules/FindProtobuf.cmake
@@ -36,15 +36,23 @@ find_path (PROTOBUF_INCLUDE_DIR 
google/protobuf/io/coded_stream.h HINTS
   NO_DEFAULT_PATH
   PATH_SUFFIXES "include")
 
+set (lib_dirs "lib")
+if (EXISTS "${_protobuf_path}/lib64")
+  set (lib_dirs "lib64" ${lib_dirs})
+endif ()
+if (EXISTS "${_protobuf_path}/lib/${CMAKE_LIBRARY_ARCHITECTURE}")
+  set (lib_dirs "lib/${CMAKE_LIBRARY_ARCHITECTURE}" ${lib_dirs})
+endif ()
+
 find_library (PROTOBUF_LIBRARY NAMES protobuf PATHS
   ${_protobuf_path}
   NO_DEFAULT_PATH
-  PATH_SUFFIXES "lib")
+  PATH_SUFFIXES ${lib_dirs})
 
 find_library (PROTOC_LIBRARY NAMES protoc PATHS
   ${_protobuf_path}
   NO_DEFAULT_PATH
-  PATH_SUFFIXES "lib")
+  PATH_SUFFIXES ${lib_dirs})
 
 find_program(PROTOBUF_EXECUTABLE protoc HINTS
   ${_protobuf_path}
@@ -53,6 +61,8 @@ find_program(PROTOBUF_EXECUTABLE protoc HINTS
 
 if (PROTOBUF_INCLUDE_DIR AND PROTOBUF_LIBRARY AND PROTOC_LIBRARY AND 
PROTOBUF_EXECUTABLE)
   set (PROTOBUF_FOUND TRUE)
+  set (PROTOBUF_SHARED_LIB ${PROTOBUF_LIBRARY})
+  set (PROTOC_SHARED_LIB ${PROTOC_LIBRARY})
   get_filename_component (PROTOBUF_LIBS ${PROTOBUF_LIBRARY} PATH)
   set (PROTOBUF_LIB_NAME protobuf)
   set (PROTOC_LIB_NAME protoc)
@@ -64,7 +74,9 @@ endif ()
 
 if (PROTOBUF_FOUND)
   message (STATUS "Found the Protobuf headers: ${PROTOBUF_INCLUDE_DIR}")
+  message (STATUS "Found the Protobuf shared library: ${PROTOBUF_SHARED_LIB}")
   message (STATUS "Found the Protobuf library: ${PROTOBUF_STATIC_LIB}")
+  message (STATUS "Found the Protoc shared library: ${PROTOC_SHARED_LIB}")
   message (STATUS "Found the Protoc library: ${PROTOC_STATIC_LIB}")
   message (STATUS "Found the Protoc executable: ${PROTOBUF_EXECUTABLE}")
 else()
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index be9d55c53..129174c8d 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -915,8 +915,13 @@ if (ARROW_ORC)
   endif ()
 
   include_directories (SYSTEM ${PROTOBUF_INCLUDE_DIR})
-  ADD_THIRDPARTY_LIB(protobuf
-STATIC_LIB ${PROTOBUF_STATIC_LIB})
+  if (ARROW_PROTOBUF_USE_SHARED)
+ADD_THIRDPARTY_LIB(protobuf
+  SHARED_LIB ${PROTOBUF_LIBRARY})
+  else ()
+ADD_THIRDPARTY_LIB(protobuf
+  STATIC_LIB ${PROTOBUF_STATIC_LIB})
+  endif ()
 
   if (PROTOBUF_VENDORED)
 add_dependencies (protobuf protobuf_ep)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support system libprotobuf
> 
>
> Key: ARROW-2416
> URL: 

[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430241#comment-16430241
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

pitrou commented on a change in pull request #1784: ARROW-2328: [C++] Fixed and 
unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r180016562
 
 

 ##
 File path: cpp/src/arrow/ipc/test-common.h
 ##
 @@ -223,15 +223,17 @@ Status MakeRandomBinaryArray(int64_t length, bool 
include_nulls, MemoryPool* poo
 if (include_nulls && values_index == 0) {
   RETURN_NOT_OK(builder.AppendNull());
 } else {
-  const std::string& value = values[values_index];
+  const std::string value =
+  i < int64_t(values.size()) ? values[values_index] : 
std::to_string(i);
 
 Review comment:
   Why did you need this change? Is this just a debugging leftover?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

2018-04-09 Thread Dave Challis (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430261#comment-16430261
 ] 

Dave Challis commented on ARROW-2406:
-

[~kszucs] My mistake, retested and noticed I was using an older env with 
pyarrow 0.8.0, looks like the issue was resolved in 0.9.0.

> [Python] Segfault when creating PyArrow table from Pandas for empty string 
> column when schema provided
> --
>
> Key: ARROW-2406
> URL: https://issues.apache.org/jira/browse/ARROW-2406
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Mac OS High Sierra
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Major
> Fix For: 0.9.0
>
>
> Minimal example to recreate:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema){code}
>  
> This causes the python interpreter to exit with "Segmentation fault: 11".
> The following examples all work without any issue:
> {code}
> # column 'a' is no longer empty
> df = pd.DataFrame({'a': ['foo']})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> {code}
> # column 'a' is empty, but no schema is specified
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> pa.Table.from_pandas(df)
> {code}
> {code}
> # column 'a' is empty, but no type 'str' specified in Pandas
> df = pd.DataFrame({'a': []})
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2418) [Rust] List builder fails due to memory not being reserved correctly

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2418.

Resolution: Fixed

Issue resolved by pull request 1857
[https://github.com/apache/arrow/pull/1857]

> [Rust] List builder fails due to memory not being reserved correctly
> 
>
> Key: ARROW-2418
> URL: https://issues.apache.org/jira/browse/ARROW-2418
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I didn't realize that BytesMut.put() doesn't automatically grow the 
> underlying buffer. Therefore the code fails if the data is large than the 
> pre-allocated buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2414) A variety of typos can be found

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430225#comment-16430225
 ] 

ASF GitHub Bot commented on ARROW-2414:
---

xhochy closed pull request #1850: ARROW-2414: Fix a variety of typos.
URL: https://github.com/apache/arrow/pull/1850
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/plasma/format/plasma.fbs b/cpp/src/plasma/format/plasma.fbs
index 66cb00f77..71e6f5c19 100644
--- a/cpp/src/plasma/format/plasma.fbs
+++ b/cpp/src/plasma/format/plasma.fbs
@@ -222,7 +222,7 @@ enum ObjectStatus:int {
   Remote,
   // Object is not stored in the system.
   Nonexistent,
-  // Object is currently transferred from a remote Plasma store the the local
+  // Object is currently transferred from a remote Plasma store the local
   // Plasma Store.
   Transfer
 }
diff --git a/format/Guidelines.md b/format/Guidelines.md
index ff3a63d9a..7b5f3a11b 100644
--- a/format/Guidelines.md
+++ b/format/Guidelines.md
@@ -32,4 +32,4 @@ Consumption of vectors should at least convert the 
unsupported input vectors to
 ## Extensibility
 An execution engine implementor can also extend their memory representation 
with their own vectors internally as long as they are never exposed. Before 
sending data to another system expecting Arrow data these custom vectors should 
be converted to a type that exist in the Arrow spec.
 An example of this is operating on compressed data.
-These custom vectors are not exchanged externaly and there is no support for 
custom metadata.
+These custom vectors are not exchanged externally and there is no support for 
custom metadata.
diff --git a/format/Layout.md b/format/Layout.md
index 963202f9f..2f4b3a77b 100644
--- a/format/Layout.md
+++ b/format/Layout.md
@@ -21,7 +21,7 @@
 
 ## Definitions / Terminology
 
-Since different projects have used differents words to describe various
+Since different projects have used different words to describe various
 concepts, here is a small glossary to help disambiguate.
 
 * Array: a sequence of values with known length all having the same type.
@@ -273,7 +273,7 @@ A list-array is represented by the combination of the 
following:
 
 The offsets array encodes a start position in the values array, and the length
 of the value in each slot is computed using the first difference with the next
-element in the offsets array. For example. the position and length of slot j is
+element in the offsets array. For example, the position and length of slot j is
 computed as:
 
 ```
@@ -610,7 +610,7 @@ reinterpreted as a non-nested array.
 Similar to structs, a particular child array may have a non-null slot
 even if the null bitmap of the parent union array indicates the slot is
 null.  Additionally, a child array may have a non-null slot even if
-the the types array indicates that a slot contains a different type at the 
index.
+the types array indicates that a slot contains a different type at the index.
 
 ## Dictionary encoding
 
diff --git a/format/Metadata.md b/format/Metadata.md
index 893b0a474..219df2124 100644
--- a/format/Metadata.md
+++ b/format/Metadata.md
@@ -68,7 +68,7 @@ table Field {
 The `type` is the logical type of the field. Nested types, such as List,
 Struct, and Union, have a sequence of child fields.
 
-a JSON representation of the schema is also provided:
+A JSON representation of the schema is also provided:
 Field:
 ```
 {
@@ -373,7 +373,7 @@ according to the child logical type (e.g. `List` vs. 
`List`).
 
 We specify two logical types for variable length bytes:
 
-* `Utf8` data is unicode values with UTF-8 encoding
+* `Utf8` data is Unicode values with UTF-8 encoding
 * `Binary` is any other variable length bytes
 
 These types both have the same memory layout as the nested type `List`,
diff --git 
a/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java 
b/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java
index 419be3429..e1149774c 100644
--- a/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java
+++ b/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java
@@ -275,7 +275,7 @@ public boolean transferBalance(final BufferLedger target) {
 }
 
 /**
- * Print the current ledger state to a the provided StringBuilder.
+ * Print the current ledger state to the provided StringBuilder.
  *
  * @param sbThe StringBuilder to populate.
  * @param indentThe level of indentation to position the data.
@@ -329,7 +329,7 @@ private void inc() {
  * should release its
  * ownership back to the AllocationManager
  *
- * @param decrement amout to decrease the reference count by
+   

[jira] [Assigned] (ARROW-2414) A variety of typos can be found

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2414:
--

Assignee: Bruce Mitchener

> A variety of typos can be found
> ---
>
> Key: ARROW-2414
> URL: https://issues.apache.org/jira/browse/ARROW-2414
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Bruce Mitchener
>Assignee: Bruce Mitchener
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This is just so that I can submit a PR for a bunch of typo fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2414) A variety of typos can be found

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2414.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1850
[https://github.com/apache/arrow/pull/1850]

> A variety of typos can be found
> ---
>
> Key: ARROW-2414
> URL: https://issues.apache.org/jira/browse/ARROW-2414
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Bruce Mitchener
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This is just so that I can submit a PR for a bunch of typo fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2418) [Rust] List builder fails due to memory not being reserved correctly

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430182#comment-16430182
 ] 

ASF GitHub Bot commented on ARROW-2418:
---

xhochy closed pull request #1857: ARROW-2418: [Rust] BUG FIX: reserve memory 
when building list
URL: https://github.com/apache/arrow/pull/1857
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/rust/src/list.rs b/rust/src/list.rs
index 461f8e645..a7b11a628 100644
--- a/rust/src/list.rs
+++ b/rust/src/list.rs
@@ -44,7 +44,9 @@ impl From for List {
 let mut buf = BytesMut::with_capacity(v.len() * 32);
 offsets.push(0_i32);
 v.iter().for_each(|s| {
-buf.put(s.as_bytes());
+let slice = s.as_bytes();
+buf.reserve(slice.len());
+buf.put(slice);
 offsets.push(buf.len() as i32);
 });
 List {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] List builder fails due to memory not being reserved correctly
> 
>
> Key: ARROW-2418
> URL: https://issues.apache.org/jira/browse/ARROW-2418
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I didn't realize that BytesMut.put() doesn't automatically grow the 
> underlying buffer. Therefore the code fails if the data is large than the 
> pre-allocated buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2416) [C++] Support system libprotobuf

2018-04-09 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2416.

Resolution: Fixed

Issue resolved by pull request 1854
[https://github.com/apache/arrow/pull/1854]

> [C++] Support system libprotobuf
> 
>
> Key: ARROW-2416
> URL: https://issues.apache.org/jira/browse/ARROW-2416
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

2018-04-09 Thread Dave Challis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Challis closed ARROW-2406.
---
   Resolution: Fixed
Fix Version/s: (was: 0.10.0)
   0.9.0

> [Python] Segfault when creating PyArrow table from Pandas for empty string 
> column when schema provided
> --
>
> Key: ARROW-2406
> URL: https://issues.apache.org/jira/browse/ARROW-2406
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Mac OS High Sierra
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Major
> Fix For: 0.9.0
>
>
> Minimal example to recreate:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema){code}
>  
> This causes the python interpreter to exit with "Segmentation fault: 11".
> The following examples all work without any issue:
> {code}
> # column 'a' is no longer empty
> df = pd.DataFrame({'a': ['foo']})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> {code}
> # column 'a' is empty, but no schema is specified
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> pa.Table.from_pandas(df)
> {code}
> {code}
> # column 'a' is empty, but no type 'str' specified in Pandas
> df = pd.DataFrame({'a': []})
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

2018-04-09 Thread Dave Challis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Challis updated ARROW-2406:

Affects Version/s: (was: 0.9.0)
   0.8.0

> [Python] Segfault when creating PyArrow table from Pandas for empty string 
> column when schema provided
> --
>
> Key: ARROW-2406
> URL: https://issues.apache.org/jira/browse/ARROW-2406
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Mac OS High Sierra
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Major
> Fix For: 0.9.0
>
>
> Minimal example to recreate:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema){code}
>  
> This causes the python interpreter to exit with "Segmentation fault: 11".
> The following examples all work without any issue:
> {code}
> # column 'a' is no longer empty
> df = pd.DataFrame({'a': ['foo']})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> {code}
> # column 'a' is empty, but no schema is specified
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> pa.Table.from_pandas(df)
> {code}
> {code}
> # column 'a' is empty, but no type 'str' specified in Pandas
> df = pd.DataFrame({'a': []})
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >