[jira] [Created] (ARROW-2268) Remove MD5 checksums from release process

2018-03-05 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2268:
---

 Summary: Remove MD5 checksums from release process
 Key: ARROW-2268
 URL: https://issues.apache.org/jira/browse/ARROW-2268
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Wes McKinney
 Fix For: 0.9.0


The ASF has changed its release policy for signatures and checksums to 
contraindicate the use of MD5 checksums: 
http://www.apache.org/dev/release-distribution#sigs-and-sums. We should remove 
this from our various release scripts prior to the 0.9.0 release



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2267) Rust bindings

2018-03-05 Thread Joshua Howard (JIRA)
Joshua Howard created ARROW-2267:


 Summary: Rust bindings
 Key: ARROW-2267
 URL: https://issues.apache.org/jira/browse/ARROW-2267
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Joshua Howard


Provide Rust bindings for Arrow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2266) [CI] Improve runtime of integration tests in Travis CI

2018-03-05 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2266:
---

 Summary: [CI] Improve runtime of integration tests in Travis CI
 Key: ARROW-2266
 URL: https://issues.apache.org/jira/browse/ARROW-2266
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration
Reporter: Wes McKinney


I was surprised to see that travis_script_integration.sh is taking over 25 
minutes to run. My only real guess about what's going on is that JVM startup 
time on these hosts is super slow.

I can think of some things we could do to make things better:

* Add debugging output so we can see what's slow
* Write a Java integration test handler that validates multiple files at once
* Generate a single set of binary files for each producer rather than 
regenerating them each time (so Java would only need to produce binary files 
once instead of 3 times like now)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: How to properly serialize subclasses of supported classes

2018-03-05 Thread Robert Nishihara
We just chatted offline. Should be fixed by
https://github.com/apache/arrow/pull/1704.

On Mon, Mar 5, 2018 at 3:42 AM Mitar  wrote:

> Hi!
>
> You mean, this explains why a subclass of list is not being matched? Maybe.
>
> But I do not get why my custom serialization for ndarray subclass is
> never called.
>
> Or how hard would it be to automatically serialize/deserialize into
> subclasses so that I would not have to have a custom serialization for
> ndarray but the existing ndarray serialization would work, casting it
> into a proper subclass.
>
>
> Mitar
>
> On Sun, Mar 4, 2018 at 2:39 PM, Robert Nishihara
>  wrote:
> > The issue is probably this line
> >
> >
> https://github.com/apache/arrow/blob/8b1c8118b017a941f0102709d72df7e5a9783aa4/cpp/src/arrow/python/python_to_arrow.cc#L504
> >
> > which uses PyList_Check instead of PyList_CheckExact. Changing it to the
> > exact form will cause it to use the custom serializer for subclasses of
> > list.
> >
> > On Sun, Mar 4, 2018 at 1:08 AM Mitar  wrote:
> >>
> >> Hi!
> >>
> >> I have a subclass of numpy and another of pandas which add a metadata
> >> attribute to them. Moreover, I have a subclass of typing.List as a
> >> Python generic with this metadata attribute as well.
> >>
> >> Now, it seems if I serialize this to plasma store and back I get
> >> standard numpy, pandas, or list back, respectively.
> >>
> >> My question is: how can I make it so that proper subclasses are
> >> returned, including the custom metadata attribute?
> >>
> >> I tried to use pyarrow_lib._default_serialization_context.register_type
> >> but it does not seem to work. Moreover, I still worry that even if I
> >> create a serialization for a custom class, if anyone makes a subclass
> >> and tries to store it plasma store they will get back the custom class
> >> and not a subclass.
> >>
> >> This is how I am testing:
> >>
> >>
> >>
> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/tests/test_plasma.py#L50
> >>
> >> And here is the code for custom numpy class and attempt at registering
> >> custom serialization:
> >>
> >>
> >>
> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/d3m_metadata/container/numpy.py#L135
> >>
> >> It looks like custom serialization is not called.
> >>
> >>
> >> Mitar
> >>
> >> --
> >> http://mitar.tnode.com/
> >> https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>


[jira] [Created] (ARROW-2265) Serializing subclasses of np.ndarray returns a np.ndarray.

2018-03-05 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-2265:
---

 Summary: Serializing subclasses of np.ndarray returns a np.ndarray.
 Key: ARROW-2265
 URL: https://issues.apache.org/jira/browse/ARROW-2265
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Robert Nishihara
Assignee: Robert Nishihara






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2264) Efficiently serialize numpy arrays with dtype of unicode fixed length string

2018-03-05 Thread Mitar (JIRA)
Mitar created ARROW-2264:


 Summary: Efficiently serialize numpy arrays with dtype of unicode 
fixed length string
 Key: ARROW-2264
 URL: https://issues.apache.org/jira/browse/ARROW-2264
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Mitar


Looking at the numpy array serialization code it seems that if I have a dtype 
like ">> np.array(['aaa', 'bbb'])}}
{{array(['aaa', 'bbb'], dtype='

[jira] [Created] (ARROW-2263) [Python] test_cython.py fails if pyarrow is not in import path (e.g. with inplace builds)

2018-03-05 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2263:
---

 Summary: [Python] test_cython.py fails if pyarrow is not in import 
path (e.g. with inplace builds)
 Key: ARROW-2263
 URL: https://issues.apache.org/jira/browse/ARROW-2263
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.9.0


see 

{code}
$ py.test pyarrow/tests/test_cython.py 
= test session starts 
=
platform linux -- Python 3.6.4, pytest-3.4.1, py-1.5.2, pluggy-0.6.0
rootdir: /home/wesm/code/arrow/python, inifile: setup.cfg
collected 1 item
  

pyarrow/tests/test_cython.py F  
[100%]

== FAILURES 
===
___ test_cython_api 
___

tmpdir = local('/tmp/pytest-of-wesm/pytest-3/test_cython_api0')

@pytest.mark.skipif(
'ARROW_HOME' not in os.environ,
reason='ARROW_HOME environment variable not defined')
def test_cython_api(tmpdir):
"""
Basic test for the Cython API.
"""
pytest.importorskip('Cython')

ld_path_default = os.path.join(os.environ['ARROW_HOME'], 'lib')

test_ld_path = os.environ.get('PYARROW_TEST_LD_PATH', ld_path_default)

with tmpdir.as_cwd():
# Set up temporary workspace
pyx_file = 'pyarrow_cython_example.pyx'
shutil.copyfile(os.path.join(here, pyx_file),
os.path.join(str(tmpdir), pyx_file))
# Create setup.py file
if os.name == 'posix':
compiler_opts = ['-std=c++11']
else:
compiler_opts = []
setup_code = setup_template.format(pyx_file=pyx_file,
   compiler_opts=compiler_opts,
   test_ld_path=test_ld_path)
with open('setup.py', 'w') as f:
f.write(setup_code)

# Compile extension module
subprocess.check_call([sys.executable, 'setup.py',
>  'build_ext', '--inplace'])

pyarrow/tests/test_cython.py:90: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _

popenargs = (['/home/wesm/miniconda/envs/arrow-dev/bin/python', 'setup.py', 
'build_ext', '--inplace'],)
kwargs = {}, retcode = 1
cmd = ['/home/wesm/miniconda/envs/arrow-dev/bin/python', 'setup.py', 
'build_ext', '--inplace']

def check_call(*popenargs, **kwargs):
"""Run command with arguments.  Wait for command to complete.  If
the exit code was zero then return, otherwise raise
CalledProcessError.  The CalledProcessError object will have the
return code in the returncode attribute.

The arguments are the same as for the call function.  Example:

check_call(["ls", "-l"])
"""
retcode = call(*popenargs, **kwargs)
if retcode:
cmd = kwargs.get("args")
if cmd is None:
cmd = popenargs[0]
>   raise CalledProcessError(retcode, cmd)
E   subprocess.CalledProcessError: Command 
'['/home/wesm/miniconda/envs/arrow-dev/bin/python', 'setup.py', 'build_ext', 
'--inplace']' returned non-zero exit status 1.

../../../miniconda/envs/arrow-dev/lib/python3.6/subprocess.py:291: 
CalledProcessError
 Captured stderr call 
-
Traceback (most recent call last):
  File "setup.py", line 7, in 
import pyarrow as pa
ModuleNotFoundError: No module named 'pyarrow'
== 1 failed in 0.23 seconds 
===
{code}

I encountered this bit of brittleness in a fresh install where I had not run 
{{setup.py develop}} nor {{setup.py install}} on my local pyarrow dev area



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Making a bugfix Arrow JS release

2018-03-05 Thread Wes McKinney
Brian mentioned on GitHub that it might be good to make a 0.3.1 JS
release due to bugs fixed since 0.3.0. Is there any other work that
needs to be merged before doing this?

Thanks
Wes


[jira] [Created] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray

2018-03-05 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2262:
--

 Summary: [Python] Support slicing on pyarrow.ChunkedArray
 Key: ARROW-2262
 URL: https://issues.apache.org/jira/browse/ARROW-2262
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 0.9.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2261) [GLib] Can't share the same memory in GArrowBuffer safely

2018-03-05 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2261:
---

 Summary: [GLib] Can't share the same memory in GArrowBuffer safely
 Key: ARROW-2261
 URL: https://issues.apache.org/jira/browse/ARROW-2261
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Affects Versions: 0.8.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.9.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2260) plasma_store should show usage

2018-03-05 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2260:
-

 Summary: plasma_store should show usage
 Key: ARROW-2260
 URL: https://issues.apache.org/jira/browse/ARROW-2260
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


Currently the options exposed by the {{plasma_store}} executable aren't very 
discoverable:

{code:bash}
$ plasma_store -h
please specify socket for incoming connections with -s switch
Abandon
(pyarrow) antoine@fsol:~/arrow/cpp (ARROW-2135-nan-conversion-when-casting *)$ 
plasma_store 
please specify socket for incoming connections with -s switch
Abandon
(pyarrow) antoine@fsol:~/arrow/cpp (ARROW-2135-nan-conversion-when-casting *)$ 
plasma_store --help
plasma_store: invalid option -- '-'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2259) [C++] importing pyarrow segfaults in boost_regex

2018-03-05 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2259:
-

 Summary: [C++] importing pyarrow segfaults in boost_regex
 Key: ARROW-2259
 URL: https://issues.apache.org/jira/browse/ARROW-2259
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


This is new (started on changeset bfac60dd73bffa5f7bcefc890486268036182278) and 
seems related to the use of boost_regex. I am building on Ubuntu 16.04 with the 
{{boost-cpp}} package from conda-forge.

Here is the gdb backtrace:

{code}
#0  std::string::_Rep::_M_is_leaked (this=this@entry=0xffe8)
at 
/home/msarahan/miniconda2/conda-bld/compilers_linux-64_1507259624353/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.h:3075
#1  0x71014856 in std::string::_Rep::_M_grab (this=0xffe8, 
__alloc1=..., __alloc2=...)
at 
/home/msarahan/miniconda2/conda-bld/compilers_linux-64_1507259624353/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.h:3126
#2  0x7101489d in std::basic_string::basic_string (this=0x7fffa0e0, __str=...)
at 
/home/msarahan/miniconda2/conda-bld/compilers_linux-64_1507259624353/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:613
#3  0x70a791fc in 
boost::re_detail_106600::cpp_regex_traits_char_layer::init() ()
   from 
/home/antoine/miniconda3/envs/pyarrow/bin/../lib/libboost_regex.so.1.66.0
#4  0x70ac1803 in 
boost::object_cache::do_get(boost::re_detail_106600::cpp_regex_traits_base const&, unsigned 
long) ()
   from 
/home/antoine/miniconda3/envs/pyarrow/bin/../lib/libboost_regex.so.1.66.0
#5  0x70acb62b in boost::basic_regex >::do_assign(char const*, char const*, unsigned 
int) () from 
/home/antoine/miniconda3/envs/pyarrow/bin/../lib/libboost_regex.so.1.66.0
#6  0x7182b6cb in boost::basic_regex >::assign (this=0x7fffa700, 
p1=0x718a61e2 
"(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
 p2=0x718a622a "", f=0)
at 
/home/antoine/miniconda3/envs/pyarrow/include/boost/regex/v4/basic_regex.hpp:381
#7  0x7182b657 in boost::basic_regex >::assign (this=0x7fffa700, 
p=0x718a61e2 
"(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
 f=0)
at 
/home/antoine/miniconda3/envs/pyarrow/include/boost/regex/v4/basic_regex.hpp:366
#8  0x7180e103 in boost::basic_regex >::basic_regex (this=0x7fffa700, 
p=0x718a61e2 
"(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
 f=0)
at 
/home/antoine/miniconda3/envs/pyarrow/include/boost/regex/v4/basic_regex.hpp:335
#9  0x7180b430 in parquet::ApplicationVersion::ApplicationVersion 
(this=0x71b98bf8 , 
created_by=...) at /home/antoine/parquet-cpp/src/parquet/metadata.cc:452
#10 0x716bfa21 in __cxx_global_var_init.1(void) () at 
/home/antoine/parquet-cpp/src/parquet/metadata.cc:35
#11 0x716bfbfe in _GLOBAL__sub_I_metadata.stdout.fsol.16106.d9N3Ps.ii 
() from /home/antoine/miniconda3/envs/pyarrow/bin/../lib/libparquet.so.1
#12 0x77de76ba in call_init (l=, argc=argc@entry=3, 
argv=argv@entry=0x7fffd7b8, env=env@entry=0x7fffd7d8) at dl-init.c:72
#13 0x77de77cb in call_init (env=0x7fffd7d8, argv=0x7fffd7b8, 
argc=3, l=) at dl-init.c:30
#14 _dl_init (main_map=main_map@entry=0x8acbb0, argc=3, argv=0x7fffd7b8, 
env=0x7fffd7d8) at dl-init.c:120
#15 0x77dec8e2 in dl_open_worker (a=a@entry=0x7fffaa90) at 
dl-open.c:575
#16 0x77de7564 in _dl_catch_error 
(objname=objname@entry=0x7fffaa80, 
errstring=errstring@entry=0x7fffaa88, 
mallocedp=mallocedp@entry=0x7fffaa7f, 
operate=operate@entry=0x77dec4d0 , 
args=args@entry=0x7fffaa90) at dl-error.c:187
#17 0x77debda9 in _dl_open (file=0x76238ae0 
"/home/antoine/arrow/python/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so", 
mode=-2147483646, 
caller_dlopen=0x77a53e21 <_PyImport_FindSharedFuncptr+417>, nsid=-2, 
argc=, argv=, env=0x7fffd7d8) at dl-open.c:660
#18 0x7747bf09 in dlopen_doit (a=a@entry=0x7fffacc0) at dlopen.c:66
#19 0x77de7564 in _dl_catch_error (objname=0x632e00, 
errstring=0x632e08, mallocedp=0x632df8, operate=0x7747beb0 ,