Re: C++ buildings and Regex issue

2018-12-11 Thread Rares Vernica
Worked like a charm. Thanks a lot! Much appreciated!
--
Rares

On Tue, Dec 11, 2018 at 11:20 PM Kouhei Sutou  wrote:

> Hi,
>
> Can you try "-DBOOST_ROOT=${YOUR_BOOST_INSTALL_PREFIX}" option?
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:53:58 -0800,
>   Rares Vernica  wrote:
>
> > Hi,
> >
> > Unfortunately we need to stay on CentOS 6 for now.
> >
> > We have a locally built libboost-devel-1.54 for CentOS 6 which installs
> in
> > a custom location. I added the installation steps at the end of
> >
> https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/centos-6/Dockerfile
> > and the library is in the Docker container now. How can I ask Arrow to
> pick
> > up this Boost library from its custom location?
> >
> > Right now I see this:
> >
> >
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7:
> > error: 'class boost::filesystem::basic_path,
> > boost::filesystem::path_traits>' has no member named 'make_preferred'
> >  i.make_preferred();
> >^~
> >
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27:
> > error: 'class boost::filesystem::basic_path,
> > boost::filesystem::path_traits>' has no member named 'native'
> >  out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL);
> >
> > and I assume Arrow is picking up the default CentOS Boost, which as you
> > mention it won't work.
> >
> > Thanks!
> > Rares
> >
> >
> > On Tue, Dec 11, 2018 at 10:18 PM Kouhei Sutou 
> wrote:
> >
> >> Hi,
> >>
> >> You can't use system Boost on CentOS 6. Because system Boost
> >> is old. It's better that you upgrade to CentOS 7.
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>   "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:07:20
> -0800,
> >>   Rares Vernica  wrote:
> >>
> >> > Wes,
> >> >
> >> > Thanks! We do plan to upgrade, as soon as we put down the fire. We
> >> noticed
> >> > some API changes and we will have to get our code updated.
> >> >
> >> > It looks like it is boost::regex. In our application we link
> dynamically
> >> > against a locally compiled Boost. For Arrow we noticed this for CentOS
> >> >
> >>
> https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69
> >> >
> >> > %if %{_centos_ver} == 6
> >> > -DARROW_BOOST_VENDORED=ON \
> >> > %endif
> >> >
> >> > I tried replacing it with
> >> >
> >> > -DARROW_BOOST_USE_SHARED=ON
> >> >
> >> > but it does not look like it is going to build
> >> >
> >> >
> >>
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7:
> >> > error: 'class boost::filesystem::basic_path,
> >> > boost::filesystem::path_traits>' has no member named 'make_preferred'
> >> >  i.make_preferred();
> >> >^~
> >> >
> >>
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27:
> >> > error: 'class boost::filesystem::basic_path,
> >> > boost::filesystem::path_traits>' has no member named 'native'
> >> >  out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL);
> >> >
> >> > I remember we had a similar conflict with ProtocolBuffers. In that
> case,
> >> > changing Arrow to use the system provided version did the trick.
> >> >
> >> > Thanks,
> >> > Rares
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney 
> >> wrote:
> >> >
> >> >> hi,
> >> >>
> >> >> Could you clarify what you mean by "regex calls"? Are you talking
> >> >> about boost::regex, std::regex, something else? How did you link the
> >> >> relevant libraries in each part of your application, and in the Arrow
> >> >> + Parquet libraries
> >> >>
> >> >> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade
> >> >>
> >> >> $ git hist apache-arrow-0.9.0..master | wc -l
> >> >> 1540
> >> >>
> >> >> - Wes
> >> >> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica 
> >> wrote:
> >> >> >
> >> >> > Hello,
> >> >> >
> >> >> > We are using the C++ bindings of Arrow 0.9.0 on our system on
> CentOS.
> >> >> Once
> >> >> > we load the Arrow library, our regular regex calls (outside of
> Arrow)
> >> >> > misbehave and trigger some unknown crashes. We are still trying to
> >> figure
> >> >> > things out but I was wondering if there are any know issues
> regarding
> >> >> regex
> >> >> > and the C++ binding. Also, how can one turn on/off flags related to
> >> regex
> >> >> > when compiling Arrow? We are still trying to isolate the crash.
> >> >> >
> >> >> > Thanks!
> >> >> > Rares
> >> >>
> >>
>


Re: C++ buildings and Regex issue

2018-12-11 Thread Kouhei Sutou
Hi,

Can you try "-DBOOST_ROOT=${YOUR_BOOST_INSTALL_PREFIX}" option?


Thanks,
--
kou

In 
  "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:53:58 -0800,
  Rares Vernica  wrote:

> Hi,
> 
> Unfortunately we need to stay on CentOS 6 for now.
> 
> We have a locally built libboost-devel-1.54 for CentOS 6 which installs in
> a custom location. I added the installation steps at the end of
> https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/centos-6/Dockerfile
> and the library is in the Docker container now. How can I ask Arrow to pick
> up this Boost library from its custom location?
> 
> Right now I see this:
> 
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7:
> error: 'class boost::filesystem::basic_path,
> boost::filesystem::path_traits>' has no member named 'make_preferred'
>  i.make_preferred();
>^~
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27:
> error: 'class boost::filesystem::basic_path,
> boost::filesystem::path_traits>' has no member named 'native'
>  out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL);
> 
> and I assume Arrow is picking up the default CentOS Boost, which as you
> mention it won't work.
> 
> Thanks!
> Rares
> 
> 
> On Tue, Dec 11, 2018 at 10:18 PM Kouhei Sutou  wrote:
> 
>> Hi,
>>
>> You can't use system Boost on CentOS 6. Because system Boost
>> is old. It's better that you upgrade to CentOS 7.
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:07:20 -0800,
>>   Rares Vernica  wrote:
>>
>> > Wes,
>> >
>> > Thanks! We do plan to upgrade, as soon as we put down the fire. We
>> noticed
>> > some API changes and we will have to get our code updated.
>> >
>> > It looks like it is boost::regex. In our application we link dynamically
>> > against a locally compiled Boost. For Arrow we noticed this for CentOS
>> >
>> https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69
>> >
>> > %if %{_centos_ver} == 6
>> > -DARROW_BOOST_VENDORED=ON \
>> > %endif
>> >
>> > I tried replacing it with
>> >
>> > -DARROW_BOOST_USE_SHARED=ON
>> >
>> > but it does not look like it is going to build
>> >
>> >
>> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7:
>> > error: 'class boost::filesystem::basic_path,
>> > boost::filesystem::path_traits>' has no member named 'make_preferred'
>> >  i.make_preferred();
>> >^~
>> >
>> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27:
>> > error: 'class boost::filesystem::basic_path,
>> > boost::filesystem::path_traits>' has no member named 'native'
>> >  out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL);
>> >
>> > I remember we had a similar conflict with ProtocolBuffers. In that case,
>> > changing Arrow to use the system provided version did the trick.
>> >
>> > Thanks,
>> > Rares
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney 
>> wrote:
>> >
>> >> hi,
>> >>
>> >> Could you clarify what you mean by "regex calls"? Are you talking
>> >> about boost::regex, std::regex, something else? How did you link the
>> >> relevant libraries in each part of your application, and in the Arrow
>> >> + Parquet libraries
>> >>
>> >> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade
>> >>
>> >> $ git hist apache-arrow-0.9.0..master | wc -l
>> >> 1540
>> >>
>> >> - Wes
>> >> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica 
>> wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> > We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS.
>> >> Once
>> >> > we load the Arrow library, our regular regex calls (outside of Arrow)
>> >> > misbehave and trigger some unknown crashes. We are still trying to
>> figure
>> >> > things out but I was wondering if there are any know issues regarding
>> >> regex
>> >> > and the C++ binding. Also, how can one turn on/off flags related to
>> regex
>> >> > when compiling Arrow? We are still trying to isolate the crash.
>> >> >
>> >> > Thanks!
>> >> > Rares
>> >>
>>


Re: C++ buildings and Regex issue

2018-12-11 Thread Rares Vernica
Hi,

Unfortunately we need to stay on CentOS 6 for now.

We have a locally built libboost-devel-1.54 for CentOS 6 which installs in
a custom location. I added the installation steps at the end of
https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/centos-6/Dockerfile
and the library is in the Docker container now. How can I ask Arrow to pick
up this Boost library from its custom location?

Right now I see this:

/root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7:
error: 'class boost::filesystem::basic_path,
boost::filesystem::path_traits>' has no member named 'make_preferred'
 i.make_preferred();
   ^~
/root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27:
error: 'class boost::filesystem::basic_path,
boost::filesystem::path_traits>' has no member named 'native'
 out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL);

and I assume Arrow is picking up the default CentOS Boost, which as you
mention it won't work.

Thanks!
Rares


On Tue, Dec 11, 2018 at 10:18 PM Kouhei Sutou  wrote:

> Hi,
>
> You can't use system Boost on CentOS 6. Because system Boost
> is old. It's better that you upgrade to CentOS 7.
>
> Thanks,
> --
> kou
>
> In 
>   "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:07:20 -0800,
>   Rares Vernica  wrote:
>
> > Wes,
> >
> > Thanks! We do plan to upgrade, as soon as we put down the fire. We
> noticed
> > some API changes and we will have to get our code updated.
> >
> > It looks like it is boost::regex. In our application we link dynamically
> > against a locally compiled Boost. For Arrow we noticed this for CentOS
> >
> https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69
> >
> > %if %{_centos_ver} == 6
> > -DARROW_BOOST_VENDORED=ON \
> > %endif
> >
> > I tried replacing it with
> >
> > -DARROW_BOOST_USE_SHARED=ON
> >
> > but it does not look like it is going to build
> >
> >
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7:
> > error: 'class boost::filesystem::basic_path,
> > boost::filesystem::path_traits>' has no member named 'make_preferred'
> >  i.make_preferred();
> >^~
> >
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27:
> > error: 'class boost::filesystem::basic_path,
> > boost::filesystem::path_traits>' has no member named 'native'
> >  out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL);
> >
> > I remember we had a similar conflict with ProtocolBuffers. In that case,
> > changing Arrow to use the system provided version did the trick.
> >
> > Thanks,
> > Rares
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney 
> wrote:
> >
> >> hi,
> >>
> >> Could you clarify what you mean by "regex calls"? Are you talking
> >> about boost::regex, std::regex, something else? How did you link the
> >> relevant libraries in each part of your application, and in the Arrow
> >> + Parquet libraries
> >>
> >> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade
> >>
> >> $ git hist apache-arrow-0.9.0..master | wc -l
> >> 1540
> >>
> >> - Wes
> >> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica 
> wrote:
> >> >
> >> > Hello,
> >> >
> >> > We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS.
> >> Once
> >> > we load the Arrow library, our regular regex calls (outside of Arrow)
> >> > misbehave and trigger some unknown crashes. We are still trying to
> figure
> >> > things out but I was wondering if there are any know issues regarding
> >> regex
> >> > and the C++ binding. Also, how can one turn on/off flags related to
> regex
> >> > when compiling Arrow? We are still trying to isolate the crash.
> >> >
> >> > Thanks!
> >> > Rares
> >>
>


Re: C++ buildings and Regex issue

2018-12-11 Thread Kouhei Sutou
Hi,

You can't use system Boost on CentOS 6. Because system Boost
is old. It's better that you upgrade to CentOS 7.

Thanks,
--
kou

In 
  "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:07:20 -0800,
  Rares Vernica  wrote:

> Wes,
> 
> Thanks! We do plan to upgrade, as soon as we put down the fire. We noticed
> some API changes and we will have to get our code updated.
> 
> It looks like it is boost::regex. In our application we link dynamically
> against a locally compiled Boost. For Arrow we noticed this for CentOS
> https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69
> 
> %if %{_centos_ver} == 6
> -DARROW_BOOST_VENDORED=ON \
> %endif
> 
> I tried replacing it with
> 
> -DARROW_BOOST_USE_SHARED=ON
> 
> but it does not look like it is going to build
> 
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7:
> error: 'class boost::filesystem::basic_path,
> boost::filesystem::path_traits>' has no member named 'make_preferred'
>  i.make_preferred();
>^~
> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27:
> error: 'class boost::filesystem::basic_path,
> boost::filesystem::path_traits>' has no member named 'native'
>  out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL);
> 
> I remember we had a similar conflict with ProtocolBuffers. In that case,
> changing Arrow to use the system provided version did the trick.
> 
> Thanks,
> Rares
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney  wrote:
> 
>> hi,
>>
>> Could you clarify what you mean by "regex calls"? Are you talking
>> about boost::regex, std::regex, something else? How did you link the
>> relevant libraries in each part of your application, and in the Arrow
>> + Parquet libraries
>>
>> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade
>>
>> $ git hist apache-arrow-0.9.0..master | wc -l
>> 1540
>>
>> - Wes
>> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica  wrote:
>> >
>> > Hello,
>> >
>> > We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS.
>> Once
>> > we load the Arrow library, our regular regex calls (outside of Arrow)
>> > misbehave and trigger some unknown crashes. We are still trying to figure
>> > things out but I was wondering if there are any know issues regarding
>> regex
>> > and the C++ binding. Also, how can one turn on/off flags related to regex
>> > when compiling Arrow? We are still trying to isolate the crash.
>> >
>> > Thanks!
>> > Rares
>>


Re: C++ buildings and Regex issue

2018-12-11 Thread Rares Vernica
Wes,

Thanks! We do plan to upgrade, as soon as we put down the fire. We noticed
some API changes and we will have to get our code updated.

It looks like it is boost::regex. In our application we link dynamically
against a locally compiled Boost. For Arrow we noticed this for CentOS
https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69

%if %{_centos_ver} == 6
-DARROW_BOOST_VENDORED=ON \
%endif

I tried replacing it with

-DARROW_BOOST_USE_SHARED=ON

but it does not look like it is going to build

/root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7:
error: 'class boost::filesystem::basic_path,
boost::filesystem::path_traits>' has no member named 'make_preferred'
 i.make_preferred();
   ^~
/root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27:
error: 'class boost::filesystem::basic_path,
boost::filesystem::path_traits>' has no member named 'native'
 out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL);

I remember we had a similar conflict with ProtocolBuffers. In that case,
changing Arrow to use the system provided version did the trick.

Thanks,
Rares











On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney  wrote:

> hi,
>
> Could you clarify what you mean by "regex calls"? Are you talking
> about boost::regex, std::regex, something else? How did you link the
> relevant libraries in each part of your application, and in the Arrow
> + Parquet libraries
>
> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade
>
> $ git hist apache-arrow-0.9.0..master | wc -l
> 1540
>
> - Wes
> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica  wrote:
> >
> > Hello,
> >
> > We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS.
> Once
> > we load the Arrow library, our regular regex calls (outside of Arrow)
> > misbehave and trigger some unknown crashes. We are still trying to figure
> > things out but I was wondering if there are any know issues regarding
> regex
> > and the C++ binding. Also, how can one turn on/off flags related to regex
> > when compiling Arrow? We are still trying to isolate the crash.
> >
> > Thanks!
> > Rares
>


Re: C++ buildings and Regex issue

2018-12-11 Thread Wes McKinney
hi,

Could you clarify what you mean by "regex calls"? Are you talking
about boost::regex, std::regex, something else? How did you link the
relevant libraries in each part of your application, and in the Arrow
+ Parquet libraries

0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade

$ git hist apache-arrow-0.9.0..master | wc -l
1540

- Wes
On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica  wrote:
>
> Hello,
>
> We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS. Once
> we load the Arrow library, our regular regex calls (outside of Arrow)
> misbehave and trigger some unknown crashes. We are still trying to figure
> things out but I was wondering if there are any know issues regarding regex
> and the C++ binding. Also, how can one turn on/off flags related to regex
> when compiling Arrow? We are still trying to isolate the crash.
>
> Thanks!
> Rares


C++ buildings and Regex issue

2018-12-11 Thread Rares Vernica
Hello,

We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS. Once
we load the Arrow library, our regular regex calls (outside of Arrow)
misbehave and trigger some unknown crashes. We are still trying to figure
things out but I was wondering if there are any know issues regarding regex
and the C++ binding. Also, how can one turn on/off flags related to regex
when compiling Arrow? We are still trying to isolate the crash.

Thanks!
Rares


Re: Timeline for Arrow 0.12.0 release

2018-12-11 Thread Wes McKinney
hi all,

I'm looking at the 0.12 backlog and I am not too comfortable with the
things that would have to be cut to get a release out next week.
Additionally, not a lot of developers are going to be working the week
of December 24 because of the Christmas and New Year's holidays, so
even if we did release, it might not get seen by a lot of people until
after the New Year.

Based on this, I would suggest we push to complete as much work as
possible (from the 0.12 backlog and beyond) by the end of the year,
and release as soon as possible in 2019. Of course, anyone is welcome
to contribute work that is not found in the 0.12 milestone =)

Any objections?

Thanks
Wes
On Mon, Dec 10, 2018 at 8:04 AM Andy Grove  wrote:
>
> Cool. I will continue to add primitive operations but I am now adding this
> in a separate source file to keep it separate from the core array code.
>
> I'm not sure how important it will be to support Rust data sources with
> Gandiva. I can see that each language should be able to construct the
> logical query plan to submit to Gandiva and let Gandiva handle execution. I
> think the more interesting part is how do we support language-specific
> lambda functions as part of that logical query plan. Maybe it is possible
> to compile the lambda down to LLVM (I haven't started learning about LLVM
> in detail yet so this is wild speculation on my part). Another option is
> for Gandiva to support calling into shared libraries and that maybe is
> simpler for languages that support building C-native shared libraries (Rust
> supports this with zero overhead).
>
> Andy.
>
>
>
>
> On Sun, Dec 9, 2018 at 11:42 AM Wes McKinney  wrote:
>
> > hi Andy,
> >
> > I can see an argument for having some basic native function kernel
> > support in Rust. One of the things that Gandiva has begun is a
> > Protobuf-based serialized representation representation of projection
> > and filter expressions. In the long run I would like to see a more
> > complete relational algebra / logical query plan that can be submitted
> > for execution. There's complexities, though, such as bridging
> > iteration of data sources written in Rust, say, with a query engine
> > written in C++. You would need to provide some kind of a callback
> > mechanism for the query engine to request the next chunk of a dataset
> > to be materialized.
> >
> > It will be interested to see what contributors will be motivated
> > enough to build over the next few years. At the end of the day, Apache
> > projects are do-ocracies.
> >
> > - Wes
> > On Fri, Dec 7, 2018 at 6:22 AM Andy Grove  wrote:
> > >
> > > I've added one PR to the list (https://github.com/apache/arrow/pull/3119
> > )
> > > to update the project to use Rust 2018 Edition.
> > >
> > > I'm also considering removing one PR from the list and would like to get
> > > opinions here.
> > >
> > > I have a PR (https://github.com/apache/arrow/pull/3033) to add some
> > basic
> > > math and comparison operators to primitive arrays. These are baby steps
> > > towards implementing more query execution capabilities such as
> > projection,
> > > selection, etc but Chao made a good point that other Rust implementations
> > > don't have these kind of capabilities and I am now wondering if this is a
> > > distraction. We already have Gandiva and the new efforts in Ursa labs and
> > > it would probably make more sense to look at having Rust bindings for the
> > > query execution capabilities there rather than having a competing (and
> > less
> > > capable) implementation in Rust.
> > >
> > > Thoughts?
> > >
> > > Andy.
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 6, 2018 at 8:42 PM paddy horan 
> > wrote:
> > >
> > > > Other than Andy’s PR below I’m going to try and find time to work on
> > > > ARROW-3827, I’ll bump it 0.13 if I can’t find the time early next week.
> > > > There is nothing else in the 0.12 backlog for Rust.  It would be nice
> > to
> > > > get the parquet merge in though.
> > > >
> > > >
> > > >
> > > > Paddy
> > > >
> > > >
> > > >
> > > > 
> > > > From: Andy Grove 
> > > > Sent: Thursday, December 6, 2018 10:20:48 AM
> > > > To: dev@arrow.apache.org
> > > > Subject: Re: Timeline for Arrow 0.12.0 release
> > > >
> > > > I have PRs pending for all the Rust issues that I want to get into
> > 0.12.0
> > > > and would appreciate some reviews so I can go ahead and merge:
> > > >
> > > > https://github.com/apache/arrow/pull/3033 (covers ARROW-3880 and
> > > > ARROW-3881
> > > > - add math and comparison operations to primitive arrays)
> > > > https://github.com/apache/arrow/pull/3096 (ARROW-3885 - Rust release
> > > > process)
> > > > https://github.com/apache/arrow/pull/3111 (ARROW-3838 - CSV Writer)
> > > >
> > > > With these in place I plan on writing a tutorial for reading a CSV
> > file,
> > > > performing some operations on primitive arrays and writing the output
> > to a
> > > > new CSV file.
> > > >
> > > > I am deferring ARROW-3882 (casting for primitive 

[jira] [Created] (ARROW-4000) Error running test_read_options on Windows

2018-12-11 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4000:


 Summary: Error running test_read_options on Windows
 Key: ARROW-4000
 URL: https://issues.apache.org/jira/browse/ARROW-4000
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 0.11.1
Reporter: Benjamin Kietzman


`py.test pyarrow -v` crashed at `pyarrow/tests/test_csv.py::test_read_options`.

errorlevel was -1073741819, not sure what that means.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4001) Create Parquet Schema in python

2018-12-11 Thread David Stauffer (JIRA)
David Stauffer created ARROW-4001:
-

 Summary: Create Parquet Schema in python
 Key: ARROW-4001
 URL: https://issues.apache.org/jira/browse/ARROW-4001
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Affects Versions: 0.9.0
Reporter: David Stauffer


Enable the creation of a Parquet schema in python. For functions like 
pyarrow.parquet.ParquetDataset, a schema must be a Parquet schema. See: 
https://stackoverflow.com/questions/53725691/pyarrow-lib-schema-vs-pyarrow-parquet-schema



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3999) [Python] Can't read large file that pyarrow wrote

2018-12-11 Thread Diego Argueta (JIRA)
Diego Argueta created ARROW-3999:


 Summary: [Python] Can't read large file that pyarrow wrote
 Key: ARROW-3999
 URL: https://issues.apache.org/jira/browse/ARROW-3999
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.11.1
 Environment: OS: OSX High Sierra 10.13.6
Python: 3.7.0
PyArrow: 0.11.1
Pandas: 0.23.4
Reporter: Diego Argueta


I loaded a large Pandas DataFrame from a CSV and successfully wrote it to a 
Parquet file using the DataFrame's {{to_parquet}} method. However, reading that 
same file back results in an exception:
{code:java}
>>> source_df.shape
(32070402, 7)

>>> source_df.dtypes
Url Source object
Url Destination object
Anchor text object
Follow / No-Follow object
Link No-Follow bool
Meta No-Follow bool
Robot No-Follow bool
dtype: object

>>> source_df.to_parquet('export.parq', compression='gzip',
 use_deprecated_int96_timestamps=True)

>>> loaded_df = pd.read_parquet('export.parq')
Traceback (most recent call last):
 File "", line 1, in 
 File 
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py",
 line 288, in read_parquet
 return impl.read(path, columns=columns, **kwargs)
 File 
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py",
 line 131, in read
 **kwargs).to_pandas()
 File 
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/parquet.py",
 line 1074, in read_table
 use_pandas_metadata=use_pandas_metadata)
 File 
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/filesystem.py",
 line 184, in read_parquet
 use_pandas_metadata=use_pandas_metadata)
 File 
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/parquet.py",
 line 943, in read
 use_pandas_metadata=use_pandas_metadata)
 File 
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/parquet.py",
 line 500, in read
 table = reader.read(**options)
 File 
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/parquet.py",
 line 187, in read
 use_threads=use_threads)
 File "pyarrow/_parquet.pyx", line 721, in 
pyarrow._parquet.ParquetReader.read_all
 File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Arrow error: Capacity error: BinaryArray cannot 
contain more than 2147483646 bytes, have 2147483685

Arrow error: Capacity error: BinaryArray cannot contain more than 2147483646 
bytes, have 2147483685
 {code}
 

One would expect that if PyArrow can write a file successfully, it can read it 
back as well. Fortunately the {{fastparquet}} library has no problem reading 
this file, so we didn't lose any data, but the roundtripping problem was a bit 
of a surprise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3998) Support TPC-H dbgen in Arrow

2018-12-11 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-3998:
-

 Summary: Support TPC-H dbgen in Arrow
 Key: ARROW-3998
 URL: https://issues.apache.org/jira/browse/ARROW-3998
 Project: Apache Arrow
  Issue Type: Wish
Reporter: Francois Saint-Jacques


Integration tests and benchmarks should read TPC-H data. This is going to be 
useful for future query execution engine benchmarking.

It could also attract researchers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3997) [C++] [Doc] Clarify dictionary encoding integer signedness (and width?)

2018-12-11 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3997:
-

 Summary: [C++] [Doc] Clarify dictionary encoding integer 
signedness (and width?)
 Key: ARROW-3997
 URL: https://issues.apache.org/jira/browse/ARROW-3997
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation, Format
Affects Versions: 0.11.1
Reporter: Antoine Pitrou


The Arrow spec states that a dictionary-encoded array uses int32 indices. 
Signed or unsigned? The spec doesn't say.

Also, the C++ implementation supports all kinds of integers as indices (8- to 
64-bit, signed and unsigned). I wonder if we should at least mandate a specific 
signedness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3996) [C++] Insufficient description on build

2018-12-11 Thread Kunihisa Abukawa (JIRA)
Kunihisa Abukawa created ARROW-3996:
---

 Summary: [C++] Insufficient description on build
 Key: ARROW-3996
 URL: https://issues.apache.org/jira/browse/ARROW-3996
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.11.1
 Environment: Ubuntu Linux(Include Ubuntu 18.04 LTS on Windows 10 WSL)
Reporter: Kunihisa Abukawa


C / C ++ version of Ubuntu Linux environment requires less library / component 
description.

Requirement
* g ++
* autoconf
* Jemalloc

In accordance with the above, you need to add the following to the library / 
component you install with apt-get install.

* libboost-regex-dev
* libjemalloc-dev
* autotools-dev



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3995) [CI] Use understandable names in Travis Matrix

2018-12-11 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-3995:
--

 Summary: [CI] Use understandable names in Travis Matrix 
 Key: ARROW-3995
 URL: https://issues.apache.org/jira/browse/ARROW-3995
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.12.0


Travis has a new feature to assign labels to the matrix entries making it much 
easier navigable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)