Re: C++ buildings and Regex issue
Worked like a charm. Thanks a lot! Much appreciated! -- Rares On Tue, Dec 11, 2018 at 11:20 PM Kouhei Sutou wrote: > Hi, > > Can you try "-DBOOST_ROOT=${YOUR_BOOST_INSTALL_PREFIX}" option? > > > Thanks, > -- > kou > > In > "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:53:58 -0800, > Rares Vernica wrote: > > > Hi, > > > > Unfortunately we need to stay on CentOS 6 for now. > > > > We have a locally built libboost-devel-1.54 for CentOS 6 which installs > in > > a custom location. I added the installation steps at the end of > > > https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/centos-6/Dockerfile > > and the library is in the Docker container now. How can I ask Arrow to > pick > > up this Boost library from its custom location? > > > > Right now I see this: > > > > > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7: > > error: 'class boost::filesystem::basic_path, > > boost::filesystem::path_traits>' has no member named 'make_preferred' > > i.make_preferred(); > >^~ > > > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27: > > error: 'class boost::filesystem::basic_path, > > boost::filesystem::path_traits>' has no member named 'native' > > out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL); > > > > and I assume Arrow is picking up the default CentOS Boost, which as you > > mention it won't work. > > > > Thanks! > > Rares > > > > > > On Tue, Dec 11, 2018 at 10:18 PM Kouhei Sutou > wrote: > > > >> Hi, > >> > >> You can't use system Boost on CentOS 6. Because system Boost > >> is old. It's better that you upgrade to CentOS 7. > >> > >> Thanks, > >> -- > >> kou > >> > >> In > >> "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:07:20 > -0800, > >> Rares Vernica wrote: > >> > >> > Wes, > >> > > >> > Thanks! We do plan to upgrade, as soon as we put down the fire. We > >> noticed > >> > some API changes and we will have to get our code updated. > >> > > >> > It looks like it is boost::regex. In our application we link > dynamically > >> > against a locally compiled Boost. For Arrow we noticed this for CentOS > >> > > >> > https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69 > >> > > >> > %if %{_centos_ver} == 6 > >> > -DARROW_BOOST_VENDORED=ON \ > >> > %endif > >> > > >> > I tried replacing it with > >> > > >> > -DARROW_BOOST_USE_SHARED=ON > >> > > >> > but it does not look like it is going to build > >> > > >> > > >> > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7: > >> > error: 'class boost::filesystem::basic_path, > >> > boost::filesystem::path_traits>' has no member named 'make_preferred' > >> > i.make_preferred(); > >> >^~ > >> > > >> > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27: > >> > error: 'class boost::filesystem::basic_path, > >> > boost::filesystem::path_traits>' has no member named 'native' > >> > out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL); > >> > > >> > I remember we had a similar conflict with ProtocolBuffers. In that > case, > >> > changing Arrow to use the system provided version did the trick. > >> > > >> > Thanks, > >> > Rares > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney > >> wrote: > >> > > >> >> hi, > >> >> > >> >> Could you clarify what you mean by "regex calls"? Are you talking > >> >> about boost::regex, std::regex, something else? How did you link the > >> >> relevant libraries in each part of your application, and in the Arrow > >> >> + Parquet libraries > >> >> > >> >> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade > >> >> > >> >> $ git hist apache-arrow-0.9.0..master | wc -l > >> >> 1540 > >> >> > >> >> - Wes > >> >> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica > >> wrote: > >> >> > > >> >> > Hello, > >> >> > > >> >> > We are using the C++ bindings of Arrow 0.9.0 on our system on > CentOS. > >> >> Once > >> >> > we load the Arrow library, our regular regex calls (outside of > Arrow) > >> >> > misbehave and trigger some unknown crashes. We are still trying to > >> figure > >> >> > things out but I was wondering if there are any know issues > regarding > >> >> regex > >> >> > and the C++ binding. Also, how can one turn on/off flags related to > >> regex > >> >> > when compiling Arrow? We are still trying to isolate the crash. > >> >> > > >> >> > Thanks! > >> >> > Rares > >> >> > >> >
Re: C++ buildings and Regex issue
Hi, Can you try "-DBOOST_ROOT=${YOUR_BOOST_INSTALL_PREFIX}" option? Thanks, -- kou In "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:53:58 -0800, Rares Vernica wrote: > Hi, > > Unfortunately we need to stay on CentOS 6 for now. > > We have a locally built libboost-devel-1.54 for CentOS 6 which installs in > a custom location. I added the installation steps at the end of > https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/centos-6/Dockerfile > and the library is in the Docker container now. How can I ask Arrow to pick > up this Boost library from its custom location? > > Right now I see this: > > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7: > error: 'class boost::filesystem::basic_path, > boost::filesystem::path_traits>' has no member named 'make_preferred' > i.make_preferred(); >^~ > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27: > error: 'class boost::filesystem::basic_path, > boost::filesystem::path_traits>' has no member named 'native' > out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL); > > and I assume Arrow is picking up the default CentOS Boost, which as you > mention it won't work. > > Thanks! > Rares > > > On Tue, Dec 11, 2018 at 10:18 PM Kouhei Sutou wrote: > >> Hi, >> >> You can't use system Boost on CentOS 6. Because system Boost >> is old. It's better that you upgrade to CentOS 7. >> >> Thanks, >> -- >> kou >> >> In >> "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:07:20 -0800, >> Rares Vernica wrote: >> >> > Wes, >> > >> > Thanks! We do plan to upgrade, as soon as we put down the fire. We >> noticed >> > some API changes and we will have to get our code updated. >> > >> > It looks like it is boost::regex. In our application we link dynamically >> > against a locally compiled Boost. For Arrow we noticed this for CentOS >> > >> https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69 >> > >> > %if %{_centos_ver} == 6 >> > -DARROW_BOOST_VENDORED=ON \ >> > %endif >> > >> > I tried replacing it with >> > >> > -DARROW_BOOST_USE_SHARED=ON >> > >> > but it does not look like it is going to build >> > >> > >> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7: >> > error: 'class boost::filesystem::basic_path, >> > boost::filesystem::path_traits>' has no member named 'make_preferred' >> > i.make_preferred(); >> >^~ >> > >> /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27: >> > error: 'class boost::filesystem::basic_path, >> > boost::filesystem::path_traits>' has no member named 'native' >> > out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL); >> > >> > I remember we had a similar conflict with ProtocolBuffers. In that case, >> > changing Arrow to use the system provided version did the trick. >> > >> > Thanks, >> > Rares >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney >> wrote: >> > >> >> hi, >> >> >> >> Could you clarify what you mean by "regex calls"? Are you talking >> >> about boost::regex, std::regex, something else? How did you link the >> >> relevant libraries in each part of your application, and in the Arrow >> >> + Parquet libraries >> >> >> >> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade >> >> >> >> $ git hist apache-arrow-0.9.0..master | wc -l >> >> 1540 >> >> >> >> - Wes >> >> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica >> wrote: >> >> > >> >> > Hello, >> >> > >> >> > We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS. >> >> Once >> >> > we load the Arrow library, our regular regex calls (outside of Arrow) >> >> > misbehave and trigger some unknown crashes. We are still trying to >> figure >> >> > things out but I was wondering if there are any know issues regarding >> >> regex >> >> > and the C++ binding. Also, how can one turn on/off flags related to >> regex >> >> > when compiling Arrow? We are still trying to isolate the crash. >> >> > >> >> > Thanks! >> >> > Rares >> >> >>
Re: C++ buildings and Regex issue
Hi, Unfortunately we need to stay on CentOS 6 for now. We have a locally built libboost-devel-1.54 for CentOS 6 which installs in a custom location. I added the installation steps at the end of https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/centos-6/Dockerfile and the library is in the Docker container now. How can I ask Arrow to pick up this Boost library from its custom location? Right now I see this: /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7: error: 'class boost::filesystem::basic_path, boost::filesystem::path_traits>' has no member named 'make_preferred' i.make_preferred(); ^~ /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27: error: 'class boost::filesystem::basic_path, boost::filesystem::path_traits>' has no member named 'native' out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL); and I assume Arrow is picking up the default CentOS Boost, which as you mention it won't work. Thanks! Rares On Tue, Dec 11, 2018 at 10:18 PM Kouhei Sutou wrote: > Hi, > > You can't use system Boost on CentOS 6. Because system Boost > is old. It's better that you upgrade to CentOS 7. > > Thanks, > -- > kou > > In > "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:07:20 -0800, > Rares Vernica wrote: > > > Wes, > > > > Thanks! We do plan to upgrade, as soon as we put down the fire. We > noticed > > some API changes and we will have to get our code updated. > > > > It looks like it is boost::regex. In our application we link dynamically > > against a locally compiled Boost. For Arrow we noticed this for CentOS > > > https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69 > > > > %if %{_centos_ver} == 6 > > -DARROW_BOOST_VENDORED=ON \ > > %endif > > > > I tried replacing it with > > > > -DARROW_BOOST_USE_SHARED=ON > > > > but it does not look like it is going to build > > > > > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7: > > error: 'class boost::filesystem::basic_path, > > boost::filesystem::path_traits>' has no member named 'make_preferred' > > i.make_preferred(); > >^~ > > > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27: > > error: 'class boost::filesystem::basic_path, > > boost::filesystem::path_traits>' has no member named 'native' > > out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL); > > > > I remember we had a similar conflict with ProtocolBuffers. In that case, > > changing Arrow to use the system provided version did the trick. > > > > Thanks, > > Rares > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney > wrote: > > > >> hi, > >> > >> Could you clarify what you mean by "regex calls"? Are you talking > >> about boost::regex, std::regex, something else? How did you link the > >> relevant libraries in each part of your application, and in the Arrow > >> + Parquet libraries > >> > >> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade > >> > >> $ git hist apache-arrow-0.9.0..master | wc -l > >> 1540 > >> > >> - Wes > >> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica > wrote: > >> > > >> > Hello, > >> > > >> > We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS. > >> Once > >> > we load the Arrow library, our regular regex calls (outside of Arrow) > >> > misbehave and trigger some unknown crashes. We are still trying to > figure > >> > things out but I was wondering if there are any know issues regarding > >> regex > >> > and the C++ binding. Also, how can one turn on/off flags related to > regex > >> > when compiling Arrow? We are still trying to isolate the crash. > >> > > >> > Thanks! > >> > Rares > >> >
Re: C++ buildings and Regex issue
Hi, You can't use system Boost on CentOS 6. Because system Boost is old. It's better that you upgrade to CentOS 7. Thanks, -- kou In "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:07:20 -0800, Rares Vernica wrote: > Wes, > > Thanks! We do plan to upgrade, as soon as we put down the fire. We noticed > some API changes and we will have to get our code updated. > > It looks like it is boost::regex. In our application we link dynamically > against a locally compiled Boost. For Arrow we noticed this for CentOS > https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69 > > %if %{_centos_ver} == 6 > -DARROW_BOOST_VENDORED=ON \ > %endif > > I tried replacing it with > > -DARROW_BOOST_USE_SHARED=ON > > but it does not look like it is going to build > > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7: > error: 'class boost::filesystem::basic_path, > boost::filesystem::path_traits>' has no member named 'make_preferred' > i.make_preferred(); >^~ > /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27: > error: 'class boost::filesystem::basic_path, > boost::filesystem::path_traits>' has no member named 'native' > out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL); > > I remember we had a similar conflict with ProtocolBuffers. In that case, > changing Arrow to use the system provided version did the trick. > > Thanks, > Rares > > > > > > > > > > > > On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney wrote: > >> hi, >> >> Could you clarify what you mean by "regex calls"? Are you talking >> about boost::regex, std::regex, something else? How did you link the >> relevant libraries in each part of your application, and in the Arrow >> + Parquet libraries >> >> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade >> >> $ git hist apache-arrow-0.9.0..master | wc -l >> 1540 >> >> - Wes >> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica wrote: >> > >> > Hello, >> > >> > We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS. >> Once >> > we load the Arrow library, our regular regex calls (outside of Arrow) >> > misbehave and trigger some unknown crashes. We are still trying to figure >> > things out but I was wondering if there are any know issues regarding >> regex >> > and the C++ binding. Also, how can one turn on/off flags related to regex >> > when compiling Arrow? We are still trying to isolate the crash. >> > >> > Thanks! >> > Rares >>
Re: C++ buildings and Regex issue
Wes, Thanks! We do plan to upgrade, as soon as we put down the fire. We noticed some API changes and we will have to get our code updated. It looks like it is boost::regex. In our application we link dynamically against a locally compiled Boost. For Arrow we noticed this for CentOS https://github.com/apache/arrow-dist/blob/master/cpp-linux/yum/arrow.spec.in#L69 %if %{_centos_ver} == 6 -DARROW_BOOST_VENDORED=ON \ %endif I tried replacing it with -DARROW_BOOST_USE_SHARED=ON but it does not look like it is going to build /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:204:7: error: 'class boost::filesystem::basic_path, boost::filesystem::path_traits>' has no member named 'make_preferred' i.make_preferred(); ^~ /root/rpmbuild/BUILD/apache-arrow-0.9.0/cpp/src/arrow/io/hdfs-internal.cc:205:27: error: 'class boost::filesystem::basic_path, boost::filesystem::path_traits>' has no member named 'native' out_handle = dlopen(i.native().c_str(), RTLD_NOW | RTLD_LOCAL); I remember we had a similar conflict with ProtocolBuffers. In that case, changing Arrow to use the system provided version did the trick. Thanks, Rares On Tue, Dec 11, 2018 at 9:30 PM Wes McKinney wrote: > hi, > > Could you clarify what you mean by "regex calls"? Are you talking > about boost::regex, std::regex, something else? How did you link the > relevant libraries in each part of your application, and in the Arrow > + Parquet libraries > > 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade > > $ git hist apache-arrow-0.9.0..master | wc -l > 1540 > > - Wes > On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica wrote: > > > > Hello, > > > > We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS. > Once > > we load the Arrow library, our regular regex calls (outside of Arrow) > > misbehave and trigger some unknown crashes. We are still trying to figure > > things out but I was wondering if there are any know issues regarding > regex > > and the C++ binding. Also, how can one turn on/off flags related to regex > > when compiling Arrow? We are still trying to isolate the crash. > > > > Thanks! > > Rares >
Re: C++ buildings and Regex issue
hi, Could you clarify what you mean by "regex calls"? Are you talking about boost::regex, std::regex, something else? How did you link the relevant libraries in each part of your application, and in the Arrow + Parquet libraries 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade $ git hist apache-arrow-0.9.0..master | wc -l 1540 - Wes On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica wrote: > > Hello, > > We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS. Once > we load the Arrow library, our regular regex calls (outside of Arrow) > misbehave and trigger some unknown crashes. We are still trying to figure > things out but I was wondering if there are any know issues regarding regex > and the C++ binding. Also, how can one turn on/off flags related to regex > when compiling Arrow? We are still trying to isolate the crash. > > Thanks! > Rares
C++ buildings and Regex issue
Hello, We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS. Once we load the Arrow library, our regular regex calls (outside of Arrow) misbehave and trigger some unknown crashes. We are still trying to figure things out but I was wondering if there are any know issues regarding regex and the C++ binding. Also, how can one turn on/off flags related to regex when compiling Arrow? We are still trying to isolate the crash. Thanks! Rares
Re: Timeline for Arrow 0.12.0 release
hi all, I'm looking at the 0.12 backlog and I am not too comfortable with the things that would have to be cut to get a release out next week. Additionally, not a lot of developers are going to be working the week of December 24 because of the Christmas and New Year's holidays, so even if we did release, it might not get seen by a lot of people until after the New Year. Based on this, I would suggest we push to complete as much work as possible (from the 0.12 backlog and beyond) by the end of the year, and release as soon as possible in 2019. Of course, anyone is welcome to contribute work that is not found in the 0.12 milestone =) Any objections? Thanks Wes On Mon, Dec 10, 2018 at 8:04 AM Andy Grove wrote: > > Cool. I will continue to add primitive operations but I am now adding this > in a separate source file to keep it separate from the core array code. > > I'm not sure how important it will be to support Rust data sources with > Gandiva. I can see that each language should be able to construct the > logical query plan to submit to Gandiva and let Gandiva handle execution. I > think the more interesting part is how do we support language-specific > lambda functions as part of that logical query plan. Maybe it is possible > to compile the lambda down to LLVM (I haven't started learning about LLVM > in detail yet so this is wild speculation on my part). Another option is > for Gandiva to support calling into shared libraries and that maybe is > simpler for languages that support building C-native shared libraries (Rust > supports this with zero overhead). > > Andy. > > > > > On Sun, Dec 9, 2018 at 11:42 AM Wes McKinney wrote: > > > hi Andy, > > > > I can see an argument for having some basic native function kernel > > support in Rust. One of the things that Gandiva has begun is a > > Protobuf-based serialized representation representation of projection > > and filter expressions. In the long run I would like to see a more > > complete relational algebra / logical query plan that can be submitted > > for execution. There's complexities, though, such as bridging > > iteration of data sources written in Rust, say, with a query engine > > written in C++. You would need to provide some kind of a callback > > mechanism for the query engine to request the next chunk of a dataset > > to be materialized. > > > > It will be interested to see what contributors will be motivated > > enough to build over the next few years. At the end of the day, Apache > > projects are do-ocracies. > > > > - Wes > > On Fri, Dec 7, 2018 at 6:22 AM Andy Grove wrote: > > > > > > I've added one PR to the list (https://github.com/apache/arrow/pull/3119 > > ) > > > to update the project to use Rust 2018 Edition. > > > > > > I'm also considering removing one PR from the list and would like to get > > > opinions here. > > > > > > I have a PR (https://github.com/apache/arrow/pull/3033) to add some > > basic > > > math and comparison operators to primitive arrays. These are baby steps > > > towards implementing more query execution capabilities such as > > projection, > > > selection, etc but Chao made a good point that other Rust implementations > > > don't have these kind of capabilities and I am now wondering if this is a > > > distraction. We already have Gandiva and the new efforts in Ursa labs and > > > it would probably make more sense to look at having Rust bindings for the > > > query execution capabilities there rather than having a competing (and > > less > > > capable) implementation in Rust. > > > > > > Thoughts? > > > > > > Andy. > > > > > > > > > > > > > > > > > > On Thu, Dec 6, 2018 at 8:42 PM paddy horan > > wrote: > > > > > > > Other than Andy’s PR below I’m going to try and find time to work on > > > > ARROW-3827, I’ll bump it 0.13 if I can’t find the time early next week. > > > > There is nothing else in the 0.12 backlog for Rust. It would be nice > > to > > > > get the parquet merge in though. > > > > > > > > > > > > > > > > Paddy > > > > > > > > > > > > > > > > > > > > From: Andy Grove > > > > Sent: Thursday, December 6, 2018 10:20:48 AM > > > > To: dev@arrow.apache.org > > > > Subject: Re: Timeline for Arrow 0.12.0 release > > > > > > > > I have PRs pending for all the Rust issues that I want to get into > > 0.12.0 > > > > and would appreciate some reviews so I can go ahead and merge: > > > > > > > > https://github.com/apache/arrow/pull/3033 (covers ARROW-3880 and > > > > ARROW-3881 > > > > - add math and comparison operations to primitive arrays) > > > > https://github.com/apache/arrow/pull/3096 (ARROW-3885 - Rust release > > > > process) > > > > https://github.com/apache/arrow/pull/3111 (ARROW-3838 - CSV Writer) > > > > > > > > With these in place I plan on writing a tutorial for reading a CSV > > file, > > > > performing some operations on primitive arrays and writing the output > > to a > > > > new CSV file. > > > > > > > > I am deferring ARROW-3882 (casting for primitive
[jira] [Created] (ARROW-4000) Error running test_read_options on Windows
Benjamin Kietzman created ARROW-4000: Summary: Error running test_read_options on Windows Key: ARROW-4000 URL: https://issues.apache.org/jira/browse/ARROW-4000 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 0.11.1 Reporter: Benjamin Kietzman `py.test pyarrow -v` crashed at `pyarrow/tests/test_csv.py::test_read_options`. errorlevel was -1073741819, not sure what that means. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4001) Create Parquet Schema in python
David Stauffer created ARROW-4001: - Summary: Create Parquet Schema in python Key: ARROW-4001 URL: https://issues.apache.org/jira/browse/ARROW-4001 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 0.9.0 Reporter: David Stauffer Enable the creation of a Parquet schema in python. For functions like pyarrow.parquet.ParquetDataset, a schema must be a Parquet schema. See: https://stackoverflow.com/questions/53725691/pyarrow-lib-schema-vs-pyarrow-parquet-schema -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3999) [Python] Can't read large file that pyarrow wrote
Diego Argueta created ARROW-3999: Summary: [Python] Can't read large file that pyarrow wrote Key: ARROW-3999 URL: https://issues.apache.org/jira/browse/ARROW-3999 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.11.1 Environment: OS: OSX High Sierra 10.13.6 Python: 3.7.0 PyArrow: 0.11.1 Pandas: 0.23.4 Reporter: Diego Argueta I loaded a large Pandas DataFrame from a CSV and successfully wrote it to a Parquet file using the DataFrame's {{to_parquet}} method. However, reading that same file back results in an exception: {code:java} >>> source_df.shape (32070402, 7) >>> source_df.dtypes Url Source object Url Destination object Anchor text object Follow / No-Follow object Link No-Follow bool Meta No-Follow bool Robot No-Follow bool dtype: object >>> source_df.to_parquet('export.parq', compression='gzip', use_deprecated_int96_timestamps=True) >>> loaded_df = pd.read_parquet('export.parq') Traceback (most recent call last): File "", line 1, in File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 288, in read_parquet return impl.read(path, columns=columns, **kwargs) File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 131, in read **kwargs).to_pandas() File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/parquet.py", line 1074, in read_table use_pandas_metadata=use_pandas_metadata) File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/filesystem.py", line 184, in read_parquet use_pandas_metadata=use_pandas_metadata) File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/parquet.py", line 943, in read use_pandas_metadata=use_pandas_metadata) File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/parquet.py", line 500, in read table = reader.read(**options) File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/parquet.py", line 187, in read use_threads=use_threads) File "pyarrow/_parquet.pyx", line 721, in pyarrow._parquet.ParquetReader.read_all File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Arrow error: Capacity error: BinaryArray cannot contain more than 2147483646 bytes, have 2147483685 Arrow error: Capacity error: BinaryArray cannot contain more than 2147483646 bytes, have 2147483685 {code} One would expect that if PyArrow can write a file successfully, it can read it back as well. Fortunately the {{fastparquet}} library has no problem reading this file, so we didn't lose any data, but the roundtripping problem was a bit of a surprise. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3998) Support TPC-H dbgen in Arrow
Francois Saint-Jacques created ARROW-3998: - Summary: Support TPC-H dbgen in Arrow Key: ARROW-3998 URL: https://issues.apache.org/jira/browse/ARROW-3998 Project: Apache Arrow Issue Type: Wish Reporter: Francois Saint-Jacques Integration tests and benchmarks should read TPC-H data. This is going to be useful for future query execution engine benchmarking. It could also attract researchers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3997) [C++] [Doc] Clarify dictionary encoding integer signedness (and width?)
Antoine Pitrou created ARROW-3997: - Summary: [C++] [Doc] Clarify dictionary encoding integer signedness (and width?) Key: ARROW-3997 URL: https://issues.apache.org/jira/browse/ARROW-3997 Project: Apache Arrow Issue Type: Improvement Components: C++, Documentation, Format Affects Versions: 0.11.1 Reporter: Antoine Pitrou The Arrow spec states that a dictionary-encoded array uses int32 indices. Signed or unsigned? The spec doesn't say. Also, the C++ implementation supports all kinds of integers as indices (8- to 64-bit, signed and unsigned). I wonder if we should at least mandate a specific signedness. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3996) [C++] Insufficient description on build
Kunihisa Abukawa created ARROW-3996: --- Summary: [C++] Insufficient description on build Key: ARROW-3996 URL: https://issues.apache.org/jira/browse/ARROW-3996 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.11.1 Environment: Ubuntu Linux(Include Ubuntu 18.04 LTS on Windows 10 WSL) Reporter: Kunihisa Abukawa C / C ++ version of Ubuntu Linux environment requires less library / component description. Requirement * g ++ * autoconf * Jemalloc In accordance with the above, you need to add the following to the library / component you install with apt-get install. * libboost-regex-dev * libjemalloc-dev * autotools-dev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3995) [CI] Use understandable names in Travis Matrix
Uwe L. Korn created ARROW-3995: -- Summary: [CI] Use understandable names in Travis Matrix Key: ARROW-3995 URL: https://issues.apache.org/jira/browse/ARROW-3995 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Uwe L. Korn Assignee: Uwe L. Korn Fix For: 0.12.0 Travis has a new feature to assign labels to the matrix entries making it much easier navigable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)