[jira] [Created] (ARROW-6474) Provide mechanism for python to write out old format
Micah Kornfield created ARROW-6474: -- Summary: Provide mechanism for python to write out old format Key: ARROW-6474 URL: https://issues.apache.org/jira/browse/ARROW-6474 Project: Apache Arrow Issue Type: Sub-task Reporter: Micah Kornfield Fix For: 0.15.0 I think this needs to be an environment variable, so it can be made to work with old version of the Java library pyspark integration. [~bryanc] can you check if this captures the requirements? -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6473) [Format] Clarify dictionary encoding edge cases
Micah Kornfield created ARROW-6473: -- Summary: [Format] Clarify dictionary encoding edge cases Key: ARROW-6473 URL: https://issues.apache.org/jira/browse/ARROW-6473 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Format Reporter: Micah Kornfield Assignee: Micah Kornfield Several recent threads on the mailing list: 1. Edge case for all null columns and interleaved dictionaries 2. Semantics non-delta dictionaries (and relation to the file format). 3. Propose a forward compatible enum so dictionaries can represented as other types besides for a "flat" vector. -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: [DISCUSS] IPC buffer layout for Null type
Hi Wes and others, I don't have a sense of where Null arrays get created in the existing code base? Also, do you think it is worth the effort make this backwards compatible. We could in theory tie the buffer count to having the continuation value for alignment. The one area were I'm slightly concerned is we seem to have users in the wild who are depending on backwards compatibility, and I'm try to better understand the odds that we break them. Thanks, Micah On Thu, Sep 5, 2019 at 7:25 AM Wes McKinney wrote: > hi folks, > > One of the as-yet-untested (in integration tests) parts of the > columnar specification is the Null layout. In C++ we additionally > implemented this by writing two length-0 "placeholder" buffers in the > RecordBatch data header, but since the Null layout has no memory > allocated nor any buffers in-memory it may be more proper to write no > buffers (since the length of the Null layout is all you need to > reconstruct it). There are 3 implementations of the placeholder > version (C++, Go, JS, maybe also C#) but it never got implemented in > Java. While technically this would break old serialized data, I would > not expect this to be very frequently occurring in many of the > currently-deployed Arrow applications > > Here is my C++ patch > > https://github.com/apache/arrow/pull/5287 > > I'm not sure we need to formalize this with a vote but I'm interested > in the community's feedback on how to proceed here. > > - Wes >
Re: Timeline for 0.15.0 release
Just for reference [1] has a dashboard of the current issues: https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.15.0+Release On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney wrote: > hi all, > > It doesn't seem like we're going to be in a position to release at the > beginning of next week. I hope that one more week of work (or less) > will be enough to get us there. Aside from merging the alignment > changes, we need to make sure that our packaging jobs required for the > release candidate are all working. > > If folks could remove issues from the 0.15.0 backlog that they don't > think they will finish by end of next week that would help focus > efforts (there are currently 78 issues in 0.15.0 still). I am looking > to tackle a few small features related to dictionaries while the > release window is still open. > > - Wes > > On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney wrote: > > > > hi, > > > > I think we should try to release the week of September 9, so > > development work should be completed by end of next week. > > > > Does that seem reasonable? > > > > I plan to get up a patch for the protocol alignment changes for C++ in > > the next couple of days -- I think that getting the alignment work > > done is the main barrier to releasing. > > > > Thanks > > Wes > > > > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu > wrote: > > > > > > Hi, Wes, on the java side, I can think of several bugs that need to be > fixed or reminded. > > > > > > i. ARROW-6040: Dictionary entries are required in IPC streams even > when empty[1] > > > This one is under review now, however through this PR we find that > there seems a bug in java reading and writing dictionaries in IPC which is > Inconsistent with spec[2] since it assumes all dictionaries are at the > start of stream (see details in PR comments, and this fix may not catch up > with version 0.15). @Micah Kornfield > > > > > > ii. ARROW-1875: Write 64-bit ints as strings in integration test JSON > files[3] > > > Java side code already checked in, other implementations seems not. > > > > > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4] > > > Caused by trying to load all records in one contiguous batch, fixed by > providing iterator API for iteratively reading in ARROW-6219[5]. > > > > > > Thanks, > > > Ji Liu > > > > > > [1] https://github.com/apache/arrow/pull/4960 > > > [2] https://arrow.apache.org/docs/ipc.html > > > [3] https://issues.apache.org/jira/browse/ARROW-1875 > > > [4] https://issues.apache.org/jira/browse/ARROW-6202[5] > https://issues.apache.org/jira/browse/ARROW-6219 > > > > > > > > > > > > -- > > > From:Wes McKinney > > > Send Time:2019年8月19日(星期一) 23:03 > > > To:dev > > > Subject:Re: Timeline for 0.15.0 release > > > > > > I'm going to work some on organizing the 0.15.0 backlog some this > > > week, if anyone wants to help with grooming (particularly for > > > languages other than C++/Python where I'm focusing) that would be > > > helpful. There have been almost 500 JIRA issues opened since the > > > 0.14.0 release, so we should make sure to check whether there's any > > > regressions or other serious bugs that we should try to fix for > > > 0.15.0. > > > > > > On Thu, Aug 15, 2019 at 6:23 PM Wes McKinney > wrote: > > > > > > > > The Windows wheel issue in 0.14.1 seems to be > > > > > > > > https://issues.apache.org/jira/browse/ARROW-6015 > > > > > > > > I think the root cause could be the Windows changes in > > > > > > > > > https://github.com/apache/arrow/commit/223ae744cc2a12c60cecb5db593263a03c13f85a > > > > > > > > I would be appreciative if a volunteer would look into what was wrong > > > > with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows wheels > > > > will be broken, too > > > > > > > > The bad wheels can be found at > > > > > > > > https://bintray.com/apache/arrow/python#files/python%2F0.14.1 > > > > > > > > On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou > wrote: > > > > > > > > > > On Thu, 15 Aug 2019 11:17:07 -0700 > > > > > Micah Kornfield wrote: > > > > > > > > > > > > > > In C++ they are > > > > > > > independent, we could have 32-bit array lengths and > variable-length > > > > > > > types with 64-bit offsets if we wanted (we just wouldn't be > able to > > > > > > > have a List child with more than INT32_MAX elements). > > > > > > > > > > > > I think the point is we could do this in C++ but we don't. I'm > not sure we > > > > > > would have introduced the "Large" types if we did. > > > > > > > > > > 64-bit offsets take twice as much space as 32-bit offsets, so if > you're > > > > > storing lots of small-ish lists or strings, 32-bit offsets are > > > > > preferrable. So even with 64-bit array lengths from the start it > would > > > > > still be beneficial to have types with 32-bit offsets. > > > > > > > > > > > Going with the limited address space in Java and calling it a > reference > > > > > > implementation seems suboptimal. If a consumer uses a
Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and Neal Richardson
Congrats everyone. On Thu, Sep 5, 2019 at 7:06 PM Ji Liu wrote: > Congratulations! > > Thanks, > Ji Liu > > > -- > From:Fan Liya > Send Time:2019年9月6日(星期五) 09:28 > To:dev > Subject:Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and > Neal Richardson > > Big congratulations to Ben, Kenta and Neal! > > Best, > Liya Fan > > On Fri, Sep 6, 2019 at 5:33 AM Wes McKinney wrote: > > > hi all, > > > > on behalf of the Arrow PMC, I'm pleased to announce that Ben, Kenta, > > and Neal have accepted invitations to become Arrow committers. Welcome > > and thank you for all your contributions! > > >
Re: New Users on JIRA
Thanks on both counts Wes! From: Wes McKinney Sent: Thursday, September 5, 2019 10:52 PM To: dev Subject: Re: New Users on JIRA hi Paddy, I keep all the e-mail in Gmail, it's easy to search there. The Pony Mail interface works well too https://lists.apache.org/list.html?dev@arrow.apache.org To assign issues to new users * Navigate to "JIRA Administration > Projects" in the top right * Click on "Apache Arrow" * Click "Users and Roles" on the left * Click "Add users to role" * Type user name or username * Make sure to select "Contributor" * Click Add I just took care of this one. - Wes On Thu, Sep 5, 2019 at 9:44 PM paddy horan wrote: > > Hi All, > > I have the same issue again where there is a new user (hengruo) that needs > permissions changed so I can assign an issue. I know that this was discussed > recently which leads me to another question. > > How do others find previous conversations in the mailing list archives? I > find it pretty tedious to navigate the archive when looking for specific > threads. Do others keep the mail in their e-mail clients for future > searching or is there some search functionality or tool I am missing? > > Thanks, > Paddy
Re: New Users on JIRA
hi Paddy, I keep all the e-mail in Gmail, it's easy to search there. The Pony Mail interface works well too https://lists.apache.org/list.html?dev@arrow.apache.org To assign issues to new users * Navigate to "JIRA Administration > Projects" in the top right * Click on "Apache Arrow" * Click "Users and Roles" on the left * Click "Add users to role" * Type user name or username * Make sure to select "Contributor" * Click Add I just took care of this one. - Wes On Thu, Sep 5, 2019 at 9:44 PM paddy horan wrote: > > Hi All, > > I have the same issue again where there is a new user (hengruo) that needs > permissions changed so I can assign an issue. I know that this was discussed > recently which leads me to another question. > > How do others find previous conversations in the mailing list archives? I > find it pretty tedious to navigate the archive when looking for specific > threads. Do others keep the mail in their e-mail clients for future > searching or is there some search functionality or tool I am missing? > > Thanks, > Paddy
New Users on JIRA
Hi All, I have the same issue again where there is a new user (hengruo) that needs permissions changed so I can assign an issue. I know that this was discussed recently which leads me to another question. How do others find previous conversations in the mailing list archives? I find it pretty tedious to navigate the archive when looking for specific threads. Do others keep the mail in their e-mail clients for future searching or is there some search functionality or tool I am missing? Thanks, Paddy
Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and Neal Richardson
Congratulations! Thanks, Ji Liu -- From:Fan Liya Send Time:2019年9月6日(星期五) 09:28 To:dev Subject:Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and Neal Richardson Big congratulations to Ben, Kenta and Neal! Best, Liya Fan On Fri, Sep 6, 2019 at 5:33 AM Wes McKinney wrote: > hi all, > > on behalf of the Arrow PMC, I'm pleased to announce that Ben, Kenta, > and Neal have accepted invitations to become Arrow committers. Welcome > and thank you for all your contributions! >
Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and Neal Richardson
Big congratulations to Ben, Kenta and Neal! Best, Liya Fan On Fri, Sep 6, 2019 at 5:33 AM Wes McKinney wrote: > hi all, > > on behalf of the Arrow PMC, I'm pleased to announce that Ben, Kenta, > and Neal have accepted invitations to become Arrow committers. Welcome > and thank you for all your contributions! >
[jira] [Created] (ARROW-6472) [Java] ValueVector#accept may has potential cast exception
Ji Liu created ARROW-6472: - Summary: [Java] ValueVector#accept may has potential cast exception Key: ARROW-6472 URL: https://issues.apache.org/jira/browse/ARROW-6472 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Per discussion [https://github.com/apache/arrow/pull/5195#issuecomment-528425302] We may use API this way: {code:java} RangeEqualsVisitor visitor = new RangeEqualsVisitor(vector1, vector2); vector3.accept(visitor, range){code} if vector1/vector2 are say, {{StructVector}}s and vector3 is an {{IntVector}} - things can go bad. we'll use the {{compareBaseFixedWidthVectors()}} and do wrong type-casts for vector1/vector2. -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: Timeline for 0.15.0 release
hi all, It doesn't seem like we're going to be in a position to release at the beginning of next week. I hope that one more week of work (or less) will be enough to get us there. Aside from merging the alignment changes, we need to make sure that our packaging jobs required for the release candidate are all working. If folks could remove issues from the 0.15.0 backlog that they don't think they will finish by end of next week that would help focus efforts (there are currently 78 issues in 0.15.0 still). I am looking to tackle a few small features related to dictionaries while the release window is still open. - Wes On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney wrote: > > hi, > > I think we should try to release the week of September 9, so > development work should be completed by end of next week. > > Does that seem reasonable? > > I plan to get up a patch for the protocol alignment changes for C++ in > the next couple of days -- I think that getting the alignment work > done is the main barrier to releasing. > > Thanks > Wes > > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu wrote: > > > > Hi, Wes, on the java side, I can think of several bugs that need to be > > fixed or reminded. > > > > i. ARROW-6040: Dictionary entries are required in IPC streams even when > > empty[1] > > This one is under review now, however through this PR we find that there > > seems a bug in java reading and writing dictionaries in IPC which is > > Inconsistent with spec[2] since it assumes all dictionaries are at the > > start of stream (see details in PR comments, and this fix may not catch up > > with version 0.15). @Micah Kornfield > > > > ii. ARROW-1875: Write 64-bit ints as strings in integration test JSON > > files[3] > > Java side code already checked in, other implementations seems not. > > > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4] > > Caused by trying to load all records in one contiguous batch, fixed by > > providing iterator API for iteratively reading in ARROW-6219[5]. > > > > Thanks, > > Ji Liu > > > > [1] https://github.com/apache/arrow/pull/4960 > > [2] https://arrow.apache.org/docs/ipc.html > > [3] https://issues.apache.org/jira/browse/ARROW-1875 > > [4] https://issues.apache.org/jira/browse/ARROW-6202[5] > > https://issues.apache.org/jira/browse/ARROW-6219 > > > > > > > > -- > > From:Wes McKinney > > Send Time:2019年8月19日(星期一) 23:03 > > To:dev > > Subject:Re: Timeline for 0.15.0 release > > > > I'm going to work some on organizing the 0.15.0 backlog some this > > week, if anyone wants to help with grooming (particularly for > > languages other than C++/Python where I'm focusing) that would be > > helpful. There have been almost 500 JIRA issues opened since the > > 0.14.0 release, so we should make sure to check whether there's any > > regressions or other serious bugs that we should try to fix for > > 0.15.0. > > > > On Thu, Aug 15, 2019 at 6:23 PM Wes McKinney wrote: > > > > > > The Windows wheel issue in 0.14.1 seems to be > > > > > > https://issues.apache.org/jira/browse/ARROW-6015 > > > > > > I think the root cause could be the Windows changes in > > > > > > https://github.com/apache/arrow/commit/223ae744cc2a12c60cecb5db593263a03c13f85a > > > > > > I would be appreciative if a volunteer would look into what was wrong > > > with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows wheels > > > will be broken, too > > > > > > The bad wheels can be found at > > > > > > https://bintray.com/apache/arrow/python#files/python%2F0.14.1 > > > > > > On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou > > > wrote: > > > > > > > > On Thu, 15 Aug 2019 11:17:07 -0700 > > > > Micah Kornfield wrote: > > > > > > > > > > > > In C++ they are > > > > > > independent, we could have 32-bit array lengths and variable-length > > > > > > types with 64-bit offsets if we wanted (we just wouldn't be able to > > > > > > have a List child with more than INT32_MAX elements). > > > > > > > > > > I think the point is we could do this in C++ but we don't. I'm not > > > > > sure we > > > > > would have introduced the "Large" types if we did. > > > > > > > > 64-bit offsets take twice as much space as 32-bit offsets, so if you're > > > > storing lots of small-ish lists or strings, 32-bit offsets are > > > > preferrable. So even with 64-bit array lengths from the start it would > > > > still be beneficial to have types with 32-bit offsets. > > > > > > > > > Going with the limited address space in Java and calling it a > > > > > reference > > > > > implementation seems suboptimal. If a consumer uses a "Large" type > > > > > presumably it is because they need the ability to store more than > > > > > INT32_MAX > > > > > child elements in a column, otherwise it is just wasting space [1]. > > > > > > > > Probably. Though if the individual elements (lists or strings) are > > > > large, not much space is wasted in proportion, so it may be simpler in > > > > such a
Re: [PROPOSAL] Consolidate Arrow's CI configuration
hi Krisztian, Anyone who's developing in the project can see that the Buildbot setup is working well (at least for Linux builds) and giving much more timely feedback, which has been very helpful. I'm concerned about the "ursabot" approach for a few reasons: * If we are to centralize our tooling for Arrow CI builds, why can we not have the build tool itself under Arrow governance? * The current "ursabot" tool has GPL dependencies. Can these be factored out into plugins so that the tool itself is ASF-compatible? * This is a bit nitpicky but the name "ursabot" bears the name mark of an organization that funds developers in this project. I'm concerned about this, as I would about a tool named "clouderabot", "dremiobot", "databricksbot", "googlebot", "ibmbot" or anything like that. It's different from using a tool developed by an unaffiliated third party In any case, I think putting the build configurations for the current Ursa Labs-managed build cluster in the Apache Arrow repository is a good idea, but there are likely a number of issues that we need to address to be able to contemplate having a hard dependency between the CI that we depend on to merge patches and this tool. - Wes On Thu, Sep 5, 2019 at 8:17 AM Antoine Pitrou wrote: > > > Le 05/09/2019 à 15:04, Krisztián Szűcs a écrit : > >> > >> If going with buildbot, this means that the various build steps need to > >> be generic like in Travis-CI (e.g. "install", "setup", "before-test", > >> "test", "after-test"...) and their contents expressed outside of the > >> buildmaster configuration per se. > >> > > This is partially resolved with the Builder abstraction, see an example > > here [1]. We just need to add and reload these Builder configurations > > dynamically on certain events, like when someone changes a builder > > from a PR. > > This is inside the buildmaster process, right? I don't understand how > you plan to change those dynamically without affecting all concurrent > builds. > > Regards > > Antoine.
[jira] [Created] (ARROW-6471) [Python] arrow_to_pandas.cc has separate code paths for populating list values into an object array
Wes McKinney created ARROW-6471: --- Summary: [Python] arrow_to_pandas.cc has separate code paths for populating list values into an object array Key: ARROW-6471 URL: https://issues.apache.org/jira/browse/ARROW-6471 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney See patch for ARROW-6369 https://github.com/apache/arrow/pull/5301. There are two different code paths for writing list values into a {{PyObject**}} output buffer. This seems like it could be simplified -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6470) Segmentation fault when trying to serialzie empty SerializeRecordBatch
Wamsi Viswanath created ARROW-6470: -- Summary: Segmentation fault when trying to serialzie empty SerializeRecordBatch Key: ARROW-6470 URL: https://issues.apache.org/jira/browse/ARROW-6470 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.13.0 Reporter: Wamsi Viswanath Below is a simple reproducible example, please let me know if the behavior is valid: {color:#ffa759}int{color} {color:#ffd580}main{color}{color:#cbccc6}(){color} {color:#cbccc6}{{color} {color:#73d0ff}std{color}{color:#cbccc6}::{color}{color:#cbccc6}shared_ptr{color}{color:#f29e74}<{color}{color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#cbccc6}Schema{color}{color:#f29e74}>{color}{color:#cbccc6} schema {color}{color:#f29e74}={color} {color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#ffd580}schema{color}{color:#cbccc6}({{color}{color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#ffd580}field{color}{color:#cbccc6}({color}{color:#bae67e}"int_"{color}{color:#cbccc6},{color} {color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#ffd580}int32{color}{color:#cbccc6}(){color}{color:#cbccc6},{color} {color:#ffcc66}false{color}{color:#cbccc6})}){color}{color:#cbccc6};{color} {color:#73d0ff}std{color}{color:#cbccc6}::{color}{color:#cbccc6}vector{color}{color:#f29e74}<{color}{color:#73d0ff}std{color}{color:#cbccc6}::{color}{color:#cbccc6}shared_ptr{color}{color:#f29e74}<{color}{color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#cbccc6}Array{color}{color:#f29e74}>>{color}{color:#cbccc6} arrays {color}{color:#f29e74}={color} {color:#cbccc6}{}{color}{color:#cbccc6};{color} {color:#73d0ff}std{color}{color:#cbccc6}::{color}{color:#cbccc6}shared_ptr{color}{color:#f29e74}<{color}{color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#cbccc6}RecordBatch{color}{color:#f29e74}>{color}{color:#cbccc6} record_batch {color}{color:#f29e74}={color} {color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#73d0ff}RecordBatch{color}{color:#cbccc6}::{color}{color:#ffd580}Make{color}{color:#cbccc6}({color}{color:#cbccc6}schema{color}{color:#cbccc6},{color} {color:#cbccc6}arrays{color}{color:#cbccc6}[{color}{color:#ffcc66}0{color}{color:#cbccc6}]{color}{color:#cbccc6}->{color}{color:#ffd580}length{color}{color:#cbccc6}(){color}{color:#cbccc6},{color}{color:#cbccc6} arrays{color}{color:#cbccc6}){color}{color:#cbccc6};{color} {color:#73d0ff}std{color}{color:#cbccc6}::{color}{color:#cbccc6}shared_ptr{color}{color:#f29e74}<{color}{color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#cbccc6}Buffer{color}{color:#f29e74}>{color}{color:#cbccc6} serialized_buffer{color}{color:#cbccc6};{color} {color:#ffa759}if{color} {color:#cbccc6}({color}{color:#f29e74}!{color}{color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#73d0ff}ipc{color}{color:#cbccc6}::{color}{color:#ffd580}SerializeRecordBatch{color}{color:#cbccc6}({color} {color:#f29e74}*{color}{color:#cbccc6}record_batch{color}{color:#cbccc6},{color} {color:#73d0ff}arrow{color}{color:#cbccc6}::{color}{color:#ffd580}default_memory_pool{color}{color:#cbccc6}(){color}{color:#cbccc6},{color} {color:#f29e74}&{color}{color:#cbccc6}serialized_buffer{color}{color:#cbccc6}){color} {color:#cbccc6} .{color}{color:#ffd580}ok{color}{color:#cbccc6}()){color} {color:#cbccc6}{{color} {color:#ffa759}throw{color} {color:#73d0ff}std{color}{color:#cbccc6}::{color}{color:#ffd580}runtime_error{color}{color:#cbccc6}({color}{color:#bae67e}"Error: Serializing Records."{color}{color:#cbccc6}){color}{color:#cbccc6};{color} {color:#cbccc6}}{color} {color:#cbccc6}}{color} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6469) PyArrow HDFS documentation does not mention HDFS short circuit readings
Paulo Roberto Cerioni created ARROW-6469: Summary: PyArrow HDFS documentation does not mention HDFS short circuit readings Key: ARROW-6469 URL: https://issues.apache.org/jira/browse/ARROW-6469 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Paulo Roberto Cerioni Due to PyArrow using libhdfs underneath, it is expected that files reading from HDFS are going to make use of short circuit readings. However, the PyArrow documentation does not explain whether this feature is supported (and on what situations) and if that works without any configuration. For instance, I'm interested in the use case in which we make use of short circuit feature to read some of the columns from a Parquet file located in HDFS into a dataframe. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and Neal Richardson
hi all, on behalf of the Arrow PMC, I'm pleased to announce that Ben, Kenta, and Neal have accepted invitations to become Arrow committers. Welcome and thank you for all your contributions!
[jira] [Created] (ARROW-6468) [C++] Remove unused hashing routines
Antoine Pitrou created ARROW-6468: - Summary: [C++] Remove unused hashing routines Key: ARROW-6468 URL: https://issues.apache.org/jira/browse/ARROW-6468 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Antoine Pitrou The adoption of xxh3 for hashing (in ARROW-6385) probably left around some specialized but unused hashing functions (e.g. CRC-based hashing, perhaps also murmurhash). We should probably remove them if no problem surfaces with xxh3. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6467) [Website] Transition to new .asf.yaml machinery for website publishing
Wes McKinney created ARROW-6467: --- Summary: [Website] Transition to new .asf.yaml machinery for website publishing Key: ARROW-6467 URL: https://issues.apache.org/jira/browse/ARROW-6467 Project: Apache Arrow Issue Type: Improvement Components: Website Reporter: Wes McKinney The ASF is providing a new configuration option for website publishing https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories This is timely since I've found deploys to be slow of late via the current mechanism https://issues.apache.org/jira/browse/INFRA-18987 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6466) [Developer] Refactor integration/integration_test.py into a proper Python package
Wes McKinney created ARROW-6466: --- Summary: [Developer] Refactor integration/integration_test.py into a proper Python package Key: ARROW-6466 URL: https://issues.apache.org/jira/browse/ARROW-6466 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Wes McKinney This could also facilitate writing unit tests for the integration tests. Maybe this could be a part of archery? -- This message was sent by Atlassian Jira (v8.3.2#803003)
[DISCUSS] IPC buffer layout for Null type
hi folks, One of the as-yet-untested (in integration tests) parts of the columnar specification is the Null layout. In C++ we additionally implemented this by writing two length-0 "placeholder" buffers in the RecordBatch data header, but since the Null layout has no memory allocated nor any buffers in-memory it may be more proper to write no buffers (since the length of the Null layout is all you need to reconstruct it). There are 3 implementations of the placeholder version (C++, Go, JS, maybe also C#) but it never got implemented in Java. While technically this would break old serialized data, I would not expect this to be very frequently occurring in many of the currently-deployed Arrow applications Here is my C++ patch https://github.com/apache/arrow/pull/5287 I'm not sure we need to formalize this with a vote but I'm interested in the community's feedback on how to proceed here. - Wes
[jira] [Created] (ARROW-6465) [Python] Improve Windows build instructions
ARF created ARROW-6465: -- Summary: [Python] Improve Windows build instructions Key: ARROW-6465 URL: https://issues.apache.org/jira/browse/ARROW-6465 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: ARF Assignee: ARF The current instructions for building the pyarrow python extension are incomplete. Problems include: * missing re2, llvm, clang prerequisites * missing info on which MSVC toolsets are supported * missing info on how the build commands to different MSVC toolsets * missing warning about currently broken Windows build config The linked PR amends the python developer documentation with above. -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: [PROPOSAL] Consolidate Arrow's CI configuration
Le 05/09/2019 à 15:04, Krisztián Szűcs a écrit : >> >> If going with buildbot, this means that the various build steps need to >> be generic like in Travis-CI (e.g. "install", "setup", "before-test", >> "test", "after-test"...) and their contents expressed outside of the >> buildmaster configuration per se. >> > This is partially resolved with the Builder abstraction, see an example > here [1]. We just need to add and reload these Builder configurations > dynamically on certain events, like when someone changes a builder > from a PR. This is inside the buildmaster process, right? I don't understand how you plan to change those dynamically without affecting all concurrent builds. Regards Antoine.
[jira] [Created] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
Ji Liu created ARROW-6464: - Summary: [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API Key: ARROW-6464 URL: https://issues.apache.org/jira/browse/ARROW-6464 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Currently {{FixedSizeListVector#splitAndTransfer}} actually use {{copyValueSafe}} which has memory copy, we should use slice API instead. Meanwhile, {{splitAndTransfer}} in all classes should position index check at beginning. -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: [PROPOSAL] Consolidate Arrow's CI configuration
Hey Antoine, On Thu, Sep 5, 2019 at 2:54 PM Antoine Pitrou wrote: > > Le 05/09/2019 à 14:43, Uwe L. Korn a écrit : > > Hello Krisztián, > > > >> Am 05.09.2019 um 14:22 schrieb Krisztián Szűcs < > szucs.kriszt...@gmail.com>: > >> > >>> * The build configuration is automatically updated on a merge to > master? > >>> > >> Not yet, but this can be automatized too with buildbot itself. > > > > This is something I would actually like to have before getting rid of > the Travis jobs. Otherwise we would be constrainted quite a bit in > development when master CI breaks because of an environment issue until one > of the few people who can update the config become available. > > I would go further and say that PRs and branches need to be able to run > different build configurations. We are moving too fast to afford an > inflexible centralized configuration. > Agree. I haven't had time to work on it yet, although I have a couple of solutions in mind. Once we decide to move on with this proposal we can allocate time to resolve it. > > If going with buildbot, this means that the various build steps need to > be generic like in Travis-CI (e.g. "install", "setup", "before-test", > "test", "after-test"...) and their contents expressed outside of the > buildmaster configuration per se. > This is partially resolved with the Builder abstraction, see an example here [1]. We just need to add and reload these Builder configurations dynamically on certain events, like when someone changes a builder from a PR. [1]: https://github.com/apache/arrow/blob/305e7387d429f095019c74f17e0c9c7cb443bb70/ci/buildbot/arrow/builders.py#L366 > > Regards > > Antoine. >
Re: [PROPOSAL] Consolidate Arrow's CI configuration
Le 05/09/2019 à 14:43, Uwe L. Korn a écrit : > Hello Krisztián, > >> Am 05.09.2019 um 14:22 schrieb Krisztián Szűcs : >> >>> * The build configuration is automatically updated on a merge to master? >>> >> Not yet, but this can be automatized too with buildbot itself. > > This is something I would actually like to have before getting rid of the > Travis jobs. Otherwise we would be constrainted quite a bit in development > when master CI breaks because of an environment issue until one of the few > people who can update the config become available. I would go further and say that PRs and branches need to be able to run different build configurations. We are moving too fast to afford an inflexible centralized configuration. If going with buildbot, this means that the various build steps need to be generic like in Travis-CI (e.g. "install", "setup", "before-test", "test", "after-test"...) and their contents expressed outside of the buildmaster configuration per se. Regards Antoine.
Re: [PROPOSAL] Consolidate Arrow's CI configuration
Hello Krisztián, > Am 05.09.2019 um 14:22 schrieb Krisztián Szűcs : > >> * The build configuration is automatically updated on a merge to master? >> > Not yet, but this can be automatized too with buildbot itself. This is something I would actually like to have before getting rid of the Travis jobs. Otherwise we would be constrainted quite a bit in development when master CI breaks because of an environment issue until one of the few people who can update the config become available. Uwe > >> >> And then a not so simple one: What will happen to our current >> docker-compose setup? From the PR it seems like we do similar things with >> ursabot but not using the central docker-compose.yml? >> > Currently we're using docker-compose to run one-off containers rather > than long running, multi-container services (which docker-compose is > designed for). Ursabot already supports the features we need from > docker-compose, so it can effectively replace the docker-compose > setup as well. We have low-level control over the docker API, so we > are able to tailor it to our requirements. > >> >> >> Cheers >> Uwe >> >>> Am 29.08.2019 um 14:19 schrieb Krisztián Szűcs < >> szucs.kriszt...@gmail.com>: >>> >>> Hi, >>> >>> Arrow's current continuous integration setup utilizes multiple CI >>> providers, >>> tools, and scripts: >>> >>> - Unit tests are running on Travis and Appveyor >>> - Binary packaging builds are running on crossbow, an abstraction over >>> multiple >>> CI providers driven through a GitHub repository >>> - For local tests and tasks, there is a docker-compose setup, or of >> course >>> you >>> can maintain your own environment >>> >>> This setup has run into some limitations: >>> - It’s slow: the CI parallelism of Travis has degraded over the last >>> couple of >>> months. Testing a PR takes more than an hour, which is a long time for >>> both >>> the maintainers and the contributors, and it has a negative effect on >>> the >>> development throughput. >>> - Build configurations are not portable, they are tied to specific >>> services. >>> You can’t just take a Travis script and run it somewhere else. >>> - Because they’re not portable, build configurations are duplicated in >>> several >>> places. >>> - The Travis, Appveyor and crossbow builds are not reproducible locally, >>> so >>> developing them requires the slow git push cycles. >>> - Public CI has limited platform support, just for example ARM machines >>> are >>> not available. >>> - Public CI also has limited hardware support, no GPUs are available >>> >>> Resolving all of the issues above is complicated, but is a must for the >>> long >>> term sustainability of Arrow. >>> >>> For some time, we’ve been working on a tool called Ursabot[1], a library >> on >>> top >>> of the CI framework Buildbot[2]. Buildbot is well maintained and widely >>> used >>> for complex projects, including CPython, Webkit, LLVM, MariaDB, etc. >>> Buildbot >>> is not another hosted CI service like Travis or Appveyor: it is an >>> extensible >>> framework to implement various automations like continuous integration >>> tasks. >>> >>> You’ve probably noticed additional “Ursabot” builds appearing on pull >>> requests, >>> in addition to the Travis and Appveyor builds. We’ve been testing the >>> framework >>> with a fully featured CI server at ci.ursalabs.org. This service runs >> build >>> configurations we can’t run on Travis, does it faster than Travis, and >> has >>> the >>> GitHub comment bot integration for ad hoc build triggering. >>> >>> While we’re not prepared to propose moving all CI to a self-hosted setup, >>> our >>> work has demonstrated the potential of using buildbot to resolve Arrow’s >>> continuous integration challenges: >>> - The docker-based builders are reusing the docker images, which >> eliminate >>> slow dependency installation steps. Some builds on this setup, run on >>> Ursa Labs’s infrastructure, run 20 minutes faster than the comparable >>> Travis-CI jobs. >>> - It’s scalable. We can deploy buildbot wherever and add more masters and >>> workers, which we can’t do with public CI. >>> - It’s platform and CI-provider independent. Builds can be run on >>> arbitrary >>> architectures, operating systems, and hardware: Python is the only >>> requirement. Additionally builds specified in buildbot/ursabot can be >>> run >>> anywhere: not only on custom buildbot infrastructure but also on >> Travis, >>> or >>> even on your own machine. >>> - It improves reproducibility and encourages consolidation of >>> configuration. >>> You can run the exact job locally that ran on Travis, and you can even >>> get >>> an interactive shell in the build so you can debug a test failure. And >>> because you can run the same job anywhere, we wouldn’t need to have >>> duplicated, Travis-specific or the docker-compose build configuration >>> stored >>> separately. >>> - It’s extensible. More exotic features like a comment bot,
Re: [PROPOSAL] Consolidate Arrow's CI configuration
Hey Uwe, On Thu, Sep 5, 2019 at 1:49 PM Uwe L. Korn wrote: > Hello Krisztián, > > I like this proposal. CI coverage and response time is a crucial thing for > the health of the project. In general I like the consolidation and local > reproducibility of tge builds. Some questions I wanted to ask to make sure > I understand your proposal correctly (hopefully they all can be answered > with a simple yes): > > * Windows builds will stay in Appveyor for now? > Yes. Afterwards I'd go with the following steps: 1. Port the AppVeyor configurations to buildbot and run them on AppVeyor with `ursabot project build windows-builder-name` 2. Once we have windows workers, and they are reliable, we can decommission the AppVeyor builds. > * MacOS builds will stay in Travis? > Yes, same as above. > * All other builds will be removed from Travis? Not all of the Travis builds are ported to buildbot yet, namely: c_glib, ruby, and the format integration tests. I suggest an incremental procedure, if the travis build is ported to buildbot, we can choose to still run it on travis or we can choose disable it. In this case Travis would only be a hosting provider. > * Machines are currently run and funded by UrsaLabs but others could also > sponsor an instance that could be added to the setup? > Exactly, either in the cloud or a bare machines, buildbot enables us to scale our cluster pretty easily. > * The build configuration is automatically updated on a merge to master? > Not yet, but this can be automatized too with buildbot itself. > > And then a not so simple one: What will happen to our current > docker-compose setup? From the PR it seems like we do similar things with > ursabot but not using the central docker-compose.yml? > Currently we're using docker-compose to run one-off containers rather than long running, multi-container services (which docker-compose is designed for). Ursabot already supports the features we need from docker-compose, so it can effectively replace the docker-compose setup as well. We have low-level control over the docker API, so we are able to tailor it to our requirements. > > > Cheers > Uwe > > > Am 29.08.2019 um 14:19 schrieb Krisztián Szűcs < > szucs.kriszt...@gmail.com>: > > > > Hi, > > > > Arrow's current continuous integration setup utilizes multiple CI > > providers, > > tools, and scripts: > > > > - Unit tests are running on Travis and Appveyor > > - Binary packaging builds are running on crossbow, an abstraction over > > multiple > > CI providers driven through a GitHub repository > > - For local tests and tasks, there is a docker-compose setup, or of > course > > you > > can maintain your own environment > > > > This setup has run into some limitations: > > - It’s slow: the CI parallelism of Travis has degraded over the last > > couple of > > months. Testing a PR takes more than an hour, which is a long time for > > both > > the maintainers and the contributors, and it has a negative effect on > > the > > development throughput. > > - Build configurations are not portable, they are tied to specific > > services. > > You can’t just take a Travis script and run it somewhere else. > > - Because they’re not portable, build configurations are duplicated in > > several > > places. > > - The Travis, Appveyor and crossbow builds are not reproducible locally, > > so > > developing them requires the slow git push cycles. > > - Public CI has limited platform support, just for example ARM machines > > are > > not available. > > - Public CI also has limited hardware support, no GPUs are available > > > > Resolving all of the issues above is complicated, but is a must for the > > long > > term sustainability of Arrow. > > > > For some time, we’ve been working on a tool called Ursabot[1], a library > on > > top > > of the CI framework Buildbot[2]. Buildbot is well maintained and widely > > used > > for complex projects, including CPython, Webkit, LLVM, MariaDB, etc. > > Buildbot > > is not another hosted CI service like Travis or Appveyor: it is an > > extensible > > framework to implement various automations like continuous integration > > tasks. > > > > You’ve probably noticed additional “Ursabot” builds appearing on pull > > requests, > > in addition to the Travis and Appveyor builds. We’ve been testing the > > framework > > with a fully featured CI server at ci.ursalabs.org. This service runs > build > > configurations we can’t run on Travis, does it faster than Travis, and > has > > the > > GitHub comment bot integration for ad hoc build triggering. > > > > While we’re not prepared to propose moving all CI to a self-hosted setup, > > our > > work has demonstrated the potential of using buildbot to resolve Arrow’s > > continuous integration challenges: > > - The docker-based builders are reusing the docker images, which > eliminate > > slow dependency installation steps. Some builds on this setup, run on > > Ursa Labs’s infrastructure, run 20 minutes faster than
Re: [Discuss][Java] Support conversions between delta vector and partial sum vector
Hi Micah, Thanks for your comments. I am aware that you have invested lots of time and effort in reviewing the algorithm related code. We really appreciate it. Thank you so much. I agree with you that the plan document is a good idea. In general, the algorithms are driven by applications, so it is difficult to give a precise plan. However, I am going to prepare a document about the requirement/design/implementation of the algorithms. Hope that will make discussions/code review more efficient. Best, Liya Fan On Thu, Sep 5, 2019 at 11:46 AM Fan Liya wrote: > Hi Wes, > > Thanks a lot for the comments. > You are right. This can be applied to the data encoding/compression, and I > think this is one of the building blocks for encoding/compression. > > In the short term, it will provide conversions between the two memory > layouts of run length vectors. > In the mid term, it can help to reduce the network traffic for > varchar/varbinary vectors. > In the long term, it will provide compression for more scenarios. > > The basic idea is based on the observations that, a vector usually has a > smaller width after converting to a delta vector. > > For example, for a varchar vector with a large number of elements, the > offset buffer will use 4 bytes for each element. > However it is likely that, the strings in the vectors are not big (less > than 65536 in length). So by converting the offset buffer to a delta > vector, we can use a int vector with 2 byte width. > > Best, > Liya Fan > > > On Thu, Sep 5, 2019 at 3:05 AM Wes McKinney wrote: > >> hi, >> >> Having utility algorithms to perform data transformations seems fine >> if there is a use for them and maintaining the code in the Arrow >> libraries makes sense. >> >> I don't understand point #2 "We can transform them to delta vectors >> before IPC". It sounds like you are proposing a data compression >> technique. Should this be a part of the >> sparseness/encoding/compression discussion? >> >> - Wes >> >> On Sun, Sep 1, 2019 at 10:14 PM Fan Liya wrote: >> > >> > Dear all, >> > >> > We want to support a feature for conversions between delta vector and >> > partial sum vector. Please give your valuable feedback. >> > >> > Best, >> > >> > Liya Fan >> > >> > What is a delta vector/partial sum vector? >> > >> > Given an integer vector a with length n, its partial sum vector is >> another >> > integer vector b with length n + 1, with values defined as: >> > >> > b(0) = initial sum >> > b(i ) = a(0) + a(1) + ... + a(i - 1) i = 1, 2, ..., n >> > >> > Given an integer vector with length n + 1, its delta vector is another >> > integer vector b with length n, with values defined as: >> > >> > b(i ) = a(i ) - a(i - 1), i = 0, 1, ... , n -1 >> > >> > In this issue, we provide utilities to convert between vector and >> partial >> > sum vector. It is interesting to note that the two operations >> corresponding >> > to the discrete integration and differentian. >> > >> > These conversions have wide applications. For example, >> > >> >1. >> > >> >The run-length vector proposed by Micah is based on the partial sum >> >vector, while the deduplication functionality is based on delta >> vector. >> >This issue provides conversions between them. >> >2. >> > >> >The current VarCharVector/VarBinaryVector implementations are based >> on >> >partial sum vector. We can transform them to delta vectors before >> IPC, to >> >reduce network traffic. >> >3. >> > >> >Converting to delta can be considered as a way for data compression. >> To >> >further reduce the data volume, the operation can be applied more >> than >> >once, to further reduce data volume. >> > >> > Points to discuss: >> > The API should be provided at the level of vector or ArrowBuf, or both? >> > 1. If it is based on vector, there can be performance overhead due to >> > virtual method calls. >> > 2. If it is base on ArrowBuf, some underlying details (type width) are >> > exposed to the end user, which is not compliant with the principle of >> > encapsulation. >> >
Re: [PROPOSAL] Consolidate Arrow's CI configuration
Hello Krisztián, I like this proposal. CI coverage and response time is a crucial thing for the health of the project. In general I like the consolidation and local reproducibility of tge builds. Some questions I wanted to ask to make sure I understand your proposal correctly (hopefully they all can be answered with a simple yes): * Windows builds will stay in Appveyor for now? * MacOS builds will stay in Travis? * All other builds will be removed from Travis? * Machines are currently run and funded by UrsaLabs but others could also sponsor an instance that could be added to the setup? * The build configuration is automatically updated on a merge to master? And then a not so simple one: What will happen to our current docker-compose setup? From the PR it seems like we do similar things with ursabot but not using the central docker-compose.yml? Cheers Uwe > Am 29.08.2019 um 14:19 schrieb Krisztián Szűcs : > > Hi, > > Arrow's current continuous integration setup utilizes multiple CI > providers, > tools, and scripts: > > - Unit tests are running on Travis and Appveyor > - Binary packaging builds are running on crossbow, an abstraction over > multiple > CI providers driven through a GitHub repository > - For local tests and tasks, there is a docker-compose setup, or of course > you > can maintain your own environment > > This setup has run into some limitations: > - It’s slow: the CI parallelism of Travis has degraded over the last > couple of > months. Testing a PR takes more than an hour, which is a long time for > both > the maintainers and the contributors, and it has a negative effect on > the > development throughput. > - Build configurations are not portable, they are tied to specific > services. > You can’t just take a Travis script and run it somewhere else. > - Because they’re not portable, build configurations are duplicated in > several > places. > - The Travis, Appveyor and crossbow builds are not reproducible locally, > so > developing them requires the slow git push cycles. > - Public CI has limited platform support, just for example ARM machines > are > not available. > - Public CI also has limited hardware support, no GPUs are available > > Resolving all of the issues above is complicated, but is a must for the > long > term sustainability of Arrow. > > For some time, we’ve been working on a tool called Ursabot[1], a library on > top > of the CI framework Buildbot[2]. Buildbot is well maintained and widely > used > for complex projects, including CPython, Webkit, LLVM, MariaDB, etc. > Buildbot > is not another hosted CI service like Travis or Appveyor: it is an > extensible > framework to implement various automations like continuous integration > tasks. > > You’ve probably noticed additional “Ursabot” builds appearing on pull > requests, > in addition to the Travis and Appveyor builds. We’ve been testing the > framework > with a fully featured CI server at ci.ursalabs.org. This service runs build > configurations we can’t run on Travis, does it faster than Travis, and has > the > GitHub comment bot integration for ad hoc build triggering. > > While we’re not prepared to propose moving all CI to a self-hosted setup, > our > work has demonstrated the potential of using buildbot to resolve Arrow’s > continuous integration challenges: > - The docker-based builders are reusing the docker images, which eliminate > slow dependency installation steps. Some builds on this setup, run on > Ursa Labs’s infrastructure, run 20 minutes faster than the comparable > Travis-CI jobs. > - It’s scalable. We can deploy buildbot wherever and add more masters and > workers, which we can’t do with public CI. > - It’s platform and CI-provider independent. Builds can be run on > arbitrary > architectures, operating systems, and hardware: Python is the only > requirement. Additionally builds specified in buildbot/ursabot can be > run > anywhere: not only on custom buildbot infrastructure but also on Travis, > or > even on your own machine. > - It improves reproducibility and encourages consolidation of > configuration. > You can run the exact job locally that ran on Travis, and you can even > get > an interactive shell in the build so you can debug a test failure. And > because you can run the same job anywhere, we wouldn’t need to have > duplicated, Travis-specific or the docker-compose build configuration > stored > separately. > - It’s extensible. More exotic features like a comment bot, benchmark > database, benchmark dashboard, artifact store, integrating other systems > are > easily implementable within the same system. > > I’m proposing to donate the build configuration we’ve been iterating on in > Ursabot to the Arrow codebase. Here [3] is a patch that adds the > configuration. > This will enable us to explore consolidating build configuration using the > buildbot framework. A next step after to explore that would be to port a > Travis >
[jira] [Created] (ARROW-6463) [C++][Python] Rename arrow::fs::Selector to FileSelector
Krisztian Szucs created ARROW-6463: -- Summary: [C++][Python] Rename arrow::fs::Selector to FileSelector Key: ARROW-6463 URL: https://issues.apache.org/jira/browse/ARROW-6463 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Krisztian Szucs Assignee: Krisztian Szucs In both the C++ implementation and the python binding. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6462) [C++] Can't build with bundled double-conversion on CentOS 6 x86_64
Sutou Kouhei created ARROW-6462: --- Summary: [C++] Can't build with bundled double-conversion on CentOS 6 x86_64 Key: ARROW-6462 URL: https://issues.apache.org/jira/browse/ARROW-6462 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Sutou Kouhei Assignee: Sutou Kouhei https://travis-ci.org/ursa-labs/crossbow/builds/581001313#L8163 {noformat} -- Installing: /root/rpmbuild/BUILD/apache-arrow-0.14.0.dev451/cpp/build/double-conversion_ep/src/double-conversion_ep/lib64/libdouble-conversion.a ... make[2]: *** No rule to make target 'double-conversion_ep/src/double-conversion_ep/lib/libdouble-conversion.a', needed by 'release/libarrow.so.15.0.0'. Stop. {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)