[jira] [Created] (ARROW-14488) [Python] Incorrect inferred schema from pandas dataframe with length 0.

2021-10-26 Thread Yuan Zhou (Jira)
Yuan Zhou created ARROW-14488:
-

 Summary: [Python] Incorrect inferred schema from pandas dataframe 
with length 0.
 Key: ARROW-14488
 URL: https://issues.apache.org/jira/browse/ARROW-14488
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 5.0.0
 Environment: OS: Windows 10, CentOS 7
Reporter: Yuan Zhou


We use pandas(with pyarrow engine) to write out parquet files and those outputs 
will be consumed by other applications such as Java apps using 
org.apache.parquet.hadoop.ParquetFileReader. We found that some empty 
dataframes would get incorrect schema for string columns in other applications. 
After some investigation, we narrow down the issue to the schema inference by 
pyarrow:

{{In [1]: import pandas as pd}}

{{In [2]: df = pd.DataFrame([['a', 1, 1.0]], columns=['a', 'b', 'c'])}}

{{In [3]: import pyarrow as pa}}

{{In [4]: pa.Schema.from_pandas(df)}}
{{Out[4]:}}
{{a: string}}
{{b: int64}}
{{c: double}}
{{-- schema metadata --}}
{{pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 
562}}

{{In [5]: pa.Schema.from_pandas(df.head(0))}}
{{Out[5]:}}
{{a: null}}
{{b: int64}}
{{c: double}}
{{-- schema metadata --}}
{{pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 
560}}

{{In [6]: pa.__version__}}
{{Out[6]: '5.0.0'}}

 

Is this an expected behavior? Or do we have any workaround for this issue? 
Could anyone take a look please. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14487) [R] Implement altrep Extract_subset() methods

2021-10-26 Thread Romain Francois (Jira)
Romain Francois created ARROW-14487:
---

 Summary: [R] Implement altrep Extract_subset() methods
 Key: ARROW-14487
 URL: https://issues.apache.org/jira/browse/ARROW-14487
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Romain Francois
Assignee: Romain Francois






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14486) [Packaging][deb] libthrift-dev dependency is missing

2021-10-26 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-14486:


 Summary: [Packaging][deb] libthrift-dev dependency is missing
 Key: ARROW-14486
 URL: https://issues.apache.org/jira/browse/ARROW-14486
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Affects Versions: 6.0.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 7.0.0, 6.0.1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14485) ParquetFile.read_row_group looses struct nullability when selecting one column from a struct

2021-10-26 Thread Jim Pivarski (Jira)
Jim Pivarski created ARROW-14485:


 Summary: ParquetFile.read_row_group looses struct nullability when 
selecting one column from a struct
 Key: ARROW-14485
 URL: https://issues.apache.org/jira/browse/ARROW-14485
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 6.0.0
Reporter: Jim Pivarski
 Attachments: test8.parquet

This appeared minutes ago because we have a test suite that saw Arrow 6.0.0 
land in PyPI. (Congrats, by the way! I've been looking forward to this one!)

Below, you'll see one thing that version 6 fixed (asking for one column in a 
nested struct returns only that one column) and a new error (it does not 
preserve nullability of the surrounding struct). Here, I'll write down the 
steps to reproduce and then explain.
{code:python}
Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow.parquet
>>> pyarrow.__version__
'5.0.0'
>>> file = pyarrow.parquet.ParquetFile("test8.parquet")
>>> file.schema

required group field_id=-1 schema {
  required group field_id=-1 x (List) {
repeated group field_id=-1 list {
  required group field_id=-1 item {
required int64 field_id=-1 y;
required double field_id=-1 z;
  }
}
  }
}

>>> file.schema_arrow
x: large_list not null> not 
null
  child 0, item: struct not null
  child 0, y: int64 not null
  child 1, z: double not null
>>> file.read_row_group(0, ["x.list.item.y"]).schema
x: large_list not null> not 
null
  child 0, item: struct not null
  child 0, y: int64 not null
  child 1, z: double not null
>>> file.read_row_group(0, ["x.list.item.y", "x.list.item.z"]).schema
x: large_list not null> not 
null
  child 0, item: struct not null
  child 0, y: int64 not null
  child 1, z: double not null
>>> file.read_row_group(0).schema
x: large_list not null> not 
null
  child 0, item: struct not null
  child 0, y: int64 not null
  child 1, z: double not null

Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow.parquet
>>> pyarrow.__version__
'6.0.0'
>>> file = pyarrow.parquet.ParquetFile("test8.parquet")
>>> file.schema

required group field_id=-1 schema {
  required group field_id=-1 x (List) {
repeated group field_id=-1 list {
  required group field_id=-1 item {
required int64 field_id=-1 y;
required double field_id=-1 z;
  }
}
  }
}

>>> file.schema_arrow
x: large_list not null> not 
null
  child 0, item: struct not null
  child 0, y: int64 not null
  child 1, z: double not null
>>> file.read_row_group(0, ["x.list.item.y"]).schema
x: large_list> not null
  child 0, item: struct
  child 0, y: int64 not null
>>> file.read_row_group(0, ["x.list.item.y", "x.list.item.z"]).schema
x: large_list not null> not 
null
  child 0, item: struct not null
  child 0, y: int64 not null
  child 1, z: double not null
>>> file.read_row_group(0).schema
x: large_list not null> not 
null
  child 0, item: struct not null
  child 0, y: int64 not null
  child 1, z: double not null
{code}
 In Arrow 5, asking for only column {{"x.list.item.y"}} returns a struct of 
type {{x: large_list not 
null> not null}}, which was undesirable because it has unnecessarily read the 
{{"z"}} column, but it got all of the {{"not null"}} types right. In 
test8.parquet, the data are non-nullable at each level.

 In Arrow 6, asking for only column {{"x.list.item.y"}} returns a struct of 
type {{x: large_list> not null}}, which is 
great because it's not reading the {{"z"}} column, but the struct's nullability 
is wrong: we should see three {{"not nulls"}} here, one for the data in {{y}}, 
one for the {{struct}}, and one for the {{list}}. It's just missing the middle 
one.

When I ask for two columns specifically or don't specify the columns, the 
nullability is correct. I think that can help to narrow it down.

I've attached the file (test8.parquet). It was the same in both of the above 
tests (generated by Arrow 5).

I labeled this as "Python" because I've only seen the symptom in Python, but I 
suspect that the actual error is in C++.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14484) [Crossbow] Add support for specifying queue path by environment variable

2021-10-26 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-14484:


 Summary: [Crossbow] Add support for specifying queue path by 
environment variable
 Key: ARROW-14484
 URL: https://issues.apache.org/jira/browse/ARROW-14484
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14483) [Release] Packages for AlmaLinux and Amazon Linux aren't downloaded in verification script

2021-10-26 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-14483:


 Summary: [Release] Packages for AlmaLinux and Amazon Linux aren't 
downloaded in verification script
 Key: ARROW-14483
 URL: https://issues.apache.org/jira/browse/ARROW-14483
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14482) [C++][Gandiva] Implement MASK_FIRST_N and MASK_LAST_N functions

2021-10-26 Thread Augusto Alves Silva (Jira)
Augusto Alves Silva created ARROW-14482:
---

 Summary: [C++][Gandiva] Implement MASK_FIRST_N and MASK_LAST_N 
functions
 Key: ARROW-14482
 URL: https://issues.apache.org/jira/browse/ARROW-14482
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++ - Gandiva
Reporter: Augusto Alves Silva


*MASK_FIRST_N*

{color:#172b4d}Returns a masked version of str with the first n values 
masked.{color}{color:#172b4d} Upper case letters are converted to "X", lower 
case letters are converted to "x" and numbers are converted to "n". For 
example, mask_first_n("1234-5678-8765-4321", 4) results in -5678-8765-4321.

*MASK_LAST_N*

Returns a masked version of str with the last n values masked. Upper case 
letters are converted to "X", lower case letters are converted to "x" and 
numbers are converted to "n". For example, mask_last_n("1234-5678-8765-4321", 
4) results in 1234-5678-8765-.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14481) [C++] Investigate recent regressions in some utf8 kernel benchmarks

2021-10-26 Thread David Li (Jira)
David Li created ARROW-14481:


 Summary: [C++] Investigate recent regressions in some utf8 kernel 
benchmarks
 Key: ARROW-14481
 URL: https://issues.apache.org/jira/browse/ARROW-14481
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


See [https://conbench.ursa.dev/benchmarks/6ccff6887e7c47148a09fe46f18c8688/]

Some (on the surface) unrelated commits have caused performance for a few 
string kernels to plummet. We should try to replicate locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14480) [R] Expose to R

2021-10-26 Thread Weston Pace (Jira)
Weston Pace created ARROW-14480:
---

 Summary: [R] Expose to R
 Key: ARROW-14480
 URL: https://issues.apache.org/jira/browse/ARROW-14480
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: R
Reporter: Weston Pace
 Fix For: 6.0.1


Trying to keep this an R-only thing for ease of patch/CRAN.  Not sure if fix 
version should be 6.0.1 or 7.0.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14479) [C++][Compute] Hash Join microbenchmarks

2021-10-26 Thread Michal Nowakiewicz (Jira)
Michal Nowakiewicz created ARROW-14479:
--

 Summary: [C++][Compute] Hash Join microbenchmarks
 Key: ARROW-14479
 URL: https://issues.apache.org/jira/browse/ARROW-14479
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 7.0.0
Reporter: Michal Nowakiewicz
Assignee: Sasha Krassovsky
 Fix For: 7.0.0


Implement a series of microbenchmarks giving a good picture of the performance 
of hash join implemented in Arrow across different set of dimensions.
Compare the performance against some other product(s).
Add scripts for generating useful visual reports giving a good picture of the 
costs of hash join.

Examples of dimensions to explore in microbenchmarks:
 * number of duplicate keys on build side
 * relative size of build side to probe side
 * selectivity of the join
 * number of key columns
 * number of payload columns
 * filtering performance for semi- and anti- joins
 * dense integer key vs sparse integer key vs string key
 * build size
 * scaling of build, filtering, probe
 * inner vs left outer, inner vs right outer
 * left semi vs right semi, left anti vs right anti, left outer vs right outer
 * non-uniform key distribution
 * monotonic key values in input, partitioned key values in input (with and 
without per batch min-max metadata)
 * chain of multiple hash joins
 * overhead of Bloom filter for non-selective Bloom filter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14478) [C++] Potential stack overflow in async scanner

2021-10-26 Thread David Li (Jira)
David Li created ARROW-14478:


 Summary: [C++] Potential stack overflow in async scanner
 Key: ARROW-14478
 URL: https://issues.apache.org/jira/browse/ARROW-14478
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


Observed in [AppVeyor 
CI|https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/41288964/job/6s4bx6cd2kc6eld6]
 on the main branch:
{noformat}
[ RUN  ] TestScan/TestCsvFileFormatScan.ScanBatchSize/0AsyncThreaded16b1024r
unknown file: error: SEH exception with code 0xc0fd thrown in the test body.
[  FAILED  ] 
TestScan/TestCsvFileFormatScan.ScanBatchSize/0AsyncThreaded16b1024r, where 
GetParam() = AsyncThreaded16b1024r (250 ms){noformat}
>From some searching, this code corresponds to a stack overflow. We've 
>previously seen errors similar to this, so it might be good to identify and 
>track this down too. (It seems less likely on Linux due to the larger default 
>stack size.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14477) [C++] Timezone-aware kernels should also handle offset strings

2021-10-26 Thread David Li (Jira)
David Li created ARROW-14477:


 Summary: [C++] Timezone-aware kernels should also handle offset 
strings
 Key: ARROW-14477
 URL: https://issues.apache.org/jira/browse/ARROW-14477
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


Both the 
[format|https://github.com/apache/arrow/blob/836ffa5656d5107fd4895ae8d7eb0e20a3df23ba/format/Schema.fbs#L341-L347]
 and the [C++ 
library|https://github.com/apache/arrow/blob/836ffa5656d5107fd4895ae8d7eb0e20a3df23ba/cpp/src/arrow/type.h#L1233-L1237]
 allow this, but kernels rely on a helper assuming that the timezone field of a 
timestamp is always a timezone name and not a timezone offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14476) [CI] Crossbow should comment cause of failure

2021-10-26 Thread Balazs Jeszenszky (Jira)
Balazs Jeszenszky created ARROW-14476:
-

 Summary: [CI] Crossbow should comment cause of failure
 Key: ARROW-14476
 URL: https://issues.apache.org/jira/browse/ARROW-14476
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Balazs Jeszenszky
 Fix For: 7.0.0


Instead of just giving a thumbs down, Crossbow should comment with a link to 
the failing job (e.g. 
https://github.com/apache/arrow/runs/4010195788?check_suite_focus=true), or its 
stack trace (usually under 'handle github commit event').



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14475) [C++] Don't shadow enable_if helpers in kernel implementations

2021-10-26 Thread David Li (Jira)
David Li created ARROW-14475:


 Summary: [C++] Don't shadow enable_if helpers in kernel 
implementations
 Key: ARROW-14475
 URL: https://issues.apache.org/jira/browse/ARROW-14475
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


A few kernel implementation files define enable_if helpers that shadow existing 
ones, which can cause strange errors in unity builds. For example: 
scalar_arithmetic.cc defines {{enable_if_floating_point}} for the C types 
float/double which conflicts with the one defined in type_traits.h for the 
Arrow types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14474) [Java] Add support for sliced arrays in C Data Interface

2021-10-26 Thread Roee Shlomo (Jira)
Roee Shlomo created ARROW-14474:
---

 Summary: [Java] Add support for sliced arrays in C Data Interface
 Key: ARROW-14474
 URL: https://issues.apache.org/jira/browse/ARROW-14474
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 6.0.0
Reporter: Roee Shlomo


The Java implementation of the C Data Interface does not support non-0-offset 
arrays. This means that arrays like pyarrow.array([0, None, 2, 3, 4]).slice(1, 
2) cannot be moved to a Java process. This is not even documented as required 
by the spec because it was an oversight.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14473) [JS][Release] Ensure can use nohup with the release script

2021-10-26 Thread Benson Muite (Jira)
Benson Muite created ARROW-14473:


 Summary: [JS][Release] Ensure can use nohup with the release script
 Key: ARROW-14473
 URL: https://issues.apache.org/jira/browse/ARROW-14473
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Affects Versions: 7.0.0
Reporter: Benson Muite
Assignee: Benson Muite


Node may have problems reading and writing files when called using nohup. 
Directly running 

{code:bash}
env "TEST_DEFAULT=0" env "TEST_JS=1"  bash 
dev/release/verify-release-candidate.sh source 6.0.0 3
{code}

seems to work, but
{code:bash}
nohup env "TEST_DEFAULT=0" env "TEST_JS=1"  bash 
dev/release/verify-release-candidate.sh source 6.0.0 3 > log.out &
{code}
may not work [1]. Either document that one can use 
{code:bash}
(nohup env "TEST_DEFAULT=0" env "TEST_JS=1"  bash 
dev/release/verify-release-candidate.sh source 6.0.0 3 > log.out & )
{code}
or modify the javascript implementation so that it can run as a background 
process and still find files so that the error:
{code:bash}
yarn run v1.22.17
$ /tmp/arrow-6.0.0.BDnN3/apache-arrow-6.0.0/js/node_modules/.bin/run-s
clean:all lint build
events.js:377
throw er; // Unhandled 'error' event
^

Error: EBADF: bad file descriptor, read
Emitted 'error' event on ReadStream instance at:
  at internal/fs/streams.js:173:14
  at FSReqCallback.wrapper [as oncomplete] (fs.js:562:5) {
errno: -9,
code: 'EBADF',
syscall: 'read'
}
error Command failed with exit code 1.
{code}
is not obtained.

[1] 
https://stackoverflow.com/questions/16604176/error-ebadf-bad-file-descriptor-when-running-node-using-nohup-of-forever




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14472) [Dev][Archery] Generate contribution statistics using archery

2021-10-26 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-14472:
---

 Summary: [Dev][Archery] Generate contribution statistics using 
archery 
 Key: ARROW-14472
 URL: https://issues.apache.org/jira/browse/ARROW-14472
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery, Developer Tools
Reporter: Krisztian Szucs


Currently we use a bash script to do that:
https://github.com/apache/arrow/blob/master/dev/release/post-03-website.sh#L47-L67

Since the rust repository split, this logic needs to be extended.
Additionally the scripts expects {{gnu date}} commands which is not available 
on macOS by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14471) [R] Implement lubridate's date/time parsing functions

2021-10-26 Thread Nicola Crane (Jira)
Nicola Crane created ARROW-14471:


 Summary: [R] Implement lubridate's date/time parsing functions
 Key: ARROW-14471
 URL: https://issues.apache.org/jira/browse/ARROW-14471
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Nicola Crane


Parse dates with year, month, and day components:
ymd() ydm() mdy() myd() dmy() dym() yq() ym() my()

Parse date-times with year, month, and day, hour, minute, and second components:
ymd_hms() ymd_hm() ymd_h() dmy_hms() dmy_hm() dmy_h() mdy_hms() mdy_hm() 
mdy_h() ydm_hms() ydm_hm() ydm_h()

Parse periods with hour, minute, and second components:
ms() hm() hms()






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14470) [Python] Expose the use_threads option in Feather read functions

2021-10-26 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-14470:
-

 Summary: [Python] Expose the use_threads option in Feather read 
functions
 Key: ARROW-14470
 URL: https://issues.apache.org/jira/browse/ARROW-14470
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche


On the C++ side, the Feather V2 Reader wraps the IPC RecordBatchFileReader, 
which accepts IpcReadOptions which can control the use of threads (and the 
default memory pool and some other options). 

On the Python (cython) side, those options are not passed through. As a 
consequence the {{use_threads}} keyword only disables multithreading in the 
conversion from arrow table to pandas DataFrame, and not the actual reading. As 
a follow-up on ARROW-13317, we can actually make this keyword control both.p



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14469) [R] Binding for lubridate::month() doesn't have `label` argument implemented

2021-10-26 Thread Nicola Crane (Jira)
Nicola Crane created ARROW-14469:


 Summary: [R] Binding for lubridate::month() doesn't have `label` 
argument implemented
 Key: ARROW-14469
 URL: https://issues.apache.org/jira/browse/ARROW-14469
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Nicola Crane


It'll be worth checking the other lubridate temporal extraction bindings to 
check if any others need extra arguments implementing too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14468) [Python] Resolve parquet version deprecation warnings when compiling pyarrow

2021-10-26 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-14468:
---

 Summary: [Python] Resolve parquet version deprecation warnings 
when compiling pyarrow
 Key: ARROW-14468
 URL: https://issues.apache.org/jira/browse/ARROW-14468
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Krisztian Szucs


{code}
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:
 
In function ‘PyObject* 
__pyx_pf_7pyarrow_8_parquet_12FileMetaData_14format_version___get__(__pyx_obj_7pyarrow_8_parquet_FileMetaData*)’:
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:14168:36:
 
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use 
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection 
[-Wdeprecated-declarations]
14168 | case  parquet::ParquetVersion::PARQUET_2_0:
   |^~~
In file included from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/types.h:30,
  from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/schema.h:32,
  from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/api/schema.h:21,
  from 
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:734:
/tmp/arrow-6.0.0.theE2/install/include/parquet/type_fwd.h:44:5: note: 
declared here
44 | PARQUET_2_0 ARROW_DEPRECATED_ENUM_VALUE("use PARQUET_2_4 or 
PARQUET_2_6 "
   | ^~~
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:14168:36:
 
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use 
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection 
[-Wdeprecated-declarations]
14168 | case  parquet::ParquetVersion::PARQUET_2_0:
   |^~~
In file included from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/types.h:30,
  from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/schema.h:32,
  from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/api/schema.h:21,
  from 
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:734:
/tmp/arrow-6.0.0.theE2/install/include/parquet/type_fwd.h:44:5: note: 
declared here
44 | PARQUET_2_0 ARROW_DEPRECATED_ENUM_VALUE("use PARQUET_2_4 or 
PARQUET_2_6 "
   | ^~~
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:
 
In function ‘std::shared_ptr 
__pyx_f_7pyarrow_8_parquet__create_writer_properties(__pyx_opt_args_7pyarrow_8_parquet__create_writer_properties*)’:
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:23800:62:
 
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use 
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection 
[-Wdeprecated-declarations]
23800 |   (void)(__pyx_v_props.version( 
parquet::ParquetVersion::PARQUET_2_0));
   | 
^~~
In file included from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/types.h:30,
  from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/schema.h:32,
  from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/api/schema.h:21,
  from 
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:734:
/tmp/arrow-6.0.0.theE2/install/include/parquet/type_fwd.h:44:5: note: 
declared here
44 | PARQUET_2_0 ARROW_DEPRECATED_ENUM_VALUE("use PARQUET_2_4 or 
PARQUET_2_6 "
   | ^~~
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:23800:62:
 
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use 
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection 
[-Wdeprecated-declarations]
23800 |   (void)(__pyx_v_props.version( 
parquet::ParquetVersion::PARQUET_2_0));
   | 
^~~
In file included from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/types.h:30,
  from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/schema.h:32,
  from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/api/schema.h:21,
  from 
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:734:
/tmp/arrow-6.0.0.theE2/install/include/parquet/type_fwd.h:44:5: note: 
declared here
44 | PARQUET_2_0 ARROW_DEPRECATED_ENUM_VALUE("use PARQUET_2_4 or 
PARQUET_2_6 "
   | ^~~
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14467) [C++][Python][Parquet] Uniform encryption

2021-10-26 Thread Maya Anderson (Jira)
Maya Anderson created ARROW-14467:
-

 Summary: [C++][Python][Parquet] Uniform encryption
 Key: ARROW-14467
 URL: https://issues.apache.org/jira/browse/ARROW-14467
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Parquet, Python
Reporter: Maya Anderson
Assignee: Maya Anderson


PME supports using the same encryption key for all columns, which is useful in 
a number of scenarios. However, misuse of this feature can break the NIST limit 
on the number of AES GCM operations with one key, as reported in PARQUET-2040. 
We will develop a limit-enforcing code and provide a Python API for uniform 
encryption, similarly to PARQUET-2040 and based on ARROW-9947.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-14466) [Java] Introduce memory leak detector/handler utility to hook on unused unreleased buffers

2021-10-26 Thread Hongze Zhang (Jira)
Hongze Zhang created ARROW-14466:


 Summary: [Java] Introduce memory leak detector/handler utility to 
hook on unused unreleased buffers
 Key: ARROW-14466
 URL: https://issues.apache.org/jira/browse/ARROW-14466
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Hongze Zhang
Assignee: Hongze Zhang


See previous discussions in mail thread: 
https://lists.apache.org/thread.html/re9896b902cddc0931e4efbdecf27203710fb87505b63e927eef7ea77%40%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)