[jira] [Created] (ARROW-17182) [C++][Docs] Show examples of multiple Acero-compatible language APIs

2022-07-21 Thread Ian Cook (Jira)
Ian Cook created ARROW-17182:


 Summary: [C++][Docs] Show examples of multiple Acero-compatible 
language APIs
 Key: ARROW-17182
 URL: https://issues.apache.org/jira/browse/ARROW-17182
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++, Documentation
Reporter: Ian Cook


Today, there is really only one feature-complete high-level API wrapping the 
lower-level Acero ExecPlan API: the dplyr interface arrow R pacakge. But in the 
future, when Acero has full capability to consume and execute Substrait plans, 
then will have more high-level APIs wrapping Acero, including Ibis. It would be 
nice to include a tabbed interface in a prominent place on the front page of 
the Acero docs in showing several different Acero-compatible language APIs, 
similar to what's on the front page of the [Apache Spark 
website|https://spark.apache.org/]. Credit to [~willjones127] for suggesting 
this idea.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17181) [Python] Scalar UDF Experimental Documentation

2022-07-21 Thread Vibhatha Lakmal Abeykoon (Jira)
Vibhatha Lakmal Abeykoon created ARROW-17181:


 Summary: [Python] Scalar UDF Experimental Documentation
 Key: ARROW-17181
 URL: https://issues.apache.org/jira/browse/ARROW-17181
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Vibhatha Lakmal Abeykoon
Assignee: Vibhatha Lakmal Abeykoon


At the moment the existing Scalar UDF usage is not documented. There will be a 
final version of documentation update once other features are integrated. But 
to support the users and developers, the existing content needs to be 
documented. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17180) [C++] Backpressure should resume as a new task, assuming executor is present

2022-07-21 Thread Weston Pace (Jira)
Weston Pace created ARROW-17180:
---

 Summary: [C++] Backpressure should resume as a new task, assuming 
executor is present
 Key: ARROW-17180
 URL: https://issues.apache.org/jira/browse/ARROW-17180
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


As brought up on the mailing list: 
https://lists.apache.org/thread/hscjqyw7lpt95vlkzoslm6pyhy9x6wso

When we resume producing in the source node, we should run the continuation 
(which starts scanning again) as a new task.  This will avoid potential stack 
overflow (if we pause and start many times) and help make for more 
understandable stack traces.  The continuation is unlikely to have anything to 
do with the task that is calling resume and so there is not as much downside to 
a context switch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17179) [R] Support more objects in as_schema() and use it in more places

2022-07-21 Thread Dewey Dunnington (Jira)
Dewey Dunnington created ARROW-17179:


 Summary: [R] Support more objects in as_schema() and use it in 
more places
 Key: ARROW-17179
 URL: https://issues.apache.org/jira/browse/ARROW-17179
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Dewey Dunnington


Right now the {{as_schema()}} method isn't used in many places and doesn't 
support many object types. It is probably a good fit for sanitizing arguments 
where a schema is expected and likely has more classes that can be interpreted 
in this way (e.g., Field or DataType as identified in ARROW-16444). After 
ARROW-16444, the internal {{in_type_as_schema()}} function used to sanitize the 
{{in_type}} argument of {{register_scalar_function()}} may be a good candidate 
for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17178) [R] Support head() in arrow_dplyr_query with user-defined function

2022-07-21 Thread Dewey Dunnington (Jira)
Dewey Dunnington created ARROW-17178:


 Summary: [R] Support head() in arrow_dplyr_query with user-defined 
function
 Key: ARROW-17178
 URL: https://issues.apache.org/jira/browse/ARROW-17178
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Dewey Dunnington


After ARROW-16444 and ARROW-16703 we will have some arrow_dplyr_query objects 
whose pipeline can't contain {{head()}} after the part that calls R code. This 
is a very big feature not to support and we need to find a workaround. The 
full-on solution is to make sure that we support an R-level RecordBatchReader, 
but there may be a workaround that we can support in the meantime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17177) [C++][Docs] Re-Organize the Existing ACERO Streaming Engine Documentation

2022-07-21 Thread Vibhatha Lakmal Abeykoon (Jira)
Vibhatha Lakmal Abeykoon created ARROW-17177:


 Summary: [C++][Docs] Re-Organize the Existing ACERO Streaming 
Engine Documentation
 Key: ARROW-17177
 URL: https://issues.apache.org/jira/browse/ARROW-17177
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Vibhatha Lakmal Abeykoon
Assignee: Vibhatha Lakmal Abeykoon


The current document is too-length. By creating sub-pages for each example and 
explain the code and provide a better description, that would be much better in 
terms of readability and browsing the content. The idea is to create a 
sub-folder in the examples called 'acero` and include each example in a 
separate `.cc` file. This is the code change. Following this, the documentation 
page on the website can be splitted into sub-pages. This is the only change 
suggested for this sub-task. 

There is already a JIRA: https://issues.apache.org/jira/browse/ARROW-16802 to 
improve the internal content. So it would be used for re-writing the contnet. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17176) [Rust] Activate generate_decimal256_case arrow integration test for rust

2022-07-21 Thread L. C. Hsieh (Jira)
L. C. Hsieh created ARROW-17176:
---

 Summary: [Rust] Activate generate_decimal256_case arrow 
integration test for rust
 Key: ARROW-17176
 URL: https://issues.apache.org/jira/browse/ARROW-17176
 Project: Apache Arrow
  Issue Type: Bug
  Components: Archery
Reporter: L. C. Hsieh






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17175) [CI][macOS] macos-10.15 is deprecated and macos-latest is macos-11

2022-07-21 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17175:


 Summary: [CI][macOS] macos-10.15 is deprecated and macos-latest is 
macos-11
 Key: ARROW-17175
 URL: https://issues.apache.org/jira/browse/ARROW-17175
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


https://github.com/actions/virtual-environments#available-environments

{quote}
macOS 11macos-latest or macos-11
macOS 10.15 deprecated  macos-10.15
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17174) FileSystemDataset FilenamePartitioning error - fsspec filesystem

2022-07-21 Thread Adam Kirby (Jira)
Adam Kirby created ARROW-17174:
--

 Summary: FileSystemDataset FilenamePartitioning error - fsspec 
filesystem
 Key: ARROW-17174
 URL: https://issues.apache.org/jira/browse/ARROW-17174
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 8.0.1
Reporter: Adam Kirby
 Attachments: zip_of_csvs_test.py

Unless this is user error (which it may well be!), it seems that Dataset 
FilenamePartitioning on read doesn't seem to work with an fsspec filesystem. 
>From what I can glean, the filenames can be parsed successfully when passed to 
the parse() method, but do not seem to be being extracted as fields from the 
filenames passed to dataset() – instead, they appear as nulls. When trying to 
use the partitioning discover() method (assuming this is a reasonable thing to 
try), I get the below traceback. (Repro python script attached).

Traceback (most recent call last):
  File "/zip_of_csvs_test.py", line 82, in 
    ds_partitioned = pds.dataset(
  File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", 
line 697, in dataset
    return _filesystem_dataset(source, **kwargs)
  File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", 
line 449, in _filesystem_dataset
    return factory.finish(schema)
  File "pyarrow/_dataset.pyx", line 1857, in 
pyarrow._dataset.DatasetFactory.finish
  File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: No non-null segments were available for field 
'frequency'; couldn't infer type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-adbc] lidavidm opened a new pull request, #44: [C] Fix compatibility issues noticed with Ibis

2022-07-21 Thread GitBox


lidavidm opened a new pull request, #44:
URL: https://github.com/apache/arrow-adbc/pull/44

   - The SQLite catalog is called "main"
   - Fix a couple bugs and note areas of improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17173) [C++] Clarify lifecycle of a StopSource/StopToken

2022-07-21 Thread Dewey Dunnington (Jira)
Dewey Dunnington created ARROW-17173:


 Summary: [C++] Clarify lifecycle of a StopSource/StopToken
 Key: ARROW-17173
 URL: https://issues.apache.org/jira/browse/ARROW-17173
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Dewey Dunnington


In ARROW-11841 we ran into an issue where a single cancellable operation (i.e., 
{{SetSignalStopSource()}}/{{ResetSignalStopSource()}} was a poor fit: the 
{{StopToken}} must be assigned to an {{IOContext}} when a filesystem is 
created; however, the filesystem may be reused for more than one cancellable 
operation (e.g., reading a CSV). Following the instructions in the current API 
(in util/cancel.h) results in a situation the lifecycle of the filesystem must 
match the lifecycle of the {{StopSource}}, which can be difficult to program 
around.

A related problem is that where we load Python and R Arrow libraries that link 
to the same .so. After ARROW-11841, R will have the ability to register signal 
handlers to interrupt Arrow operations, and users that load pyarrow via 
reticulate must be careful to disable it or they will get an error along the 
lines of "StopSource already set up".

>From a purely R-centric point of view, we could provide our own {{StopToken}} 
>implementation if we were allowed to since R already implements the proper 
>signal handler and the arrow R package implements the proper event loop to 
>make this thread safe. Currently the {{StopToken}} is passed by value and thus 
>a subclass is not an option. For R, anyway, this would eliminate any need to 
>consider the lifecycle of another object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17172) [C++][Python] test_cython_api fails on windows

2022-07-21 Thread Alenka Frim (Jira)
Alenka Frim created ARROW-17172:
---

 Summary: [C++][Python] test_cython_api fails on windows 
 Key: ARROW-17172
 URL: https://issues.apache.org/jira/browse/ARROW-17172
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++, Python
Reporter: Alenka Frim
Assignee: Alenka Frim
 Fix For: 10.0.0


With the current change in https://github.com/apache/arrow/pull/13311 the 
second part of the test_cython_api that checks that the extension module is 
loadable from a subprocess without pyarrow imported first is failing on Windows.

Research the issue and be sure the test is run and passes the CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-testing] pitrou commented on pull request #80: ARROW-17100: Add example of Arrow 2.0 DataPageV2 compression issue

2022-07-21 Thread GitBox


pitrou commented on PR #80:
URL: https://github.com/apache/arrow-testing/pull/80#issuecomment-1191800075

   Thanks for the update @wjones127 :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-testing] pitrou merged pull request #80: ARROW-17100: Add example of Arrow 2.0 DataPageV2 compression issue

2022-07-21 Thread GitBox


pitrou merged PR #80:
URL: https://github.com/apache/arrow-testing/pull/80


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17171) [C++][Gandiva] Implement case-insensitive

2022-07-21 Thread Vinicius Souza Roque (Jira)
Vinicius Souza Roque created ARROW-17171:


 Summary: [C++][Gandiva] Implement case-insensitive
 Key: ARROW-17171
 URL: https://issues.apache.org/jira/browse/ARROW-17171
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++ - Gandiva
Reporter: Vinicius Souza Roque


Implementing changes for the function to be case-insensitive



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17170) Research Documentation Formats

2022-07-21 Thread Kae Suarez (Jira)
Kae Suarez created ARROW-17170:
--

 Summary: Research Documentation Formats
 Key: ARROW-17170
 URL: https://issues.apache.org/jira/browse/ARROW-17170
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Kae Suarez
Assignee: Kae Suarez


In order to revise the documentation, some inspiration is needed to get the 
format right. This ticket provides a space for exploration of possible 
inspiration for the C++ documentation – once we have some good examples and/or 
agreement, we can move to some content creation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17169) [Go] goPanicIndex in firstTimeBitmapWriter.Finish()

2022-07-21 Thread Robert Purdom (Jira)
Robert Purdom created ARROW-17169:
-

 Summary: [Go] goPanicIndex in firstTimeBitmapWriter.Finish()
 Key: ARROW-17169
 URL: https://issues.apache.org/jira/browse/ARROW-17169
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Affects Versions: 8.0.1, 9.0.0
 Environment: go (1.18.3), Linux, AMD64
Reporter: Robert Purdom


I'm working with complex parquet files with 500+ "root" columns where some 
fields are lists of structs, internally referred to as 'topics'.  Some of these 
structs have 100's of columns.  When reading a particular topic, I get an Index 
Panic at the line indicated below. This error occurs when the value for the 
topic is Null, as in, for this particular root record, this topic has no data.  
The root is household data, the topic is auto, so the error occurs when the 
household has no autos.  The auto field is a Nullable List of Struct.

 
{code:go}
/* Finish() was called from defLevelsToBitmapInternal.

data values when panic occurs
bw.length == 17531
bw.bitMask == 1
bw.pos == 3424
bw.length == 17531
len(bw.Buf) == 428
cap(bw.Buf) == 448
bw.byteOffset == 428
bw.curByte == 0
*/

// bitmap_writer.go
func (bw *firstTimeBitmapWriter) Finish() {
// store curByte into the bitmap
 if bw.length >0&& bw.bitMask !=0x01|| bw.pos < bw.length {
  bw.buf[int(bw.byteOffset)] = bw.curByte   // < Panic index
 }
}
{code}
In every case, when the panic occurs, bw.byteOffset == len(bw.Buf). I tested 
the below modification and it does remedy the bug. However, it's probably only 
masking the actual bug.
{code:go}
// Test version: No Panic
func (bw *firstTimeBitmapWriter) Finish() {
// store curByte into the bitmap
if bw.length > 0 && bw.bitMask != 0x01 || bw.pos < bw.length {
if bw.byteOffset == len(bw.Buf) {
 bw.buf = append(bw.buf, bw.curByte)
} else {
 bw.buf[int(bw.byteOffset)] = bw.curByte
   }
}
}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17168) [C++][Docs] Expand C++ Cookbook

2022-07-21 Thread Kae Suarez (Jira)
Kae Suarez created ARROW-17168:
--

 Summary: [C++][Docs] Expand C++ Cookbook
 Key: ARROW-17168
 URL: https://issues.apache.org/jira/browse/ARROW-17168
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++, Documentation
Reporter: Kae Suarez


Currently, the C++ Cookbook has very few examples compared to the Python 
Cookbook, even though C++ Arrow can be used for the same tasks. Achieving 
parity between the cookbooks would be useful for developers to be able to use 
Arrow via C++ as a primary language.

Requires creation of examples and some light supporting prose – Python Cookbook 
can likely be copied in structure and some prose to save time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17167) [C++][Docs] Improve C++ Documentation

2022-07-21 Thread Kae Suarez (Jira)
Kae Suarez created ARROW-17167:
--

 Summary: [C++][Docs] Improve C++ Documentation
 Key: ARROW-17167
 URL: https://issues.apache.org/jira/browse/ARROW-17167
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation
Reporter: Kae Suarez


Parent ticket for tasks that aim to improve C++ Arrow documentation.

 

General goal could be parity with Python documentation, so there's a baseline – 
open to further suggestions. Suggestions from new users would be incredibly 
valued, due to their experiences being more likely to have been impacted by any 
lacking documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-testing] pitrou commented on pull request #80: ARROW-17100: Add example of Arrow 2.0 DataPageV2 compression issue

2022-07-21 Thread GitBox


pitrou commented on PR #80:
URL: https://github.com/apache/arrow-testing/pull/80#issuecomment-1191734043

   Some nits:
   * avoid creating a subdir for a single file?
   * add a README.md in `data/parquet` to start describing the files being 
added? A bit like in 
https://github.com/apache/parquet-testing/blob/master/data/README.md


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-testing] wjones127 commented on pull request #80: ARROW-17100: Add example of Arrow 2.0 DataPageV2 compression issue

2022-07-21 Thread GitBox


wjones127 commented on PR #80:
URL: https://github.com/apache/arrow-testing/pull/80#issuecomment-1191717690

   cc @pitrou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-testing] wjones127 opened a new pull request, #80: ARROW-17100: Add example of Arrow 2.0 DataPageV2 compression issue

2022-07-21 Thread GitBox


wjones127 opened a new pull request, #80:
URL: https://github.com/apache/arrow-testing/pull/80

   We used to always set `is_compressed=false` in page headers regardless of 
whether there was actual compression. Check you can read this file if you want 
to suppose files written by Arrow C++ 2.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm opened a new issue, #43: ADBC/Ibis pain points

2022-07-21 Thread GitBox


lidavidm opened a new issue, #43:
URL: https://github.com/apache/arrow-adbc/issues/43

   - Need way to query driver and server version
   - Need handling of multiple databases within a connection


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17166) [R] [CI] Remove ENV TZ from docker files

2022-07-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17166:
--

 Summary: [R] [CI] Remove ENV TZ from docker files
 Key: ARROW-17166
 URL: https://issues.apache.org/jira/browse/ARROW-17166
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Rok Mihevc
 Fix For: 9.0.0


We have noticed R CI job (AMD64 Ubuntu 20.04 R 4.2 Force-Tests true) failing on 
master: 
[1|https://github.com/apache/arrow/runs/7424773120?check_suite_focus=true#step:7:5547],
 
[2|https://github.com/apache/arrow/runs/7431821192?check_suite_focus=true#step:7:5804],
 
[3|https://github.com/apache/arrow/runs/7445803518?check_suite_focus=true#step:7:16305]
with:
{code:java}
Start test: array uses local timezone for POSIXct without timezone
  test-Array.R:269:3 [success]
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17165) [Python] Add python bindings to ExecuteScalarExpression

2022-07-21 Thread Weston Pace (Jira)
Weston Pace created ARROW-17165:
---

 Summary: [Python] Add python bindings to ExecuteScalarExpression
 Key: ARROW-17165
 URL: https://issues.apache.org/jira/browse/ARROW-17165
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Weston Pace


Currently, if a user wants to execute an expression, we require them to create 
an exec plan, add a project node, and then run the exec plan.  However, for 
simple use cases, where a user already has a record batch in memory, we could 
probably expose ExecuteScalarExpression on its own.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17164) [C++] Expose higher-level utility to execute the kernel

2022-07-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17164:
--

 Summary: [C++] Expose higher-level utility to execute the kernel
 Key: ARROW-17164
 URL: https://issues.apache.org/jira/browse/ARROW-17164
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 9.0.0


Currently, the compute layer exposes several high-level facilities to execute a 
compute function: {{CallFunction}} and {{Function::Execute}}.

However, if you'd favor a two-step approach of first resolving the {{Kernel}} 
for a given set of argument types, then execute the kernel, then you're forced 
to deal with the rather cumbersome {{Kernel}} execution interface.

It would be nice if the base {{Kernel}} class had something similar to the 
{{Function::Execute}} method.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-adbc] lidavidm merged pull request #42: [Docs] Update README

2022-07-21 Thread GitBox


lidavidm merged PR #42:
URL: https://github.com/apache/arrow-adbc/pull/42


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17163) [C++] Don't install jni_util.h

2022-07-21 Thread David Li (Jira)
David Li created ARROW-17163:


 Summary: [C++] Don't install jni_util.h
 Key: ARROW-17163
 URL: https://issues.apache.org/jira/browse/ARROW-17163
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


ARROW-17086 fixed some compiler warnings and restored the installation of 
jni_util.h to match prior behavior. But we never intended to expose this 
header, and the downstream Gluten project [no longer depends on 
it|https://github.com/apache/arrow/pull/13614#issuecomment-1191198106], so we 
can stop installing it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17162) [C++] Bump version of bundled protobuf to include ABI mismatch on debug builds

2022-07-21 Thread Jira
Raúl Cumplido created ARROW-17162:
-

 Summary: [C++] Bump version of bundled protobuf to include ABI 
mismatch on debug builds
 Key: ARROW-17162
 URL: https://issues.apache.org/jira/browse/ARROW-17162
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Raúl Cumplido
Assignee: Raúl Cumplido
 Fix For: 9.0.0


As part of the investigation for ARROW 
https://issues.apache.org/jira/browse/ARROW-17104  and the issue on 
https://issues.apache.org/jira/browse/ARROW-16520. We identified some missing 
symbols when compiling protobuf, See upstream fix: 
[https://github.com/protocolbuffers/protobuf/pull/10271]

This tickets purpose is only to update the version of the vendored protobuf 
version defined for ARROW.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17161) [C++][Java] Dataset: Support reading from fixed offset of a file for Parquet format

2022-07-21 Thread Hongze Zhang (Jira)
Hongze Zhang created ARROW-17161:


 Summary: [C++][Java] Dataset: Support reading from fixed offset of 
a file for Parquet format
 Key: ARROW-17161
 URL: https://issues.apache.org/jira/browse/ARROW-17161
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Java
Reporter: Hongze Zhang
 Fix For: 9.0.0


This adds property *start_offset_* and *length_* to FileSource and should be 
functional for Parquet dataset format. Supporting Java and C++ dataset API at 
this time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17160) [C++] Create a base directory for PyArrow CPP header files

2022-07-21 Thread Alenka Frim (Jira)
Alenka Frim created ARROW-17160:
---

 Summary: [C++] Create a base directory for PyArrow CPP header files
 Key: ARROW-17160
 URL: https://issues.apache.org/jira/browse/ARROW-17160
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Alenka Frim
Assignee: Alenka Frim
 Fix For: 10.0.0


See: https://github.com/apache/arrow/pull/13311#discussion_r925344753



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17159) [C++][JAVA] Dataset: Support reading from fixed offset of a file for Parquet format

2022-07-21 Thread Jin Chengcheng (Jira)
Jin Chengcheng created ARROW-17159:
--

 Summary: [C++][JAVA] Dataset: Support reading from fixed offset of 
a file for Parquet format
 Key: ARROW-17159
 URL: https://issues.apache.org/jira/browse/ARROW-17159
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Java
Affects Versions: 9.0.0
Reporter: Jin Chengcheng
Assignee: Jin Chengcheng


With that, we can use substrait plan ReadRel_LocalFiles_FileOrFiles.start() and 
length() to pushdown scan filter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17158) [GLib][Flight] Add support for GetFlightInfo

2022-07-21 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17158:


 Summary: [GLib][Flight] Add support for GetFlightInfo
 Key: ARROW-17158
 URL: https://issues.apache.org/jira/browse/ARROW-17158
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17157) [GLib][Ruby][Flight] Add support for headers to GAFlightCallOptions

2022-07-21 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17157:


 Summary: [GLib][Ruby][Flight] Add support for headers to 
GAFlightCallOptions
 Key: ARROW-17157
 URL: https://issues.apache.org/jira/browse/ARROW-17157
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, GLib, Ruby
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17156) [GLib][Flight] Add GAFlightClientOptions::disable-server-verification

2022-07-21 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17156:


 Summary: [GLib][Flight] Add 
GAFlightClientOptions::disable-server-verification
 Key: ARROW-17156
 URL: https://issues.apache.org/jira/browse/ARROW-17156
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17155) pyarrow compute module does not contain functions described in documentation

2022-07-21 Thread Volodymyr (Jira)
Volodymyr created ARROW-17155:
-

 Summary: pyarrow compute module does not contain functions 
described in documentation
 Key: ARROW-17155
 URL: https://issues.apache.org/jira/browse/ARROW-17155
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 8.0.0
Reporter: Volodymyr


Looks like pyarrow compute module 
(https://github.com/apache/arrow/blob/master/python/pyarrow/compute.py) has 
entirely different stuff than described in documentation: 
[https://arrow.apache.org/docs/python/api/compute.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)