[GitHub] [arrow-julia] kou closed issue #301: Release script publishes the artifacts to wrong URL
kou closed issue #301: URL: https://github.com/apache/arrow-julia/issues/301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-julia] kou opened a new issue #301: Release script publishes the artifacts to wrong URL
kou opened a new issue #301: URL: https://github.com/apache/arrow-julia/issues/301 It publishes to https://dist.apache.org/repos/dist/release/arrow/apache-arrow-julia-X.Y.Z but we should remove "apache-" prefix because other release doesn't have "apache-" prefix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-15865) [Java][Doc]: Configure local maven to consume github arrow nightly assets
David Dali Susanibar Arce created ARROW-15865: - Summary: [Java][Doc]: Configure local maven to consume github arrow nightly assets Key: ARROW-15865 URL: https://issues.apache.org/jira/browse/ARROW-15865 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: David Dali Susanibar Arce Assignee: David Dali Susanibar Arce Current maven configuration to integrate with github assets repository: {code:java} http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd; xmlns="http://maven.apache.org/SETTINGS/1.1.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> staged staged-releases https://repository.apache.org/content/repositories/staging/ true true never arrowrc staged staged-releases https://github.com/ursacomputing/crossbow/releases/tag/release-7.0.0-rc10-0-github-java-jars true true arrownightly {code} Run with "mvn -Parrownightly clean install" its download files to .m2 local repository but as a invalid jar/pom files Define a way about how to integrate maven with current github assets repository to download assets properly without errors -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15864) [Java][Doc]: Arrow java nightly build
David Dali Susanibar Arce created ARROW-15864: - Summary: [Java][Doc]: Arrow java nightly build Key: ARROW-15864 URL: https://issues.apache.org/jira/browse/ARROW-15864 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: David Dali Susanibar Arce Assignee: David Dali Susanibar Arce Current java artifacts nightly build are uploaded to github as an assets. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15863) [Packaging][C++][Python] Conda package build failure
Antoine Pitrou created ARROW-15863: -- Summary: [Packaging][C++][Python] Conda package build failure Key: ARROW-15863 URL: https://issues.apache.org/jira/browse/ARROW-15863 Project: Apache Arrow Issue Type: Bug Components: C++, Packaging, Python Reporter: Antoine Pitrou The Windows conda package builds are failing: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=20856=logs=4c86bc1b-1091-5192-4404-c74dfaad23e7=1e0e7149-0c33-565b-af41-050b54dd61be -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [arrow-julia] nilshg opened a new issue #300: Possible bug in `Any` concretization routine
nilshg opened a new issue #300: URL: https://github.com/apache/arrow-julia/issues/300 As discussed on Slack: ``` julia> using Arrow, DataFrames julia> Arrow.write("test.arrow", (a = [1, 2], b = Any[3, 4.5])) "test.arrow" julia> DataFrame(Arrow.Table("test.arrow")) 2×2 DataFrame Row │ a b │ Int64 Float64 ─┼─ 1 │ 1 1.5e-323 2 │ 2 4.5 (jl_aYpToJ) pkg> st Status `C:\Users\ngudat\AppData\Local\Temp\jl_aYpToJ\Project.toml` [69666777] Arrow v2.2.0 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-15862) [R][C++] Provide a way to go from integer to duration
Dragoș Moldovan-Grünfeld created ARROW-15862: Summary: [R][C++] Provide a way to go from integer to duration Key: ARROW-15862 URL: https://issues.apache.org/jira/browse/ARROW-15862 Project: Apache Arrow Issue Type: Improvement Components: C++, R Reporter: Dragoș Moldovan-Grünfeld Currently it is not possible to directly create a duration object from a numeric one (for example through casting). {code:r} library(arrow) a <- Array$create(32L) a$cast(duration("s")) #> Error: NotImplemented: Unsupported cast from int32 to duration using function cast_duration #> /Users/dragos/Documents/arrow/cpp/src/arrow/compute/function.cc:231 DispatchBest() {code} This underpins a lot of the date-time arithmetic in R, which support the conversion/ coercion of an integer to difftime (R's equivalent for duration), such as in the pipeline below. {code:r} library(arrow, warn.conflicts = FALSE) #> See arrow_info() for available features library(dplyr, warn.conflicts = FALSE) library(lubridate, warn.conflicts = FALSE) df <- tibble(time = as_datetime(c("2022-03-07 15:00:28", "2022-03-06 14:00:28"))) df #> # A tibble: 2 × 1 #> time #> #> 1 2022-03-07 15:00:28 #> 2 2022-03-06 14:00:28 df %>% mutate(time2 = time + seconds(2)) #> # A tibble: 2 × 2 #> timetime2 #> #> 1 2022-03-07 15:00:28 2022-03-07 15:00:30 #> 2 2022-03-06 14:00:28 2022-03-06 14:00:30 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15861) [Java][Flight] grpc-netty, version mismatch, incompatible ctor for "PooledByteBufAllocator" in io.grpc.netty.Utils#createByteBufAllocator
Gavin Ray created ARROW-15861: - Summary: [Java][Flight] grpc-netty, version mismatch, incompatible ctor for "PooledByteBufAllocator" in io.grpc.netty.Utils#createByteBufAllocator Key: ARROW-15861 URL: https://issues.apache.org/jira/browse/ARROW-15861 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Affects Versions: 8.0.0 Reporter: Gavin Ray Attachments: image-2022-03-07-10-47-09-355.png Using Arrow nightly jars from 03/03/2022 {code:java} val LOCALHOST = "localhost" val allocator = RootAllocator(Long.MAX_VALUE) val serverLocation = Location.forGrpcInsecure(LOCALHOST, 0) val producer = DataWrapperFlightSQLProducer(serverLocation) val server = FlightServer.builder(allocator, serverLocation, producer).build().start() val clientLocation = Location.forGrpcInsecure(LOCALHOST, server.port) val client = FlightSqlClient(FlightClient.builder(allocator, clientLocation).build()) {code} This throws the following error (from "FlightServer.builder") {code:java} 'void io.netty.buffer.PooledByteBufAllocator.(boolean, int, int, int, int, int, int, boolean)' java.lang.NoSuchMethodError: 'void io.netty.buffer.PooledByteBufAllocator.(boolean, int, int, int, int, int, int, boolean)' at io.grpc.netty.Utils.createByteBufAllocator(Utils.java:176) at io.grpc.netty.Utils.access$000(Utils.java:75) at io.grpc.netty.Utils$ByteBufAllocatorPreferDirectHolder.(Utils.java:97) at io.grpc.netty.Utils.getByteBufAllocator(Utils.java:144) at io.grpc.netty.NettyServer.start(NettyServer.java:205) at io.grpc.internal.ServerImpl.start(ServerImpl.java:183) at io.grpc.internal.ServerImpl.start(ServerImpl.java:92) at org.apache.arrow.flight.FlightServer.start(FlightServer.java:83) at FlightSQLServerAndClientTest.(FlightSQLServerAndClientTest.kt:33) {code} The reason is because the constructor is incompatible: !image-2022-03-07-10-47-09-355.png! To fix this, you can override Arrow's dependencies versions: {code:groovy} implementation("io.grpc", "grpc-netty").version { strictly("1.44.1") } implementation("io.netty", "netty-all").version { strictly("4.1.74.Final") } implementation("io.netty", "netty-codec").version { strictly("4.1.74.Final") } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15860) [Python][Docs] Document RecordBatchReader
Will Jones created ARROW-15860: -- Summary: [Python][Docs] Document RecordBatchReader Key: ARROW-15860 URL: https://issues.apache.org/jira/browse/ARROW-15860 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Python Affects Versions: 7.0.0 Reporter: Will Jones Fix For: 8.0.0 RecordBatchReader seems like a pretty important type, but it is missing from the Python API docs. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15859) [C++] Add nightly test for static build with arrow_flight_static and arrow_bundled_dependencies
Rok Mihevc created ARROW-15859: -- Summary: [C++] Add nightly test for static build with arrow_flight_static and arrow_bundled_dependencies Key: ARROW-15859 URL: https://issues.apache.org/jira/browse/ARROW-15859 Project: Apache Arrow Issue Type: Improvement Components: C++, Continuous Integration Reporter: Rok Mihevc Due to abseil dependencies static builds with arrow_bundled_dependencies are brittle. We could test them nightly with the example proposed in ARROW-14708. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15858) [R][C++] Support duration creation from integer
Dragoș Moldovan-Grünfeld created ARROW-15858: Summary: [R][C++] Support duration creation from integer Key: ARROW-15858 URL: https://issues.apache.org/jira/browse/ARROW-15858 Project: Apache Arrow Issue Type: Bug Components: C++, R Reporter: Dragoș Moldovan-Grünfeld I would expect both {{a}} and {{b}} to create a {{duration}} object of 32 seconds, but the second one returns an {{int32}} {code:r} library(arrow, warn.conflicts = FALSE) a <- as.difftime(32, units = "secs") b <- as.difftime(32L, units = "secs") Array$create(a) #> Array #> #> [ #> 32 #> ] Array$create(b) #> Array #> #> [ #> 32 #> ] {code} If I try to be explicit, I get somewhat of a clue why that might be happening: {code:r} Array$create(a, type = duration()) #> Array #> #> [ #> 32 #> ] Array$create(b, type = duration()) #> Error: #> ! NotImplemented: Extend {code} Nevertheless, the fallback to creating an integer was unexpected. Also, not sure if this is a bug, an improvement or a new feature. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15857) [R] rhub/fedora-clang-devel fails to install 'sass' (rmarkdown dependency)
Dewey Dunnington created ARROW-15857: Summary: [R] rhub/fedora-clang-devel fails to install 'sass' (rmarkdown dependency) Key: ARROW-15857 URL: https://issues.apache.org/jira/browse/ARROW-15857 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Dewey Dunnington Starting 2022-03-03, we get a failure on the rhub/fedora-clang-devel nightly build. It seems to be a linking error but nothing in the sass package seems to have changed for some time (last update May 2021). https://github.com/ursacomputing/crossbow/runs/5444005154?check_suite_focus=true#step:5:3007 Build log for the sass package: {noformat} #14 1099.2 make[1]: Entering directory '/tmp/RtmpvEMraB/R.INSTALL555d42b8f18e/sass/src' #14 1099.2 /opt/R-devel/lib64/R/share/make/shlib.mk:18: warning: overriding recipe for target 'shlib-clean' #14 1099.2 Makevars:12: warning: ignoring old recipe for target 'shlib-clean' #14 1099.2 /usr/bin/clang -I"/opt/R-devel/lib64/R/include" -DNDEBUG -I./libsass/include -I/usr/local/include -fpic -g -O2 -c compile.c -o compile.o #14 1099.2 /usr/bin/clang -I"/opt/R-devel/lib64/R/include" -DNDEBUG -I./libsass/include -I/usr/local/include -fpic -g -O2 -c init.c -o init.o #14 1099.2 MAKEFLAGS= CC="/usr/bin/clang" CFLAGS="-g -O2 " CXX="/usr/bin/clang++ -std=gnu++14 -stdlib=libc++" AR="ar" LDFLAGS="-L/usr/local/lib64" make -C libsass #14 1099.2 make[2]: Entering directory '/tmp/RtmpvEMraB/R.INSTALL555d42b8f18e/sass/src/libsass' #14 1099.2 /usr/bin/clang -g -O2 -O2 -I ./include -fPIC -c -o src/cencode.o src/cencode.c #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/ast.o src/ast.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/ast_values.o src/ast_values.cpp #14 1099.2 src/ast_values.cpp:484:23: warning: loop variable 'numerator' creates a copy from type 'const std::__1::basic_string, std::__1::allocator>' [-Wrange-loop-construct] #14 1099.2 for (const auto numerator : numerators) #14 1099.2 ^ #14 1099.2 src/ast_values.cpp:484:12: note: use reference type 'const std::__1::basic_string, std::__1::allocator> &' to prevent copying #14 1099.2 for (const auto numerator : numerators) #14 1099.2^~ #14 1099.2 & #14 1099.2 src/ast_values.cpp:486:23: warning: loop variable 'denominator' creates a copy from type 'const std::__1::basic_string, std::__1::allocator>' [-Wrange-loop-construct] #14 1099.2 for (const auto denominator : denominators) #14 1099.2 ^ #14 1099.2 src/ast_values.cpp:486:12: note: use reference type 'const std::__1::basic_string, std::__1::allocator> &' to prevent copying #14 1099.2 for (const auto denominator : denominators) #14 1099.2^~~~ #14 1099.2 & #14 1099.2 2 warnings generated. #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/ast_supports.o src/ast_supports.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/ast_sel_cmp.o src/ast_sel_cmp.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/ast_sel_unify.o src/ast_sel_unify.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/ast_sel_super.o src/ast_sel_super.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/ast_sel_weave.o src/ast_sel_weave.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/ast_selectors.o src/ast_selectors.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/context.o src/context.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/constants.o src/constants.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/fn_utils.o src/fn_utils.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/fn_miscs.o src/fn_miscs.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/fn_maps.o src/fn_maps.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/fn_lists.o src/fn_lists.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I ./include -fPIC -c -o src/fn_colors.o src/fn_colors.cpp #14 1099.2 /usr/bin/clang++ -std=gnu++14 -stdlib=libc++ -Wall -O2 -std=c++11 -I
[jira] [Created] (ARROW-15856) [R] S3FileSystem - open_dataset
Martin du Toit created ARROW-15856: -- Summary: [R] S3FileSystem - open_dataset Key: ARROW-15856 URL: https://issues.apache.org/jira/browse/ARROW-15856 Project: Apache Arrow Issue Type: New Feature Components: R Affects Versions: 7.0.0 Reporter: Martin du Toit Hi I can successfully create a S3FileSystem that connects via minio. I can create a SubTreeFileSystem: s3://investmentaccountingdata/rawdata/transactions/transactions-xxx/v1.1/ I can list the files in the SubTreeFileSystem, and I can open a dataset on from the list of files {code:java} // code placeholder list_files <- sfs$ls(recursive=TRUE) ds <- arrow::open_dataset(sources = list_files, schema = schema_file, format = csv_format, filesystem = sfs) {code} This all works fine, if I provide the list of files, but I want to specify a path higher up to be able to include the sub folders as partitions. The code I use works perfectly if I run it on a local disk. How can I do open_dataset, and give a folder as source? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15855) [Python]Add dictionary_pagesize_limit to Parquet writer
Xinyu Zeng created ARROW-15855: -- Summary: [Python]Add dictionary_pagesize_limit to Parquet writer Key: ARROW-15855 URL: https://issues.apache.org/jira/browse/ARROW-15855 Project: Apache Arrow Issue Type: Improvement Components: Parquet, Python Reporter: Xinyu Zeng Fix For: 7.0.0 Although the python Parquet api is a wrapper of c++, there are some tuning knobs not included in python. For example, dictionary_pagesize_limit_. The dictionary page size will easily exceed the limit when any or many of the following happen: 1. The row_group_size is relatively large e.g. the default is 64M. 2. The size per entry is large e.g large string column 3. the repeatability of data is not so high. This may result in the dictionary encoding not being fully utilized if this parameter cannot be tuned. In C++, however, this parameter can be tuned to the optimized setting. There are also other parameters not exposed in python, for example, max_statistics_size. -- This message was sent by Atlassian Jira (v8.20.1#820001)