jianxind edited a comment on pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#issuecomment-649198815
Here is the results:
https://ci.ursalabs.org/#/builders/73/builds/90/steps/3/logs/result, not know
why it not paste here, I see all 0.01% get positive results.
jianxind edited a comment on pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#issuecomment-649198815
Here is the results:
https://ci.ursalabs.org/#/builders/73/builds/90/steps/3/logs/result, not know
why it not paste here, I see all 0.01% get positive results.
jianxind commented on pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#issuecomment-649198815
Here is the results:
https://ci.ursalabs.org/#/builders/73/builds/90/steps/3/logs/result, not know
why it not paste here, I see all 0.01% get positive results.
106
jianxind commented on pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#issuecomment-649197066
@ursabot benchmark --suite-filter=parquet-encoding-benchmark
--benchmark-filter=BM_Plain
This is an automated
kiszk commented on a change in pull request #7507:
URL: https://github.com/apache/arrow/pull/7507#discussion_r445289548
##
File path: cpp/src/arrow/ipc/read_write_test.cc
##
@@ -427,6 +469,14 @@ class TestIpcRoundTrip : public
::testing::TestWithParam,
void TearDown() {
wesm commented on pull request #7439:
URL: https://github.com/apache/arrow/pull/7439#issuecomment-649195633
The Java docs don't build at all
```
+ mvn -B -DskipTests -Drat.skip=true
-Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn
wesm closed pull request #7439:
URL: https://github.com/apache/arrow/pull/7439
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
wesm commented on a change in pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#discussion_r445281773
##
File path: cpp/src/arrow/util/spaced.h
##
@@ -0,0 +1,199 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license
wesm closed pull request #7538:
URL: https://github.com/apache/arrow/pull/7538
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
wesm commented on pull request #7538:
URL: https://github.com/apache/arrow/pull/7538#issuecomment-649186060
+1
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
bkietz commented on pull request #7536:
URL: https://github.com/apache/arrow/pull/7536#issuecomment-649154064
Actually, on reflection: I'm not sure it's worthwhile to check the count of
unique values at all. In any given batch a virtual column would be materialized
with a single-item
jianxind commented on a change in pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#discussion_r445249689
##
File path: cpp/src/arrow/util/spaced.h
##
@@ -0,0 +1,199 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor
jianxind commented on a change in pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#discussion_r445249248
##
File path: cpp/src/arrow/util/spaced.h
##
@@ -0,0 +1,199 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor
jianxind commented on a change in pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#discussion_r445249183
##
File path: cpp/src/arrow/util/spaced.h
##
@@ -0,0 +1,199 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor
bkietz commented on pull request #7536:
URL: https://github.com/apache/arrow/pull/7536#issuecomment-649148829
I think there's value in finding the smallest index type possible; we expect
partition fields to have few unique values in most cases.
jianxind commented on pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#issuecomment-649147883
@ursabot benchmark --suite-filter=parquet-encoding-benchmark
This is an automated message from the Apache Git
jianxind commented on a change in pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#discussion_r445245990
##
File path: cpp/src/arrow/util/spaced.h
##
@@ -0,0 +1,199 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor
wesm commented on pull request #7536:
URL: https://github.com/apache/arrow/pull/7536#issuecomment-649145717
We could use just int32() dictionary indices and call it a day?
This is an automated message from the Apache Git
github-actions[bot] commented on pull request #7538:
URL: https://github.com/apache/arrow/pull/7538#issuecomment-649144542
https://issues.apache.org/jira/browse/ARROW-7925
This is an automated message from the Apache Git
wesm opened a new pull request #7538:
URL: https://github.com/apache/arrow/pull/7538
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
wesm commented on pull request #4140:
URL: https://github.com/apache/arrow/pull/4140#issuecomment-649124737
You should be able to just rebase and the problem will go away
This is an automated message from the Apache Git
wesm commented on a change in pull request #7537:
URL: https://github.com/apache/arrow/pull/7537#discussion_r445221159
##
File path: cpp/src/arrow/python/helpers.cc
##
@@ -254,14 +255,45 @@ bool PyFloat_IsNaN(PyObject* obj) {
return PyFloat_Check(obj) &&
wesm opened a new pull request #7537:
URL: https://github.com/apache/arrow/pull/7537
This has been the root cause of a number of bugs. I'm unclear if there's a
race condition with tearing down a `static OwnedRef` so we might need some
other approach to managing symbols imported from
github-actions[bot] commented on pull request #7536:
URL: https://github.com/apache/arrow/pull/7536#issuecomment-649112307
https://issues.apache.org/jira/browse/ARROW-8647
This is an automated message from the Apache Git
github-actions[bot] commented on pull request #7535:
URL: https://github.com/apache/arrow/pull/7535#issuecomment-649112294
Thanks for opening a pull request!
Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW
Then
bkietz opened a new pull request #7536:
URL: https://github.com/apache/arrow/pull/7536
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
wesm opened a new pull request #7535:
URL: https://github.com/apache/arrow/pull/7535
See mailing list discussion.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
BryanCutler commented on pull request #6316:
URL: https://github.com/apache/arrow/pull/6316#issuecomment-649103303
I mean the current process for integration tests with the master branch is
to build Spark with Arrow Java master, then run Java and Python tests. That
process is good for
wesm commented on pull request #7396:
URL: https://github.com/apache/arrow/pull/7396#issuecomment-649102716
I used perf to record some data about the hung function
```
+ 83.37% 0.00% gandiva-decimal [unknown] [.]
0x
+ 65.62%
wesm commented on pull request #7452:
URL: https://github.com/apache/arrow/pull/7452#issuecomment-649098754
Once the utf8_lower/utf8_upper patch lands I am going to make utf8proc not
mandatory. See ARROW-9220.
This is an
kszucs commented on a change in pull request #7519:
URL: https://github.com/apache/arrow/pull/7519#discussion_r445194648
##
File path: python/pyarrow/tests/test_misc.py
##
@@ -120,7 +120,6 @@ def test_cpu_count():
pa.LargeListValue,
pa.MapValue,
kou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-649090497
Rebased.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
kou closed pull request #7452:
URL: https://github.com/apache/arrow/pull/7452
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
kou commented on pull request #7452:
URL: https://github.com/apache/arrow/pull/7452#issuecomment-649085709
+1
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
kou commented on a change in pull request #7507:
URL: https://github.com/apache/arrow/pull/7507#discussion_r445184425
##
File path: cpp/src/arrow/ipc/read_write_test.cc
##
@@ -427,6 +469,14 @@ class TestIpcRoundTrip : public
::testing::TestWithParam,
void TearDown() {
xrl commented on pull request #4140:
URL: https://github.com/apache/arrow/pull/4140#issuecomment-649081934
@wesm I'm the original author and I'd love to wrap this up. I can probably
figure out how to debug some ruby for that release script bug.
wesm commented on pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#issuecomment-649079161
There's a small rebase conflict here. Need help from Java folks here,
@rymurr can you help?
This is an automated
wesm closed pull request #7014:
URL: https://github.com/apache/arrow/pull/7014
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
wesm commented on pull request #6979:
URL: https://github.com/apache/arrow/pull/6979#issuecomment-649077394
@sonthonaxrk I would recommend opening a new PR
This is an automated message from the Apache Git Service.
To respond
wesm commented on pull request #6725:
URL: https://github.com/apache/arrow/pull/6725#issuecomment-649076794
@sbinet could you assist with this?
This is an automated message from the Apache Git Service.
To respond to the
wesm commented on pull request #6725:
URL: https://github.com/apache/arrow/pull/6725#issuecomment-649076694
Hm, at a high level it seems like it might be better to have a separate set
of LargeBinary types rather than try to pack both 32-bit and 64-bit into the
same types. This means some
wesm commented on pull request #4140:
URL: https://github.com/apache/arrow/pull/4140#issuecomment-649075783
Any hope of rehabilitating this for 1.0.0?
This is an automated message from the Apache Git Service.
To respond to
wesm commented on pull request #7374:
URL: https://github.com/apache/arrow/pull/7374#issuecomment-649075242
I'm closing this
This is an automated message from the Apache Git Service.
To respond to the message, please log on
wesm closed pull request #7374:
URL: https://github.com/apache/arrow/pull/7374
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
wesm commented on pull request #7376:
URL: https://github.com/apache/arrow/pull/7376#issuecomment-649074875
@kszucs can you take this over?
This is an automated message from the Apache Git Service.
To respond to the message,
wesm closed pull request #7520:
URL: https://github.com/apache/arrow/pull/7520
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
wesm closed pull request #7533:
URL: https://github.com/apache/arrow/pull/7533
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
wesm commented on pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#issuecomment-649067717
Benchmark results
```
$ archery benchmark diff --cc=gcc-8 --cxx=g++-8 jianxind/BitBlockSpaced
master --suite-filter=parquet-encoding
kou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-649065240
Oh, sorry.
It seems that I saw wrong CI jobs.
The link problem has been fixed by the workaround.
I'll cherry pick the workaround to
kszucs commented on pull request #6316:
URL: https://github.com/apache/arrow/pull/6316#issuecomment-649043272
> @kszucs I submitted a patch to fix Java compilation with Spark master and
branch-3.0, and tested locally with the latest pyarrow so Spark integration
tests should pass for these
wesm commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r445127367
##
File path: r/src/array_from_vector.cpp
##
@@ -915,6 +924,39 @@ class Time64Converter : public TimeConverter {
}
};
+class BinaryVectorConverter :
wesm commented on a change in pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#discussion_r445122603
##
File path: cpp/src/arrow/util/spaced.h
##
@@ -0,0 +1,199 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license
bkietz closed pull request #7493:
URL: https://github.com/apache/arrow/pull/7493
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
github-actions[bot] commented on pull request #7534:
URL: https://github.com/apache/arrow/pull/7534#issuecomment-649022873
https://issues.apache.org/jira/browse/ARROW-8729
This is an automated message from the Apache Git
bkietz opened a new pull request #7534:
URL: https://github.com/apache/arrow/pull/7534
This bug is inherited from `parquet::arrow::RowGroupRecordBatchReader`,
which yielded empty record batches when no columns were projected because no
field readers were available from which to derive
wesm closed pull request #7498:
URL: https://github.com/apache/arrow/pull/7498
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
nealrichardson commented on pull request #7527:
URL: https://github.com/apache/arrow/pull/7527#issuecomment-649000526
@romainfrancois this is ready for (and seriously needs) your review. Tests
should be passing now.
This is
nealrichardson commented on a change in pull request #7527:
URL: https://github.com/apache/arrow/pull/7527#discussion_r445098056
##
File path: r/R/schema.R
##
@@ -83,16 +83,21 @@ Schema <- R6Class("Schema",
}
),
active = list(
-names = function()
alexbaden commented on pull request #7263:
URL: https://github.com/apache/arrow/pull/7263#issuecomment-648988459
Maybe with this PR that is possible, I'll have to explore a bit once this is
merged. The concern is more around getting the order of the dictionary, etc
right in the message
BryanCutler commented on pull request #6316:
URL: https://github.com/apache/arrow/pull/6316#issuecomment-648982266
@kszucs I submitted a patch to fix Java compilation with Spark master and
branch-3.0, and tested locally with the latest pyarrow so Spark integration
tests should pass for
wesm commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r445042115
##
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##
@@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator {
ArrayType
kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r445007543
##
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##
@@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator {
ArrayType
kszucs commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r445006111
##
File path: cpp/src/arrow/testing/gtest_util.h
##
@@ -137,6 +137,8 @@ namespace arrow {
//
nealrichardson closed pull request #7297:
URL: https://github.com/apache/arrow/pull/7297
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
wesm commented on a change in pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#discussion_r444996273
##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -399,15 +434,59 @@ class TestNumericMinMaxKernel : public ::testing::Test {
};
nevi-me commented on pull request #7297:
URL: https://github.com/apache/arrow/pull/7297#issuecomment-648901870
> Ok there was one other test needing to be skipped, which I've done, and
now the tests "pass". Should we merge this and progressively unskip tests as
you can?
Yes please
andygrove commented on pull request #7297:
URL: https://github.com/apache/arrow/pull/7297#issuecomment-648901377
Yes, that would be great. Thanks!
On Wed, Jun 24, 2020 at 9:43 AM Neal Richardson
wrote:
> Ok there was one other test needing to be skipped, which I've done,
nealrichardson commented on pull request #7297:
URL: https://github.com/apache/arrow/pull/7297#issuecomment-648900443
Ok there was one other test needing to be skipped, which I've done, and now
the tests "pass". Should we merge this and progressively unskip tests as you
can?
nealrichardson commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-648889866
@xhochy if you're trying to add R Windows dependencies, see the discussion
on https://issues.apache.org/jira/browse/ARROW-6960 for pointers
nealrichardson commented on pull request #7275:
URL: https://github.com/apache/arrow/pull/7275#issuecomment-648886550
Is this good to merge now? @BryanCutler are you still planning to review
this? Would like to get this in 1.0.
wesm commented on pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#issuecomment-648879878
Looking
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
wesm commented on pull request #7532:
URL: https://github.com/apache/arrow/pull/7532#issuecomment-648877226
Here's 0.17.1 with the benchmark changes backported
https://github.com/wesm/arrow/tree/BitBlockSpacedBM-0.17.1
This
wesm closed pull request #7532:
URL: https://github.com/apache/arrow/pull/7532
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
wesm commented on pull request #7532:
URL: https://github.com/apache/arrow/pull/7532#issuecomment-648875010
For interest, I benchmarked 0.17.1 versus master with these new benchmarks
(gcc-8 on i9-9960X, with SSE4.2):
```
benchmark
wesm commented on pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#issuecomment-648873558
I sent an e-mail to dev@ -- let's discuss there
This is an automated message from the Apache Git Service.
To respond
alippai edited a comment on pull request #7533:
URL: https://github.com/apache/arrow/pull/7533#issuecomment-648752119
Can this be extended to support any scalar value? Creating a column with
single value is a common step for me (before concatenating tables, so the
fragments a are
kszucs commented on pull request #7478:
URL: https://github.com/apache/arrow/pull/7478#issuecomment-648856803
ping @wesm
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
kszucs commented on pull request #6592:
URL: https://github.com/apache/arrow/pull/6592#issuecomment-648856487
This PR was outdated, I will keep working on the windows docker containers
instead.
This is an automated message
romainfrancois commented on pull request #7524:
URL: https://github.com/apache/arrow/pull/7524#issuecomment-648848097
Added support for record batches.
Toying with the idea of a print method for the metadata, to make it less
opaque:
``` r
library(arrow)
#>
#>
kszucs commented on a change in pull request #7533:
URL: https://github.com/apache/arrow/pull/7533#discussion_r444924523
##
File path: python/pyarrow/__init__.py
##
@@ -90,7 +90,7 @@ def parse_git(root, **kwargs):
schema,
bkietz commented on a change in pull request #7526:
URL: https://github.com/apache/arrow/pull/7526#discussion_r444914459
##
File path: cpp/src/arrow/dataset/file_parquet.cc
##
@@ -357,13 +355,20 @@ static inline Result>
AugmentRowGroups(
return row_groups;
}
-Result
bkietz commented on a change in pull request #7526:
URL: https://github.com/apache/arrow/pull/7526#discussion_r444914459
##
File path: cpp/src/arrow/dataset/file_parquet.cc
##
@@ -357,13 +355,20 @@ static inline Result>
AugmentRowGroups(
return row_groups;
}
-Result
wesm commented on a change in pull request #7533:
URL: https://github.com/apache/arrow/pull/7533#discussion_r444890495
##
File path: python/pyarrow/__init__.py
##
@@ -90,7 +90,7 @@ def parse_git(root, **kwargs):
schema,
wesm commented on pull request #7533:
URL: https://github.com/apache/arrow/pull/7533#issuecomment-648817609
@alippai that is doable but would need to get done in a separate PR
This is an automated message from the Apache Git
rymurr commented on pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#issuecomment-648810719
Thanks @wesm and @jacques-n for the review. I will leave this up until
consensus is reached on the format change. Please let me know if I can help w/
the c++ patch, would be happy
alippai commented on pull request #7533:
URL: https://github.com/apache/arrow/pull/7533#issuecomment-648752119
Can this be extended to support any scalar value? Creating a column with
single value is a common step for us (before concatenating tables, so the
fragments a are differentiated
github-actions[bot] commented on pull request #7533:
URL: https://github.com/apache/arrow/pull/7533#issuecomment-648694810
https://issues.apache.org/jira/browse/ARROW-7375
This is an automated message from the Apache Git
kszucs opened a new pull request #7533:
URL: https://github.com/apache/arrow/pull/7533
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
jianxind commented on pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#issuecomment-648656865
This PR https://github.com/apache/arrow/pull/7532 add the 0.01% benchmark
case, I can trigger a benchmark action if 7532 get merged.
Below is the results for 0.01% on my
github-actions[bot] commented on pull request #7532:
URL: https://github.com/apache/arrow/pull/7532#issuecomment-648655465
https://issues.apache.org/jira/browse/ARROW-9217
This is an automated message from the Apache Git
xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-648653322
> The R ones probably?
For these, we need to add `utf8proc` to rtools40 and rtools35 and add them
to the linker line of the R build.
jianxind opened a new pull request #7532:
URL: https://github.com/apache/arrow/pull/7532
Add 0.01% null probability which represent most data are true values.
Signed-off-by: Frank Du
This is an automated message from
xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-648649038
The R ones probably?
This is an automated message from the Apache Git Service.
To respond to the message, please
xhochy commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-648648745
@kou What is the problematic CI job that shows your problem? The MinGW ones
seem fine.
This is an automated
github-actions[bot] commented on pull request #7531:
URL: https://github.com/apache/arrow/pull/7531#issuecomment-648648342
https://issues.apache.org/jira/browse/ARROW-9216
This is an automated message from the Apache Git
jianxind opened a new pull request #7531:
URL: https://github.com/apache/arrow/pull/7531
Speedup the typical use case which most data are true values, also add null
probability
test case.
Signed-off-by: Frank Du
romainfrancois commented on a change in pull request #7524:
URL: https://github.com/apache/arrow/pull/7524#discussion_r444682932
##
File path: r/R/table.R
##
@@ -202,7 +210,27 @@ Table$create <- function(..., schema = NULL) {
#' @export
as.data.frame.Table <- function(x,
romainfrancois commented on a change in pull request #7524:
URL: https://github.com/apache/arrow/pull/7524#discussion_r444675449
##
File path: r/tests/testthat/test-Table.R
##
@@ -334,5 +334,5 @@ test_that("Table metadata", {
test_that("Table handles null type
praveenbingo closed pull request #7402:
URL: https://github.com/apache/arrow/pull/7402
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
99 matches
Mail list logo