Kouhei Sutou created ARROW-7668:
---
Summary: [Packaging][RPM] Use NInja if possible to reduce build
time
Key: ARROW-7668
URL: https://issues.apache.org/jira/browse/ARROW-7668
Project: Apache Arrow
I would vote for treating nulls as empty.
On Fri, Jan 10, 2020 at 12:36 AM Ji Liu wrote:
> Hi all,
>
> Currently isEmpty API is always return false in BaseRepeatedValueVector,
> and its subclass ListVector did not overwrite this method.
> This will lead to incorrect result, for example, a
Looking at this it seems like the main change is require empty lists
instead of null values? I think this might potentially be too strict for
existing degenerate cases (e.g. empty files, I also don't remember if we
said null type requires a buffer).
Most of the others like MessageHeader make
Sounds good, I'll leave it up to you which to implement. Thanks for taking
it on.
On Sun, Jan 19, 2020 at 8:47 PM Fan Liya wrote:
> Hi Jacques and Micah,
>
> Thanks for the fruitful discussion.
>
> It seems netty based allocator and unsafe based allocator have their
> specific advantages.
>
Hi John,
Not Wes, but my thoughts on this are as follows:
1. Alternate bit/byte arrangements can also be useful for processing [1] in
addition to compression.
2. I think they are quite a bit more complicated then the existing schemes
proposed in [2], so I think it would be more expedient to get
I mentioned this elsewhere but my intent is to stop doing java reviews for
the immediate future once I wrap up the few that I have requested change on.
I'm happy to try to triage incoming Java PRs, but in order to do this, I
need to know which committers have some bandwidth to do reviews (some of
One of the things that I think got overlooked in the conversation on having
a slice offset in the C API was a suggestion from Jacques of perhaps
generalizing the concept to an arbitrary "filter" for arrays/record batches.
I believe this point was also discussed in the past as well. I'm not
Kouhei Sutou created ARROW-7667:
---
Summary: [Packaging][deb] ubuntu-eoan is missing in nightly jobs
Key: ARROW-7667
URL: https://issues.apache.org/jira/browse/ARROW-7667
Project: Apache Arrow
Kouhei Sutou created ARROW-7666:
---
Summary: [Packaging][deb] Always use NInja to reduce build time
Key: ARROW-7666
URL: https://issues.apache.org/jira/browse/ARROW-7666
Project: Apache Arrow
Thanks for investigating this and the quick fix Joris and Wes! I just have
a couple questions about the behavior observed here. The pyspark code
assigns either the same series back to the pandas.DataFrame or makes some
modifications if it is a timestamp. In the case there are no timestamps, is
Wes, what do you think about Arrow supporting a new suite of fixed-length
data types that unshuffle on column->Value(i) calls? This would allow
memory/swap compressors and memory maps backed by compressing
filesystems (ZFS) or block devices (VDO) to operate more efficiently.
By doing it with new
Antoine Pitrou created ARROW-7665:
-
Summary: [R] linuxLibs.R should build in parallel
Key: ARROW-7665
URL: https://issues.apache.org/jira/browse/ARROW-7665
Project: Apache Arrow
Issue Type:
Ben Kietzman created ARROW-7664:
---
Summary: [C++] Extract localfs default from FileSystemFromUri
Key: ARROW-7664
URL: https://issues.apache.org/jira/browse/ARROW-7664
Project: Apache Arrow
On Thu, Jan 23, 2020 at 12:42 PM John Muehlhausen wrote:
>
> Again, I know very little about Parquet, so your patience is appreciated.
>
> At the moment I can Arrow/mmap a file without having anywhere nearly as
> much available memory as the file size. I can visit random place in the
> file
Again, I know very little about Parquet, so your patience is appreciated.
At the moment I can Arrow/mmap a file without having anywhere nearly as
much available memory as the file size. I can visit random place in the
file (such as a binary search if it is ordered) and only the locations
visited
David Li created ARROW-7663:
---
Summary: from_pandas gives TypeError instead of ArrowTypeError in
some cases
Key: ARROW-7663
URL: https://issues.apache.org/jira/browse/ARROW-7663
Project: Apache Arrow
Parquet is most relevant in scenarios filesystem IO is constrained
(spinning rust HDD, network FS, cloud storage / S3 / GCS). For those
use cases memory-mapped Arrow is not viable.
Against local NVMe (> 2000 MB/s read throughput) your mileage may vary.
On Thu, Jan 23, 2020 at 12:06 PM Francois
This could also have utility in memory via things like zram/zswap, right?
Mac also has a memory compressor?
I don't think Parquet is an option for me unless the integration with Arrow
is tighter than I imagine (i.e. zero-copy). That said, I confess I know
next to nothing about Parquet.
On Thu,
Forgot to give the URL:
https://github.com/apache/arrow/pull/6005
Regards
Antoine.
Le 23/01/2020 à 18:23, Antoine Pitrou a écrit :
>
> Le 23/01/2020 à 18:16, John Muehlhausen a écrit :
>> Perhaps related to this thread, are there any current or proposed tools to
>> transform columns for
Le 23/01/2020 à 18:16, John Muehlhausen a écrit :
> Perhaps related to this thread, are there any current or proposed tools to
> transform columns for fixed-length data types according to a "shuffle?"
> For precedent see the implementation of the shuffle filter in hdf5.
>
Perhaps related to this thread, are there any current or proposed tools to
transform columns for fixed-length data types according to a "shuffle?"
For precedent see the implementation of the shuffle filter in hdf5.
Arrow Build Report for Job nightly-2020-01-23-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-23-0
Failed Tasks:
- conda-win-vs2015-py36:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-23-0-azure-conda-win-vs2015-py36
-
Michael Chirico created ARROW-7662:
--
Summary: Support for auto-inferring list column->array in
write_parquet
Key: ARROW-7662
URL: https://issues.apache.org/jira/browse/ARROW-7662
Project: Apache
Projjal Chanda created ARROW-7660:
-
Summary: [C++][Gandiva] Optimise castVarchar(string, int) function
for single byte characters
Key: ARROW-7660
URL: https://issues.apache.org/jira/browse/ARROW-7660
24 matches
Mail list logo