[jira] [Commented] (ARROW-6282) Support lossy compression

2019-08-17 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909870#comment-16909870
 ] 

Micah Kornfield commented on ARROW-6282:


[~domoritz] it is definitely worth discussing your implementation plans/design 
on ML before getting too far, especially if this will require changes to the 
IPC specification.

> Support lossy compression
> -
>
> Key: ARROW-6282
> URL: https://issues.apache.org/jira/browse/ARROW-6282
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Dominik Moritz
>Priority: Major
>
> Arrow dataframes with large columns of integers or floats can be compressed 
> using gzip or brotli. However, in some cases it will be okay to compress the 
> data lossy to achieve even higher compression ratios. The main use case for 
> this is visualization where small inaccuracies matter less. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (ARROW-6267) [Ruby] Add Arrow::Time for Arrow::Time{32,64}DataType value

2019-08-17 Thread Yosuke Shiro (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yosuke Shiro resolved ARROW-6267.
-
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request 5102
[https://github.com/apache/arrow/pull/5102]

> [Ruby] Add Arrow::Time for Arrow::Time{32,64}DataType value
> ---
>
> Key: ARROW-6267
> URL: https://issues.apache.org/jira/browse/ARROW-6267
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Ruby
>Reporter: Sutou Kouhei
>Assignee: Sutou Kouhei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6282) Support lossy compression

2019-08-17 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909851#comment-16909851
 ] 

Brian Hulette commented on ARROW-6282:
--

Great idea! I think right now we only support compressing entire record record 
batches, to make this work would need buffer-level compression so that we could 
just compress the floating-point buffers. [~emkornfi...@gmail.com] did write up 
a proposal that included buffer-level compression, among other things: 
[strawman PR|https://github.com/apache/arrow/pull/4815], [ML 
discussion|https://lists.apache.org/thread.html/a99124e57c14c3c9ef9d98f3c80cfe1dd25496bf3ff7046778add937@%3Cdev.arrow.apache.org%3E]

> Support lossy compression
> -
>
> Key: ARROW-6282
> URL: https://issues.apache.org/jira/browse/ARROW-6282
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Dominik Moritz
>Priority: Major
>
> Arrow dataframes with large columns of integers or floats can be compressed 
> using gzip or brotli. However, in some cases it will be okay to compress the 
> data lossy to achieve even higher compression ratios. The main use case for 
> this is visualization where small inaccuracies matter less. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (ARROW-6270) [C++][Fuzzing] IPC reads do not check buffer indices

2019-08-17 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6270.
-
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request 5105
[https://github.com/apache/arrow/pull/5105]

> [C++][Fuzzing] IPC reads do not check buffer indices
> 
>
> Key: ARROW-6270
> URL: https://issues.apache.org/jira/browse/ARROW-6270
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Marco Neumann
>Assignee: Marco Neumann
>Priority: Major
>  Labels: fuzzer, pull-request-available
> Fix For: 0.15.0
>
> Attachments: crash-bd7e00178af2d236fdf041fcc1fb30975bf8fbca
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The attached crash was found by {{arrow-ipc-fuzzing-test}} and indicates that 
> the IPC reader is not checking the flatbuffer encoded buffers for length and 
> can produce out-of-bounds-reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (ARROW-5085) [Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups

2019-08-17 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5085.
-
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request 5107
[https://github.com/apache/arrow/pull/5107]

> [Python/C++] Conversion of dict encoded null column fails in parquet writing 
> when using RowGroups
> -
>
> Key: ARROW-5085
> URL: https://issues.apache.org/jira/browse/ARROW-5085
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Florian Jetter
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: parquet, pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Conversion of dict encoded null column fails in parquet writing when using 
> RowGroups
> {code:python}
> import pyarrow.parquet as pq
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"col": [None] * 100, "int": [1.0] * 100})
> df = df.astype({"col": "category"})
> table = pa.Table.from_pandas(df)
> buf = pa.BufferOutputStream()
> pq.write_table(
> table,
> buf,
> version="2.0",
> chunk_size=10,
> )
> {code}
> fails with 
> {{pyarrow.lib.ArrowIOError: Column 2 had 100 while previous column had 10}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (ARROW-5028) [Python][C++] Creating list with pyarrow.array can overflow child builder

2019-08-17 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5028.
-
Resolution: Fixed

Issue resolved by pull request 5108
[https://github.com/apache/arrow/pull/5108]

> [Python][C++] Creating list with pyarrow.array can overflow child 
> builder
> -
>
> Key: ARROW-5028
> URL: https://issues.apache.org/jira/browse/ARROW-5028
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1, 0.13.0
> Environment: python 3.6
>Reporter: Marco Neumann
>Assignee: Wes McKinney
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.15.0
>
> Attachments: dct.json.gz, dct.pickle.gz
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I am sorry if this bugs feels rather long and the reproduction data is large, 
> but I was not able to reduce the data even further while still triggering the 
> problem. I was able to trigger this behavior on master and on {{0.11.1}}.
> {code:python}
> import io
> import os.path
> import pickle
> import numpy as np
> import pyarrow as pa
> import pyarrow.parquet as pq
> def dct_to_table(index_dct):
> labeled_array = pa.array(np.array(list(index_dct.keys(
> partition_array = pa.array(np.array(list(index_dct.values(
> return pa.Table.from_arrays(
> [labeled_array, partition_array], names=['a', 'b']
> )
> def check_pq_nulls(data):
> fp = io.BytesIO(data)
> pfile = pq.ParquetFile(fp)
> assert pfile.num_row_groups == 1
> md = pfile.metadata.row_group(0)
> col = md.column(1)
> assert col.path_in_schema == 'b.list.item'
> assert col.statistics.null_count == 0  # fails
> def roundtrip(table):
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf)
> data = buf.getvalue().to_pybytes()
> # this fails:
> #   check_pq_nulls(data)
> reader = pa.BufferReader(data)
> return pq.read_table(reader)
> with open(os.path.join(os.path.dirname(__file__), 'dct.pickle'), 'rb') as fp:
> dct = pickle.load(fp)
> # this does NOT help:
> #   pa.set_cpu_count(1)
> #   import gc; gc.disable()
> table = dct_to_table(dct)
> # this fixes the issue:
> #   table = pa.Table.from_pandas(table.to_pandas())
> table2 = roundtrip(table)
> assert table.column('b').null_count == 0
> assert table2.column('b').null_count == 0  # fails
> # if table2 is converted to pandas, you can also observe that some values at 
> the end of column b are `['']` which clearly is not present in the original 
> data
> {code}
> I would also be thankful for any pointers on where the bug comes from or on 
> who to reduce the test case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6287) [Rust] [DataFusion] Refactor TableProvider to return thread-safe BatchIterator

2019-08-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6287:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Refactor TableProvider to return thread-safe BatchIterator
> --
>
> Key: ARROW-6287
> URL: https://issues.apache.org/jira/browse/ARROW-6287
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> This refactor is a step towards implementing the new query execution that 
> supports partitions and parallel execution.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6287) [Rust] [DataFusion] Refactor TableProvider to return thread-safe BatchIterator

2019-08-17 Thread Andy Grove (JIRA)
Andy Grove created ARROW-6287:
-

 Summary: [Rust] [DataFusion] Refactor TableProvider to return 
thread-safe BatchIterator
 Key: ARROW-6287
 URL: https://issues.apache.org/jira/browse/ARROW-6287
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.15.0


This refactor is a step towards implementing the new query execution that 
supports partitions and parallel execution.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6282) Support lossy compression

2019-08-17 Thread Martin Radev (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909778#comment-16909778
 ] 

Martin Radev commented on ARROW-6282:
-

Hello Dominik,

are you going to work on this new feature?

I actually already began working on this feature though not directly for Arrow. 
In particular, my work focuses on investigating, designing, proposing and 
implementing an extension to Apache Parquet for support of lossy and lossless 
floating-point compression.
I had an initial report which can be read here: 
[https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view?usp=sharing]

I investigated two lossy compressors: ZFP and SZ. I concluded that, despite 
SZ's better compression ratio, it cannot be introduced to Parquet since the 
implementation is not mature enough - the API is poorly designed, it is not 
thread safe and I observed two segfaults locally. Developers have been also 
slow to correspond. For example, this issue I opened has not led to any 
discussion [https://github.com/disheng222/SZ/issues/29]

ZFP seems to be a safer choice for using it in Parquet. There's still some 
consideration on the way the user should specify how data should be discarded. 
In particular, it should be designed in such a way that other lossy compressors 
can be added in the future. These are the error modes I've observed in my 
investigation: absolute error, relative error, number of mantissa bits to 
discard.

Can you please share at what stage are you currently in working on the feature? 
I think we can collaborate.

> Support lossy compression
> -
>
> Key: ARROW-6282
> URL: https://issues.apache.org/jira/browse/ARROW-6282
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Dominik Moritz
>Priority: Major
>
> Arrow dataframes with large columns of integers or floats can be compressed 
> using gzip or brotli. However, in some cases it will be okay to compress the 
> data lossy to achieve even higher compression ratios. The main use case for 
> this is visualization where small inaccuracies matter less. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6286) [GLib] Add support for LargeList type

2019-08-17 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-6286:
---

 Summary: [GLib] Add support for LargeList type
 Key: ARROW-6286
 URL: https://issues.apache.org/jira/browse/ARROW-6286
 Project: Apache Arrow
  Issue Type: New Feature
  Components: GLib
Reporter: Yosuke Shiro
Assignee: Yosuke Shiro






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6285) [GLib] Add support for LargeBinary and LargeString types

2019-08-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6285:
--
Labels: pull-request-available  (was: )

> [GLib] Add support for LargeBinary and LargeString types
> 
>
> Key: ARROW-6285
> URL: https://issues.apache.org/jira/browse/ARROW-6285
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Reporter: Yosuke Shiro
>Assignee: Yosuke Shiro
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6285) [GLib] Add support for LargeBinary and LargeString types

2019-08-17 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-6285:
---

 Summary: [GLib] Add support for LargeBinary and LargeString types
 Key: ARROW-6285
 URL: https://issues.apache.org/jira/browse/ARROW-6285
 Project: Apache Arrow
  Issue Type: New Feature
  Components: GLib
Reporter: Yosuke Shiro
Assignee: Yosuke Shiro






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6284) [C++] Allow references in std::tuple when converting tuple to arrow array

2019-08-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6284:
--
Labels: pull-request-available  (was: )

> [C++] Allow references in std::tuple when converting tuple to arrow array
> -
>
> Key: ARROW-6284
> URL: https://issues.apache.org/jira/browse/ARROW-6284
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Minor
>  Labels: pull-request-available
>
> This allows using std::tuple (e.g. std::tie) to convert user data types. More 
> details will be provided in PR.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6284) [C++] Allow references in std::tuple when converting tuple to arrow array

2019-08-17 Thread Omer Ozarslan (JIRA)
Omer Ozarslan created ARROW-6284:


 Summary: [C++] Allow references in std::tuple when converting 
tuple to arrow array
 Key: ARROW-6284
 URL: https://issues.apache.org/jira/browse/ARROW-6284
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Omer Ozarslan


This allows using std::tuple (e.g. std::tie) to convert user data types. More 
details will be provided in PR.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6101) [Rust] [DataFusion] Create physical plan from logical plan

2019-08-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6101:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Create physical plan from logical plan
> --
>
> Key: ARROW-6101
> URL: https://issues.apache.org/jira/browse/ARROW-6101
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
>
> Once the physical plan is in place and can be executed, I will implement 
> logic to convert the logical plan to a physical plan and remove the legacy 
> code for directly executing a logical plan.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6283) [Rust] [DataFusion] Implement operator to write query results to partitioned CSV

2019-08-17 Thread Andy Grove (JIRA)
Andy Grove created ARROW-6283:
-

 Summary: [Rust] [DataFusion] Implement operator to write query 
results to partitioned CSV
 Key: ARROW-6283
 URL: https://issues.apache.org/jira/browse/ARROW-6283
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5227) [Rust] [DataFusion] Re-implement query execution with an extensible physical query plan

2019-08-17 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-5227:
--
Summary: [Rust] [DataFusion] Re-implement query execution with an 
extensible physical query plan  (was: [Rust] [DataFusion] Implement parallel 
query execution)

> [Rust] [DataFusion] Re-implement query execution with an extensible physical 
> query plan
> ---
>
> Key: ARROW-5227
> URL: https://issues.apache.org/jira/browse/ARROW-5227
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  This story (maybe it should have been an epic with hindsight) is to 
> re-implement query execution in DataFusion using a physical plan that 
> supports partitions and parallel execution.
> This will replace the current query execution which happens directly from the 
> logical plan.
> The new physical plan is based on traits and is therefore extensible by other 
> projects that use Arrow. For example, another project could add physical 
> plans for distributed compute.
> See design doc at 
> [https://docs.google.com/document/d/1ATZGIs8ry_kJeoTgmJjLrg6Ssb5VE7lNzWuz_4p6EWk/edit?usp=sharing]
>  for more info



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5227) [Rust] [DataFusion] Implement parallel query execution

2019-08-17 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-5227:
--
Description: 
 This story (maybe it should have been an epic with hindsight) is to 
re-implement query execution in DataFusion using a physical plan that supports 
partitions and parallel execution.

This will replace the current query execution which happens directly from the 
logical plan.

The new physical plan is based on traits and is therefore extensible by other 
projects that use Arrow. For example, another project could add physical plans 
for distributed compute.

See design doc at 
[https://docs.google.com/document/d/1ATZGIs8ry_kJeoTgmJjLrg6Ssb5VE7lNzWuz_4p6EWk/edit?usp=sharing]
 for more info

  was:
 

 

Implement parallel query execution to take advantage of multiple cores when 
running queries.

See design doc at 
[https://docs.google.com/document/d/1ATZGIs8ry_kJeoTgmJjLrg6Ssb5VE7lNzWuz_4p6EWk/edit?usp=sharing]
 for more info


> [Rust] [DataFusion] Implement parallel query execution
> --
>
> Key: ARROW-5227
> URL: https://issues.apache.org/jira/browse/ARROW-5227
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  This story (maybe it should have been an epic with hindsight) is to 
> re-implement query execution in DataFusion using a physical plan that 
> supports partitions and parallel execution.
> This will replace the current query execution which happens directly from the 
> logical plan.
> The new physical plan is based on traits and is therefore extensible by other 
> projects that use Arrow. For example, another project could add physical 
> plans for distributed compute.
> See design doc at 
> [https://docs.google.com/document/d/1ATZGIs8ry_kJeoTgmJjLrg6Ssb5VE7lNzWuz_4p6EWk/edit?usp=sharing]
>  for more info



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5227) [Rust] [DataFusion] Implement parallel query execution

2019-08-17 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-5227:
--
Description: 
 

 

Implement parallel query execution to take advantage of multiple cores when 
running queries.

See design doc at 
[https://docs.google.com/document/d/1ATZGIs8ry_kJeoTgmJjLrg6Ssb5VE7lNzWuz_4p6EWk/edit?usp=sharing]
 for more info

  was:
Implement parallel query execution to take advantage of multiple cores when 
running queries.

See design doc at 
[https://docs.google.com/document/d/1ATZGIs8ry_kJeoTgmJjLrg6Ssb5VE7lNzWuz_4p6EWk/edit?usp=sharing]
 for more info


> [Rust] [DataFusion] Implement parallel query execution
> --
>
> Key: ARROW-5227
> URL: https://issues.apache.org/jira/browse/ARROW-5227
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  
>  
> Implement parallel query execution to take advantage of multiple cores when 
> running queries.
> See design doc at 
> [https://docs.google.com/document/d/1ATZGIs8ry_kJeoTgmJjLrg6Ssb5VE7lNzWuz_4p6EWk/edit?usp=sharing]
>  for more info



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-4588) [JS] add logging

2019-08-17 Thread Dominik Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909640#comment-16909640
 ] 

Dominik Moritz commented on ARROW-4588:
---

I don't think we have logging set up yet. 

> [JS] add logging
> 
>
> Key: ARROW-4588
> URL: https://issues.apache.org/jira/browse/ARROW-4588
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Major
>
> As discussed in https://github.com/apache/arrow/pull/3634, the javascript 
> library will need some logging infrastructure. The goals for this 
> implementation are a lightweight logger that can be easily configured to not 
> write to console. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6282) Support lossy compression

2019-08-17 Thread Dominik Moritz (JIRA)
Dominik Moritz created ARROW-6282:
-

 Summary: Support lossy compression
 Key: ARROW-6282
 URL: https://issues.apache.org/jira/browse/ARROW-6282
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Dominik Moritz


Arrow dataframes with large columns of integers or floats can be compressed 
using gzip or brotli. However, in some cases it will be okay to compress the 
data lossy to achieve even higher compression ratios. The main use case for 
this is visualization where small inaccuracies matter less. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)