[jira] [Created] (ARROW-12430) [C++] Support LZO compression

2021-04-16 Thread Haowei Yu (Jira)
Haowei Yu created ARROW-12430:
-

 Summary: [C++] Support LZO compression
 Key: ARROW-12430
 URL: https://issues.apache.org/jira/browse/ARROW-12430
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Haowei Yu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12429) [C++] MergedGeneratorTestFixture is incorrectly instantiated

2021-04-16 Thread David Li (Jira)
David Li created ARROW-12429:


 Summary: [C++] MergedGeneratorTestFixture is incorrectly 
instantiated
 Key: ARROW-12429
 URL: https://issues.apache.org/jira/browse/ARROW-12429
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li
Assignee: David Li


[https://gist.github.com/kou/868eaed328b348e45865747044044272#file-source-cpp-txt]

Looks like the base class was accidentally instantiated instead of the actual 
test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12428) [Python] pyarrow.parquet.read_* should use pre_buffer=True

2021-04-16 Thread David Li (Jira)
David Li created ARROW-12428:


 Summary: [Python] pyarrow.parquet.read_* should use pre_buffer=True
 Key: ARROW-12428
 URL: https://issues.apache.org/jira/browse/ARROW-12428
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: David Li
Assignee: David Li
 Fix For: 5.0.0


If the user is synchronously reading a single file, we should try to read it as 
fast as possible. The one sticking point might be whether it's beneficial to 
enable this no matter the filesystem or whether we should try to only enable it 
on high-latency filesystems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12427) [Rust][DataFusion] Reenable physical_optimizer::repartition::Repartition;

2021-04-16 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12427:
---

 Summary: [Rust][DataFusion] Reenable 
physical_optimizer::repartition::Repartition;
 Key: ARROW-12427
 URL: https://issues.apache.org/jira/browse/ARROW-12427
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb


To fix https://issues.apache.org/jira/browse/ARROW-12421

We disabled the physical_optimizer::repartition::Repartition rule in 
https://github.com/apache/arrow/pull/10069


this ticket tracks finding the root cause of the CI test failure and reenabing 
physical_optimizer::repartition::Repartition;




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12426) [Rust] Concatenating dictionaries ignores values

2021-04-16 Thread Raphael Taylor-Davies (Jira)
Raphael Taylor-Davies created ARROW-12426:
-

 Summary: [Rust] Concatenating dictionaries ignores values
 Key: ARROW-12426
 URL: https://issues.apache.org/jira/browse/ARROW-12426
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Raphael Taylor-Davies
Assignee: Raphael Taylor-Davies


Concatenating dictionaries ignores the values array, at best leading to 
incorrect data, but often leading to keys with indexes beyond the bounds of the 
values array



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12425) [Rust] new_null_array doesn't allocate keys buffer for dictionary arrays

2021-04-16 Thread Raphael Taylor-Davies (Jira)
Raphael Taylor-Davies created ARROW-12425:
-

 Summary: [Rust] new_null_array doesn't allocate keys buffer for 
dictionary arrays
 Key: ARROW-12425
 URL: https://issues.apache.org/jira/browse/ARROW-12425
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Raphael Taylor-Davies
Assignee: Raphael Taylor-Davies






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12424) Add Schema Package

2021-04-16 Thread Matt Topol (Jira)
Matt Topol created ARROW-12424:
--

 Summary: Add Schema Package
 Key: ARROW-12424
 URL: https://issues.apache.org/jira/browse/ARROW-12424
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go, Parquet
Reporter: Matt Topol
Assignee: Matt Topol


Adding the ported code for the Schema module for Go Parquet library.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12423) Codecov badge in main Readme only applies to Rust

2021-04-16 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-12423:
--

 Summary: Codecov badge in main Readme only applies to Rust
 Key: ARROW-12423
 URL: https://issues.apache.org/jira/browse/ARROW-12423
 Project: Apache Arrow
  Issue Type: Task
Reporter: Dominik Moritz


The badge in https://github.com/apache/arrow/blob/master/README.md links to 
https://app.codecov.io/gh/apache/arrow, which seems to only show the coverage 
for the Rust code. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12422) Add castVARCHAR for milliseconds

2021-04-16 Thread Rodrigo Jacomozzi de Bem (Jira)
Rodrigo Jacomozzi de Bem created ARROW-12422:


 Summary: Add castVARCHAR for milliseconds
 Key: ARROW-12422
 URL: https://issues.apache.org/jira/browse/ARROW-12422
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Rodrigo Jacomozzi de Bem






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12421) [Rust] [DataFusion] topk_query test fails in master

2021-04-16 Thread Andy Grove (Jira)
Andy Grove created ARROW-12421:
--

 Summary: [Rust] [DataFusion] topk_query test fails in master
 Key: ARROW-12421
 URL: https://issues.apache.org/jira/browse/ARROW-12421
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Andy Grove


{code:java}
 Running target/debug/deps/user_defined_plan-6b63acb904117235running 3 tests
test topk_plan ... ok
test topk_query ... FAILED
test normal_query ... okfailures: topk_query stdout 
thread 'topk_query' panicked at 'assertion failed: `(left == right)`
  left: `["+-+-+", "| customer_id | revenue |", 
"+-+-+", "| paul| 300 |", "| jorge   | 200  
   |", "| andy| 150 |", "+-+-+"]`,
 right: `["++", "||", "++", "++"]`: output mismatch for Topk context. Expectedn
+-+-+
| customer_id | revenue |
+-+-+
| paul| 300 |
| jorge   | 200 |
| andy| 150 |
+-+-+Actual:
++
||
++
++
', datafusion/tests/user_defined_plan.rs:133:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12420) [C++/Dataset] Reading null columns as dictionary not longer possible

2021-04-16 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-12420:


 Summary: [C++/Dataset] Reading null columns as dictionary not 
longer possible
 Key: ARROW-12420
 URL: https://issues.apache.org/jira/browse/ARROW-12420
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 4.0.0
Reporter: Uwe Korn
 Fix For: 4.0.0


Reading a dataset with a dictionary column where some of the files don't 
contain any data for that column (and thus are typed as null) broke with 
https://github.com/apache/arrow/pull/9532. It worked with the 3.0 release 
though and thus I would consider this a regression.

This can be reproduced using the following Python snippet:

{code}
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds

table = pa.table({"a": [None, None]})
pq.write_table(table, "test.parquet")
schema = pa.schema([pa.field("a", pa.dictionary(pa.int32(), pa.string()))])
fsds = ds.FileSystemDataset.from_paths(
paths=["test.parquet"],
schema=schema,
format=pa.dataset.ParquetFileFormat(),
filesystem=pa.fs.LocalFileSystem(),
)
fsds.to_table()
{code}

The exception on master is currently:

{code}
---
ArrowNotImplementedError  Traceback (most recent call last)
 in 
  6 filesystem=pa.fs.LocalFileSystem(),
  7 )
> 8 fsds.to_table()

~/Development/arrow/python/pyarrow/_dataset.pyx in 
pyarrow._dataset.Dataset.to_table()
456 table : Table instance
457 """
--> 458 return self._scanner(**kwargs).to_table()
459 
460 def head(self, int num_rows, **kwargs):

~/Development/arrow/python/pyarrow/_dataset.pyx in 
pyarrow._dataset.Scanner.to_table()
   2887 result = self.scanner.ToTable()
   2888 
-> 2889 return pyarrow_wrap_table(GetResultValue(result))
   2890 
   2891 def take(self, object indices):

~/Development/arrow/python/pyarrow/error.pxi in 
pyarrow.lib.pyarrow_internal_check_status()
139 cdef api int pyarrow_internal_check_status(const CStatus& status) \
140 nogil except -1:
--> 141 return check_status(status)
142 
143 

~/Development/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
116 raise ArrowKeyError(message)
117 elif status.IsNotImplemented():
--> 118 raise ArrowNotImplementedError(message)
119 elif status.IsTypeError():
120 raise ArrowTypeError(message)

ArrowNotImplementedError: Unsupported cast from null to 
dictionary (no available cast function 
for target type)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12419) [Java] flatc is not used in mvn

2021-04-16 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-12419:


 Summary: [Java] flatc is not used in mvn
 Key: ARROW-12419
 URL: https://issues.apache.org/jira/browse/ARROW-12419
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 4.0.0
Reporter: Kazuaki Ishizaki
Assignee: Kazuaki Ishizaki


ARROW-12111 removed the usage of flatc during the build process in mvn. Thus, 
it is not necessary to explicitly download flatc for s390x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12418) 1Z0-1072 PDF - Become Oracle Certified With The Help Of Prepare4test

2021-04-16 Thread Andrew Sharon (Jira)
Andrew Sharon created ARROW-12418:
-

 Summary: 1Z0-1072 PDF - Become Oracle Certified With The Help Of 
Prepare4test
 Key: ARROW-12418
 URL: https://issues.apache.org/jira/browse/ARROW-12418
 Project: Apache Arrow
  Issue Type: Task
Reporter: Andrew Sharon


*Take Up the 1Z0-1072 Exam For a Successful Career!*

In order to prove your expertise in the Oracle Cloud Infrastructure 2019 
Architect Associate Exam the best thing you could do is to take up the exam 
1Z0-1072. This would bring instant fame to you and also prove that you are an 
Oracle Cloud expert. The passing score is decided by the Oracle and it is 
likely to change. You may refer to the Oracle website in order to find the 
correct passing score. 

There are many recommended 1Z0-1072 courses that you may take up for the Oracle 
Cloud Infrastructure 2019 Architect Associate Exam exam and these include 
Oracle Cloud services etc. Having this knowledge would help you to perform well 
in the 1Z0-1072 exam. All these training programs are offered by Oracle and you 
can make use of the online training option in order to get trained at home 
itself! The [Oracle Cloud exam|http://prepare4test.com/exam/1z0-1072-dumps/] 
syllabus would include topics like basic Oracle Cloud Infrastructure 2019 
Architect Associate Exam etc. In case you failed to pass the 1Z0-1072 exam with 
the required percentage of marks, you could re attend the Oracle Cloud 
Infrastructure 2019 Architect Associate Exam exam. In order to prepare well for 
the Oracle Cloud exam, you could take up various coaching programs by the 
Oracle university. There are different types of programs which includes 
instructor led class, web based class. You could choose the appropriate program 
based on your convenience. There are also lots of 1Z0-1072 practice exams that 
are available online. You could take up these 1Z0-1072 practice tests in order 
to understand the Oracle Cloud Infrastructure 2019 Architect Associate Exam 
exam pattern in a better way!

[!https://i.imgur.com/maE1HKX.jpg!|http://prepare4test.com/exam/1z0-1072-dumps/]
{quote} {quote}
*Why Oracle Cloud Infrastructure 2019 Architect Associate Exam training and 
certification?*

IT professionals those who are Oracle Cloud training and certification holders 
boast a distinct advantage over other IT aspirants. Oracle 1Z0-1072 
certification is valuable and globally recognized credential that prove the 
skills and expertise of the IT professionals. Oracle Cloud is the most 
innovative and top data base product, developed to handle the massive and 
continuously growing and expanding requirements of modern organizations at 
lower costs, with high quality standards. Oracle Cloud Infrastructure 2019 
Architect Associate Exam certification bring forth the aspirants' level of 
knowledge and skills to create and maintain Oracle Cloud environment, etc. This 
is hence, can be considered as one of the highly respectable and viable Oracle 
certification in the industry. 1Z0-1072 IT professionals already working in the 
industry get benefited by being eligible to get a salary raise, also strengthen 
and create newer avenues in the job market and career hierarchy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)