[jira] [Created] (ARROW-10764) [Rust] Inline small JSON and CSV test files
Neville Dipale created ARROW-10764: -- Summary: [Rust] Inline small JSON and CSV test files Key: ARROW-10764 URL: https://issues.apache.org/jira/browse/ARROW-10764 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale Some of our tests use small CSV and JSON files, which we could inline in the code, instead of adding more files to test data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10763) [Rust] Speed up take kernels
Daniël Heres created ARROW-10763: Summary: [Rust] Speed up take kernels Key: ARROW-10763 URL: https://issues.apache.org/jira/browse/ARROW-10763 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Daniël Heres Speed up take kernels for non-null data -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10762) Configuration does not provide a mapping for array column
Benjamin Du created ARROW-10762: --- Summary: Configuration does not provide a mapping for array column Key: ARROW-10762 URL: https://issues.apache.org/jira/browse/ARROW-10762 Project: Apache Arrow Issue Type: Bug Reporter: Benjamin Du I tried to leverage `org.apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrow` to query a Hive table but got the following error message on array columns. {{Configuration does not provide a mapping for array column 234}} The error message makes wondering whether it is possible to provided a customized configuration for array columns? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10761) [Rust] [DataFusion] Add SQL support for referencing fields in structs
Andy Grove created ARROW-10761: -- Summary: [Rust] [DataFusion] Add SQL support for referencing fields in structs Key: ARROW-10761 URL: https://issues.apache.org/jira/browse/ARROW-10761 Project: Apache Arrow Issue Type: New Feature Components: Rust - DataFusion Reporter: Andy Grove -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10760) [Rust] [DataFusion] Predicate push down does not support joins correctly
Andy Grove created ARROW-10760: -- Summary: [Rust] [DataFusion] Predicate push down does not support joins correctly Key: ARROW-10760 URL: https://issues.apache.org/jira/browse/ARROW-10760 Project: Apache Arrow Issue Type: Bug Components: Rust - DataFusion Reporter: Andy Grove Fix For: 3.0.0 {code:java} equijoin_implicit_syntax_with_filter stdout thread 'equijoin_implicit_syntax_with_filter' panicked at 'Creating physical plan for 'SELECT t1_id, t1_name, t2_name FROM t1, t2 WHERE t1_id > 0 AND t1_id = t2_id AND t2_id < 99 ORDER BY t1_id': Sort: #t1_id ASC NULLS FIRST Projection: #t1_id, #t1_name, #t2_name Join: t1_id = t2_id Filter: #t1_id Gt Int64(0) And #t2_id Lt Int64(99) TableScan: t1 projection=Some([0, 1]) Filter: #t1_id Gt Int64(0) And #t2_id Lt Int64(99) TableScan: t2 projection=Some([0, 1]): ArrowError(InvalidArgumentError("Unable to get field named \"t2_id\". Valid fields: [\"t1_id\", \"t1_name\"]"))', datafusion/tests/sql.rs:1262:48 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10759) [Rust][DataFusion] Implement support for casting string to date in sql expressions
Yordan Pavlov created ARROW-10759: - Summary: [Rust][DataFusion] Implement support for casting string to date in sql expressions Key: ARROW-10759 URL: https://issues.apache.org/jira/browse/ARROW-10759 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Affects Versions: 2.0.0 Reporter: Yordan Pavlov If DataFusion had support for creating date literals using a cast expression such as CAST('2019-01-01' AS DATE) this would allow direct (and therefore more efficient) comparison of date columns to scalar values (compared to representing dates as strings and then resorting to string comparison). I already have a basic implementation that works, just have to add some more tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10758) Arrow Dataset Loading CSV format file from S3
Lynch created ARROW-10758: - Summary: Arrow Dataset Loading CSV format file from S3 Key: ARROW-10758 URL: https://issues.apache.org/jira/browse/ARROW-10758 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 2.0.0 Reporter: Lynch I am using `S3FileSystem` along with `CsvFileFormat` in Arrow dataset to load all csv files under a S3 bucket. Main test code is as below: {code:java} auto format = std::make_shared(); string output_path; auto s3_file_system = arrow::fs::FileSystemFromUri("s3://test-csv-bucket", &output_path).ValueOrDie(); FileSystemFactoryOptions options; options.partition_base_dir = output_path; arrow::fs::FileSelector _file_selector; ASSERT_OK_AND_ASSIGN(auto factory, FileSystemDatasetFactory::Make(s3_file_system, _file_selector, format, options)); ASSERT_OK_AND_ASSIGN(auto schema, factory->Inspect()); ASSERT_OK_AND_ASSIGN(auto dataset, factory->Finish(schema)); {code} But it seems when calling `ASSERT_OK_AND_ASSIGN(auto schema, factory->Inspect());` it throws exception when reading file from S3 bucket and the exception stack is as follows: {code:java} __pthread_kill 0x7fff70dc033a pthread_kill 0x7fff70e7ce60 abort 0x7fff70d47808 malloc_vreport 0x7fff70e3d50b malloc_report 0x7fff70e4040f Aws::Free(void*) AWSMemory.cpp:97 std::__1::enable_if > >::value, void>::type Aws::Delete > >(std::__1::basic_iostream >*) AWSMemory.h:119 Aws::Utils::Stream::ResponseStream::ReleaseStream() ResponseStream.cpp:62 Aws::Utils::Stream::ResponseStream::~ResponseStream() ResponseStream.cpp:54 Aws::Utils::Stream::ResponseStream::~ResponseStream() ResponseStream.cpp:53 Aws::S3::Model::GetObjectResult::~GetObjectResult() GetObjectResult.h:30 Aws::S3::Model::GetObjectResult::~GetObjectResult() GetObjectResult.h:30 arrow::fs::(anonymous namespace)::ObjectInputFile::ReadAt(long long, long long, void*) s3fs.cc:724 arrow::fs::(anonymous namespace)::ObjectInputFile::ReadAt(long long, long long) s3fs.cc:735 arrow::dataset::OpenReader(arrow::dataset::FileSource const&, arrow::dataset::CsvFileFormat const&, std::__1::shared_ptr const&, arrow::MemoryPool*) file_csv.cc:119 arrow::dataset::CsvFileFormat::Inspect(arrow::dataset::FileSource const&) const file_csv.cc:182 arrow::dataset::FileSystemDatasetFactory::InspectSchemas(arrow::dataset::InspectOptions) discovery.cc:219 arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions) discovery.cc:41 {code} Does Arrow dataset support reading csv/parquest/ipc from S3Filesystem? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10757) [Rust] [CI] Sporadic failures due to disk filling up
Neville Dipale created ARROW-10757: -- Summary: [Rust] [CI] Sporadic failures due to disk filling up Key: ARROW-10757 URL: https://issues.apache.org/jira/browse/ARROW-10757 Project: Apache Arrow Issue Type: Bug Components: CI, Rust Reporter: Neville Dipale Assignee: Neville Dipale CI is failing due to disk size filling up, affecting almost all Rust PRs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10756) Fix reduntant clone
Daniël Heres created ARROW-10756: Summary: Fix reduntant clone Key: ARROW-10756 URL: https://issues.apache.org/jira/browse/ARROW-10756 Project: Apache Arrow Issue Type: Improvement Reporter: Daniël Heres Was introduced in PR that was merged around the same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)