[jira] [Created] (ARROW-9640) [C++][Gandiva] Implement round() for integers and long integers

2020-08-03 Thread Sagnik Chakraborty (Jira)
Sagnik Chakraborty created ARROW-9640:
-

 Summary: [C++][Gandiva] Implement round() for integers and long 
integers
 Key: ARROW-9640
 URL: https://issues.apache.org/jira/browse/ARROW-9640
 Project: Apache Arrow
  Issue Type: Task
Reporter: Sagnik Chakraborty






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9639) [Ruby] Add dependency version check

2020-08-03 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-9639:
---

 Summary: [Ruby] Add dependency version check
 Key: ARROW-9639
 URL: https://issues.apache.org/jira/browse/ARROW-9639
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Ruby
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9638) [C++][Compute] Implement mode(most frequent number) kernel

2020-08-03 Thread Yibo Cai (Jira)
Yibo Cai created ARROW-9638:
---

 Summary: [C++][Compute] Implement mode(most frequent number) kernel
 Key: ARROW-9638
 URL: https://issues.apache.org/jira/browse/ARROW-9638
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Yibo Cai
Assignee: Yibo Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9637) Speed degradation with categoricals

2020-08-03 Thread Larry Parker (Jira)
Larry Parker created ARROW-9637:
---

 Summary: Speed degradation with categoricals
 Key: ARROW-9637
 URL: https://issues.apache.org/jira/browse/ARROW-9637
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Larry Parker


I have noticed some major speed degradation when using categorical data types.  
For example, a Parquet file with 1 million rows that sums 10 float columns and 
groups by two columns (one a date column and one a category column).  The 
cardinality of the category seems to have a major effect.  When grouping on 
category column of cardinality 10, performance is decent (query runs in 150 
ms).  But with cardinality of 100, the query runs in 10 seconds.  If I switch 
over to my Parquet file that does *not* have categorical columns, the same 
query that took 10 seconds with categoricals now runs in 350 ms.

I would be happy to post the Pandas code that I'm using (including how I'm 
creating the Parquet file), but I first wanted to report this and see if it's a 
known issue.

Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9636) Error when using 'LZO' compression in write_table

2020-08-03 Thread Pierre (Jira)
Pierre created ARROW-9636:
-

 Summary: Error when using 'LZO' compression in write_table
 Key: ARROW-9636
 URL: https://issues.apache.org/jira/browse/ARROW-9636
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Pierre






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9635) Can't install red-arrow 0.1.7.1

2020-08-03 Thread Natasha (Jira)
Natasha created ARROW-9635:
--

 Summary: Can't install red-arrow 0.1.7.1 
 Key: ARROW-9635
 URL: https://issues.apache.org/jira/browse/ARROW-9635
 Project: Apache Arrow
  Issue Type: Bug
  Components: Ruby
Affects Versions: 0.17.1
Reporter: Natasha


I need some help with this error, this was working ok 3 days ago, now I'm 
getting this:

Dependencies installed: 
libarrow17 libarrow-glib17 gir1.2-arrow-1.0 \
libparquet17 libparquet-glib17 gir1.2-parquet-1.0 \
libarrow-dev libarrow-glib-dev libparquet-dev libparquet-glib-dev
 
ERROR:  Error installing red-parquet:   ERROR: Failed to build gem native 
extension.current directory: 
/usr/local/bundle/gems/red-arrow-0.17.1/ext/arrow/usr/local/bin/ruby -r 
./siteconf20200803-9-v6ubfi.rb extconf.rbchecking --enable-debug-build 
option... nochecking C++ compiler... g++checking g++ version... 6.3 
(gnu++14)mkmf-gnome2 is deprecated. Use mkmf-gnome instead.checking for 
--enable-debug-build option... nochecking for -Wall option to compiler... 
yeschecking for -Waggregate-return option to compiler... yeschecking for 
-Wcast-align option to compiler... yeschecking for -Wextra option to 
compiler... yeschecking for -Wformat=2 option to compiler... yeschecking for 
-Winit-self option to compiler... yeschecking for -Wlarger-than-65500 option to 
compiler... yeschecking for -Wmissing-declarations option to compiler... 
yeschecking for -Wmissing-format-attribute option to compiler... yeschecking 
for -Wmissing-include-dirs option to compiler... yeschecking for 
-Wmissing-noreturn option to compiler... yeschecking for -Wmissing-prototypes 
option to compiler... yeschecking for -Wnested-externs option to compiler... 
yeschecking for -Wold-style-definition option to compiler... yeschecking for 
-Wpacked option to compiler... yeschecking for -Wp,-D_FORTIFY_SOURCE=2 option 
to compiler... yeschecking for -Wpointer-arith option to compiler... 
yeschecking for -Wundef option to compiler... yeschecking for 
-Wout-of-line-declaration option to compiler... nochecking for 
-Wunsafe-loop-optimizations option to compiler... yeschecking for 
-Wwrite-strings option to compiler... yeschecking for Homebrew... nochecking 
for arrow... yeschecking for arrow-glib... yescreating Makefilecurrent 
directory: /usr/local/bundle/gems/red-arrow-0.17.1/ext/arrowmake "DESTDIR=" 
cleancurrent directory: /usr/local/bundle/gems/red-arrow-0.17.1/ext/arrowmake 
"DESTDIR="compiling arrow.cppcompiling converters.cppIn file included from 
converters.cpp:20:0:converters.hpp:258:19: error: ‘arrow::Status 
red_arrow::ListArrayValueConverter::Visit(const arrow::UnionArray&)’ marked 
‘override’, but does not override arrow::Status Visit(const arrow::TYPE ## 
Array& array) override {   \   ^converters.hpp:288:5: note: in 
expansion of macro ‘VISIT’ VISIT(Union) ^converters.hpp:360:19: 
error: ‘arrow::Status red_arrow::StructArrayValueConverter::Visit(const 
arrow::UnionArray&)’ marked ‘override’, but does not override arrow::Status 
Visit(const arrow::TYPE ## Array& array) override {   \   
^converters.hpp:391:5: note: in expansion of macro ‘VISIT’ VISIT(Union) 
^converters.hpp: In member function ‘VALUE 
red_arrow::StructArrayValueConverter::convert(const arrow::StructArray&, 
int64_t)’:converters.hpp:342:48: warning: ‘int arrow::DataType::num_children() 
const’ is deprecated: Use num_fields() [-Wdeprecated-declarations]   const 
auto n = struct_type->num_children();   
 ^In file included from /usr/include/arrow/array/array_base.h:31:0, 
from /usr/include/arrow/array.h:25, from 
/usr/include/arrow/api.h:22, from red-arrow.hpp:22, 
from converters.hpp:20, from 
converters.cpp:20:/usr/include/arrow/type.h:139:7: note: declared here   int 
num_children() const \{ return num_fields(); }   ^~~~In file 
included from converters.cpp:20:0:converters.hpp:344:53: warning: ‘const 
std::shared_ptr& arrow::DataType::child(int) const’ is 
deprecated: Use field(i) [-Wdeprecated-declarations] const auto 
field_type = struct_type->child(i).get();   
  ^In file included from 
/usr/include/arrow/array/array_base.h:31:0, from 
/usr/include/arrow/array.h:25, from 
/usr/include/arrow/api.h:22, from red-arrow.hpp:22, 
from converters.hpp:20, from 
converters.cpp:20:/usr/include/arrow/type.h:127:33: note: declared here   const 
std::shared_ptr& child(int i) const \{ return field(i); }
 ^In file included from converters.cpp:20:0:converters.hpp: 
At global scope:converters.hpp:451:19: error: ‘arrow::Status 

[jira] [Created] (ARROW-9634) [C++][Python] Restore non-UTC time zones when reading Parquet file that was previously Arrow

2020-08-03 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9634:
---

 Summary: [C++][Python] Restore non-UTC time zones when reading 
Parquet file that was previously Arrow
 Key: ARROW-9634
 URL: https://issues.apache.org/jira/browse/ARROW-9634
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Wes McKinney
 Fix For: 2.0.0


This was reported on the mailing list

{code}
In [20]: df = pd.DataFrame({'a': pd.Series(np.arange(0, 1, 
1000)).astype(pd.DatetimeTZDtype('ns', 'America/Los_Angeles'
...: ))})   
   

In [21]: t = pa.table(df)   
   

In [22]: t  
   
Out[22]: 
pyarrow.Table
a: timestamp[ns, tz=America/Los_Angeles]

In [23]: pq.write_table(t, 'test.parquet')  
   

In [24]: pq.read_table('test.parquet')  
   
Out[24]: 
pyarrow.Table
a: timestamp[us, tz=UTC]

In [25]: pq.read_table('test.parquet')[0]   
   
Out[25]: 

[
  [
1970-01-01 00:00:00.00,
1970-01-01 00:00:00.01,
1970-01-01 00:00:00.02,
1970-01-01 00:00:00.03,
1970-01-01 00:00:00.04,
1970-01-01 00:00:00.05,
1970-01-01 00:00:00.06,
1970-01-01 00:00:00.07,
1970-01-01 00:00:00.08,
1970-01-01 00:00:00.09
  ]
]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9633) [C++] Do not toggle memory mapping globally in LocalFileSystem

2020-08-03 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9633:
---

 Summary: [C++] Do not toggle memory mapping globally in 
LocalFileSystem
 Key: ARROW-9633
 URL: https://issues.apache.org/jira/browse/ARROW-9633
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 2.0.0


In the context of the Datasets API, some file formats benefit greatly from 
memory mapping (like Arrow IPC files) while other less so. Additionally, in 
some scenarios, memory mapping could fail when used on network-attached storage 
devices. Since a filesystem may be used to read different kinds of files and 
use both memory mapping and non-memory mapping, and additionally the Datasets 
API should be able to fall back on non-memory mapping if the attempt to memory 
map fails, it would make sense to have a non-global option for this:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/localfs.h

I would suggest adding a new filesystem API with something like 
{{OpenMappedInputFile}} with some options to control the behavior when memory 
mapping is not possible. These options may be among:

* Falling back on a normal RandomAccessFile
* Reading the entire file into memory (or even tmpfs?) and then wrapping it in 
a BufferReader
* Failing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9632) add a func "new" for ExecutionContextSchemaProvider

2020-08-03 Thread qingcheng wu (Jira)
qingcheng wu created ARROW-9632:
---

 Summary: add a func "new" for ExecutionContextSchemaProvider
 Key: ARROW-9632
 URL: https://issues.apache.org/jira/browse/ARROW-9632
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 2.0.0
Reporter: qingcheng wu


I use ExecutionContextSchemaProvider in outside app, so i add keyword "pub" for 
ExecutionContextSchemaProvider, and add a new func "new" for 
ExecutionContextSchemaProvider.

I add keyword "pub" for build_schema also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9631) [Rust] Arrow crate should not depend on flight

2020-08-03 Thread Andy Grove (Jira)
Andy Grove created ARROW-9631:
-

 Summary: [Rust] Arrow crate should not depend on flight
 Key: ARROW-9631
 URL: https://issues.apache.org/jira/browse/ARROW-9631
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 2.0.0


It seems that the dependencies are inverted. The core arrow crate should 
contain the array data structures and compute kernels and should not depend on 
the flight crate, which contains protocols and brings in many dependencies.

If we have code for converting between arrow types and flight types then that 
code should live in the flight crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9630) [Go] Support JSON reader/writer

2020-08-03 Thread Ryo Okubo (Jira)
Ryo Okubo created ARROW-9630:


 Summary: [Go] Support JSON reader/writer
 Key: ARROW-9630
 URL: https://issues.apache.org/jira/browse/ARROW-9630
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Ryo Okubo


Any plan to support JSON reader and/or writer in Go implementation? I would 
like that like [CSV 
R/W|[https://github.com/apache/arrow/blob/master/docs/source/status.rst#third-party-data-formats]]

 

[arrjson 
package|[https://github.com/apache/arrow/tree/master/go/arrow/internal/arrjson]]
 seems to support it but it's an internal package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9629) [Python] Kartothek integration tests failing due to missing freezegun module

2020-08-03 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-9629:


 Summary: [Python] Kartothek integration tests failing due to 
missing freezegun module
 Key: ARROW-9629
 URL: https://issues.apache.org/jira/browse/ARROW-9629
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Joris Van den Bossche


See eg https://github.com/ursa-labs/crossbow/runs/939266052

{code}
 ERRORS 
 ERROR collecting test session _
/opt/conda/envs/arrow/lib/python3.7/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
:1006: in _gcd_import
???
:983: in _find_and_load
???
:967: in _find_and_load_unlocked
???
:677: in _load_unlocked
???
/opt/conda/envs/arrow/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:170:
 in exec_module
exec(co, module.__dict__)
tests/cli/conftest.py:11: in 
from freezegun import freeze_time
E   ModuleNotFoundError: No module named 'freezegun'
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9628) [Rust][DataFusion] Clippy PR test failing intermittently on Rust / AMD64 MacOS

2020-08-03 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-9628:
--

 Summary: [Rust][DataFusion] Clippy PR test failing intermittently 
on Rust / AMD64 MacOS 
 Key: ARROW-9628
 URL: https://issues.apache.org/jira/browse/ARROW-9628
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb


As reported by Jorge, on 

https://github.com/apache/arrow/commit/aa6889a74c57d6faea0d27ea8013d9b0c7ef809a#commitcomment-41124305

" I believe that this is somehow interacting with the caching system and 
sometimes failing the build of clippy. E.g. this build is failing for Mac OS, 
and it hits the cache: https://github.com/apache/arrow/runs/937976656

{code}
  Downloaded heck v0.3.1
  Downloaded aho-corasick v0.7.13
  Downloaded fnv v1.0.7
  Downloaded futures-io v0.3.5
  Downloaded base64 v0.11.0
  Downloaded dirs v1.0.5
  Downloaded async-stream-impl v0.2.1
  Downloaded async-stream v0.2.1
  Downloaded anyhow v1.0.32
  Downloaded atty v0.2.14
  Downloaded num-integer v0.1.43
   Compiling arrow-flight v2.0.0-SNAPSHOT 
(/Users/runner/work/arrow/arrow/rust/arrow-flight)
error[E0463]: can't find crate for `prost_derive` which `tonic_build` depends on
  --> arrow-flight/build.rs:36:9
   |
36 | tonic_build::compile_protos("../../format/Flight.proto")?;
   | ^^^ can't find crate

error: aborting due to previous error
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9627) JVM failed when use gandiva udf with dynamic libraries

2020-08-03 Thread Leo89 (Jira)
Leo89 created ARROW-9627:


 Summary: JVM failed when use gandiva udf with dynamic libraries
 Key: ARROW-9627
 URL: https://issues.apache.org/jira/browse/ARROW-9627
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva, Java
 Environment: OS:Centos7.4
llvm:7.0.1
jdk:1.8.0_162
arrow:1.0.0
Reporter: Leo89


Hi there,

Recently I'm trying to add some UDF with dynamic link libaries. It is fine 
compiling and running test in cpp, but when I call the udf from java, JVM 
failed with errors.

Steps to reproduce the issue
1 Prepare dynamic library 'libmytest.so'
{code:java}
// code placeholder
#ifndef MYTEST_H
#define MYTEST_H
#ifdef __cplusplus
extern "C"{
#endif
 float testSim();
#ifdef __cplusplus
}
#endif
#endif
{code}
 

2 Add simple code for the udf in file 'string_ops.cc'
{code:java}
// code placeholder
FORCE_INLINEFORCE_INLINE
gdv_float32 test_sim_binary_binary(gdv_int64 context, const char* left, 
gdv_int32 left_len, const char* right,  gdv_int32 right_len) {
 float sim = testSim();
 return sim;
}
{code}

3 Add function details in the function registry file 
'function_registry_string.cc'
{code:java}
// code placeholder
NativeFunction("test_sim", {}, DataTypeVector{binary(),binary()},float32(),
 kResultNullIfNull, "sim_binary_binary", NativeFunction::kNeedsContext | 
NativeFunction::kCanReturnErrors),
{code}
4 Create test functions

5 Add link to the CMakeLists.txt

5 compile and test

6 write a java demo to call the udf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9626) JVM failed when use gandiva udf with dynamic libraries

2020-08-03 Thread Leo89 (Jira)
Leo89 created ARROW-9626:


 Summary: JVM failed when use gandiva udf with dynamic libraries
 Key: ARROW-9626
 URL: https://issues.apache.org/jira/browse/ARROW-9626
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva, Java
 Environment: OS:Centos7.4
llvm:7.0.1
jdk:1.8.0_162
arrow:1.0.0
Reporter: Leo89
 Attachments: hs_err_pid28288.log

Hi there,

Recently I'm trying to add some UDF with dynamic link libaries. It is fine 
compiling and running test in cpp, but when I call the udf from java, JVM 
failed with errors.

Steps to reproduce the issue
1 Prepare dynamic library 'libmytest.so'
{code:java}
// code placeholder
#ifndef MYTEST_H
#define MYTEST_H
#ifdef __cplusplus
extern "C"{
#endif
 float testSim();
#ifdef __cplusplus
}
#endif
#endif
{code}
 

2 Add simple code for the udf in file 'string_ops.cc'
{code:java}
// code placeholder
FORCE_INLINEFORCE_INLINE
gdv_float32 test_sim_binary_binary(gdv_int64 context, const char* left, 
gdv_int32 left_len, const char* right,  gdv_int32 right_len) {
 float sim = testSim();
 return sim;
}
{code}

3 Add function details in the function registry file 
'function_registry_string.cc'
{code:java}
// code placeholder
NativeFunction("test_sim", {}, DataTypeVector{binary(),binary()},float32(),
 kResultNullIfNull, "sim_binary_binary", NativeFunction::kNeedsContext | 
NativeFunction::kCanReturnErrors),
{code}
4 Create test functions

5 Add link to the CMakeLists.txt

5 compile and test

6 write a java demo to call the udf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9625) JVM failed when use gandiva udf with dynamic libraries

2020-08-03 Thread Leo89 (Jira)
Leo89 created ARROW-9625:


 Summary: JVM failed when use gandiva udf with dynamic libraries
 Key: ARROW-9625
 URL: https://issues.apache.org/jira/browse/ARROW-9625
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva, Java
 Environment: OS:Centos7.4
llvm:7.0.1
jdk:1.8.0_162
arrow:1.0.0
Reporter: Leo89
 Attachments: hs_err_pid28288.log

Hi there,

Recently I'm trying to add some UDF with dynamic link libaries. It is fine 
compiling and running test in cpp, but when I call the udf from java, JVM 
failed with errors.

Steps to reproduce the issue
1 Prepare dynamic library 'libmytest.so'
{code:java}
// code placeholder
#ifndef MYTEST_H
#define MYTEST_H
#ifdef __cplusplus
extern "C"{
#endif
  float testSim();
#ifdef __cplusplus
}
#endif
#endif{code}
2 Add simple code for the udf in file 'string_ops.cc'

 
{code:java}
// code placeholder
FORCE_INLINEFORCE_INLINE
gdv_float32 test_sim_binary_binary(gdv_int64 context, const char* left, 
gdv_int32 left_len, const char* right,                 gdv_int32 right_len) {
   float sim = testSim();
   return sim;
}{code}
 

3 Add function details in the function registry file 
'function_registry_string.cc'
{code:java}
// code placeholder
  NativeFunction("test_sim", {}, 
DataTypeVector{binary(),binary()},float32(),
 kResultNullIfNull, "sim_binary_binary", 
NativeFunction::kNeedsContext | NativeFunction::kCanReturnErrors){code}
4 Create test functions

5 Add link to the CMakeLists.txt

5 compile and test 

6 write a java demo to call the udf

[^hs_err_pid28288.log]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9624) JVM failed when use gandiva udf with dynamic libraries

2020-08-03 Thread Leo89 (Jira)
Leo89 created ARROW-9624:


 Summary: JVM failed when use gandiva udf with dynamic libraries
 Key: ARROW-9624
 URL: https://issues.apache.org/jira/browse/ARROW-9624
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva, Java
 Environment: OS:Centos7.4
llvm:7.0.1
jdk:1.8.0_162
arrow:1.0.0
Reporter: Leo89
 Attachments: hs_err_pid28288.log

Hi there,

Recently I'm trying to add some UDF with dynamic link libaries. It is fine 
compiling and running test in cpp, but when I call the udf from java, JVM 
failed with errors.

Steps to reproduce the issue
1 Prepare dynamic library 'libmytest.so'
{code:java}
// code placeholder
#ifndef MYTEST_H
#define MYTEST_H
#ifdef __cplusplus
extern "C"{
#endif
  float testSim();
#ifdef __cplusplus
}
#endif
#endif{code}
2 Add simple code for the udf in file 'string_ops.cc'

 
{code:java}
// code placeholder
FORCE_INLINEFORCE_INLINE
gdv_float32 test_sim_binary_binary(gdv_int64 context, const char* left, 
gdv_int32 left_len, const char* right,                 gdv_int32 right_len) {
   float sim = testSim();
   return sim;
}{code}
 

3 Add function details in the function registry file 
'function_registry_string.cc'
{code:java}
// code placeholder
  NativeFunction("test_sim", {}, 
DataTypeVector{binary(),binary()},float32(),
 kResultNullIfNull, "sim_binary_binary", 
NativeFunction::kNeedsContext | NativeFunction::kCanReturnErrors){code}
4 Create test functions

5 Add link to the CMakeLists.txt

5 compile and test 

6 write a java demo to call the udf

[^hs_err_pid28288.log]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9623) Performance difference between pc.multiply vs pd.multiply

2020-08-03 Thread H G (Jira)
H G created ARROW-9623:
--

 Summary: Performance difference between pc.multiply vs pd.multiply
 Key: ARROW-9623
 URL: https://issues.apache.org/jira/browse/ARROW-9623
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 1.0.0
 Environment: Windows
Pyarrow 1.0.0
Reporter: H G


Wanted to report the performance difference observed between Pandas and Pyarrow.

```
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.compute as pc

df = pd.DataFrame(np.random.randn(1))
%timeit -n 5 -r 5 df.multiply(df)

table = pa.Table.from_pandas(df)
%timeit -n 5 -r 5 pc.multiply(table[0],table[0])
```

Results:
```
%timeit -n 5 -r 5 df.multiply(df)
374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
```

```
%timeit -n 5 -r 5 pc.multiply(table[0],table[0])
698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)