Re: 回复: [DISCUSS][C++][Gandiva]About Secondary Cache

2024-05-26 Thread Sutou Kouhei
Hi,

> Would you kindly give me more info about its application
> scenario to make sure we got the point.

Could you ask the author it on the issue?
https://github.com/apache/arrow/issues/35201

> By the way, if there's any plan to merge this code to
> official release?

The author closed the PR. So there is no plan.


Thanks,
-- 
kou

In 
 

  "回复: [DISCUSS][C++][Gandiva]About Secondary Cache" on Sun, 26 May 2024 
22:23:39 +,
  即 云  wrote:

>   Thank you, Kou. Everytime we need help, you are always there.
>   We actually are trying integrate Arrow into our DB engine (java 
> based) and encountered some challenges. One of them is about cache. From the 
> introduction about Secondary cache 
> (https://github.com/apache/arrow/pull/35447/files), it served for persisting 
> data in cache when restarting. Would you kindly give me more info about its 
> application scenario to make sure we got the point.
>   By the way, if there's any plan to merge this code to official 
> release?
> 
>  Thanks again.
> 
>  NaiYan.
> 
> 发件人: Sutou Kouhei 
> 发送时间: 2024年5月26日 20:30
> 收件人: user@arrow.apache.org 
> 主题: Re: [DISCUSS][C++][Gandiva]About Secondary Cache
> 
> Hi,
> 
> It's not merged but there is another cache related
> improvement that is included in 16.0.0:
> https://github.com/apache/arrow/pull/40041
> 
> 
> 
> Thanks,
> --
> kou
> 
> In
>  
> 
>   "[DISCUSS][C++][Gandiva]About Secondary Cache" on Sun, 26 May 2024 14:00:54 
> +,
>   即 云  wrote:
> 
>> Greetings,
>> I noted "secondary cache" has been added to Arrow 
>> (https://github.com/apache/arrow/pull/35447/files). I was wondering if this 
>> has been released officially? And if so, which version of Arrow is available?
>>   Thanks in advance.
>>
>> NaiYan
>> [https://avatars.githubusercontent.com/u/77307286?s=400=4]<https://github.com/apache/arrow/pull/35447/files>
>> GH-35201: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code 
>> by schavan6 ・ Pull Request #35447 ・ 
>> apache/arrow<https://github.com/apache/arrow/pull/35447/files>
>> Rationale for this change Arrow gandiva has a primary cache but this cache 
>> doesnt persist across restarts. What changes are included in this PR? 
>> Integrate a new API in project and filter and ...
>> github.com
>>


Re: [DISCUSS][C++][Gandiva]About Secondary Cache

2024-05-26 Thread Sutou Kouhei
Hi,

It's not merged but there is another cache related
improvement that is included in 16.0.0:
https://github.com/apache/arrow/pull/40041



Thanks,
-- 
kou

In 
 

  "[DISCUSS][C++][Gandiva]About Secondary Cache" on Sun, 26 May 2024 14:00:54 
+,
  即 云  wrote:

> Greetings,
> I noted "secondary cache" has been added to Arrow 
> (https://github.com/apache/arrow/pull/35447/files). I was wondering if this 
> has been released officially? And if so, which version of Arrow is available?
>   Thanks in advance.
> 
> NaiYan
> [https://avatars.githubusercontent.com/u/77307286?s=400=4]
> GH-35201: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code 
> by schavan6 ・ Pull Request #35447 ・ 
> apache/arrow
> Rationale for this change Arrow gandiva has a primary cache but this cache 
> doesnt persist across restarts. What changes are included in this PR? 
> Integrate a new API in project and filter and ...
> github.com
> 


Re: 回复: 回复: [DISCUSS][C++][JNI] libgandiva_jni.so fails to run on ARM platform (but compilation successful)

2024-05-25 Thread Sutou Kouhei
001 in ?? ()
> #15 0x in ?? ()
> 
> ## IDEA call stack
> make:238, Projector (org.apache.arrow.gandiva.evaluator)
> make:190, Projector (org.apache.arrow.gandiva.evaluator)
> make:95, Projector (org.apache.arrow.gandiva.evaluator)
> build:87, NativeProjector (com.bigknow.sabot.op.llvm)
> build:84, NativeProjectorBuilder (com.bigknow.sabot.op.llvm)
> setupFinish:185, SplitStageExecutor (com.bigknow.exec.expr)
> setupProjector:221, SplitStageExecutor (com.bigknow.exec.expr)
> projectorSetup:351, ExpressionSplitter (com.bigknow.exec.expr)
> setupProjector:562, ExpressionSplitter (com.bigknow.exec.expr)
> setupProjector:555, ExpressionSplitter (com.bigknow.exec.expr)
> setupProjector:159, CoercionReader (com.bigknow.exec.store)
> newSchema:136, CoercionReader (com.bigknow.exec.store)
> setup:119, CoercionReader (com.bigknow.exec.store)
> setupReaderAsCorrectUser:349, ScanOperator (com.bigknow.sabot.op.scan)
> setupReader:340, ScanOperator (com.bigknow.sabot.op.scan)
> setup:304, ScanOperator (com.bigknow.sabot.op.scan)
> setup:595, SmartOp$SmartProducer (com.bigknow.sabot.driver)
> visitProducer:80, Pipe$SetupVisitor (com.bigknow.sabot.driver)
> visitProducer:64, Pipe$SetupVisitor (com.bigknow.sabot.driver)
> accept:565, SmartOp$SmartProducer (com.bigknow.sabot.driver)
> setup:102, StraightPipe (com.bigknow.sabot.driver)
> setup:102, StraightPipe (com.bigknow.sabot.driver)
> setup:102, StraightPipe (com.bigknow.sabot.driver)
> setup:102, StraightPipe (com.bigknow.sabot.driver)
> setup:102, StraightPipe (com.bigknow.sabot.driver)
> setup:102, StraightPipe (com.bigknow.sabot.driver)
> setup:102, StraightPipe (com.bigknow.sabot.driver)
> setup:71, Pipeline (com.bigknow.sabot.driver)
> setupExecution:619, FragmentExecutor (com.bigknow.sabot.exec.fragment)
> run:441, FragmentExecutor (com.bigknow.sabot.exec.fragment)
> access$1700:107, FragmentExecutor (com.bigknow.sabot.exec.fragment)
> run:999, FragmentExecutor$AsyncTaskImpl (com.bigknow.sabot.exec.fragment)
> run:122, AsyncTaskWrapper (com.bigknow.sabot.task)
> mainExecutionLoop:249, SlicingThread (com.bigknow.sabot.task.slicing)
> run:171, SlicingThread (com.bigknow.sabot.task.slicing)
> 
> ## Values →  Projector.make
> schema: Schema
> exprs:  size = 1
> 
> ## Projector.make → JniWrapper.buildProjector
> schemaBuf.toByteArray():
> [10, 18, 10, 6, 101, 120, 112, 114, 36, 48, 18, 6, 8, 22, 24, 38, 32, 6, 24, 
> 1]
> builder.build().toByteArray():
> [18, 77, 10, 59, 18, 57, 10, 7, 99, 97, 115, 116, 73, 78, 84, 18, 42, 18, 40, 
> 10, 10, 99, 97, 115, 116, 70, 76, 79, 65, 84, 56, 18, 22, 10, 20, 10, 18, 10, 
> 6, 101, 120, 112, 114, 36, 48, 18, 6, 8, 22, 24, 38, 32, 6, 24, 1, 26, 2, 8, 
> 12, 26, 2, 8, 7, 18, 14, 10, 6, 69, 88, 80, 82, 36, 48, 18, 2, 8, 7, 24, 1]
> 
> ## Errors we found in JniWrapper.buildProjector calling libgandiva_jni.so at 
> Java_org_apache_arrow_gandiva_evaluator_JniWrapper_buildProjector
> schema_arr in Java layer schemaBuf.toByteArray() in NOT Null,but here is 0x0
> 
> Java_org_apache_arrow_gandiva_evaluator_JniWrapper_buildProjector 
> (env=0x7f25737a7b48, obj=0x7f25280899a8, schema_arr=0x0, 
> exprs_arr=0x7f2528089998, selection_vector_type=671652240, configuration_id=0)
> 
> ## Error location in Java source
> at 
> arrow-gandiva-9.0.0-20221123064031-c39b8a6253-bigknow.jar!/org/apache/arrow/gandiva/evaluator/Projector.class
>Projector.make method
>wrapper.buildProjector(secondaryCache, schemaBuf.toByteArray(), 
> builder.build().toByteArray(), selectionVectorType.getNumber(), 
> configurationId);
> ```
> public static Projector make(Schema schema, List exprs, 
> GandivaTypes.SelectionVectorType selectionVectorType, long configurationId, 
> JavaSecondaryCacheInterface secondaryCache) throws GandivaException {
> GandivaTypes.ExpressionList.Builder builder = 
> ExpressionList.newBuilder();
> Iterator var7 = exprs.iterator();
> 
> while(var7.hasNext()) {
> ExpressionTree expr = (ExpressionTree)var7.next();
> builder.addExprs(expr.toProtobuf());
> }
> 
> GandivaTypes.Schema schemaBuf = 
> ArrowTypeHelper.arrowSchemaToProtobuf(schema);
> JniWrapper wrapper = JniLoader.getInstance().getWrapper();
> long moduleId = wrapper.buildProjector(secondaryCache, 
> schemaBuf.toByteArray(), builder.build().toByteArray(), 
> selectionVectorType.getNumber(), configurationId);
> logger.debug("Created module for the projector with id {}", moduleId);
> return new Projector(wrapper, moduleId, schema, exprs.size());
> }
> 
> ```
> 
> 发件人: Sutou Kouhei 
> 发送时间: 2024年5月23日 8:46
> 收件人: user@arrow.apache.org 
> 主题: Re: 

Re: 回复: [DISCUSS][C++][JNI] libgandiva_jni.so fails to run on ARM platform (but compilation successful)

2024-05-23 Thread Sutou Kouhei
Hi,

It seems that crashed at

https://github.com/apache/arrow/blob/apache-arrow-16.1.0/java/gandiva/src/main/cpp/jni_common.cc#L612

or

https://github.com/apache/arrow/blob/apache-arrow-16.1.0/java/gandiva/src/main/cpp/jni_common.cc#L616

.

Could you check your schema and expressions that are passed
to Projector::make()?


Thanks,
-- 
kou


In 
 

  "回复: [DISCUSS][C++][JNI] libgandiva_jni.so fails to run on ARM platform (but 
compilation successful)" on Thu, 23 May 2024 02:04:14 +,
  即 云  wrote:

> Hello Kou,
>  Thanks a lot for your reply.
>  We have followed your advice to adopt a copy of arrow package 16.x 
> including libgandiva_jni.so from 
> https://repository.apache.org/#nexus-search;quick~arrow-gandiva.
>  It seems worked w/o previous issues, however, when we tried a JAVA 
> program (worked w/ Arrow 9.x on x86 platform)  it got failed with errors 
> thrown out below.  I was wondering if there's any difference in JNI APIs 
> between arrow 9..x and 16.x, or where I can find out a workable compiled 
> version of Arrow which includes libgandiva_jni.so.
> 
>  Thanks again in advance!
> 
> 1.  Exception thrown from Java program:
> Stack slot to memory mapping:
> stack at sp + 0 slots: 0x7f5a347ebb69:  in 
> /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so at 0x7f5a33f0c000
> stack at sp + 1 slots: 0x0 is NULL
> stack at sp + 2 slots: 0x7f59ed2698c0 is pointing into the stack for 
> thread: 0x7f59e875a000
> stack at sp + 3 slots: 0x7f59e875a348 points into unknown readable 
> memory: 0x7f5a35147b40 | 40 7b 14 35 5a 7f 00 00
> stack at sp + 4 slots: 0x7f59e2f8c3f7: 
> Java_org_apache_arrow_gandiva_evaluator_JniWrapper_buildProjector+0x0057
>  in /tmp/libgandiva_jni.soa81db5ba-fcd3-4494-b24b-3da3a3a99162 at 
> 0x7f59e2b1b000
> stack at sp + 5 slots: 0x7f59ed269500 is pointing into the stack for 
> thread: 0x7f59e875a000
> stack at sp + 6 slots: 0x7f5a344ec003:  in 
> /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so at 0x7f5a33f0c000
> stack at sp + 7 slots: 0x7f59ed2697b0 is pointing into the stack for 
> thread: 0x7f59e875a000
> 
> 2.  GDB debug info:
> 
> gdb /usr/bin/java core
> 
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x7f5a351f7859 in __GI_abort () at abort.c:79
> #2  0x7f5a3418f435 in os::abort(bool, void*, void const*) [clone .cold] 
> () from /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
> #3  0x7f5a34da7e5d in VMError::report_and_die(int, char const*, char 
> const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, 
> int, unsigned long) () from 
> /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
> #4  0x7f5a34da89cf in VMError::report_and_die(Thread*, unsigned int, 
> unsigned char*, void*, void*, char const*, ...) () from 
> /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
> #5  0x7f5a34da8a02 in VMError::report_and_die(Thread*, unsigned int, 
> unsigned char*, void*, void*) () from 
> /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
> #6  0x7f5a34afd166 in JVM_handle_linux_signal () from 
> /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
> #7  0x7f5a34aefc8c in signalHandler(int, siginfo_t*, void*) () from 
> /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
> #8  
> #9  0x7f5a34484764 in 
> AccessInternal::PostRuntimeDispatch G1BarrierSet>, (AccessInternal::BarrierType)2, 
> 1097844ul>::oop_access_barrier(void*) () from 
> /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
> #10 0x7f5a347ebb69 in jni_GetArrayLength () from 
> /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
> #11 0x7f59e2f8c3f7 in 
> Java_org_apache_arrow_gandiva_evaluator_JniWrapper_buildProjector () from 
> /tmp/libgandiva_jni.soa81db5ba-fcd3-4494-b24b-3da3a3a99162
> #12 0x7f5a147eaa30 in ?? ()
> #13 0x0001 in ?? ()
> #14 0x in ?? ()
> 
> 发件人: Sutou Kouhei 
> 发送时间: 2024年5月20日 13:21
> 收件人: user@arrow.apache.org 
> 主题: Re: [DISCUSS][C++][JNI] libgandiva_jni.so fails to run on ARM platform 
> (but compilation successful)
> 
> Hi,
> 
> https://github.com/apache/arrow/issues/30701 may be related.
> 
> BTW, recent our Gandiva packages include libgandiva_jni.so
> for ARM. You may able to use it instead of building it
> manually.
> 
> 
> Thanks,
> --
> kou
> 
> In
>  
> 
>   "[DISCUSS][C++][JNI] libgandiva_jni.so fails to run on ARM platform (but 
> compilation successful)" on Mon, 20 May 2024 10:10:28 +,
>   即 云  wrote:
> 
>>

Re: [DISCUSS][C++][JNI] libgandiva_jni.so fails to run on ARM platform (but compilation successful)

2024-05-20 Thread Sutou Kouhei
Hi,

https://github.com/apache/arrow/issues/30701 may be related.

BTW, recent our Gandiva packages include libgandiva_jni.so
for ARM. You may able to use it instead of building it
manually.


Thanks,
-- 
kou

In 
 

  "[DISCUSS][C++][JNI] libgandiva_jni.so fails to run on ARM platform (but 
compilation successful)" on Mon, 20 May 2024 10:10:28 +,
  即 云  wrote:

> Greetings,
>  I have encountered an issue about Arrow on ARM platform.
>  We compiled successfully“libgandiva_jni.so”on ARM v8, but when we call 
> this lib from JAVA environment, it throws out an error "Exception 
> java.lang.UnsatisfiedLinkError:/tmp/libgandiva_jni.so9f8bef08-ab7f-425d-8b42-11f522026a10;undefined
>  symbol: _ZTIN4llvm11ObjectCacheE"
>  Would you anybody give me some clues to handle this issue?  Thanks in 
> advance!
> 
> Env. details:
> Arrow version:maint-9.0.0 (https://github.com/apache/arrow/tree/maint-9.0.0)
> OS:CentOS 7.6 with ARM
> Gcc/llvm:gcc 8.3.1 ,llvm 14.0.0
> Compilation flags:
> mkdir cpp/build
> cd  cpp/build
> cmake .. -DARROW_GANDIVA_JAVA=ON -DARROW_GANDIVA=ON  -DARROW_WITH_RE2=ON 
> -DARROW_WITH_UTF8PROC=ON
> make
> 
> CPU info:
> [root@ecs-5f21 arrow]# lscpu
> Architecture:  aarch64
> Byte Order:Little Endian
> CPU(s):2
> On-line CPU(s) list:   0,1
> Thread(s) per core:1
> Core(s) per socket:2
> Socket(s): 1
> NUMA node(s):  1
> Model: 0
> CPU max MHz:   2400.
> CPU min MHz:   2400.
> BogoMIPS:  200.00
> L1d cache: 64K
> L1i cache: 64K
> L2 cache:  512K
> L3 cache:  32768K
> NUMA node0 CPU(s): 0,1
> Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics 
> fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm


Re: [Java][Flight RPC] How to handle a client disconnection on Flight SQL server side?

2024-02-08 Thread Sutou Kouhei
Hi,

We're developing session support:
https://github.com/apache/arrow/pull/34817

If you're interested in this, please review the pull request
to check whether your use case is covered or not.


Thanks,
-- 
kou

In 
  "[Java][Flight RPC] How to handle a client disconnection on Flight SQL server 
side?" on Thu, 8 Feb 2024 12:08:33 +0300,
  Aleksandr Blazhkov  wrote:

> Hi, community!
> 
> I am developing a stateful Arrow Flight SQL server that manages user
> sessions. How can I figure out that the client has closed the connection so
> the session can be removed? What is the supposed implementation for this
> case?
> 
> Sincerely, Alexander


[ANNOUNCE] Apache Arrow Flight SQL adapter for PostgreSQL 0.1.0 released

2023-09-13 Thread Sutou Kouhei
The Apache Arrow team is pleased to announce the 0.1.0 release of
the Apache Arrow Flight SQL adapter for PostgreSQL.

The release is available now from our website:
  https://arrow.apache.org/flight-sql-postgresql/0.1.0/install.html

Read about what's new in the release:
  https://arrow.apache.org/blog/2023/09/13/flight-sql-postgresql-0.1.0-release/

Release note:
  
https://arrow.apache.org/flight-sql-postgresql/0.1.0/release-notes.html#version-0-1-0


What is Apache Arrow Flight SQL adapter for PostgreSQL?

Apache Arrow Flight SQL adapter for PostgreSQL is a
PostgreSQL extension that adds an Apache Arrow Flight SQL
endpoint to PostgreSQL.

Apache Arrow Flight SQL is a protocol to use Apache Arrow
format to interact with SQL databases. You can use Apache
Arrow Flight SQL instead of the PostgreSQL wire protocol to
interact with PostgreSQL by Apache Arrow Flight SQL adapter
for PostgreSQL.

Apache Arrow format is designed for fast typed table data
exchange. If you want to get large data by SELECT or
INSERT/UPDATE large data, Apache Arrow Flight SQL will be
faster than the PostgreSQL wire protocol.


Please report any feedback to the GitHub issues or mailing lists:
  * GitHub: https://github.com/apache/arrow-flight-sql-postgresql/issues
  * ML: https://arrow.apache.org/community/


Thanks,
-- 
The Apache Arrow community


Re: Is there a way to specify a particular Arrow version when using the Debian apt install?

2023-05-11 Thread Sutou Kouhei
Hi,

Could you try "sudo apt install -y -V libarrow-dev=11.0.0-1"?

See also apt-get(8):

> A specific version of a package can be selected for installation by
> following the package name with an equals and the version of the
> package to select. This will cause that version to be located and
> selected for install. Alternatively a specific distribution can be
> selected by following the package name with a slash and the version
> of the distribution or the Archive name (stable, testing,
> unstable).


Thanks,
-- 
kou

In 


  "Is there a way to specify a particular Arrow version when using the Debian 
apt install?" on Thu, 11 May 2023 16:51:07 +,
  "Philip Moore via user"  wrote:

> Hello,
> 
> This is likely a very newb question – but:
> 
> Is there a way to specify an Arrow version when using the 
> “apt” install instructions for Debian – as specified here: 
> https://arrow.apache.org/install/ ?
> 
> Thanks!
> 
> Phil
> 


Re: [C++] Undefined reference during static link Arrow (10.0.1) in Manylinux2014 container

2022-12-13 Thread Sutou Kouhei
Hi,

Our CentOS 7 packages use devtoolset-11 not devtoolset-10:
https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/centos-7/Dockerfile#L22

Could you try devtoolset-11 not devtoolset-10 or use
binaries in pyarrow's manylinux wheel instead?

Thanks,
-- 
kou

In 
  "[C++] Undefined reference during static link Arrow (10.0.1) in Manylinux2014 
container" on Tue, 13 Dec 2022 13:19:41 -0800,
  Lei Xu  wrote:

> Hi, there,
> 
> I was trying to statically link Arrow / ArrowDataset into one of my
> projects in Manylinux,
> 
> it reports missing the reference to standard C++ functions.
> 
> ```
> /opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld:
> /usr/lib64/libarrow_dataset.a(dataset.cc.o): in function
> `arrow::dataset::(anonymous
> namespace)::BasicFragmentEvolution::EvolveBatch(std::shared_ptr
> const&, std::vector >
> const&, std::vector std::allocator > const&) const':
> (.text+0xe9b): undefined reference to `std::__throw_bad_array_new_length()'
> /opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld:
> /usr/lib64/libarrow_dataset.a(dataset.cc.o): in function
> `arrow::dataset::UnionDataset::ReplaceSchema(std::shared_ptr)
> const [clone .localalias]':
> (.text+0x1cac): undefined reference to `std::__throw_bad_array_new_length()'
> /opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld:
> /usr/lib64/libarrow_dataset.a(dataset.cc.o): in function
> `arrow::Result
> arrow::compute::ModifyExpression namespace)::BasicFragmentEvolution::DevolveFilter(arrow::compute::Expression
> const&) const::{lambda(arrow::compute::Expression)#1},
> arrow::dataset::(anonymous
> namespace)::BasicFragmentEvolution::DevolveFilter(arrow::compute::Expression
> const&) const::{lambda(arrow::compute::Expression,
> arrow::compute::Expression*)#2}>(arrow::compute::Expression,
> arrow::dataset::(anonymous
> namespace)::BasicFragmentEvolution::DevolveFilter(arrow::compute::Expression
> const&) const::{lambda(arrow::compute::Expression)#1} const&,
> arrow::dataset::(anonymous
> namespace)::BasicFragmentEvolution::DevolveFilter(arrow::compute::Expression
> const&) const::{lambda(arrow::compute::Expression,
> arrow::compute::Expression*)#2} const&)':
> (.text+0x384f): undefined reference to `std::__throw_bad_array_new_length()'
> /opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld:
> (.text+0x3867): undefined reference to `std::__throw_bad_array_new_length()'
> /opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld:
> /usr/lib64/libarrow_dataset.a(dataset.cc.o): in function
> `arrow::dataset::(anonymous
> namespace)::BasicFragmentEvolution::DevolveSelection(std::vector std::allocator > const&) const':
> (.text+0x4ea6): undefined reference to `std::__throw_bad_array_new_length()'
> /opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld:
> /usr/lib64/libarrow_dataset.a(dataset.cc.o):(.text+0x4eb6): more undefined
> references to `std::__throw_bad_array_new_length()' follow
> /opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld:
> /usr/lib64/libarrow_bundled_dependencies.a(regexp.cc.o): in function
> `re2::ConvertRunesToBytes(bool, int*, int, std::string*)':
> (.text+0x10d1): undefined reference to `std::string::reserve()'
> ```
> 
> My Manylinux2014 dockerfile
> 
> ```
> 
> FROM quay.io/pypa/manylinux2014_x86_64
> 
> ENV LD_LIBRARY_PATH=/usr/local/lib
> 
> ENV ARROW_VERSION=10.0.1-1.el7
> 
> RUN yum update -y \
>   && yum install -y epel-release || yum install -y
> https://dl.fedoraproject.org/pub/epel/epel-release-latest-$(cut -d:
> -f5 /etc/system-release-cpe | cut -d. -f1).noarch.rpm \
>   && yum install -y
> https://apache.jfrog.io/artifactory/arrow/centos/7/x86_64/Packages/apache-arrow-release-${ARROW_VERSION}.noarch.rpm
> \
>   && yum install -y --enablerepo=epel \
>   arrow-devel-${ARROW_VERSION} \
>   arrow-dataset-devel-${ARROW_VERSION} \
> 
> ```
> 
> Is the reason because "libarrow_dataset.a" was built by another version of
> the toolchain?
> 
> Best,
> 
> -- 
> Lei Xu
> Eto.ai


Re: [C++] [Windows] Building arrow minimal build sample on Windows

2022-11-04 Thread Sutou Kouhei
Hi,

> Just wondering if you know when arrow-cpp v10.0.0 packages will be
> available via conda mechanism?

This is work in progress. Please watch this pull request:
https://github.com/conda-forge/arrow-cpp-feedstock/pull/866


Thanks,
-- 
kou

In 
  "Re: [C++] [Windows] Building arrow minimal build sample on Windows" on Fri, 
4 Nov 2022 18:14:01 +1100,
  Raghavendra Prasad  wrote:

> Hi kou,
> 
> Thanks for the quick reply, that seems to have worked & I can build fine
> now!   I have run into other issues, but at least I can progress now.
> 
> Just wondering if you know when arrow-cpp v10.0.0 packages will be
> available via conda mechanism?
> 
> Regards
> Prasad
> 
> 
> On Fri, Nov 4, 2022 at 3:43 PM Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> Could you use "arrow_shared" instead of
>> "Arrow::arrow_shared" instead? "Arrow::arrow_shared" is
>> available since Apache Arrow 10.0.0.
>>
>> FYI: "arrow_shared" is still available with Apache Arrow
>> 10.0.0 to keep backward compatibility.
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "[C++] [Windows] Building arrow minimal build sample on Windows" on Fri,
>> 4 Nov 2022 09:03:44 +1100,
>>   Raghavendra Prasad  wrote:
>>
>> > Hello everyone,
>> >
>> > I am exploring usage of Apache Arrow specifically usage form Visual
>> Studio
>> > (VS2019) compiled C++ programs on my Windows 10 machine.
>> >
>> > I have Visual Studio 2019 installed already.   I wanted to simply use
>> pre-build
>> > binaries, so I installed Arrow 9.0.0 using miniconda:  conda install
>> > arrow-cpp=9.0.* -c conda-forge.  (9.0.0 was the latest package I can find
>> > there).   The install was successful.
>> >
>> > I now wanted to build the arrow minimal_build example & am failing at
>> multiple
>> > attempts.  Will gratefully accept any guidance to get this working!
>> >
>> > C:\Repos\arrow\cpp\examples\minimal_build> cmake CMakeLists.txt
>> > which immediately failed with:
>> >
>> > C:\Repos\arrow\cpp\examples\minimal_build>cmake CMakeLists.txt
>> > -- Selecting Windows SDK version 10.0.19041.0 to target Windows
>> 10.0.19044.
>> > -- Arrow version: 9.0.0
>> > -- Arrow SO version: 900.0.0
>> > -- Configuring done
>> > CMake Error at CMakeLists.txt:40 (add_executable):
>> >   Target "arrow-example" links to target "Arrow::arrow_shared" but the
>> target
>> >   was not found.  Perhaps a find_package() call is missing for an
>> IMPORTED
>> >   target, or an ALIAS target is missing?
>> >
>> > I next activated arrow-dev as per Developing on Windows & ran the same
>> command.
>> >
>> > conda create -y -n arrow-dev --file=ci\conda_env_cpp.txt  ==> successful
>> > conda activate arrow-dev ==> successful
>> > (arrow-dev) C:\Repos\arrow\cpp\examples\minimal_build>cmake
>> cmakelists.txt  ==>
>> > failed
>> > -- Selecting Windows SDK version 10.0.19041.0 to target Windows
>> 10.0.19044.
>> > -- The C compiler identification is MSVC 19.29.30143.0
>> > -- The CXX compiler identification is MSVC 19.29.30143.0
>> > -- Detecting C compiler ABI info
>> > -- Detecting C compiler ABI info - done
>> > -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual
>> >
>> Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe -
>> > skipped
>> > -- Detecting C compile features
>> > -- Detecting C compile features - done
>> > -- Detecting CXX compiler ABI info
>> > -- Detecting CXX compiler ABI info - done
>> > -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft
>> Visual
>> >
>> Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe -
>> > skipped
>> > -- Detecting CXX compile features
>> > -- Detecting CXX compile features - done
>> > -- Arrow version: 9.0.0
>> > -- Arrow SO version: 900.0.0
>> > -- Configuring done
>> > CMake Error at CMakeLists.txt:43 (target_link_libraries):
>> >   Target "arrow-example" links to:
>> >
>> > Arrow::arrow_shared
>> >
>> >   but the target was not found.  Possible reasons include:
>> >
>> > * There is a typo in the target name.
>> > * A find_package call is missing for an IMPORTED target.
>> > * An ALIAS target is missing.
>> >
>> > Regards
>> > Prasad
>>


Re: [C++] [Windows] Building arrow minimal build sample on Windows

2022-11-03 Thread Sutou Kouhei
Hi,

Could you use "arrow_shared" instead of
"Arrow::arrow_shared" instead? "Arrow::arrow_shared" is
available since Apache Arrow 10.0.0.

FYI: "arrow_shared" is still available with Apache Arrow
10.0.0 to keep backward compatibility.


Thanks,
-- 
kou

In 
  "[C++] [Windows] Building arrow minimal build sample on Windows" on Fri, 4 
Nov 2022 09:03:44 +1100,
  Raghavendra Prasad  wrote:

> Hello everyone,
> 
> I am exploring usage of Apache Arrow specifically usage form Visual Studio
> (VS2019) compiled C++ programs on my Windows 10 machine.
> 
> I have Visual Studio 2019 installed already.   I wanted to simply use 
> pre-build
> binaries, so I installed Arrow 9.0.0 using miniconda:  conda install
> arrow-cpp=9.0.* -c conda-forge.  (9.0.0 was the latest package I can find
> there).   The install was successful.
> 
> I now wanted to build the arrow minimal_build example & am failing at multiple
> attempts.  Will gratefully accept any guidance to get this working!
> 
> C:\Repos\arrow\cpp\examples\minimal_build> cmake CMakeLists.txt
> which immediately failed with:
> 
> C:\Repos\arrow\cpp\examples\minimal_build>cmake CMakeLists.txt
> -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19044.
> -- Arrow version: 9.0.0
> -- Arrow SO version: 900.0.0
> -- Configuring done
> CMake Error at CMakeLists.txt:40 (add_executable):
>   Target "arrow-example" links to target "Arrow::arrow_shared" but the target
>   was not found.  Perhaps a find_package() call is missing for an IMPORTED
>   target, or an ALIAS target is missing?
> 
> I next activated arrow-dev as per Developing on Windows & ran the same 
> command.
> 
> conda create -y -n arrow-dev --file=ci\conda_env_cpp.txt  ==> successful
> conda activate arrow-dev ==> successful
> (arrow-dev) C:\Repos\arrow\cpp\examples\minimal_build>cmake cmakelists.txt  
> ==>
> failed
> -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19044.
> -- The C compiler identification is MSVC 19.29.30143.0
> -- The CXX compiler identification is MSVC 19.29.30143.0
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual
> Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe -
> skipped
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual
> Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe -
> skipped
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> -- Arrow version: 9.0.0
> -- Arrow SO version: 900.0.0
> -- Configuring done
> CMake Error at CMakeLists.txt:43 (target_link_libraries):
>   Target "arrow-example" links to:
> 
> Arrow::arrow_shared
> 
>   but the target was not found.  Possible reasons include:
> 
> * There is a typo in the target name.
> * A find_package call is missing for an IMPORTED target.
> * An ALIAS target is missing.
> 
> Regards
> Prasad


Re: [c++] problem to use macro ARROW_ASSIGN_OR_RAISE with arrow::RecordBatchBuilder::Make

2022-10-23 Thread Sutou Kouhei
Hi,

ARROW_ASSIGN_OR_RAISE() may return arrow::Status. So we
can't use it in "int main()". Please define a function that
returns arrow::Status and use it from main() like
https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc
.

Thanks,
-- 
kou

In <1998467614.1608798.1666529604...@mail.yahoo.com>
  "[c++] problem to use macro ARROW_ASSIGN_OR_RAISE with 
arrow::RecordBatchBuilder::Make" on Sun, 23 Oct 2022 12:53:24 + (UTC),
  "Alan Souza via user"  wrote:

> Hello. I am trying to use this macro with the the arrow function
> arrow::RecordBatchBuilder::Make with this macro
> However I am getting this error:
> 
> /usr/bin/c++  -isystem /usr/local/arrow/include -O3 -DNDEBUG -fPIE -std=c++17
> -MD -MT benchmark/CMakeFiles/reproducer.exe.dir/reproducer.cpp.o -MF
> benchmark/CMakeFiles/reproducer.exe.dir/reproducer.cpp.o.d -o
> benchmark/CMakeFiles/reproducer.exe.dir/reproducer.cpp.o -c
> /workspaces/cpptest/benchmark/reproducer.cpp
> In file included from /usr/local/arrow/include/arrow/buffer.h:29,
>  from /usr/local/arrow/include/arrow/array/data.h:26,
>  from /usr/local/arrow/include/arrow/array/array_base.h:26,
>  from /usr/local/arrow/include/arrow/array.h:37,
>  from /usr/local/arrow/include/arrow/api.h:22,
>  from /workspaces/cpptest/benchmark/reproducer.cpp:5:
> /workspaces/cpptest/benchmark/reproducer.cpp: In function ‘int main()’:
> /workspaces/cpptest/benchmark/reproducer.cpp:28:5: error: cannot convert
> ‘const arrow::Status’ to ‘int’ in return
>28 | ARROW_ASSIGN_OR_RAISE(batch_builder,
> arrow::RecordBatchBuilder::Make(schema, arrow::default_memory_pool(), nrows));
>   | ^
>   | |
>   | const arrow::Status
> ninja: build stopped: subcommand failed.
> 
> I am using g++ 12.2.1 (I also have tried with clang++ 14.0.5) with a built
> arrow library 10 (I also have tried with arrow 9). The strangest thing is that
> I am able to build  and run the rapidjson conversion example. that uses a
> similar construct
> 
> Without using the macro (the commented out code) works without any issues.
> 
> #include 
> #include 
> #include 
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> std::shared_ptr ExampleSchema1() {
> auto f0 = arrow::field("f0", arrow::int32());
> auto f1 = arrow::field("f1", arrow::utf8());
> auto f2 = arrow::field("f1", arrow::list(arrow::int8()));
> return arrow::schema({f0, f1, f2});
> }
> 
> int main(){
> std::shared_ptr schema = ExampleSchema1();
> std::unique_ptr batch_builder;
> std::int64_t nrows = 10;
> ARROW_ASSIGN_OR_RAISE(batch_builder, arrow::RecordBatchBuilder::Make(schema,
> arrow::default_memory_pool(), nrows));
> // arrow::Result>
> batch_builder_result = arrow::RecordBatchBuilder::Make(schema,
> // arrow::default_memory_pool(),
> // nrows);
> // if(!batch_builder_result.ok()){
> // std::cerr < // }
> 
> return 0;
> }
> 
> thanks
> 
> *


[ANN] DataFusion C/DataFusion GLib/Red DataFusion 10.0.0 released

2022-08-22 Thread Sutou Kouhei
Hi,

I've released DataFusion C, DataFusion GLib and Red
DataFusion 10.0.0 that are based on Apache Arrow DataFusion
10.0.0.
(I'll release DataFusion C, DataFusion GLib and Red
DataFusion 11.0.0 that are based on Apache Arrow DataFusion
11.0.0 sometime soon.)

DataFusion C:
  https://datafusion-contrib.github.io/datafusion-c/latest/
DataFusion GLib:
  https://datafusion-contrib.github.io/datafusion-c/latest/glib/
Red DataFusion:
  https://github.com/datafusion-contrib/datafusion-ruby/

Apache Arrow DataFusion is written in Rust. DataFusion C
provides C API for Apache Arrow DataFusion. Its API only
uses the standard C. There is no external dependency.

DataFusion C supports Apache Arrow C data interface. It
means that you can register your Apache Arrow data in memory
into Apache Arrow DataFusion and you can retrieve data in
memory returned by Apache Arrow DataFusion as Apache Arrow
data without external library. Because Apache Arrow C data
interface just uses the standard C ABI.

DataFusion C is suitable for creating language bindings of
Apache Arrow DataFusion because many languages have built-in
C support. Here are examples that use DataFusion C from
other languages:

Python with ctypes:
  
https://datafusion-contrib.github.io/datafusion-c/latest/example/sql.html#raw-c-api-from-python
Ruby with Fiddle:
  
https://datafusion-contrib.github.io/datafusion-c/latest/example/sql.html#raw-c-api-from-ruby

DataFusion GLib provides GLib API. DataFusion GLib is built
on top of the DataFusion C. DataFusion GLib is also suitable
for creating language bindings. Because DataFusion GLib
supports GObject Introspection that is a middleware to
generate language bindings dynamically:
  https://gi.readthedocs.io/en/latest/

DataFusion GLib is integrated with Apache Arrow GLib. You
can use convenient API than API provided by DataFusion
C. DataFusion C uses raw Apache Arrow C data interface but
DataFusion GLib uses Apache Arrow GLib objects instead of
Apache Arrow C data interface. DataFusion GLib uses Apache
Arrow C data interface internally but it hides the details
from users.

If a language supports GObject Introspection, you can
generate language binding with a few lines. Here are
examples that use DataFusion GLib from other languages:

Python with PyGObject:
  
https://datafusion-contrib.github.io/datafusion-c/latest/example/sql.html#glib-api-from-python
Ruby with gobject-introspection gem:
  
https://datafusion-contrib.github.io/datafusion-c/latest/example/sql.html#glib-api-from-ruby

There are binary packages of DataFusion C and DataFusion
GLib for Debian GNU/Linux, Ubuntu and AlmaLinux:
  https://datafusion-contrib.github.io/datafusion-c/latest/install.html

They are provided from
https://apache.jfrog.io/artifactory/arrow/ that are also
used by Apache Arrow C++ and Apache Arrow GLib. It means
that you can install Apache Arrow C++, Apache Arrow GLib,
DataFusion C and DataFusion GLib from the same APT/Yum
repositories.


Red DataFusion is a language bindings of Apache Arrow
DataFusion for Ruby. It's based on DataFusion GLib. This is
well integrated with Red Arrow.

There is another Apache Arrow DataFusion bindings for Ruby:
https://github.com/jychen7/arrow-datafusion-ruby

It only depends on Apache Arrow DataFusion. It doesn't
depend on Red Arrow. It doesn't use Apache Arrow C data
interface. It writes Rust codes to convert DataFusion
objects to Ruby objects.


If you're interesting in these projects, there are GitHub
Discussions for them:

  * DataFusion C/DataFusion GLib:
https://github.com/datafusion-contrib/datafusion-c/discussions
  * Red DataFusion:
https://github.com/datafusion-contrib/datafusion-ruby/discussions


Thanks,
-- 
kou


Re: StreamReader

2022-07-20 Thread Sutou Kouhei
Hi,

I can't understand why you want to mix dev@ and user@
mailing lists... Anyway...


Sorry. I misunderstood. I thought that your input is Apache
Arrow format and your output is CSV. You can't use
arrow::ipc::RecordBatchStreamReader for CSV. You need to use
arrow::csv::StreamReader:

  buffer_type_t res = fut.get0();
  BOOST_LOG_TRIVIAL(trace) <<
"RawxBackendReader: Got result with buffer size: " << res.size();
  auto input = std::make_shared(
reinterpret_cast(res.get()),
res.size());
  BOOST_LOG_TRIVIAL(trace) << "laa type input" << input.get();
  auto io_context = arrow::io::IOContext(arrow::default_memory_pool());
  auto read_options = arrow::csv::ReadOptions::Defaults();
  auto parse_options = arrow::csv::ParseOptions::Defaults();
  auto convert_options = arrow::csv::ConvertOptions::Defaults();
  auto reader_result =
arrow::csv::StreamReader::Make(io_context,
   input,
   read_options,
   parse_options,
   convert_options);
  if (reader_result.ok()) {
exit(1);
  }
  auto reader = *reader_result;
  for (auto record_batch_result : *reader) {
if (!record_batch_result.ok()) {
  exit(1);
}
auto *record_batch = record_batch_result;
// Filter record_batch and write CSV.
// You can use arrow::csv::MakeCSVWriter() to write a CSV.
  }

  result.push_back(std::move(res));


Thanks,
-- 
kou

In 
  "Re: StreamReader" on Mon, 18 Jul 2022 10:31:08 +0200,
  L Ait  wrote:

> Hey,
> 
> I tested the suggestion here and by adapting code to read  stream from csv
> format.
> But in my y tests the method OnRecordBatchDecoded is never called and my
> understanding is that this waits for an ipc format
> while I am reading csv format?
> 
> I am missing something?
> 
> In the meantime in order to replay to this thread, I only need to replay to
> d...@arrow.apache.org ?
> https://lists.apache.org/thread/5rpykkfoz416mq889pcpx9rwrrtjog60
> on dev@ to connect the existing thread?
> 
> lass MyListener : public arrow::ipc::Listener {
>   public:
> arrow::Status
> OnRecordBatchDecoded(std::shared_ptr record_batch)
> override {
>   ArrowFilter arrow_filter = ArrowFilter(record_batch);
>arrow_filter.ToCsv();
> }
>   }
> 
> 
> Thanks
> 
> 
> Le mer. 13 juil. 2022 à 06:50, Sutou Kouhei  a écrit :
> 
>> Could you resend your reply to
>> https://lists.apache.org/thread/5rpykkfoz416mq889pcpx9rwrrtjog60
>> on dev@ to connect the existing thread?
>>
>> In 
>>   "Re: StreamReader" on Tue, 12 Jul 2022 10:01:00 +0200,
>>   L Ait  wrote:
>>
>> > Thank you, I will look on that,
>> > The real problem is that I read data in chunks and the end of the chunk
>> is
>> > truncated (not a complete line) . I need to wait for the next chunk to
>> have
>> > the line completion.
>> >
>> > Is there a way you suggest to process only the chunks smoothly ?
>> >
>> > Thank you
>> >
>> >
>> > Le ven. 8 juil. 2022 à 03:37, Sutou Kouhei  a écrit
>> :
>> >
>> >> Answered on dev@:
>> >> https://lists.apache.org/thread/5rpykkfoz416mq889pcpx9rwrrtjog60
>> >>
>> >> In 
>> >>   "StreamReader" on Sat, 2 Jul 2022 16:04:45 +0200,
>> >>   L Ait  wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I need help to integrate arrow cpp in my current project. In fact I
>> built
>> >> > cpp library and can call api.
>> >> >
>> >> > What I need is that:
>> >> >
>> >> > I have a c++ project that reads data by chunks then uses some erasure
>> >> code
>> >> > to rebuild original data.
>> >> >
>> >> > The rebuild is done in chunks , At each iteration I can access a
>> buffer
>> >> of
>> >> > rebuilt data.
>> >> >
>> >> > My need is to pass this data as a stream to arrow process then send
>> the
>> >> > processed stream.
>> >> >
>> >> > For example if my original file is a csv and I would like to filter
>> and
>> >> > save first column:
>> >> >
>> >> > file
>> >> >
>> >> > col1,col2, col3, col3
>> >> > a1,b1,c1,d1
>> >> > an,bn,cn,dn
>> >> >
>> >> > split to 6 chunks of equal sizes chunk1:
>> >> >
>> >> > a1,b1,c1,d1
>> &

Re: [C/GLib] Trying (and failing) to send RecordBatches between Client and Server in C

2022-07-12 Thread Sutou Kouhei
Hi,

How do you send the written data over the network? Do you
use raw socket(2) and write(2)? If you use raw socket, can
we wrap the raw socket by GioUnixSocketStream[1]? We can
wrap the raw socket by g_unix_output_stream_new()[2] with
the file descriptor of the raw socket.

[1] https://docs.gtk.org/gio/class.UnixOutputStream.html
[2] https://docs.gtk.org/gio/ctor.UnixOutputStream.new.html

If we can wrap the raw socket by GioUnixSocketStream, we
don't need to create GArrowBuffer for serialized record
batches. We can write serialized record batches to the raw
socket directly.

I created examples to send/receive record batches via
network: https://github.com/apache/arrow/pull/13590

This may help you.


Thanks,
-- 
kou

In <2d93a698-55ac-c8bd-d0ad-d724efdd5...@freenet.de>
  "Re: [C/GLib] Trying (and failing) to send RecordBatches between Client and 
Server in C" on Mon, 11 Jul 2022 15:48:38 +0200,
  Joel Ziegler  wrote:

> Man i should stop assuming return types with data in name are just
> bytes and instead read up on the datatype. Sorry for that, it's my
> first time with glib and Arrow.
> 
> Thanks a lot for the help, again! I am able to send RecordBatches over
> the network now.
> 
> A new problem arose and i could solve it, but i am not sure, whether
> my solution is appropriate. I would be glad if you can give me your
> opinion.
> 
> I started splitting bigger Tables in multiple RecordBatches, sending
> them over the network and reading them with
> garrow_record_batch_reader_read_next(). But i created a new
> GArrowRecordBatchStreamWriter for each RecordBatch and closed it with
> g_object_unref() before sending the data over, because i want to
> "close" the writer before reading from the output buffer. This lead to
> the StreamReader only assuming the first RecordBatch in the stream,
> probably because the writer writes an EOS. So i started not using
> g_object_unref() on the StreamWriter and just reading from the buffer,
> which seems to work fine. Am i just lucky? Or is there another way of
> securely reading parts of the buffer, even though more RecordBatches
> will be written in the future?
> 
> I also wanted to ask, where can i find the usage of these Arrow Writer
> Classes? The usage of the GLib classes are well documented and i was
> just blind in not finding the information, you provided, because of
> false assumptions. But i can't find the Arrow documentation, which is
> explaining the usage of the Writer classes, as you did to me.
> 
> Sorry, if i am asking too much. I am also fine, if you just send some
> direction or links, with which i can find the solution by myself. You
> don't have to build my code :)
> 
> 
> Sincerely, Joel Ziegler
> 
> 
> On 09.07.22 04:48, Sutou Kouhei wrote:
>> Hi,
>>
>>>      GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>>>      gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
>>>
>>>      GArrowBuffer *receivingBuffer = garrow_buffer_new(data, length);
>> The data is GBytes * not const char *. You need to get raw
>> data from GBytes *:
>>
>>GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>>
>>gsize data_size;
>>gconstpointer data_raw = g_bytes_get_data(data, _size);
>>GArrowBuffer *receivingBuffer = garrow_buffer_new(data_raw,
>>data_size);
>>
>> And you need to call g_bytes_unref() against the data when
>> no longer needed:
>>
>>g_bytes_unref(data);
>>
>>
>> Thanks,


Re: [C/GLib] Trying (and failing) to send RecordBatches between Client and Server in C

2022-07-08 Thread Sutou Kouhei
Hi,

>     GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>     gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
> 
>     GArrowBuffer *receivingBuffer = garrow_buffer_new(data, length);

The data is GBytes * not const char *. You need to get raw
data from GBytes *:

  GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));

  gsize data_size;
  gconstpointer data_raw = g_bytes_get_data(data, _size);
  GArrowBuffer *receivingBuffer = garrow_buffer_new(data_raw, data_size);

And you need to call g_bytes_unref() against the data when
no longer needed:

  g_bytes_unref(data);


Thanks,
-- 
kou

In <4f699681-f280-f0ef-2f2c-65f17e516...@freenet.de>
  "Re: [C/GLib] Trying (and failing) to send RecordBatches between Client and 
Server in C" on Fri, 8 Jul 2022 15:05:52 +0200,
  Joel Ziegler  wrote:

> Hi Sutou Kouhei,
> 
> closing the writer before requesting the data does not solve the
> problem at my side. Any other error i made? The error happens at the
> creation of the RecordBatchStreamReader.
> 
> 
> void testRecordbatchStream(GArrowRecordBatch *rb){
>     GError *error = NULL;
>     GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(300,
> );
>     if(buffer == NULL){
>     fprintf(stderr, "Failed to initialize resizable buffer! Error
> message: %s\n", error->message);
>     g_error_free(error);
>     }
> 
>     GArrowBufferOutputStream *bufferStream =
> garrow_buffer_output_stream_new(buffer);
>     GArrowSchema *schema = garrow_record_batch_get_schema(rb);
>     GArrowRecordBatchStreamWriter *sw =
> garrow_record_batch_stream_writer_new(GARROW_OUTPUT_STREAM(bufferStream),
> schema, );
>     if(sw == NULL){
>     fprintf(stderr, "Failed to create Record batch writer! Error
> message: %s\n", error->message);
>     g_error_free(error);
>     }
> 
>     g_object_unref(bufferStream);
>     g_object_unref(schema);
> 
>     gboolean test =
> garrow_record_batch_writer_write_record_batch(GARROW_RECORD_BATCH_WRITER(sw),
> rb, );
>     if(!test){
>     fprintf(stderr, "Failed to write Record batch! Error message:
> %s\n", error->message);
>     g_error_free(error);
>     }
>     g_object_unref(sw);
> 
>     GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>     gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
> 
>     GArrowBuffer *receivingBuffer = garrow_buffer_new(data, length);
>     GArrowBufferInputStream *inputStream =
> garrow_buffer_input_stream_new(GARROW_BUFFER(receivingBuffer)); //
> using inital buffer here, without the intermediate data pointer,
> works!
>     GArrowRecordBatchStreamReader *sr =
> garrow_record_batch_stream_reader_new(GARROW_INPUT_STREAM(inputStream),
> );
>     if(sr == NULL){
>     fprintf(stderr, "Failed to create stream reader! Error
> message: %s\n", error->message);
>     g_error_free(error);
>     }
> 
>     GArrowRecordBatch *rb2 =
> garrow_record_batch_reader_read_next(GARROW_RECORD_BATCH_READER(sr),
> );
>     if(rb == NULL){
>     printf("Failed to create record batch from stream... Error
> message: %s\n", error->message);
>     g_error_free(error);
>     }else {
>     printf("Recordbatch:\n%s\n",
> garrow_record_batch_to_string(rb2, ));
>     }
> 
>     g_object_unref(inputStream);
>     g_object_unref(rb2);
>     g_object_unref(sr);
>     g_object_unref(buffer);
> }
> 
> 
> On 07.07.22 22:30, Sutou Kouhei wrote:
>> Hi,
>>
>>>      gboolean test = garrow_record_batch_writer_write_record_batch(
>>>      GARROW_RECORD_BATCH_WRITER(sw), rb, );
>>>
>>>      GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>>>      gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
>>>
>>>      g_object_unref(sw);
>> You need to "close" the writer before you get data from
>> buffer. g_object_unref(sw) closes the writer implicitly:
>>
>>gboolean test = garrow_record_batch_writer_write_record_batch(
>>GARROW_RECORD_BATCH_WRITER(sw), rb, );
>>g_object_unref(sw);
>>
>>GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>>gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
>>
>>
>> Thanks,


Re: StreamReader

2022-07-07 Thread Sutou Kouhei
Answered on dev@: 
https://lists.apache.org/thread/5rpykkfoz416mq889pcpx9rwrrtjog60

In 
  "StreamReader" on Sat, 2 Jul 2022 16:04:45 +0200,
  L Ait  wrote:

> Hi,
> 
> I need help to integrate arrow cpp in my current project. In fact I built
> cpp library and can call api.
> 
> What I need is that:
> 
> I have a c++ project that reads data by chunks then uses some erasure code
> to rebuild original data.
> 
> The rebuild is done in chunks , At each iteration I can access a buffer of
> rebuilt data.
> 
> My need is to pass this data as a stream to arrow process then send the
> processed stream.
> 
> For example if my original file is a csv and I would like to filter and
> save first column:
> 
> file
> 
> col1,col2, col3, col3
> a1,b1,c1,d1
> an,bn,cn,dn
> 
> split to 6 chunks of equal sizes chunk1:
> 
> a1,b1,c1,d1
> ak,bk
> 
> chunk2:
> 
> ck,dk
> ...
> am,bm,cm,dm
> 
> and so on.
> 
> My question is how to use the right StreamReader  in arrow and how this
> deals with in complete records( lines)  at the beginning and end of each
> chunk ?
> 
> Here a snippet of code I use :
> buffer_type_t res = fut.get0();
> BOOST_LOG_TRIVIAL(trace) <<
> "RawxBackendReader: Got result with buffer size: " << res.size();
> std::shared_ptr input;
> 
> std::shared_ptr buffer(new arrow::io::BufferReader(
> reinterpret_cast(res.get()), res.size()));
> input = buffer;
> BOOST_LOG_TRIVIAL(trace) << "laa type input" << input.get();
> 
> ArrowFilter arrow_filter = ArrowFilter(input);
> arrow_filter.ToCsv();
> 
> 
> result.push_back(std::move(res));
> 
> Thank you


Re: [C/GLib] Trying (and failing) to send RecordBatches between Client and Server in C

2022-07-07 Thread Sutou Kouhei
Hi,

>     gboolean test = garrow_record_batch_writer_write_record_batch(
>     GARROW_RECORD_BATCH_WRITER(sw), rb, );
> 
>     GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>     gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
> 
>     g_object_unref(sw);

You need to "close" the writer before you get data from
buffer. g_object_unref(sw) closes the writer implicitly:

  gboolean test = garrow_record_batch_writer_write_record_batch(
  GARROW_RECORD_BATCH_WRITER(sw), rb, );
  g_object_unref(sw);

  GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
  gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));


Thanks,
-- 
kou

In <0451ce95-3d28-1bde-f58b-fc7f4083a...@freenet.de>
  "Re: [C/GLib] Trying (and failing) to send RecordBatches between Client and 
Server in C" on Thu, 7 Jul 2022 14:17:39 +0200,
  Joel Ziegler  wrote:

> Thanks a lot for the reply! The conversion works as you wrote it. I am
> still unsure, how to send the IPC format written buffer. I tried
> getting a pointer to the data and length, so that i can simply send
> the data over the network, but the buffer created from the data
> pointer and length is not the same.
> 
> 
>     gboolean test = garrow_record_batch_writer_write_record_batch(
>     GARROW_RECORD_BATCH_WRITER(sw), rb, );
> 
>     GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>     gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
> 
>     g_object_unref(sw);
> 
>     // Receiving side
>     GArrowBuffer *receivingBuffer = garrow_buffer_new(data, length);
>     GArrowBufferInputStream *inputStream =
> garrow_buffer_input_stream_new(GARROW_BUFFER(receivingBuffer));
> 
> 
> On 06.07.22 09:35, Sutou Kouhei wrote:
>> Hi,
>>
>> You need to use GArrowRecordBatchStreamWriter instead of
>> garrow_output_stream_write_record_batch() to read by
>> GArrowRecordBatchStreamReader.
>>
>> GArrowRecordBatchStreamWriter and
>> GArrowRecordBatchStreamReader assume
>> IPC Streaming Format
>> https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format
>> but garrow_output_stream_write_record_batch() just writes a
>> RecordBatch message
>> https://arrow.apache.org/docs/format/Columnar.html#recordbatch-message
>> .
>>
>> void testRecordbatchStream(GArrowRecordBatch *rb){
>>GError *error = NULL;
>>
>>// Write Recordbatch
>>GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(300,
>>);
>>GArrowBufferOutputStream *bufferStream =
>>  garrow_buffer_output_stream_new(buffer);
>>GArrowSchema *schema = garrow_record_batch_get_schema(rb);
>>GArrowRecordBatchStreamWriter *writer =
>>  garrow_record_batch_stream_writer_new(bufferStream, schema, error);
>>g_object_unref(schema);
>>g_object_unref(bufferStream);
>>garrow_record_batch_writer_write_record_batch(
>>  GARROW_RECORD_BATCH_WRITER(writer), rb, error);
>>g_object_unref(writer);
>>
>>// Read RecordBatch from buffer
>>GArrowBufferInputStream *inputStream =
>>  garrow_buffer_input_stream_new(buffer);
>>GArrowRecordBatchStreamReader *sr =
>>  garrow_record_batch_stream_reader_new(
>>GARROW_INPUT_STREAM(inputStream), );
>>g_object_unref(inputStream);
>>GArrowRecordBatch *rb2 = garrow_record_batch_reader_read_next(sr,
>>);
>>printf("Received RB: \n%s\n", garrow_record_batch_to_string(rb2,
>>));
>>g_object_urnef(rb2);
>>g_object_unref(sr);
>>
>>g_object_unref(buffer);
>> }
>>
>> Your code misses g_object_unref()s. You need to call
>> g_object_unref() when an object is no longer needed. If you
>> forget to call g_object_unref(), your program causes a
>> memory leak.
>>
>>
>> Thanks,


Re: [C/GLib] Trying (and failing) to send RecordBatches between Client and Server in C

2022-07-06 Thread Sutou Kouhei
Hi,

You need to use GArrowRecordBatchStreamWriter instead of
garrow_output_stream_write_record_batch() to read by
GArrowRecordBatchStreamReader.

GArrowRecordBatchStreamWriter and
GArrowRecordBatchStreamReader assume
IPC Streaming Format
https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format
but garrow_output_stream_write_record_batch() just writes a
RecordBatch message
https://arrow.apache.org/docs/format/Columnar.html#recordbatch-message
.

void testRecordbatchStream(GArrowRecordBatch *rb){
  GError *error = NULL;

  // Write Recordbatch
  GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(300, );
  GArrowBufferOutputStream *bufferStream =
garrow_buffer_output_stream_new(buffer);
  GArrowSchema *schema = garrow_record_batch_get_schema(rb);
  GArrowRecordBatchStreamWriter *writer =
garrow_record_batch_stream_writer_new(bufferStream, schema, error);
  g_object_unref(schema);
  g_object_unref(bufferStream);
  garrow_record_batch_writer_write_record_batch(
GARROW_RECORD_BATCH_WRITER(writer), rb, error);
  g_object_unref(writer);

  // Read RecordBatch from buffer
  GArrowBufferInputStream *inputStream =
garrow_buffer_input_stream_new(buffer);
  GArrowRecordBatchStreamReader *sr =
garrow_record_batch_stream_reader_new(
  GARROW_INPUT_STREAM(inputStream), );
  g_object_unref(inputStream);
  GArrowRecordBatch *rb2 = garrow_record_batch_reader_read_next(sr, );
  printf("Received RB: \n%s\n", garrow_record_batch_to_string(rb2, ));
  g_object_urnef(rb2);
  g_object_unref(sr);

  g_object_unref(buffer);
}

Your code misses g_object_unref()s. You need to call
g_object_unref() when an object is no longer needed. If you
forget to call g_object_unref(), your program causes a
memory leak.


Thanks,
-- 
kou

In 
  "[C/GLib] Trying (and failing) to send RecordBatches between Client and 
Server in C" on Tue, 5 Jul 2022 17:04:33 +0200,
  Joel Ziegler  wrote:

> Hi folks,
> 
> I read some data from a PostgreSQL database, convert it into
> RecordBatches and try to send the data to a client. But I fail to
> properly understand the usage of Apache Arrow C/GLib.
> 
> My information sources are the [C++ docs][1], [the Apache Arrow C/GLib
> reference manual][2] and [the C/GLib Github files][3].
> 
> By following the usage description of Apache Arrow C++ and
> experimenting with the wrapper classes in C, I build this minimal
> example of writing out a RecordBatch into a buffer and (after
> theoretically sending and receiving the buffer) trying to read that
> buffer back into a RecordBatch. But it fails and i would be glad, if
> you could point out my mistakes!
> 
> I omitted the error catching for readability. The code errors out at
> creation of the GArrowRecordBatchStreamReader. If i use the
> arrowbuffer or the buffer from the top in creating the InputStream,
> the error reads:
> 
> ```[record-batch-stream-reader][open]: IOError: Expected IPC message
> of type schema but got record batch```.
> 
> If i use the testBuffer the error complains about an invalid IPC
> stream, so the data is just corrupt.
> 
> 
> ```
> void testRecordbatchStream(GArrowRecordBatch *rb){
>     GError *error = NULL;
> 
>     // Write Recordbatch
>     GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(300,
> );
>     GArrowBufferOutputStream *bufferStream =
> garrow_buffer_output_stream_new(buffer);
>     long written =
> garrow_output_stream_write_record_batch(GARROW_OUTPUT_STREAM(bufferStream),
> rb, NULL, );
> 
>     // Use buffer as plain bytes
>     void *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>     size_t length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
> 
>     // Read plain bytes and test serialize function
>     GArrowBuffer *testBuffer = garrow_buffer_new(data, length);
>     GArrowBuffer *arrowbuffer = garrow_record_batch_serialize(rb,
> NULL, );
> 
>     // Read RecordBatch from buffer
>     GArrowBufferInputStream *inputStream =
> garrow_buffer_input_stream_new(arrowbuffer);
>     GArrowRecordBatchStreamReader *sr =
> garrow_record_batch_stream_reader_new(GARROW_INPUT_STREAM(inputStream),
> );
>     GArrowRecordBatch *rb2 = garrow_record_batch_reader_read_next(sr,
> );
> 
> 
>     printf("Received RB: \n%s\n", garrow_record_batch_to_string(rb2,
> ));
> }
> ```
> 
> 
>   [1]: https://arrow.apache.org/docs/cpp/index.html
>   [2]: https://arrow.apache.org/docs/c_glib/arrow-glib/
>   [3]: https://github.com/apache/arrow/tree/master/c_glib
> 


Re: Does cpp Win32 work?

2022-05-30 Thread Sutou Kouhei
Hi,

It seems that 32 bit Windows doesn't provide the followings:

* __popcnt64()
* _BitScanReverse64()
* _BitScanForward64()

We have fallback implementations for _BitScan*64(). So we
can use them by the following change:


diff --git a/cpp/src/arrow/util/bit_util.h b/cpp/src/arrow/util/bit_util.h
index 8583e10b22..e06e3399e1 100644
--- a/cpp/src/arrow/util/bit_util.h
+++ b/cpp/src/arrow/util/bit_util.h
@@ -199,7 +199,7 @@ static inline int CountLeadingZeros(uint64_t value) {
 #if defined(__clang__) || defined(__GNUC__)
   if (value == 0) return 64;
   return static_cast(__builtin_clzll(value));
-#elif defined(_MSC_VER)
+#elif defined(_MSC_VER) && (defined(_M_AMD64) || defined(_M_X64))
   unsigned long index; // NOLINT
   if (_BitScanReverse64(, value)) {  // NOLINT
 return 63 - static_cast(index);
@@ -220,7 +220,7 @@ static inline int CountTrailingZeros(uint32_t value) {
 #if defined(__clang__) || defined(__GNUC__)
   if (value == 0) return 32;
   return static_cast(__builtin_ctzl(value));
-#elif defined(_MSC_VER)
+#elif defined(_MSC_VER) && (defined(_M_AMD64) || defined(_M_X64))
   unsigned long index;  // NOLINT
   if (_BitScanForward(, value)) {
 return static_cast(index);


But we don't have a fall back implementation for
__popcnt64(). Could you file this to
https://issues.apache.org/jira/browse/ARROW ?

BTW, do you want to work on this?


Thanks,
-- 
kou


In <6290DDC6005E02EC00390001_0_37693@msllnjpmsgsv06>
  "Does cpp Win32 work?" on Fri, 27 May 2022 14:18:46 -,
  "Arkadiy Vertleyb (BLOOMBERG/ 120 PARK)"  wrote:

> Hi all,
> 
> After resolving my linker issue, I now have the following problem:
> 
> C:\Users\avertleyb\git\arrow\cpp\src\arrow/util/bit_util.h(70,59): error 
> C3861: '__popcnt64': identifier not found 
> [C:\Users\avertleyb\git\arrow\cpp\build32\src\arrow\arrow_shared.vcxproj]
> C:\Users\avertleyb\git\arrow\cpp\src\arrow/util/bit_util.h(204,7): error 
> C3861: '_BitScanReverse64': identifier not found 
> [C:\Users\avertleyb\git\arrow\cpp\build32\src\arrow\arrow_shared.vcxproj]
> C:\Users\avertleyb\git\arrow\cpp\src\arrow/util/bit_util.h(250,7): error 
> C3861: '_BitScanForward64': identifier not found 
> [C:\Users\avertleyb\git\arrow\cpp\build32\src\arrow\arrow_shared.vcxproj]
> 
> Looks like it is trying to use 64 bit stuff, which isn't defined in the 32 
> bit architecture.
> 
> One thing I noticed - all vcproj files contain:
> 
>   
> x64
>   
> 
> Not sure if this is the issue, but looks suspicious.
> 
> Also, for some reason, generated vcproj files don't contain C++ properties, 
> including preprocessor properties, when I open them in MSVC.
> 
> Any help would be greatly appreciated.
> 
> Thanks,
> Arkadiy


Re: cpp build issue with gflags

2022-05-29 Thread Sutou Kouhei
Hi,

It seems that the following build dependency is missing:

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index aa01b7528c..c62bd41c5f 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -1379,6 +1379,7 @@ macro(build_gflags)
   add_dependencies(toolchain gflags_ep)
 
   add_thirdparty_lib(gflags::gflags_static STATIC ${GFLAGS_STATIC_LIB})
+  add_dependencies(gflags::gflags_static gflags_ep)
   set(GFLAGS_LIBRARY gflags::gflags_static)
   set_target_properties(${GFLAGS_LIBRARY}
 PROPERTIES INTERFACE_COMPILE_DEFINITIONS 
"GFLAGS_IS_A_DLL=0"


Could you try the change with clean build directory?


Thanks,
-- 
kou

In <628FE2E400C207D600390001_0_60716@msllnjpmsgsv06>
  "Re:cpp build issue with gflags" on Thu, 26 May 2022 20:28:20 -,
  "Arkadiy Vertleyb (BLOOMBERG/ 120 PARK)"  wrote:

> This issue gets resolved if I run the second step twice.
> 
> From: user@arrow.apache.org At: 05/26/22 15:12:39 UTC-4:00To:  
> user@arrow.apache.org
> Subject: cpp build issue with gflags
> 
> Hi all,
> 
> Trying to build cpp Arrow with MSVC 2019, as per 
> https://arrow.apache.org/docs/developers/cpp/windows.html.
> 
> The first step - 
> 
> cmake .. -G "Visual Studio 16 2019" -A x64 -DARROW_BUILD_TESTS=ON
> 
> says it can't find gflags and will build them from source:
> 
> -- Building gflags from source
> -- Added static library dependency gflags::gflags_static: 
> C:/Users/avertleyb/git/arrow/cpp/build/gflags_ep-prefix/src/gflags_ep/lib/gflags_static.lib
> 
> The second step - 
> 
> cmake --build . --config Release
> 
> right away complains about this library:
> 
> LINK : fatal error LNK1181: cannot open input file 
> 'C:\Users\avertleyb\git\arrow\cpp\build\gflags_ep-prefix\src\gflags_ep\lib\gflags_static.lib'
>  
> [C:\Users\avertleyb\git\arrow\cpp\build\src\arrow\arrow_bundled_dependencies.vcxproj]
> C:\Program Files (x86)\Microsoft Visual 
> Studio\2019\Professional\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(241,5):
>  error MSB8066: Custom build for 
> 'C:\Users\avertleyb\git\arrow\cpp\build\CMakeFiles\b033194e6d32d6a2595cc88c82
> 72e4b2\arrow_bundled_dependencies.lib.rule;C:\Users\avertleyb\git\arrow\cpp\build\CMakeFiles\672df30e18a621ddf9c15292835268fd\arrow_bundled_dependencies.rule'
>  exited with code 1181. [C:\Users\avertleyb\git\arrow\cpp\build\src\arrow\arro
> w_bundled_dependencies.vcxproj]
> 
> However it proceeds with the build, and when the build ends, the library is 
> there:
> 
> C:\Users\avertleyb\git\arrow\cpp\build>dir 
> C:\Users\avertleyb\git\arrow\cpp\build\gflags_ep-prefix\src\gflags_ep\lib\gflags_static.lib
>  Volume in drive C is Windows
>  Volume Serial Number is 3E24-1FC6
> 
>  Directory of 
> C:\Users\avertleyb\git\arrow\cpp\build\gflags_ep-prefix\src\gflags_ep\lib
> 
> 05/26/2022  02:40 PM   672,310 gflags_static.lib
>1 File(s)672,310 bytes
>0 Dir(s)  288,920,072,192 bytes free
> 
> So what is wrong?  Any help is greatly appreciated.
> 
> Thanks,
> Arkadiy
> 
> 


Re: cpp Windows 32 build

2022-05-25 Thread Sutou Kouhei
Hi,

What problems do you have? Could you share build log or
something?

"-A Win32" is correct CMake option.

BTW, why do you want to build for 32 bit Windows? We
recommend a build for 64 bit Windows for performance and 32
bit Windows support is limited.


Thanks,
-- 
kou

In 
 

  "cpp Windows 32 build" on Wed, 25 May 2022 19:42:56 +,
  Arkadiy Vertleyb  wrote:

> Hi all.  I am having problems trying to build cpp for 32 bit windows.  Is 
> Win32 the right architecture to use with cmake?
> 
> I am using the MSVC method as described in 
> https://arrow.apache.org/docs/developers/cpp/windows.html.
> 
> Thanks for any help.
> 
> Arkadiy
> 
> Sent from Mail for Windows
> 


Re: arrow_flight_sql_static

2022-05-16 Thread Sutou Kouhei
Hi,

You'll be able to find ArrowFlightSql CMake package by
adding -DArrowFLightSql_DIR=${ARROW_PREFIX}/lib/cmake/arrow
CMake option.

I'll fix this by
https://issues.apache.org/jira/browse/ARROW-12175 .


Thanks,
-- 
kou

In 
  "Re: arrow_flight_sql_static" on Mon, 16 May 2022 15:24:11 +0800,
  Zmoey Zhang  wrote:

> Thank you Sutou. But it didn't work and got error like
> 
> By not providing "FindArrowFlightSql.cmake" in CMAKE_MODULE_PATH this
>   project has asked CMake to find a package configuration file provided by
>   "ArrowFlightSql", but CMake did not find one.
> 
> And finally I used this and it worked.
> 
> find_library(ARROW_FLIGHT_SQL_LIBRARY NAMES arrow_flight_sql REQUIRED)
> 
> target_link_libraries(${TARGET_NAME}
> PUBLIC
> ${ARROW_FLIGHT_SQL_LIBRARY} ...
> 
> 
> Best Regards,
> Zimo Zhang
> 
> 
> Sutou Kouhei  于2022年5月14日周六 04:41写道:
> 
>> Hi,
>>
>> It seems that you don't have
>>
>>   find_package(ArrowFlightSql REQUIRED)
>>
>> in your CMakeLists.txt. Could you confirm it?
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "arrow_flight_sql_static" on Fri, 13 May 2022 10:59:17 +0800,
>>   Zmoey Zhang  wrote:
>>
>> > Hi there,
>> >
>> > I'm recently trying to write a flight sql server following the sqlite
>> > example. However, it throws error "ld: library not found for
>> > -larrow_flight_sql_static" during compiling. My arrow's installed via
>> vcpkg
>> > and I'm sure libarrow_flight_sql.a is in both installed/x64-osx/debug/lib
>> > and installed/x64-osx/lib. And I have following code in the
>> CMakeLists.txt
>> >
>> > ...
>> > target_link_libraries(${TARGET_NAME}
>> > PUBLIC
>> > arrow_flight_sql_static gRPC::grpc++ protobuf::libprotobuf
>> > arrow_static ...
>> >
>> > I've checked arrow official doc but seems flight-sql-related content not
>> > there yet. Do you have any idea how to fix this? Many thanks in advance.
>> >
>> >
>> > Best Regards,
>> > Zimo Zhang
>>


Re: arrow_flight_sql_static

2022-05-13 Thread Sutou Kouhei
Hi,

It seems that you don't have

  find_package(ArrowFlightSql REQUIRED)

in your CMakeLists.txt. Could you confirm it?


Thanks,
-- 
kou

In 
  "arrow_flight_sql_static" on Fri, 13 May 2022 10:59:17 +0800,
  Zmoey Zhang  wrote:

> Hi there,
> 
> I'm recently trying to write a flight sql server following the sqlite
> example. However, it throws error "ld: library not found for
> -larrow_flight_sql_static" during compiling. My arrow's installed via vcpkg
> and I'm sure libarrow_flight_sql.a is in both installed/x64-osx/debug/lib
> and installed/x64-osx/lib. And I have following code in the CMakeLists.txt
> 
> ...
> target_link_libraries(${TARGET_NAME}
> PUBLIC
> arrow_flight_sql_static gRPC::grpc++ protobuf::libprotobuf
> arrow_static ...
> 
> I've checked arrow official doc but seems flight-sql-related content not
> there yet. Do you have any idea how to fix this? Many thanks in advance.
> 
> 
> Best Regards,
> Zimo Zhang


Re: [Ruby] Cannot require 'parquet' on M1 Mac

2022-04-22 Thread Sutou Kouhei
Hi,

> And now it works!

That's good to know. :-)

> I tried to see what has changed and found this which sounds related:
> https://github.com/Homebrew/homebrew-core/commit/0d53c4ddc79abc153967d0f903c5621eab59e422

Yes. This will replace `@rpath`.

> However, that change should already have been included in 7.0.0_3, so I'm
> confused...

https://github.com/Homebrew/homebrew-core/commits/master/Formula/apache-arrow.rb
shows a 7.0.0_3 bottle update after the commit but there is
also a 7.0.0_3 bottle update BEFORE the commit.

It seems that the commit misses revision bump like
https://github.com/Homebrew/homebrew-core/commit/ede84dd46a3bf3ac745c0cbc3580a2d44d04ea76
. So some(? all?) users such as you use the old 7.0.0_3
bottle that doesn't include the commit.


Thanks,
-- 
kou

In 
  "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Fri, 22 Apr 2022 11:12:37 
+0200,
  Sten Larsson  wrote:

> Hi
> 
> I noticed this when building apache-arrow that Homebrew released a new
> patch version
> 
> ...
> ==> Upgrading apache-arrow
>   7.0.0_3 -> 7.0.0_4
> ...
> 
> And now it works!
> 
> I tried to see what has changed and found this which sounds related:
> https://github.com/Homebrew/homebrew-core/commit/0d53c4ddc79abc153967d0f903c5621eab59e422
> 
> However, that change should already have been included in 7.0.0_3, so I'm
> confused...
> 
> In any case it seems this was a Homebrew issue and that it is now fixed. I
> appreciate all your help, and I learned a lot about macOS along the way.
> 
> Thanks
> /Sten


Re: [Ruby] Cannot require 'parquet' on M1 Mac

2022-04-22 Thread Sutou Kouhei
Hi,

Thanks for getting the log! I didn't know the "-b 100m" option!

>   This seems to be
> the real problem:
> 
> open("@rpath/libarrow.700.dylib\0", 0x0, 0x0) = -1 Err#2
> open("@rpath\0", 0x10, 0x0) = -1 Err#2

Yes.

It's caused by embedded information in libparquet.700.dylib:

> $ otool -L /opt/homebrew/lib/libparquet.700.dylib
> /opt/homebrew/lib/libparquet.700.dylib:
> /opt/homebrew/opt/apache-arrow/lib/libparquet.700.dylib (compatibility 
> version 700.0.0, current version 700.0.0)
> @rpath/libarrow.700.dylib (compatibility version 700.0.0, current version 
> 700.0.0)

I don't know why @rpath is used here. Note that Homebrew
bottle on non M1 Mac doesn't use @rpath:

$ otool -L /usr/local/lib/libparquet.dylib
/usr/local/lib/libparquet.dylib:
/usr/local/opt/apache-arrow/lib/libparquet.700.dylib (compatibility 
version 700.0.0, current version 700.0.0)
/usr/local/Cellar/apache-arrow/7.0.0_4/lib/libarrow.700.dylib 
(compatibility version 700.0.0, current version 700.0.0)

This may be a Homebrew problem or our CMake script related
problem.

Could you try:

$ brew install --build-from-source --keep-tmp --apache-arrow

and:

$ otool -L $(brew --prefix)/lib/libparquet.dylib

?

If this still include @rpath, could you provide build
directory that is kept by --keep-tmp option?


Thanks,
-- 
kou


In 
  "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Thu, 21 Apr 2022 10:22:52 
+0200,
  Sten Larsson  wrote:

> Hi kou
> 
> I figured out that the reason the log was incomplete was that the dtruss
> buffer was too small. I ran the command with -b 100m and updated the gist.
> Note that the log file is truncated in the UI, so you need to download it
> to see the interesting parts:
> https://gist.github.com/stenlarsson/02d777e4c3b9e485b6e0f80f834ed8f5
> 
> It seems like the error message is lying, and that it is actually trying to
> load /opt/homebrew/lib/libparquet-glib.700.dylib as well. This seems to be
> the real problem:
> 
> open("@rpath/libarrow.700.dylib\0", 0x0, 0x0) = -1 Err#2
> open("@rpath\0", 0x10, 0x0) = -1 Err#2
> 
> Not sure what it means though?
> 
> Thanks
> /Sten
> 
> On Thu, 21 Apr 2022 at 07:30, Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> Thanks for the update! It seems that we need to know more
>> macOS tools to debug this case...
>>
>> If I have a M1 Mac, I can look into this more. But I don't
>> have it. Please use DYLD_FALLBACK_LIBRARY_PATH for now.
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Wed, 20 Apr 2022
>> 09:49:31 +0200,
>>   Sten Larsson  wrote:
>>
>> > Hi kou
>> >
>> > I rescued NameError instead of LoadError to catch the error, but
>> > unfortunately it still doesn't seem to show anything about loading
>> > libraries. I updated the gist with the output:
>> > https://gist.github.com/stenlarsson/02d777e4c3b9e485b6e0f80f834ed8f5
>> >
>> > Thanks
>> > /Sten
>> >
>> > On Wed, 20 Apr 2022 at 02:35, Sutou Kouhei  wrote:
>> >
>> >> Hi,
>> >>
>> >> Thanks for providing it!
>> >>
>> >> But it doesn't include information what I want to see (which
>> >> files are opened/stated)... It seems that dtruss log isn't
>> >> completed.
>> >>
>> >> Could you try again with the following arrow-test.rb
>> >> content?
>> >>
>> >> 
>> >> begin
>> >>   require 'parquet'
>> >> rescue LoadError
>> >> end
>> >> sleep(10)
>> >> puts("done")
>> >> ---
>> >>
>> >> Thanks,
>> >> --
>> >> kou
>> >>
>> >> In 
>> >>   "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Tue, 19 Apr 2022
>> >> 08:41:37 +0200,
>> >>   Sten Larsson  wrote:
>> >>
>> >> > Hi Kou
>> >> >
>> >> > I have uploaded the output of dtruss here:
>> >> > https://gist.github.com/stenlarsson/02d777e4c3b9e485b6e0f80f834ed8f5
>> >> >
>> >> > Thanks
>> >> > /Sten
>> >> >
>> >> > On Fri, 15 Apr 2022 at 02:50, Sutou Kouhei 
>> wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> > I disabled SIP, but unfortunately dtrace didn't give anything
>> useful.
>> >> >>

Re: [Ruby] Cannot require 'parquet' on M1 Mac

2022-04-20 Thread Sutou Kouhei
Hi,

Thanks for the update! It seems that we need to know more
macOS tools to debug this case...

If I have a M1 Mac, I can look into this more. But I don't
have it. Please use DYLD_FALLBACK_LIBRARY_PATH for now.


Thanks,
-- 
kou

In 
  "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Wed, 20 Apr 2022 09:49:31 
+0200,
  Sten Larsson  wrote:

> Hi kou
> 
> I rescued NameError instead of LoadError to catch the error, but
> unfortunately it still doesn't seem to show anything about loading
> libraries. I updated the gist with the output:
> https://gist.github.com/stenlarsson/02d777e4c3b9e485b6e0f80f834ed8f5
> 
> Thanks
> /Sten
> 
> On Wed, 20 Apr 2022 at 02:35, Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> Thanks for providing it!
>>
>> But it doesn't include information what I want to see (which
>> files are opened/stated)... It seems that dtruss log isn't
>> completed.
>>
>> Could you try again with the following arrow-test.rb
>> content?
>>
>> 
>> begin
>>   require 'parquet'
>> rescue LoadError
>> end
>> sleep(10)
>> puts("done")
>> ---
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Tue, 19 Apr 2022
>> 08:41:37 +0200,
>>   Sten Larsson  wrote:
>>
>> > Hi Kou
>> >
>> > I have uploaded the output of dtruss here:
>> > https://gist.github.com/stenlarsson/02d777e4c3b9e485b6e0f80f834ed8f5
>> >
>> > Thanks
>> > /Sten
>> >
>> > On Fri, 15 Apr 2022 at 02:50, Sutou Kouhei  wrote:
>> >
>> >> Hi,
>> >>
>> >> > I disabled SIP, but unfortunately dtrace didn't give anything useful.
>> >> >
>> >> > $ sudo dtrace $(rbenv which ruby) arrow-test.rb
>> >> > dtrace: no probes specified
>> >>
>> >> Sorry... I told wrong command. We should use dtruss not
>> >> dtrace.
>> >>
>> >> Could you try the following?
>> >>
>> >>   $ sudo dtruss $(rbenv which ruby) arrow-test.rb
>> >>
>> >>
>> >> Thanks,
>> >> --
>> >> kou
>> >>
>> >> In 
>> >>   "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Thu, 14 Apr 2022
>> >> 09:58:47 +0200,
>> >>   Sten Larsson  wrote:
>> >>
>> >> > Hi kou
>> >> >
>> >> > Thanks, rbenv was indeed the reason DYLD_FALLBACK_LIBRARY_PATH didn't
>> >> have
>> >> > any effect, so this command now works!
>> >> >
>> >> >   $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> >> > --prefix)/lib:/usr/local/lib:/usr/lib" \
>> >> >   $(rbenv which ruby) arrow-test.rb
>> >> >
>> >> > (It doesn't print anything since the script doesn't actually do
>> >> anything.)
>> >> >
>> >> > I disabled SIP, but unfortunately dtrace didn't give anything useful.
>> >> >
>> >> > $ sudo dtrace $(rbenv which ruby) arrow-test.rb
>> >> > dtrace: no probes specified
>> >> >
>> >> > I have not used dtrace before so I don't know what probes I should
>> >> specify,
>> >> > sorry.
>> >> >
>> >> > Thanks
>> >> > /Sten
>> >> >
>> >> > On Thu, 14 Apr 2022 at 09:29, Sutou Kouhei 
>> wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> > $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> >> >> --prefix)/lib:/usr/local/lib:/usr/lib" \
>> >> >> > ruby arrow-test.rb
>> >> >>
>> >> >> Ah, I forgot that you use rbenv. rbenv runs ruby from
>> >> >> a wrapper script. DYLD_FALLBACK_LIBRARY_PATH isn't inherited
>> >> >> to a subprocess on macOS for security reason.
>> >> >>
>> >> >> Could you try the following?
>> >> >>
>> >> >>   $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> >> >> --prefix)/lib:/usr/local/lib:/usr/lib" \
>> >> >>   $(rbenv which ruby) arrow-test.rb
>> >> >>
>> >> >> If you can disable SIP (System Integrity Protection) on the
>> >> >> machine, could you provide dtrace log?
>> >> >>
>> >> >>   $ sudo dtrace $(rbenv 

Re: [Ruby] Cannot require 'parquet' on M1 Mac

2022-04-19 Thread Sutou Kouhei
Hi,

Thanks for providing it!

But it doesn't include information what I want to see (which
files are opened/stated)... It seems that dtruss log isn't
completed.

Could you try again with the following arrow-test.rb
content?


begin
  require 'parquet'
rescue LoadError
end
sleep(10)
puts("done")
---

Thanks,
-- 
kou

In 
  "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Tue, 19 Apr 2022 08:41:37 
+0200,
  Sten Larsson  wrote:

> Hi Kou
> 
> I have uploaded the output of dtruss here:
> https://gist.github.com/stenlarsson/02d777e4c3b9e485b6e0f80f834ed8f5
> 
> Thanks
> /Sten
> 
> On Fri, 15 Apr 2022 at 02:50, Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> > I disabled SIP, but unfortunately dtrace didn't give anything useful.
>> >
>> > $ sudo dtrace $(rbenv which ruby) arrow-test.rb
>> > dtrace: no probes specified
>>
>> Sorry... I told wrong command. We should use dtruss not
>> dtrace.
>>
>> Could you try the following?
>>
>>   $ sudo dtruss $(rbenv which ruby) arrow-test.rb
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Thu, 14 Apr 2022
>> 09:58:47 +0200,
>>   Sten Larsson  wrote:
>>
>> > Hi kou
>> >
>> > Thanks, rbenv was indeed the reason DYLD_FALLBACK_LIBRARY_PATH didn't
>> have
>> > any effect, so this command now works!
>> >
>> >   $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> > --prefix)/lib:/usr/local/lib:/usr/lib" \
>> >   $(rbenv which ruby) arrow-test.rb
>> >
>> > (It doesn't print anything since the script doesn't actually do
>> anything.)
>> >
>> > I disabled SIP, but unfortunately dtrace didn't give anything useful.
>> >
>> > $ sudo dtrace $(rbenv which ruby) arrow-test.rb
>> > dtrace: no probes specified
>> >
>> > I have not used dtrace before so I don't know what probes I should
>> specify,
>> > sorry.
>> >
>> > Thanks
>> > /Sten
>> >
>> > On Thu, 14 Apr 2022 at 09:29, Sutou Kouhei  wrote:
>> >
>> >> Hi,
>> >>
>> >> > $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> >> --prefix)/lib:/usr/local/lib:/usr/lib" \
>> >> > ruby arrow-test.rb
>> >>
>> >> Ah, I forgot that you use rbenv. rbenv runs ruby from
>> >> a wrapper script. DYLD_FALLBACK_LIBRARY_PATH isn't inherited
>> >> to a subprocess on macOS for security reason.
>> >>
>> >> Could you try the following?
>> >>
>> >>   $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> >> --prefix)/lib:/usr/local/lib:/usr/lib" \
>> >>   $(rbenv which ruby) arrow-test.rb
>> >>
>> >> If you can disable SIP (System Integrity Protection) on the
>> >> machine, could you provide dtrace log?
>> >>
>> >>   $ sudo dtrace $(rbenv which ruby) arrow-test.rb
>> >>
>> >>
>> >> Thanks,
>> >> --
>> >> kou
>> >>
>> >> In 
>> >>   "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Thu, 14 Apr 2022
>> >> 08:26:54 +0200,
>> >>   Sten Larsson  wrote:
>> >>
>> >> > Hi kou
>> >> >
>> >> > Thanks for trying to help me with this
>> >> >
>> >> >
>> >> > 1. Yes
>> >> >
>> >> >
>> >> > 2. Note that I skipped the -r flag to get the result.
>> >> >
>> >> > $ grep -A 4 '> >> >   > >> >  version="1.0"
>> >> >  shared-library="libarrow-glib.700.dylib"
>> >> >  c:identifier-prefixes="GArrow"
>> >> >  c:symbol-prefixes="garrow">
>> >> >
>> >> >
>> >> > 3. Same here
>> >> >
>> >> > $ grep -A 4 '> --prefix)/share/gir-1.0/Parquet-1.0.gir
>> >> >   > >> >  version="1.0"
>> >> >  shared-library="libparquet-glib.700.dylib"
>> >> >  c:identifier-prefixes="GParquet"
>> >> >  c:symbol-prefixes="gparquet">
>> >> >
>> >> >
>> >> > 4. No matches
>> >> >
>> >> > $ env | grep LIBRARY

Re: [Ruby] Cannot require 'parquet' on M1 Mac

2022-04-14 Thread Sutou Kouhei
Hi,

> I disabled SIP, but unfortunately dtrace didn't give anything useful.
> 
> $ sudo dtrace $(rbenv which ruby) arrow-test.rb
> dtrace: no probes specified

Sorry... I told wrong command. We should use dtruss not
dtrace.

Could you try the following?

  $ sudo dtruss $(rbenv which ruby) arrow-test.rb


Thanks,
-- 
kou

In 
  "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Thu, 14 Apr 2022 09:58:47 
+0200,
  Sten Larsson  wrote:

> Hi kou
> 
> Thanks, rbenv was indeed the reason DYLD_FALLBACK_LIBRARY_PATH didn't have
> any effect, so this command now works!
> 
>   $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
> --prefix)/lib:/usr/local/lib:/usr/lib" \
>   $(rbenv which ruby) arrow-test.rb
> 
> (It doesn't print anything since the script doesn't actually do anything.)
> 
> I disabled SIP, but unfortunately dtrace didn't give anything useful.
> 
> $ sudo dtrace $(rbenv which ruby) arrow-test.rb
> dtrace: no probes specified
> 
> I have not used dtrace before so I don't know what probes I should specify,
> sorry.
> 
> Thanks
> /Sten
> 
> On Thu, 14 Apr 2022 at 09:29, Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> > $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> --prefix)/lib:/usr/local/lib:/usr/lib" \
>> > ruby arrow-test.rb
>>
>> Ah, I forgot that you use rbenv. rbenv runs ruby from
>> a wrapper script. DYLD_FALLBACK_LIBRARY_PATH isn't inherited
>> to a subprocess on macOS for security reason.
>>
>> Could you try the following?
>>
>>   $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> --prefix)/lib:/usr/local/lib:/usr/lib" \
>>   $(rbenv which ruby) arrow-test.rb
>>
>> If you can disable SIP (System Integrity Protection) on the
>> machine, could you provide dtrace log?
>>
>>   $ sudo dtrace $(rbenv which ruby) arrow-test.rb
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Thu, 14 Apr 2022
>> 08:26:54 +0200,
>>   Sten Larsson  wrote:
>>
>> > Hi kou
>> >
>> > Thanks for trying to help me with this
>> >
>> >
>> > 1. Yes
>> >
>> >
>> > 2. Note that I skipped the -r flag to get the result.
>> >
>> > $ grep -A 4 '> >   > >  version="1.0"
>> >  shared-library="libarrow-glib.700.dylib"
>> >  c:identifier-prefixes="GArrow"
>> >  c:symbol-prefixes="garrow">
>> >
>> >
>> > 3. Same here
>> >
>> > $ grep -A 4 '> >   > >  version="1.0"
>> >  shared-library="libparquet-glib.700.dylib"
>> >  c:identifier-prefixes="GParquet"
>> >  c:symbol-prefixes="gparquet">
>> >
>> >
>> > 4. No matches
>> >
>> > $ env | grep LIBRARY_PATH | sort
>> >
>> >
>> > 5. Nothing found
>> >
>> > $ ls /usr/local/lib/lib*-glib.*.dylib
>> > zsh: no matches found: /usr/local/lib/lib*-glib.*.dylib
>> >
>> >
>> > 6. Nothing found
>> >
>> > $ ls /usr/lib/lib*-glib.*.dylib
>> > zsh: no matches found: /usr/lib/lib*-glib.*.dylib
>> >
>> >
>> > DYLD_FALLBACK_LIBRARY_PATH doesn't seem to have any effect
>> >
>> > $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> --prefix)/lib:/usr/local/lib:/usr/lib"
>> > ruby arrow-test.rb
>> > (null)-WARNING **: Failed to load shared library
>> > 'libparquet-glib.700.dylib' referenced by the typelib:
>> > dlopen(libparquet-glib.700.dylib, 0x0009): tried:
>> > 'libparquet-glib.700.dylib' (no such file),
>> > '/usr/local/lib/libparquet-glib.700.dylib' (no such file),
>> > '/usr/lib/libparquet-glib.700.dylib' (no such file),
>> > '/Users/stenlarsson/Documents/src/arrow-test/libparquet-glib.700.dylib'
>> (no
>> > such file)
>> > [...]
>> >
>> > Unfortunately it is still a mystery.
>> >
>> > Thanks
>> > /Sten
>> >
>> >
>> > On Wed, 13 Apr 2022 at 23:47, Sutou Kouhei  wrote:
>> >
>> >> Hi,
>> >>
>> >> Could you tell the following?
>> >>
>> >> 1. Did you run the script in
>> >>/Users/stenlarsson/Documents/src/arrow-test/ ?
>> >>
>> >> 2. The output of
>> >>grep -r -A 4 '> --prefix)/share/gir-1.0/

Re: [Ruby] Cannot require 'parquet' on M1 Mac

2022-04-14 Thread Sutou Kouhei
Hi,

> $ DYLD_FALLBACK_LIBRARY_PATH="$(brew --prefix)/lib:/usr/local/lib:/usr/lib" \
> ruby arrow-test.rb

Ah, I forgot that you use rbenv. rbenv runs ruby from
a wrapper script. DYLD_FALLBACK_LIBRARY_PATH isn't inherited
to a subprocess on macOS for security reason.

Could you try the following?

  $ DYLD_FALLBACK_LIBRARY_PATH="$(brew --prefix)/lib:/usr/local/lib:/usr/lib" \
  $(rbenv which ruby) arrow-test.rb

If you can disable SIP (System Integrity Protection) on the
machine, could you provide dtrace log?

  $ sudo dtrace $(rbenv which ruby) arrow-test.rb


Thanks,
-- 
kou

In 
  "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Thu, 14 Apr 2022 08:26:54 
+0200,
  Sten Larsson  wrote:

> Hi kou
> 
> Thanks for trying to help me with this
> 
> 
> 1. Yes
> 
> 
> 2. Note that I skipped the -r flag to get the result.
> 
> $ grep -A 4 ' version="1.0"
>  shared-library="libarrow-glib.700.dylib"
>  c:identifier-prefixes="GArrow"
>  c:symbol-prefixes="garrow">
> 
> 
> 3. Same here
> 
> $ grep -A 4 ' version="1.0"
>  shared-library="libparquet-glib.700.dylib"
>  c:identifier-prefixes="GParquet"
>  c:symbol-prefixes="gparquet">
> 
> 
> 4. No matches
> 
> $ env | grep LIBRARY_PATH | sort
> 
> 
> 5. Nothing found
> 
> $ ls /usr/local/lib/lib*-glib.*.dylib
> zsh: no matches found: /usr/local/lib/lib*-glib.*.dylib
> 
> 
> 6. Nothing found
> 
> $ ls /usr/lib/lib*-glib.*.dylib
> zsh: no matches found: /usr/lib/lib*-glib.*.dylib
> 
> 
> DYLD_FALLBACK_LIBRARY_PATH doesn't seem to have any effect
> 
> $ DYLD_FALLBACK_LIBRARY_PATH="$(brew --prefix)/lib:/usr/local/lib:/usr/lib"
> ruby arrow-test.rb
> (null)-WARNING **: Failed to load shared library
> 'libparquet-glib.700.dylib' referenced by the typelib:
> dlopen(libparquet-glib.700.dylib, 0x0009): tried:
> 'libparquet-glib.700.dylib' (no such file),
> '/usr/local/lib/libparquet-glib.700.dylib' (no such file),
> '/usr/lib/libparquet-glib.700.dylib' (no such file),
> '/Users/stenlarsson/Documents/src/arrow-test/libparquet-glib.700.dylib' (no
> such file)
> [...]
> 
> Unfortunately it is still a mystery.
> 
> Thanks
> /Sten
> 
> 
> On Wed, 13 Apr 2022 at 23:47, Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> Could you tell the following?
>>
>> 1. Did you run the script in
>>/Users/stenlarsson/Documents/src/arrow-test/ ?
>>
>> 2. The output of
>>grep -r -A 4 '>
>> 3. The output of
>>grep -r -A 4 '>
>> 4. The output of
>>env | grep LIBRARY_PATH | sort
>>
>> 5. The output of
>>ls /usr/local/lib/lib*-glib.*.dylib
>>
>> 6. The output of
>>ls /usr/lib/lib*-glib.*.dylib
>>
>> The following command line may resolve this:
>>
>>   $ DYLD_FALLBACK_LIBRARY_PATH="$(brew
>> --prefix)/lib:/usr/local/lib:/usr/lib" \
>>   ruby arrow-test.rb
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "[Ruby] Cannot require 'parquet' on M1 Mac" on Wed, 13 Apr 2022 10:34:52
>> +0200,
>>   Sten Larsson  wrote:
>>
>> > Hi
>> >
>> > I'm struggling to get Arrow working on my M1 MacBook Pro. The test
>> program
>> > simply consists of
>> >
>> > require 'parquet'
>> >
>> > This fails with
>> >
>> > $ ruby arrow-test.rb
>> > (null)-WARNING **: Failed to load shared library
>> > 'libparquet-glib.700.dylib' referenced by the typelib:
>> > dlopen(libparquet-glib.700.dylib, 0x0009): tried:
>> > 'libparquet-glib.700.dylib' (no such file),
>> > '/usr/local/lib/libparquet-glib.700.dylib' (no such file),
>> > '/usr/lib/libparquet-glib.700.dylib' (no such file),
>> > '/Users/stenlarsson/Documents/src/arrow-test/libparquet-glib.700.dylib'
>> (no
>> > such file)
>> > from
>> >
>> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:234:in
>> > `load_object_info'
>> > from
>> >
>> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/red-parquet-7.0.0/lib/parquet/loader.rb:38:in
>> > `load_object_info'
>> > from
>> >
>> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:73:in
>> > `load_info'
>> > from

Re: [Ruby] Cannot require 'parquet' on M1 Mac

2022-04-13 Thread Sutou Kouhei
Hi,

Could you tell the following?

1. Did you run the script in
   /Users/stenlarsson/Documents/src/arrow-test/ ?

2. The output of
   grep -r -A 4 '
  "[Ruby] Cannot require 'parquet' on M1 Mac" on Wed, 13 Apr 2022 10:34:52 
+0200,
  Sten Larsson  wrote:

> Hi
> 
> I'm struggling to get Arrow working on my M1 MacBook Pro. The test program
> simply consists of
> 
> require 'parquet'
> 
> This fails with
> 
> $ ruby arrow-test.rb
> (null)-WARNING **: Failed to load shared library
> 'libparquet-glib.700.dylib' referenced by the typelib:
> dlopen(libparquet-glib.700.dylib, 0x0009): tried:
> 'libparquet-glib.700.dylib' (no such file),
> '/usr/local/lib/libparquet-glib.700.dylib' (no such file),
> '/usr/lib/libparquet-glib.700.dylib' (no such file),
> '/Users/stenlarsson/Documents/src/arrow-test/libparquet-glib.700.dylib' (no
> such file)
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:234:in
> `load_object_info'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/red-parquet-7.0.0/lib/parquet/loader.rb:38:in
> `load_object_info'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:73:in
> `load_info'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:47:in
> `block (2 levels) in load'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:34:in
> `block (2 levels) in each'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:33:in
> `times'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:33:in
> `block in each'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:32:in
> `each'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:32:in
> `each'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:46:in
> `block in load'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:622:in
> `prepare_class'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:41:in
> `load'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:25:in
> `load'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/red-parquet-7.0.0/lib/parquet/loader.rb:22:in
> `load'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/red-parquet-7.0.0/lib/parquet.rb:28:in
> `'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/red-parquet-7.0.0/lib/parquet.rb:24:in
> `'
> from
> :160:in
> `require'
> from
> :160:in
> `rescue in require'
> from
> :149:in
> `require'
> from arrow-test.rb:1:in `'
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/red-parquet-7.0.0/lib/parquet/loader.rb:40:in
> `load_object_info': uninitialized constant Parquet::ArrowFileReader
> (NameError)
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:73:in
> `load_info'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:47:in
> `block (2 levels) in load'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:34:in
> `block (2 levels) in each'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:33:in
> `times'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:33:in
> `block in each'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:32:in
> `each'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/repository.rb:32:in
> `each'
> from
> /Users/stenlarsson/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/gobject-introspection-3.5.1/lib/gobject-introspection/loader.rb:46:in
> `block in load'
> from
> 

Re: [Plasma] [Installation] Dependency conflict with latest nvidia driver on Ubuntu 20.04

2022-03-31 Thread Sutou Kouhei
Hi,

We may resolve this by adding ${package:libcuda1} like
nvidia-cuda-dev
  
https://salsa.debian.org/nvidia-team/nvidia-cuda-toolkit/-/blob/master/debian/control#L318
in our control:
  
https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/debian/control.in#L60-L63

Could you open a JIRA issue for this?
https://issues.apache.org/jira/browse/ARROW


If we can resolve this by the change, we can use nightly
packages as a workaround until the next version is released.


Thanks,
-- 
kou

In 
 

  "[Plasma] [Installation] Dependency conflict with latest nvidia driver on 
Ubuntu 20.04" on Thu, 31 Mar 2022 11:51:12 +,
  Bessman Alexander  wrote:

> Hi,
> 
> I'm trying to install libplasma-dev on Ubuntu Focal, but the installation 
> fails:
> 
> The following packages have unmet dependencies:
>  libplasma-dev : Depends: libarrow-cuda-dev (= 7.0.0-1) but it is not going 
> to be installed
>   Depends: libplasma700 (= 7.0.0-1) but it is not 
> going to be installed
> 
> The problem seems to be the following dependency tree:
> 
> libplasma700 -> libarrow-cuda700 -> libnvidia-compute-460-server -> 
> libnvidia-compute-470-server
> 
> libnvidia-compute-470-server conflicts with libnvidia-compute-510, on which 
> nvidia-driver-510 depends.
> 
> Do I need to downgrade my graphics driver in order to install libplasma, or 
> is there another way?
> 
> Thanks


Re: [c_glib] c data interface export / import

2022-01-09 Thread Sutou Kouhei
Hi,

In 
  "[c_glib] c data interface export / import" on Sun, 9 Jan 2022 13:51:27 -0800,
  James Van Alstine  wrote:

> Can you import / export from gobject library and the c data interface?

You can use the following functions:

  * garrow_record_batch_import()

https://arrow.apache.org/docs/c_glib/arrow-glib/record-batch.html#garrow-record-batch-import
  * garrow_record_batch_export()

https://arrow.apache.org/docs/c_glib/arrow-glib/record-batch.html#garrow-record-batch-export
  * garrow_record_batch_reader_import()

https://arrow.apache.org/docs/c_glib/arrow-glib/reader-classes.html#garrow-record-batch-reader-import
  * garrow_record_batch_reader_export()

https://arrow.apache.org/docs/c_glib/arrow-glib/reader-classes.html#garrow-record-batch-reader-export
  * garrow_schema_import()

https://arrow.apache.org/docs/c_glib/arrow-glib/GArrowSchema.html#garrow-schema-import
  * garrow_schema_export()

https://arrow.apache.org/docs/c_glib/arrow-glib/GArrowSchema.html#garrow-schema-export

The following functions may be needed:

  * garrow_array_import()

https://arrow.apache.org/docs/c_glib/arrow-glib/basic-array-classes.html#garrow-array-import
  * garrow_array_export()

https://arrow.apache.org/docs/c_glib/arrow-glib/basic-array-classes.html#garrow-array-export
  * garrow_data_type_export()

https://arrow.apache.org/docs/c_glib/arrow-glib/basic-data-type-classes.html#garrow-data-type-export
  * garrow_data_type_import()

https://arrow.apache.org/docs/c_glib/arrow-glib/basic-data-type-classes.html#garrow-data-type-import


Here is an use-case of them:

  * Red Arrow DuckDB

https://github.com/red-data-tools/red-arrow-duckdb/blob/master/ext/arrow-duckdb/arrow-duckdb.cpp


Thanks,
-- 
kou


Re: [Java] C Data Interface artifact

2021-11-30 Thread Sutou Kouhei
Hi,

We will be able to release the binary package for it in
7.0.0. We couldn't prepare our release process to release
the binary package for it by 6.0.0.

Thanks,
-- 
kou

In 
  "[Java] C Data Interface artifact" on Tue, 30 Nov 2021 21:10:43 -0600,
  Paul Whalen  wrote:

> Is there a released version of the java C interface bindings, or do we have
> to build it ourselves to use?  I don't see it included in any of the
> artifacts on maven central, and I can't find it referenced in documentation
> anywhere.  Based on when it was merged, I would expect it to be a part of
> 6.0.0.  I do see build instructions, which I'd assumed were primarily for
> development:
> 
> https://github.com/apache/arrow/tree/master/java/c
> 
> Apologies if I'm missing something obvious.
> 
> Thanks,
> Paul


Re: Getting Parquet File Metadata in C/GLib interface

2021-09-24 Thread Sutou Kouhei
Hi,

Thanks. I've implemented: https://github.com/apache/arrow/pull/11215

-- 
kou

In 

  "Re: Getting Parquet File Metadata in C/GLib interface" on Wed, 22 Sep 2021 
15:41:19 +,
  "McDonald, Ben"  wrote:

> The number of rows is the only one that is causing some hiccups on my end due 
> to the performance bottleneck with large files, the others would be 
> nice-to-haves, but aren’t blocking in any way.
> 
> I opened up a Jira issue here: 
> httpgs://issues.apache.org/jira/browse/ARROW-14072. Thank you.
> 
> Best,
> Ben McDonald
> 
> From: Sutou Kouhei 
> Date: Tuesday, September 21, 2021 at 5:20 PM
> To: user@arrow.apache.org 
> Subject: Re: Getting Parquet File Metadata in C/GLib interface
> Hi,
> 
> Unfortunately, Apache Arrow GLib doesn't provide an API to
> get the number of rows in Parquet without reading all row
> groups yet.
> 
> Could you open a JIRA issue that requests this feature?
>   https://issues.apache.org/jira<https://issues.apache.org/jira>
> 
> I'll implement it until the next release.
> 
> We can get the number of columns from schema got by the
> following API:
> 
> GArrowSchema *
> gparquet_arrow_file_reader_get_schema(GParquetArrowFileReader *reader,
>   GError **error);
> 
> We can get the number of row groups by the following API:
> 
> gint
> gparquet_arrow_file_reader_get_n_row_groups(GParquetArrowFileReader *reader);
> 
> 
> We can't get "created_by", "format_version" and
> "serialized_size" yet. Do you want to get all of them?
> 
> 
> Thanks,
> --
> kou
> 
> In 
> 
>   "Getting Parquet File Metadata in C/GLib interface" on Tue, 21 Sep 2021 
> 22:13:09 +,
>   "McDonald, Ben"  wrote:
> 
>> Hello,
>>
>> I am working with the C/GLib Arrow interface to read Parquet files and I am 
>> having trouble accessing all of the file metadata.
>>
>> Reading my file into Python and printing the metadata like this:
>> ```
>> pq.ParquetFile('f1.parquet').metadata
>> ```
>>
>> Results in this metadata:
>> ```
>> 
>>   created_by: parquet-cpp-arrow version 5.0.0
>>   num_columns: 3
>>   num_rows: 10
>>   num_row_groups: 1
>>   format_version: 1.0
>>   serialized_size: 420
>> ```
>>
>> But reading the same file into the C/GLib interface and printing the 
>> metadata from this call (where the schema is from the same file):
>> ```
>> garrow_schema_to_string_metadata(schema, trueGbooleanValue)
>> ```
>>
>> Results in this metadata, which is only the schema and doesn’t include any 
>> of the above metadata:
>> ```
>> first-int-col: int64
>> str-col: string
>> second-int-col: int64
>> ```
>>
>> My specific question is: is it possible to easily get the number of rows of 
>> a Parquet file in the C/GLib Arrow library? (i.e., without having to read in 
>> the whole table), but I would also be interested in getting the rest of the 
>> metadata that is shown in pyarrow. I wasn’t able to find a way to do this in 
>> the C/GLib documentation, but feel like I must be missing something. Thank 
>> you.
>>
>> Best,
>> Ben McDonald


Re: Getting Parquet File Metadata in C/GLib interface

2021-09-21 Thread Sutou Kouhei
Hi,

Unfortunately, Apache Arrow GLib doesn't provide an API to
get the number of rows in Parquet without reading all row
groups yet.

Could you open a JIRA issue that requests this feature?
  https://issues.apache.org/jira

I'll implement it until the next release.

We can get the number of columns from schema got by the
following API:

GArrowSchema *
gparquet_arrow_file_reader_get_schema(GParquetArrowFileReader *reader,
  GError **error);

We can get the number of row groups by the following API:

gint
gparquet_arrow_file_reader_get_n_row_groups(GParquetArrowFileReader *reader);


We can't get "created_by", "format_version" and
"serialized_size" yet. Do you want to get all of them?


Thanks,
-- 
kou

In 

  "Getting Parquet File Metadata in C/GLib interface" on Tue, 21 Sep 2021 
22:13:09 +,
  "McDonald, Ben"  wrote:

> Hello,
> 
> I am working with the C/GLib Arrow interface to read Parquet files and I am 
> having trouble accessing all of the file metadata.
> 
> Reading my file into Python and printing the metadata like this:
> ```
> pq.ParquetFile('f1.parquet').metadata
> ```
> 
> Results in this metadata:
> ```
> 
>   created_by: parquet-cpp-arrow version 5.0.0
>   num_columns: 3
>   num_rows: 10
>   num_row_groups: 1
>   format_version: 1.0
>   serialized_size: 420
> ```
> 
> But reading the same file into the C/GLib interface and printing the metadata 
> from this call (where the schema is from the same file):
> ```
> garrow_schema_to_string_metadata(schema, trueGbooleanValue)
> ```
> 
> Results in this metadata, which is only the schema and doesn’t include any of 
> the above metadata:
> ```
> first-int-col: int64
> str-col: string
> second-int-col: int64
> ```
> 
> My specific question is: is it possible to easily get the number of rows of a 
> Parquet file in the C/GLib Arrow library? (i.e., without having to read in 
> the whole table), but I would also be interested in getting the rest of the 
> metadata that is shown in pyarrow. I wasn’t able to find a way to do this in 
> the C/GLib documentation, but feel like I must be missing something. Thank 
> you.
> 
> Best,
> Ben McDonald


Re: [GLib] Call Plasma from GLib?

2021-04-13 Thread Sutou Kouhei
Hi,

Plasma support is provided by separated library: plasma-glib

https://arrow.apache.org/docs/c_glib/plasma-glib/api-index-full.html


const gchar *socket_name = "/tmp/plasma-store.sock";
GPlasmaClientOptions *options = NULL;
GError *error = NULL;
GPlasmaClient client = gplasma_client_new(socket_name, options, );
// Check error

GPlasmaObjectID *id = gplasma_object_id_new("ID", 2, );
// Check error
gint64 timeout_ms = 1000;
GPLasmaReferredObject *object =
  gplasma_client_refer_object(client,
  id,
  timeout_ms,
  );
// Check error

GArrowBuffer *data;
g_object_get(object,
 "data", ,
 NULL);
GArrowBufferInputStream *input = garrow_buffer_input_stream_new(data);
// Or StreamReader
GArrowRecordBatchFileReader *reader =
  garrow_record_batch_file_reader_new(input, );
// Check error

guint n_record_batches =
  garrow_record_batch_file_reader_get_n_record_batches(reader);
guint i;
for (i = 0; i < n_record_batches; i++) {
  GArrowRecordBatch *record_batch =
garrow_record_batch_file_reader_read_record_batch(reader, i, );
  // Check error
  // Process record_batch
  g_object_unref(record_batch);
}
g_object_unref(reader);
g_object_unref(input);
g_object_unref(data);
g_object_unref(object);
g_object_unref(client);



Thanks,
--
kou

In 
  "[GLib] Call Plasma from GLib?" on Tue, 13 Apr 2021 05:48:18 +,
  "Xander Dunn"  wrote:

> I've been using Arrow's GLib library to write an Arrow library for the Swift 
> programming language. In Python I am using pyarrow.plasma to store 
> RecordBatch buffers, and I would like to retrieve those in Swift. However, I 
> just noticed that there is no mention of plasma in the GLib interface: 
> https://arrow.apache.org/docs/c_glib/arrow-glib/api-index-full.html. Is 
> Plasma not a part of GLib by design or is it planned to add it? In the 
> meantime, it looks like my option is to call the arrow C++ library from my 
> Swift layer? I see instructions here for using Plasma in C++: 
> https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md
> 
> And an unrelated, less important question:
> 
> Should I be able to call Plasma from the C++ library in Cython rather than 
> using the pyarrow Plasma interface in Cython? I think I will just need to 
> `cdef extern` declare all of the C++ interfaces I need to call.
> 
> Thanks,
> 
> Xander


Re: Installing Apache Arrow Amazon Linux 2

2021-04-12 Thread Sutou Kouhei
Hi,

This will be fixed tomorrow.
We're using Bintray to distribute our Amazon Linux 2
packages but Bintray will shut down soon:

https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/

Today (2021-04-12) is a known Bintray down day:

> April 12th, 26th, 2021
>
> We will have some short service brown-outs to remind users
> about the services that are going away on May
> 1st. (Specific hours will be advertised in the Bintray
> status page.)

We'll use Artifactory instead of Bintray since 4.0.0 but
it's not ready yet:

  
https://lists.apache.org/thread.html/r9200fed3fa812f8c7de07a2500425f258db3231baa8e05f288175e4a%40%3Cbuilds.apache.org%3E


I have a mirror for our packages on Bintray: packages.groonga.org

You can use the mirror temporary by installing
groonga-release-latest instead of
apache-arrow-release-latest in
https://arrow.apache.org/install/ :

  sudo yum install -y 
https://packages.groonga.org/centos/7/groonga-release-latest.noarch.rpm


Note: It's recommended that you use the official package
repository provided by Apache Arrow. The mirror is a
temporary workaround.


Thanks,
--
kou

In <2d413a84-1d8b-4c0e-a138-39e454f82...@forwardpmx.com>
  "Installing Apache Arrow Amazon Linux 2" on Mon, 12 Apr 2021 20:08:29 +,
  David Lahn  wrote:

> Hi,
> 
> The instructions here seem to be failing now:
> https://arrow.apache.org/install/
> 
> Specifically, installing Arrow from Bintray
> 
> bash-4.2# yum install -y 
> https://apache.bintray.com/arrow/centos/7/apache-arrow-release-latest.rpm
> Loaded plugins: ovl, priorities
> Cannot open: 
> https://apache.bintray.com/arrow/centos/7/apache-arrow-release-latest.rpm. 
> Skipping.
> Error: Nothing to do
> 
> Has this moved somewhere else?
> 
> Best regards,
> Dave
> 
> David Lahn
> DevOps Lead
> Development
>
> ForwardPMX 
> Privacy Policy
> 
>  
>   
> 
> This e-mail is confidential to ForwardPMX intended for use by the recipient. 
> If you received this in error or are not the intended recipient, you are 
> hereby notified that any review, retransmission, copying or other use of, or 
> taking of any action in reliance upon this information is strictly prohibited.
> 


Re: Arrow Flight in C

2021-04-08 Thread Sutou Kouhei
Hi,

I hope that we implement Apache Arrow Flight bindings to
Apache Arrow C GLib by 5.0.0. Are you interested in
development them together?

If you aren't interested in it, you can implement glue
module in C++ by yourself. C interface in Apache Arrow C++
will help you: https://arrow.apache.org/docs/cpp/api/c_abi.html


Thanks,
--
kou

In <97908a2bf6f64395a25b58950ea9e...@tudelft.nl>
  "Arrow Flight in C" on Fri, 9 Apr 2021 02:29:37 +,
  Tanveer Ahmad - EWI  wrote:

> Hi,
> 
> 
> I need some help in sending Arrow RecordBatches over Arrow Flight inside a C 
> application. As there is no interface for Arrow Flight is available for Arrow 
> CGlib. Does someone have some custom C interface/suggestions to use C++ 
> functions for Arrow Flight inside a C application. Thanks.
> 
> 
> 
> Regards,
> 
> Tanveer Ahmad
> 


Re: LLVM

2021-03-30 Thread Sutou Kouhei
Hi,

In <205432f7-fda5-ce5f-beba-bfe3424af...@airmettle.com>
  "LLVM" on Wed, 31 Mar 2021 12:21:17 +1100,
  Matt Youill  wrote:

> Is it possible to override the version of LLVM that arrow uses during
> a build? Seems to always pick the latest version it finds.

You want to specify LLVM version used by Gandiva, right?

If it's right, you can't do it for now. We have LLVM version
list as ARROW_LLVM_VERSIONS in cpp/CMakeLists.txt. We can
change detection order by changing the list but we don't
provide build option to do it.


We can improve the current behavior by adding a new build
option such as ARROW_LLVM_VERSION into
cpp/cmake_modules/DefineOptions.cmake and using it as the
desired LLVM version.


Thanks,
--
kou



Re: [python] Unable to build a python 3.8 wheel

2021-03-14 Thread Sutou Kouhei
Hi,

Could you try

  PYTHON=3.8 docker-compose run python-wheel-manylinux-2014

instead of "docker-compose run -e ..."?


Thanks,
--
kou

In 
  "Re: [python] Unable to build a python 3.8 wheel" on Sun, 14 Mar 2021 
21:24:02 +,
  Alina Valea  wrote:

> I did change the python version to 3.8 in
> https://github.com/apache/arrow/blob/ad4504e8e85eb8e5babe0f01ca8cf9947499fc40/ci/docker/python-wheel-manylinux-201x.dockerfile#L94-L97
> but the wheel was still built with python 3.6.
> 
> On Sun, Mar 14, 2021, 5:50 PM Ian Cook  wrote:
> 
>> If the image was built with python-wheel-manylinux-201x.dockerfile, these
>> lines might explain the behavior:
>>
>> https://github.com/apache/arrow/blob/ad4504e8e85eb8e5babe0f01ca8cf9947499fc40/ci/docker/python-wheel-manylinux-201x.dockerfile#L94-L97
>>
>> Ian
>>
>> On Sun, Mar 14, 2021 at 10:35 AM Alina Valea  wrote:
>>
>>> Hello,
>>>
>>> I am trying to build a stripped down python 3.8 wheel from tag apache-
>>> arrow-3 .0.0. I am
>>> building it using
>>>
>>> docker-compose run -e PYTHON=3.8 -e PYTHON_VERSION=3.8
>>> python-wheel-manylinux-2014
>>> however, it is being built using python 3.6.
>>>
>>> The final wheel gets tagged with
>>> pyarrow-3.0.0-cp36-cp36m-manylinux2014_x86_64.whl and looking through the
>>> build log I can see that cmake decides to use python 3.6 even though python
>>> 3.8 is available in the manylinux image.
>>>
>>> cmake -DARROW_BROTLI_USE_SHARED=OFF -DARROW_BUILD_SHARED=ON 
>>> -DARROW_BUILD_STATIC=OFF -DARROW_BUILD_TESTS=OFF -DARROW_DATASET=ON 
>>> -DARROW_DEPENDENCY_SOURCE=SYSTEM -DARROW_DEPENDENCY_USE_SHARED=OFF 
>>> -DARROW_FLIGHT==OFF -DARROW_GANDIVA=OFF -DARROW_HDFS=OFF 
>>> -DARROW_JEMALLOC=ON -DARROW_MIMALLOC=ON -DARROW_ORC=OFF 
>>> -DARROW_PACKAGE_KIND=manylinux2014 -DARROW_PARQUET=ON -DARROW_PLASMA=OFF 
>>> -DARROW_PYTHON=ON -DARROW_RPATH_ORIGIN=ON -DARROW_S3=OFF 
>>> -DARROW_TENSORFLOW=OFF -DARROW_USE_CCACHE=ON 
>>> -DARROW_UTF8PROC_USE_SHARED=OFF -DARROW_WITH_BROTLI=OFF 
>>> -DARROW_WITH_BZ2=OFF -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON 
>>> -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DCMAKE_BUILD_TYPE=release 
>>> -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_INSTALL_PREFIX=/tmp/arrow-dist 
>>> -DCMAKE_TOOLCHAIN_FILE=/opt/vcpkg/scripts/buildsystems/vcpkg.cmake 
>>> -DCMAKE_UNITY_BUILD=OFF -DOPENSSL_USE_STATIC_LIBS=ON 
>>> -DThrift_ROOT=/opt/vcpkg/installed/x64-linux/lib 
>>> -DVCPKG_TARGET_TRIPLET=x64-linux-static-release -G Ninja /arrow/cpp
>>>
>>> -- Building using CMake version: 3.19.2
>>>
>>> -- The C compiler identification is GNU 9.3.1
>>>
>>> -- The CXX compiler identification is GNU 9.3.1
>>>
>>> -- Detecting C compiler ABI info
>>>
>>> -- Detecting C compiler ABI info - done
>>>
>>> -- Check for working C compiler: /opt/rh/devtoolset-9/root/usr/bin/cc - 
>>> skipped
>>>
>>> -- Detecting C compile features
>>>
>>> -- Detecting C compile features - done
>>>
>>> -- Detecting CXX compiler ABI info
>>>
>>> -- Detecting CXX compiler ABI info - done
>>>
>>> -- Check for working CXX compiler: /opt/rh/devtoolset-9/root/usr/bin/c++ - 
>>> skipped
>>>
>>> -- Detecting CXX compile features
>>>
>>> -- Detecting CXX compile features - done
>>>
>>> -- Arrow version: 3.0.0 (full: '3.0.0')
>>>
>>> -- Arrow SO version: 300 (full: 300.0.0)
>>>
>>> -- clang-tidy not found
>>>
>>> -- clang-format not found
>>>
>>> -- Could NOT find ClangTools (missing: CLANG_FORMAT_BIN CLANG_TIDY_BIN)
>>>
>>> -- infer not found
>>>
>>> -- Found Python3: /opt/python/cp36-cp36m/bin/python3.6 (found version 
>>> "3.6.12") found components: Interpreter
>>>
>>> Am I missing some config to make it build a python 3.8 wheel?
>>>
>>> Thanks.
>>>
>>>
>>>
>>>


Re: c/c++ interop

2021-02-16 Thread Sutou Kouhei
Hi,

Could you provide a complete project that reproduces this
case? (For example, you create a GitHub repository,
prepare the repository to reproduces this case and share it
to us.)

I couldn't reproduce with the following:

test_stream.h:

#include 

class test_stream: public ::arrow::RecordBatchReader {
public:
  std::shared_ptr<::arrow::Schema> schema() const override {
return ::arrow::schema({::arrow::field("a", arrow::int32())});
  }

  arrow::Status ReadNext(std::shared_ptr<::arrow::RecordBatch> *batch)
override {
return arrow::Status::OK();
  }
};

extern "C" GArrowRecordBatchReader* make_test_stream(){
  std::shared_ptr<::arrow::RecordBatchReader> ts =
std::make_shared();
  return garrow_record_batch_reader_new_raw();
}

test.cpp:

#include "test_stream.h"

int
main(void)
{
  GArrowRecordBatchReader *ts = make_test_stream();
  GArrowSchema *schema = garrow_record_batch_reader_get_schema(ts);
  g_object_unref(schema);
  g_object_unref(ts);
  return 0;
}

Command lines:

$ g++ test.cpp $(pkg-config --cflags --libs arrow-glib)
$ ./a.out


Thanks,
--
kou

In 
  "c/c++ interop" on Wed, 17 Feb 2021 12:09:37 +1100,
  Matt Youill  wrote:

> Hi,
> 
> I'm not super familiar with glib and currently have an issue with the
> arrows cglib lib.
> 
> I have some C++ code that creates a subclass of RecordBatchReader
> (test_stream), and returns it to some C code as a
> GArrowRecordBatchReader *.
> 
> If I run the same code in C++ it runs fine, but in C I always get a
> segmentation fault when performing an operation on the returned
> GArrowRecordBatchReader *.
> 
> test_stream doesn't really do anything other than return a dummy
> schema (it's just a test).
> 
> 
> Sample code:
> 
> *test_stream.h*
> 
> 
> class test_stream: public ::arrow::RecordBatchReader {
> 
>  public:
>   std::shared_ptr<::arrow::Schema> schema() const override {
>     return ::arrow::schema({::arrow::field("a", arrow::int32())});
>   }
> 
>   arrow::Status ReadNext(std::shared_ptr<::arrow::RecordBatch> *batch)
> override {
>     return arrow::Status::OK();
>   }
> 
> };
> 
> extern "C" GArrowRecordBatchReader* make_test_stream(){
>   std::shared_ptr<::arrow::RecordBatchReader> ts =
> std::make_shared();
>   return garrow_record_batch_reader_new_raw();
> }
> 
> 
> *test.c (or test.cpp)*
> 
> GArrowRecordBatchReader *ts = make_test_stream();
> GArrowSchema *schema = garrow_record_batch_reader_get_schema(ts);
> < segfault here
> 
> 
> Underlying break happens in arrows reader.h on this piece of code...
> 
> #define GARROW_TYPE_RECORD_BATCH_READER
> #(garrow_record_batch_reader_get_type())
> G_DECLARE_DERIVABLE_TYPE(GArrowRecordBatchReader,
>  garrow_record_batch_reader,
>  GARROW,
>  RECORD_BATCH_READER,
>  GObject)
> 
> It smells like its something to do with my sub classing of
> RecordBatchReader but I don't know enough about glib.
> 
> Thanks, Matt
> 
> 


Re: glib build error

2021-02-04 Thread Sutou Kouhei
Hi,

Could you try "LD_LIBRARY_PATH=/lib
make" instead of "make"?


Thanks,
--
kou

In <1114d126-c61d-c298-4293-91597c35f...@airmettle.com>
  "glib build error" on Thu, 4 Feb 2021 18:56:46 +1100,
  Matt Youill  wrote:

> Hi,
> 
> I'm attempting to build arrow glib (v3) and running into a link issue
> (on Linux).
> 
> The commands I'm issuing (from the c_glib sub dir of the arrow repo)
> are:
> 
> ./autogen.sh
> ./configure PKG_CONFIG_PATH= dir>/lib/pkgconfig:${PKG_CONFIG_PATH} --prefix= dir>
> make
> 
> All seems happy except for a link error:
> 
> /usr/bin/ld: warning: libarrow.so.300, needed by
> ./../arrow-glib/.libs/libarrow-glib.so, not found (try using -rpath or
> -rpath-link)
> 
> 
> The complication is perhaps that I haven't installed the arrow (c++)
> libs system wide. They're just in an ad-hoc build dir.
> 
> Thanks, Matt
> 
> 


Re: [C-Glib] - writing an extension array

2020-11-26 Thread Sutou Kouhei
Hi,

Thanks.

We need to implement garrow_field_get_meatadata(),
garrow_field_with_metadata() and
garrow_field_with_merged_metadata(). I'll do it before
Apache Arrow 3.0.0.

So you can't read extension type for now. Sorry.


Thanks,
--
kou

In 
 

  "Re: [C-Glib] - writing an extension array" on Fri, 27 Nov 2020 05:15:56 
+,
  Ishan Anand  wrote:

> Hi Kou
> 
> Sure. Here are the code snippets as a github gist - 
> https://gist.github.com/ananis25/0b645ef94a70a0834fd23177e8721be9
> 
> Thank you for looking.
> 
> 
> ____
> From: Sutou Kouhei 
> Sent: Friday, November 27, 2020 8:41 AM
> To: user@arrow.apache.org 
> Subject: Re: [C-Glib] - writing an extension array
> 
> Hi,
> 
> Could you provide the Python script you used and the C
> program you used?
> 
> 
> Thanks,
> --
> kou
> 
> In
>  
> 
>   "[C-Glib] - writing an extension array" on Thu, 26 Nov 2020 18:11:53 +,
>   Ishan Anand  wrote:
> 
>> Hi
>>
>> How do you go about implementing an extension type through the C API for 
>> Arrow?
>>
>> Creating a record batch in python like the example in pyarrow tests 
>> [here](https://github.com/apache/arrow/blob/2a5f92455ec4f9788ee96fa209b38d76bd927196/python/pyarrow/tests/test_extension_type.py#L375),
>>  and reading the resulting schema using the C API, it correctly reads it as 
>> an array of the underlying storage type. The schema along with the metadata 
>> can be printed as expected.
>> ```
>> ext: int64
>> -- metadata --
>> ARROW:extension:metadata: freq=D
>> ARROW:extension:name: test.period
>> ```
>>
>> However, trying to access the metadata for the schema (obtained with 
>> `garrow_schema_get_metadata`) indicates its size to be 0, which indicates 
>> that metadata for the schema isn't the same as that for a field. Is it 
>> possible using the existing API to read/write the metadata for a field?
>>
>>
>> Thank you,
>> Ishan


Re: [C-Glib] - writing an extension array

2020-11-26 Thread Sutou Kouhei
Hi,

Could you provide the Python script you used and the C
program you used?


Thanks,
--
kou

In 
 

  "[C-Glib] - writing an extension array" on Thu, 26 Nov 2020 18:11:53 +,
  Ishan Anand  wrote:

> Hi
> 
> How do you go about implementing an extension type through the C API for 
> Arrow?
> 
> Creating a record batch in python like the example in pyarrow tests 
> [here](https://github.com/apache/arrow/blob/2a5f92455ec4f9788ee96fa209b38d76bd927196/python/pyarrow/tests/test_extension_type.py#L375),
>  and reading the resulting schema using the C API, it correctly reads it as 
> an array of the underlying storage type. The schema along with the metadata 
> can be printed as expected.
> ```
> ext: int64
> -- metadata --
> ARROW:extension:metadata: freq=D
> ARROW:extension:name: test.period
> ```
> 
> However, trying to access the metadata for the schema (obtained with 
> `garrow_schema_get_metadata`) indicates its size to be 0, which indicates 
> that metadata for the schema isn't the same as that for a field. Is it 
> possible using the existing API to read/write the metadata for a field?
> 
> 
> Thank you,
> Ishan


Re: [Python/C-Glib] writing IPC file format column-by-column

2020-09-10 Thread Sutou Kouhei
Hi,

I add dev@ because this may need to improve Apache Arrow C++.

It seems that we need the following new feature for this use
case (combining chunks with small memory to process large
data with pandas, mmap and small memory):

  * Writing chunks in arrow::Table as one large
arrow::RecordTable without creating intermediate
combined chunks

The current arrow::ipc::RecordBatchWriter::WriteTable()
always splits the given arrow::Table to one or more
arrow::RecordBatch. We may be able to add the feature that
writes the given arrow::Table as one combined
arrow::RecordBatch without creating intermediate combined
chunks.


Do C++ developers have any opinion on this?


Thanks,
--
kou

In 
 

  "[Python/C-Glib] writing IPC file format column-by-column " on Wed, 9 Sep 
2020 10:11:54 +,
  Ishan Anand  wrote:

> Hi
> 
> I'm looking at using Arrow primarily on low-resource instances with out of 
> memory datasets. This is the workflow I'm trying to implement.
> 
> 
>   *   Write record batches in IPC streaming format to a file from a C runtime.
>   *   Consume it one row at a time from python/C by loading the file in 
> chunks.
>   *   If the schema is simple enough to support zero copy operations, make 
> the table readable from pandas. This needs me to,
>  *   convert it into a Table with a single chunk per column (since pandas 
> can't use mmap with chunked arrays).
>  *   write the table in IPC random access format to a file.
> 
> PyArrow provides a method `combine_chunks` to combine chunks into a single 
> chunk. However, it needs to create the entire table in memory (I suspect it 
> is 2x, since it loads both versions of the table in memory but that can be 
> avoided).
> 
> Since the Arrow layout is columnar, I'm curious if it is possible to write 
> the table one column at a time. And if the existing glib/python APIs support 
> it? The C++ file writer objects seem to go down to serializing a single 
> record batch at a time and not per column.
> 
> 
> Thank you,
> Ishan


Re: [C-GLib] reading values quickly from a list array

2020-09-07 Thread Sutou Kouhei
Hi,

I've merged it.

Note that you need to install Apache Arrow C++ (master) before you
install Apache Arrow GLib (master). Apache Arrow GLib
depends on Apache Arrow C++.

Thanks,
--
kou


In 
 

  "Re: [C-GLib] reading values quickly from a list array " on Mon, 7 Sep 2020 
04:54:24 +,
  Ishan Anand  wrote:

> Thank you very much for the commit Kouhei-san. I'd love to use it sooner so 
> I'll use the source code directly to build Arrow-glib once this PR is in.
> 
> 
> Thank you,
> Ishan
> ________
> From: Sutou Kouhei 
> Sent: Monday, September 7, 2020 6:44 AM
> To: user@arrow.apache.org 
> Subject: Re: [C-GLib] reading values quickly from a list array
> 
> Hi,
> 
> garrow_list_array_get_value() is a bit high cost function
> because it creates a sub list array. It doesn't copy array
> data (it shares array data) but it creates a new sub array
> (container for data) in C++ level and C level.
> 
> Apache Arrow GLib 1.0.1 doesn't have low level APIs to access
> list array values. Sorry. I've implemented them:
> https://github.com/apache/arrow/pull/8119
> 
> It'll be included in Apache Arrow GLib 2.0.0 that will be
> released in a few months.
> 
> (Can you wait 2.0.0?)
> 
> With these APIs, you can write like the following:
> 
> 
> #include 
> #include 
> 
> int
> main(void)
> {
>   GError *error = NULL;
> 
>   GArrowMemoryMappedInputStream *input;
>   input = garrow_memory_mapped_input_stream_new("/tmp/batch.arrow", );
>   if (!input) {
> g_print("failed to open file: %s\n", error->message);
> g_error_free(error);
> return EXIT_FAILURE;
>   }
> 
>   {
> GArrowRecordBatchFileReader *reader;
> reader =
>   garrow_record_batch_file_reader_new(GARROW_SEEKABLE_INPUT_STREAM(input),
>   );
> 
> if (!reader) {
>   g_print("failed to open file reader: %s\n", error->message);
>   g_error_free(error);
>   g_object_unref(input);
>   return EXIT_FAILURE;
> }
> 
> {
>   guint i;
>   guint num_batches = 100;
>   for (i = 0; i < num_batches; i++) {
> GArrowRecordBatch *record_batch;
> record_batch = 
> garrow_record_batch_file_reader_read_record_batch(reader, i, );
> 
> GArrowArray* column = 
> garrow_record_batch_get_column_data(record_batch, 1);
> guint length_list = garrow_array_get_length(column);
> 
> GArrowListArray* list_arr = (GArrowListArray*)column;
> 
> GArrowInt64Array *list_values =
>   GARROW_INT64_ARRAY(garrow_list_array_get_values(list_arr));
> gint64 n_list_values;
> const gint64 *raw_list_values =
>   garrow_int64_array_get_values(list_values, _list_values);
> gint64 n_value_offsets;
> const gint32 *value_offsets =
>   garrow_list_array_get_value_offsets(list_arr, _value_offsets);
> guint j;
> for (j = 0; j < n_value_offsets; ++j) {
>   gint32 value_offset = value_offsets[j];
>   gint32 value_length = value_offsets[j + 1] - value_offset;
>   gint32 k;
>   for (k = 0; k < value_length; ++k) {
> raw_list_values[value_offset + k];
>   }
> }
> g_object_unref(list_values);
> 
> g_object_unref(column);
> 
> g_object_unref(record_batch);
>   }
> }
> g_object_unref(reader);
>   }
> 
>   g_object_unref(input);
> 
>   return EXIT_SUCCESS;
> }
> 
> 
> It takes 0.5sec on my machine.
> 
> 
> Thanks,
> --
> kou
> 
> In
>  
> 
>   "[C-GLib] reading values quickly from a list array " on Sun, 6 Sep 2020 
> 07:40:06 +,
>   Ishan Anand  wrote:
> 
>> Hi
>>
>> I am trying to use the Arrow Glib API to read/write from C. Specifically, 
>> while Arrow is a columnar format, I'm really excited to be able to write a 
>> lot of rows from a C like runtime and access it from python for analytics as 
>> an array per column. And vice versa.
>>
>>  To get a quick example running, I created an Arrow table in python with 100 
>> million entries as follows:
>> ```py
>> import pyarrow as pa
>>
>> foo = {
>> "colA": np.arange(0, 1000_000),
>> "colB": [np.arange(1, 5)] * 1000_000
>> }
>>
>> table = pa.table(foo)
>> with pa.RecordBatchFileWriter("/tmp/batch.arrow", table.schema) as writer:
>> for _ in range(100):
>> writer.write_table(table)
>> ```
>>
>> However, using the G

Re: Red-arrow Gem Hardcoding Paths in Compiled Library

2020-05-28 Thread Sutou Kouhei
Hi,

This is a feature request for Ext++.
Could you open an issue on
https://github.com/red-data-tools/extpp/issues ?

Thanks,
--
kou

In <3accd3fd-19ad-429a-a7c8-2373cc23e...@forwardpmx.com>
  "Red-arrow Gem Hardcoding Paths in Compiled Library" on Thu, 28 May 2020 
18:16:32 +,
  David Lahn  wrote:

> Hello,
> 
> We are noticing that when the red-arrow gem in Ruby is bundled, the resulting 
> arrow.so file has explicit paths for extpp, and thus, if the location changes 
> (after a deployment), those libraries can no longer be found:
> 
> Example:
> 
> bash-4.2# ldd bundle/ruby/2.7.0/gems/red-arrow-0.17.0/lib/arrow.so
> linux-vdso.so.1 (0x7ffd703a4000)
> libruby.so.2.7 => /var/lang/lib/libruby.so.2.7 
> (0x7f90ce6fe000)
> libarrow.so.17 => not found
> libgobject-2.0.so.0 => /lib64/libgobject-2.0.so.0 
> (0x7f90ce4ab000)
> libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 
> (0x7f90ce195000)
> libarrow-glib.so.17 => not found
> 
> /codebuild/output/src687471828/src/gsb/vendor/bundle/ruby/2.7.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so
>  => not found
> 
> /codebuild/output/src687471828/src/gsb/vendor/bundle/ruby/2.7.0/gems/extpp-0.0.8/lib/libruby-extpp.so
>  => not found
> libstdc++.so.6 => /lib64/libstdc++.so.6 (0x7f90cde13000)
> libm.so.6 => /lib64/libm.so.6 (0x7f90cdad3000)
> libc.so.6 => /lib64/libc.so.6 (0x7f90cd728000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f90cd512000)
> libz.so.1 => /lib64/libz.so.1 (0x7f90cd2fd000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f90cd0df000)
> librt.so.1 => /lib64/librt.so.1 (0x7f90cced7000)
> libdl.so.2 => /lib64/libdl.so.2 (0x7f90cccd3000)
> libcrypt.so.1 => /lib64/libcrypt.so.1 (0x7f90cca9c000)
> /lib64/ld-linux-x86-64.so.2 (0x7f90ceec4000)
> libpcre.so.1 => /lib64/libpcre.so.1 (0x7f90cc838000)
> libffi.so.6 => /lib64/libffi.so.6 (0x7f90cc63)
> 
> Note that these have the full path.
> 
> Any ideas of how to avoid this? We are trying to get red-arrow working on AWS 
> Lambda. When we build, the directory has to change from where it currently 
> is. The codebuild directory being specified is something out of our control. 
> When the code is deployed, it ends up in /var/task. This is where vendor 
> needs to be.
> 
> Dave
> 
> David Lahn
> DevOps Lead
> Development
>
> ForwardPMX 
> Privacy Policy
> 
>  
>   
> 
> This e-mail is confidential to ForwardPMX intended for use by the recipient. 
> If you received this in error or are not the intended recipient, you are 
> hereby notified that any review, retransmission, copying or other use of, or 
> taking of any action in reliance upon this information is strictly prohibited.
> 


Re: Builder.resize in c

2020-05-17 Thread Sutou Kouhei
Hi,

I only re-implemented server implementation with recommended
API:
  https://gitlab.com/ktou/apache-arrow-glib-socket

I'll do it for client implementation later.

> Also, lets say I want to read the gconstpointer from another process in a
> different language, all I would have to do would be storw do a byte to
> buffer transformation right?

Right.


Thanks,
--
kou

In 
  "Re: Builder.resize in c" on Wed, 6 May 2020 23:58:53 +0200,
  swizz one  wrote:

> This is the example i did, the idea would be to have one thread creating
> data and filling up the array, when full another thread sends it via
> websocket.
> The example just creats the buffer and sends it.
> 
> Also, lets say I want to read the gconstpointer from another process in a
> different language, all I would have to do would be storw do a byte to
> buffer transformation right?
> 
> Your's Faithfully,
> Stevedan Ogochukwu Omodolor
> 
> On Wed, May 6, 2020, 23:35 Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> > To prevent from allocating memory every time I have to append value which
>> > in real time process it slow down because the computer jas to find memory
>> > location everytime.
>>
>> Could you show your code? It seems that resize() isn't a
>> good use case for it.
>>
>> > Another question, the gconstpointer is the buffer of bytes righy?
>>
>> Right. gconstpointer is a typedef of "const void *":
>>
>>
>> https://developer.gnome.org/glib/stable/glib-Basic-Types.html#gconstpointer
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: Builder.resize in c" on Wed, 6 May 2020 23:11:32 +0200,
>>   swizz one  wrote:
>>
>> > To prevent from allocating memory every time I have to append value which
>> > in real time process it slow down because the computer jas to find memory
>> > location everytime.
>> > Another question, the gconstpointer is the buffer of bytes righy?
>> > Thank you very much.
>> >
>> > On Wed, May 6, 2020, 23:00 Sutou Kouhei  wrote:
>> >
>> >> Hi,
>> >>
>> >> In 
>> >>   "Builder.resize in c" on Wed, 6 May 2020 08:53:50 +0200,
>> >>   swizz one  wrote:
>> >>
>> >> > Is there an api similar to builder.resize in c to fix builder length?
>> >>
>> >> "build.resize" mentions arrow::ArrayBuilder::Resize() in
>> >> C++, right? Could you stop abbreviating words as much as
>> >> possible to avoid misunderstandings?
>> >>
>> >> We don't have a C binding of arrow::ArrayBuilder::Resize()
>> >> yet. I'll add it later but could you show your use case? Why
>> >> do you need to fix builder length?
>> >>
>> >>
>> >> Thanks,
>> >> --
>> >> kou
>> >>
>>


Re: Builder.resize in c

2020-05-06 Thread Sutou Kouhei
Hi,

> To prevent from allocating memory every time I have to append value which
> in real time process it slow down because the computer jas to find memory
> location everytime.

Could you show your code? It seems that resize() isn't a
good use case for it.

> Another question, the gconstpointer is the buffer of bytes righy?

Right. gconstpointer is a typedef of "const void *":

  https://developer.gnome.org/glib/stable/glib-Basic-Types.html#gconstpointer

Thanks,
--
kou

In 
  "Re: Builder.resize in c" on Wed, 6 May 2020 23:11:32 +0200,
  swizz one  wrote:

> To prevent from allocating memory every time I have to append value which
> in real time process it slow down because the computer jas to find memory
> location everytime.
> Another question, the gconstpointer is the buffer of bytes righy?
> Thank you very much.
> 
> On Wed, May 6, 2020, 23:00 Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> In 
>>   "Builder.resize in c" on Wed, 6 May 2020 08:53:50 +0200,
>>   swizz one  wrote:
>>
>> > Is there an api similar to builder.resize in c to fix builder length?
>>
>> "build.resize" mentions arrow::ArrayBuilder::Resize() in
>> C++, right? Could you stop abbreviating words as much as
>> possible to avoid misunderstandings?
>>
>> We don't have a C binding of arrow::ArrayBuilder::Resize()
>> yet. I'll add it later but could you show your use case? Why
>> do you need to fix builder length?
>>
>>
>> Thanks,
>> --
>> kou
>>


Re: Builder.resize in c

2020-05-06 Thread Sutou Kouhei
Hi,

In 
  "Builder.resize in c" on Wed, 6 May 2020 08:53:50 +0200,
  swizz one  wrote:

> Is there an api similar to builder.resize in c to fix builder length?

"build.resize" mentions arrow::ArrayBuilder::Resize() in
C++, right? Could you stop abbreviating words as much as
possible to avoid misunderstandings?

We don't have a C binding of arrow::ArrayBuilder::Resize()
yet. I'll add it later but could you show your use case? Why
do you need to fix builder length?


Thanks,
--
kou


Re: PyArrow building from source issue

2020-05-02 Thread Sutou Kouhei
Hi,

Is the another third party library built with the Apache
Arrow binary you built? If it's true, it works well.

If the another third party library is built with another
Apache Arrow binary, it may not work. If another Apache
Arrow binary and your Apache Arrow binary use different
build options, it doesn't work. If another Apache Arrow
binary and your Apache Arrow binary use different revision,
it may not work.


Thanks,
--
kou

In 
  "Re: PyArrow building from source issue" on Sat, 2 May 2020 20:42:08 -0400,
  Vibhatha Abeykoon  wrote:

> I understand. My main question is the following, building arrow cpp and
> pyarrow from the same source would allow seamless integration with another
> third party library which uses the arrow cpp and cython API?
> 
> On Sat, May 2, 2020 at 8:18 PM Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> > -DCMAKE_INSTALL_LIBDIR=lib \
>> >
>> > this path is relative to ARROW_HOME, correct me if I am wrong?
>>
>> It's relative to CMAKE_INSTALL_PREFIX not ARROW_HOME.
>>
>> > So with this source build, it doesn't necessarily copy files to /usr/lib
>> ?
>>
>> You don't need to copy.
>>
>>
>> BTW, do you still want to use custom CMAKE_INSTALL_LIBDIR?
>> If you want to do, you can't use the ARROW_HOME environment
>> variable.
>>
>> Again, we don't recommend that you use custom
>> CMAKE_INSTALL_LIBDIR if you aren't a CMake expert.
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: PyArrow building from source issue" on Sat, 2 May 2020 19:01:14
>> -0400,
>>   Vibhatha Abeykoon  wrote:
>>
>> > One thing I noticed,
>> >
>> > -DCMAKE_INSTALL_LIBDIR=lib \
>> >
>> > this path is relative to ARROW_HOME, correct me if I am wrong?
>> >
>> > So with this source build, it doesn't necessarily copy files to /usr/lib
>> ?
>> >
>> > With Regards,
>> > Vibhatha Abeykoon
>> >
>> >
>> > On Sat, May 2, 2020 at 6:12 PM Vibhatha Abeykoon 
>> wrote:
>> >
>> >> So far this is the updated steps,
>> >>
>> >> pwd => /home/vibhatha/sandbox/arrow/repos
>> >>
>> >> export
>> PYTHON_EXEC=/home/vibhatha/sandbox/arrow/repos/ENVARROW/bin/python3
>> >> export LIB_DIR=$(pwd)/libs
>> >>
>> >> mkdir $LIB_DIR
>> >>
>> >> ## ENV CONFIGS
>> >>
>> >> export ARROW_HOME=export ARROW_HOME=$(pwd)/dist
>> >> export LD_LIBRARY_PATH=$(pwd)/dist/lib:$LD_LIBRARY_PATH
>> >>
>> >> ## MAKE BUILD DIRS
>> >>
>> >> mkdir arrow/cpp/build
>> >> pushd arrow/cpp/build
>> >>
>> >> cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>> >>   -DCMAKE_INSTALL_LIBDIR=$LIB_DIR \
>> >>   -DARROW_WITH_BZ2=OFF \
>> >>   -DARROW_WITH_ZLIB=OFF \
>> >>   -DARROW_WITH_ZSTD=OFF \
>> >>   -DARROW_WITH_LZ4=OFF \
>> >>   -DARROW_WITH_SNAPPY=OFF \
>> >>   -DARROW_WITH_BROTLI=OFF \
>> >>   -DARROW_PARQUET=OFF \
>> >>   -DARROW_PYTHON=ON \
>> >>   -DARROW_BUILD_TESTS=ON \
>> >>   -DPYTHON_EXECUTABLE=$PYTHON_EXEC \
>> >>   ..
>> >>
>> >> make -j4
>> >> make install
>> >>
>> >> popd
>> >>
>> >> export LD_LIBRARY_PATH=$LIB_DIR:$LD_LIBRARY_PATH
>> >>
>> >> pushd arrow/python
>> >> PYARROW_CMAKE_OPTIONS="-DCMAKE_MODULE_PATH="${LIB_DIR}"/cmake/arrow"
>> >> python3 setup.py install
>> >>
>> >>
>> >> *Getting an error *
>> >>
>> >> CMake Error at
>> >>
>> /usr/local/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146
>> >> (message):
>> >>   Could NOT find Arrow (missing: ARROW_LIB_DIR) (found version "0.16.0")
>> >> Call Stack (most recent call first):
>> >>
>> >>
>> /usr/local/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:393
>> >> (_FPHSA_FAILURE_MESSAGE)
>> >>
>>  /home/vibhatha/sandbox/arrow/repos/libs/cmake/arrow/FindArrow.cmake:420
>> >> (find_package_handle_standard_args)
>> >>
>> >>
>> /home/vibhatha/sandbox/arrow/repos/libs/cmake/arrow/FindArrowPython.cmake:46
>> >> (find_package)
>> >>   CMakeLists.txt:204 (find_package)
>> >>
>> >>

Re: PyArrow building from source issue

2020-05-02 Thread Sutou Kouhei
ython static library:
>> ARROW_PYTHON_static_lib-NOTFOUND*
>> -- Configuring done
>> -- Generating done
>> -- Build files have been written to:
>> /home/vibhatha/sandbox/arrow/repos/arrow/python/build/temp.linux-x86_64-3.8
>> -- Finished cmake for pyarrow
>> -- Running cmake --build for pyarrow
>> cmake --build . --config release --
>> [  6%] Compiling Cython CXX source for _compute...
>> [  6%] Built target _compute_pyx
>> Scanning dependencies of target _compute
>> [ 13%] Building CXX object CMakeFiles/_compute.dir/_compute.cpp.o
>> [ 20%] Linking CXX shared module release/_
>> compute.cpython-38-x86_64-linux-gnu.so
>>
>>
>> */usr/bin/ld: cannot find -larrow_shared/usr/bin/ld: cannot find
>> -larrow_python_shared*collect2: error: ld returned 1 exit status
>> make[2]: *** [CMakeFiles/_compute.dir/build.make:84: release/_
>> compute.cpython-38-x86_64-linux-gnu.so] Error 1
>> make[1]: *** [CMakeFiles/Makefile2:89: CMakeFiles/_compute.dir/all] Error 2
>> make: *** [Makefile:84: all] Error 2
>> error: command 'cmake' failed with exit status 2
>>
>> The log also shows that the shared or static libraries have not been
>> found,
>>
>>
>> With Regards,
>> Vibhatha Abeykoon
>>
>>
>> On Fri, May 1, 2020 at 8:23 PM Vibhatha Abeykoon 
>> wrote:
>>
>>> Thank you for the clarification.
>>>
>>> On Fri, May 1, 2020 at 8:10 PM Sutou Kouhei  wrote:
>>>
>>>> Hi,
>>>>
>>>> It'll work.
>>>> Note that the LD_LIBRARY_PATH is the environment variable at
>>>> run time. You need to specify the correct ARROW_HOME or
>>>> -DCMAKE_MODULE_PATH at build time too.
>>>>
>>>>
>>>> Thanks,
>>>> --
>>>> kou
>>>>
>>>> In 
>>>>   "Re: PyArrow building from source issue" on Fri, 1 May 2020 18:13:54
>>>> -0400,
>>>>   Vibhatha Abeykoon  wrote:
>>>>
>>>> > I will elaborate a couple of reasons,
>>>> >
>>>> > When there are a couple of versions of Arrow, used for different
>>>> projects
>>>> > depending on various development choices, it is convenient for me to
>>>> keep
>>>> > them pointed towards a folder of my choice.
>>>> > Then refer to it and continue the work. Correct me if I am wrong, what
>>>> if I
>>>> > point to this folder of my choice and add it to the LD_LIBRARY_PATH.
>>>> > Will this cause issues?
>>>> >
>>>> > With Regards,
>>>> > Vibhatha Abeykoon
>>>> >
>>>> >
>>>> > On Fri, May 1, 2020 at 6:09 PM Vibhatha Abeykoon 
>>>> wrote:
>>>> >
>>>> >> Okay, thank you for your response.
>>>> >>
>>>> >> With Regards,
>>>> >> Vibhatha Abeykoon
>>>> >>
>>>> >>
>>>> >> On Fri, May 1, 2020 at 5:17 PM Sutou Kouhei 
>>>> wrote:
>>>> >>
>>>> >>> Hi,
>>>> >>>
>>>> >>> > I used these settings, I still want the libs to be in a custom
>>>> >>> directory,
>>>> >>> > not in lib/
>>>> >>>
>>>> >>> Why? We don't recommend it if you don't a CMake expert.
>>>> >>>
>>>> >>> You can't do this if you use ARROW_HOME environment
>>>> >>> variable.
>>>> >>>
>>>> >>> You may able to do this by removing ARROW_HOME environment
>>>> >>> variable and adding
>>>> >>>
>>>> >>>
>>>> PYARROW_CMAKE_OPTIONS="-DCMAKE_MODULE_PATH=/home/vibhatha/sandbox/arrow/repos/arrow/cpp/dist/lib/cmake/arrow"
>>>> >>> environment variable or something.
>>>> >>>
>>>> >>>
>>>> >>> Thanks,
>>>> >>> --
>>>> >>> kou
>>>> >>>
>>>> >>> In >>> awrfqzgyi+qldyec_kvbn2njgtwjl...@mail.gmail.com>
>>>> >>>   "Re: PyArrow building from source issue" on Fri, 1 May 2020
>>>> 15:05:51
>>>> >>> -0400,
>>>> >>>   Vibhatha Abeykoon  wrote:
>>>> >>>
>>>> >>> > export ARROW_HOME=/home/vibhatha/sandbox/

Re: Sending arrow via socket

2020-05-02 Thread Sutou Kouhei
Hi,

>> Is it necessary to create a buffer to send data via socket, is, for
>> example I create the schemas, do I have to create a buffer to store the
>> data and then send he that data?,

If your C program and JavaScript program run in the same
process, you don't need to do it. You can just pass memory
address.

If your C program and JavaScript program run in the
difference processes (I think so), you need to do it.

>> Also is this method more effective compared to writing data to a arrow
>> file and reading from that arrow file in the javascript program?

If you use memory file system such as tmpfs in Linux for
writing and reading the Apache Arrow file, it may be faster
than sending data via socket.

>> Finally, what are the option that are available for passing data from one
>> program language to another using arrow?

1. Sending data via socket
   (It works with different processes and different hosts)
2. Writing data to a file and passing the file
   (It works with different processes)
3. Sending memory address (the same process)
   (It works with the same process)

> And also, instead of sending the pointer, can you send the buffer directly?

The example I introduced sends the buffer data not the
pointer of the buffer data.


Thanks,
--
kou

In 
  "Re: Sending arrow via socket" on Sat, 2 May 2020 07:46:55 +0200,
  swizz one  wrote:

> And also, instead of sending the pointer, can you send the buffer directly?
> 
> El sáb., 2 may. 2020 a las 7:07, swizz one () escribió:
> 
>> Thank you very much, Your answer has been really helpful. Can I ask a
>> question?
>> Is it necessary to create a buffer to send data via socket, is, for
>> example I create the schemas, do I have to create a buffer to store the
>> data and then send he that data?,
>> Also is this method more effective compared to writing data to a arrow
>> file and reading from that arrow file in the javascript program?
>> Finally, what are the option that are available for passing data from one
>> program language to another using arrow?
>>
>> Sorry, for the questions, I am a bit new to programming.
>>
>> El vie., 1 may. 2020 a las 23:30, Sutou Kouhei ()
>> escribió:
>>
>>> Hi,
>>>
>>> Do you want to use Apache Arrow C GLib instead of Apache
>>> Arrow C++, right?
>>>
>>> We don't provide a memory pool in Apache Arrow C GLib API.
>>>
>>> You can create a resizable buffer and output serialize data
>>> to it:
>>>
>>> ---
>>> GError *error = NULL;
>>> GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(1024, );
>>> if (!buffer) {
>>>   g_print("error: %s\n", error->message);
>>>   g_error_free(error);
>>>   return;
>>> }
>>>
>>> GArrowBufferOutputStream *output =
>>> garrow_buffer_output_stream_new(buffer);
>>>
>>> GArrowRecordBatchStreamWriter *writer =
>>>   garrow_record_batch_stream_writer_new(GARROW_OUTPUT_STREAM(output),
>>> schema, /* You need to create
>>> this */
>>> );
>>> if (!writer) {
>>>   g_print("error: %s\n", error->message);
>>>   g_error_free(error);
>>>   g_object_unref(output);
>>>   g_object_unref(buffer);
>>>   return;
>>> }
>>>
>>> if (!garrow_record_batch_writer_write_record_batch(
>>>GARROW_RECORD_BATCH_WRITER(writer),
>>>record_batch, /* You need to create this */
>>>)) {
>>>   g_print("error: %s\n", error->message);
>>>   g_error_free(error);
>>>   g_object_unref(writer);
>>>   g_object_unref(output);
>>>   g_object_unref(buffer);
>>>   return;
>>> }
>>>
>>> if (!garrow_record_batch_writer_close(
>>>GARROW_RECORD_BATCH_WRITER(writer),
>>>)) {
>>>   g_print("error: %s\n", error->message);
>>>   g_error_free(error);
>>>   g_object_unref(writer);
>>>   g_object_unref(output);
>>>   g_object_unref(buffer);
>>>   return;
>>> }
>>>
>>> GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
>>> gsize data_size;
>>> gconstpointer data_raw = g_bytes_get_data(data, _size);
>>> write(websocket_fd, data_raw, data_size);
>>> g_bytes_unref(data);
>>>
>>> g_object_unref(writer);
>>> g_object_unref(output);
>>> g_object_unref(buffer);
>>> ---
>>>
>>>
>>> Thanks,
>>> --
>>> kou
>>>
>>> In 
>>>   "Sending arrow via socket" on Fri, 1 May 2020 18:12:34 +0200,
>>>   swizz one  wrote:
>>>
>>> > Please, I am currently working on a project that require sending data
>>> from
>>> > an c program to perpespective(javascript) via socket from c. Since
>>> > perpespective works with arrow, it was a perfect choice. Is it possible
>>> to
>>> > send table at x interval from c to javascript via websocket without
>>> > creating a an arrow binary format file?
>>> > How to you create a memory pool primotive with c, without having to
>>> size to
>>> > have a fixed sized on the array like the memory pool in c++.
>>> >
>>> > Your's Faithfully,
>>> > Thank you
>>>
>>


Re: PyArrow building from source issue

2020-05-01 Thread Sutou Kouhei
Hi,

It'll work.
Note that the LD_LIBRARY_PATH is the environment variable at
run time. You need to specify the correct ARROW_HOME or
-DCMAKE_MODULE_PATH at build time too.


Thanks,
--
kou

In 
  "Re: PyArrow building from source issue" on Fri, 1 May 2020 18:13:54 -0400,
  Vibhatha Abeykoon  wrote:

> I will elaborate a couple of reasons,
> 
> When there are a couple of versions of Arrow, used for different projects
> depending on various development choices, it is convenient for me to keep
> them pointed towards a folder of my choice.
> Then refer to it and continue the work. Correct me if I am wrong, what if I
> point to this folder of my choice and add it to the LD_LIBRARY_PATH.
> Will this cause issues?
> 
> With Regards,
> Vibhatha Abeykoon
> 
> 
> On Fri, May 1, 2020 at 6:09 PM Vibhatha Abeykoon  wrote:
> 
>> Okay, thank you for your response.
>>
>> With Regards,
>> Vibhatha Abeykoon
>>
>>
>> On Fri, May 1, 2020 at 5:17 PM Sutou Kouhei  wrote:
>>
>>> Hi,
>>>
>>> > I used these settings, I still want the libs to be in a custom
>>> directory,
>>> > not in lib/
>>>
>>> Why? We don't recommend it if you don't a CMake expert.
>>>
>>> You can't do this if you use ARROW_HOME environment
>>> variable.
>>>
>>> You may able to do this by removing ARROW_HOME environment
>>> variable and adding
>>>
>>> PYARROW_CMAKE_OPTIONS="-DCMAKE_MODULE_PATH=/home/vibhatha/sandbox/arrow/repos/arrow/cpp/dist/lib/cmake/arrow"
>>> environment variable or something.
>>>
>>>
>>> Thanks,
>>> --
>>> kou
>>>
>>> In 
>>>   "Re: PyArrow building from source issue" on Fri, 1 May 2020 15:05:51
>>> -0400,
>>>   Vibhatha Abeykoon  wrote:
>>>
>>> > export ARROW_HOME=/home/vibhatha/sandbox/arrow/repos/arrow/cpp/dist
>>> > export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
>>> >
>>> > export PYARROW_WITH_PARQUET=1
>>> > export PYARROW_WITH_PYTHON=1
>>> > export PYARROW_WITH_BUILD_TESTS=1
>>> >
>>> > cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>>> >
>>> >
>>> -DCMAKE_INSTALL_LIBDIR=/home/vibhatha/sandbox/arrow/repos/arrow/cpp/arrowmylibs
>>> > \
>>> >   -DARROW_WITH_BZ2=OFF \
>>> >   -DARROW_WITH_ZLIB=OFF \
>>> >   -DARROW_WITH_ZSTD=OFF \
>>> >   -DARROW_WITH_LZ4=OFF \
>>> >   -DARROW_WITH_SNAPPY=OFF \
>>> >   -DARROW_WITH_BROTLI=OFF \
>>> >   -DARROW_PARQUET=ON \
>>> >   -DARROW_PYTHON=ON \
>>> >   -DARROW_BUILD_TESTS=ON \
>>> >
>>> >
>>> -DPYTHON_EXECUTABLE=/home/vibhatha/sandbox/arrow/repos/arrow/ENVARROW/bin/python3
>>> > \
>>> >   ..
>>> >
>>> >
>>> > I used these settings, I still want the libs to be in a custom
>>> directory,
>>> > not in lib/
>>> >
>>> > Does it make things not work?
>>> >
>>> > Now I get the following error,
>>> >
>>> > python setup.py build_ext --inplace
>>> > WARNING: The wheel package is not available.
>>> > running build_ext
>>> > -- Running cmake for pyarrow
>>> > cmake
>>> -DPYTHON_EXECUTABLE=/home/vibhatha/sandbox/arrow/ENVARROW/bin/python
>>> >  -DPYARROW_BUILD_CUDA=off -DPYARROW_BUILD_FLIGHT=off
>>> > -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_DATASET=off
>>> > -DPYARROW_BUILD_ORC=off -DPYARROW_BUILD_PARQUET=on
>>> > -DPYARROW_BUILD_PLASMA=off -DPYARROW_BUILD_S3=off
>>> -DPYARROW_BUILD_HDFS=off
>>> > -DPYARROW_USE_TENSORFLOW=off -DPYARROW_BUNDLE_ARROW_CPP=off
>>> > -DPYARROW_BUNDLE_BOOST=off -DPYARROW_GENERATE_COVERAGE=off
>>> > -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_PARQUET_USE_SHARED=on
>>> > -DCMAKE_BUILD_TYPE=release
>>> /home/vibhatha/sandbox/arrow/repos/arrow/python
>>> > -- System processor: x86_64
>>> > -- Arrow build warning level: PRODUCTION
>>> > Using ld linker
>>> > Configured for RELEASE build (set with cmake
>>> > -DCMAKE_BUILD_TYPE={release,debug,...})
>>> > -- Build Type: RELEASE
>>> > -- Build output directory:
>>> >
>>> /home/vibhatha/sandbox/arrow/repos/arrow/python/build/temp.linux-x86_64-3.8/release
>>> > -- Arrow version: 0.18.0 (HOME:
>>> > /h

Re: Sending arrow via socket

2020-05-01 Thread Sutou Kouhei
Hi,

Do you want to use Apache Arrow C GLib instead of Apache
Arrow C++, right?

We don't provide a memory pool in Apache Arrow C GLib API.

You can create a resizable buffer and output serialize data
to it:

---
GError *error = NULL;
GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(1024, );
if (!buffer) {
  g_print("error: %s\n", error->message);
  g_error_free(error);
  return;
}

GArrowBufferOutputStream *output = garrow_buffer_output_stream_new(buffer);

GArrowRecordBatchStreamWriter *writer =
  garrow_record_batch_stream_writer_new(GARROW_OUTPUT_STREAM(output),
schema, /* You need to create this */
);
if (!writer) {
  g_print("error: %s\n", error->message);
  g_error_free(error);
  g_object_unref(output);
  g_object_unref(buffer);
  return;
}

if (!garrow_record_batch_writer_write_record_batch(
   GARROW_RECORD_BATCH_WRITER(writer),
   record_batch, /* You need to create this */
   )) {
  g_print("error: %s\n", error->message);
  g_error_free(error);
  g_object_unref(writer);
  g_object_unref(output);
  g_object_unref(buffer);
  return;
}

if (!garrow_record_batch_writer_close(
   GARROW_RECORD_BATCH_WRITER(writer),
   )) {
  g_print("error: %s\n", error->message);
  g_error_free(error);
  g_object_unref(writer);
  g_object_unref(output);
  g_object_unref(buffer);
  return;
}

GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
gsize data_size;
gconstpointer data_raw = g_bytes_get_data(data, _size);
write(websocket_fd, data_raw, data_size);
g_bytes_unref(data);

g_object_unref(writer);
g_object_unref(output);
g_object_unref(buffer);
---


Thanks,
--
kou

In 
  "Sending arrow via socket" on Fri, 1 May 2020 18:12:34 +0200,
  swizz one  wrote:

> Please, I am currently working on a project that require sending data from
> an c program to perpespective(javascript) via socket from c. Since
> perpespective works with arrow, it was a perfect choice. Is it possible to
> send table at x interval from c to javascript via websocket without
> creating a an arrow binary format file?
> How to you create a memory pool primotive with c, without having to size to
> have a fixed sized on the array like the memory pool in c++.
> 
> Your's Faithfully,
> Thank you


Re: Snappy Compression with red-parquet Ruby Gem

2020-04-23 Thread Sutou Kouhei
Hi,

Oh, we forgot to integrate saver interface with the Parquet
compression option.

You can use the feature by the following code with 0.17.0:

--
require "parquet"

table = Arrow::Table.new({"count" => [1, 2, 3]})
Arrow::FileOutputStream.open("test.parquet", false) do |output|
  properties = Parquet::WriterProperties.new
  properties.set_compression(:snappy)
  Parquet::ArrowFileWriter.open(table.schema, output, properties) do |writer|
chunk_size = 1024
writer.write_table(table, chunk_size)
  end
end
--

You'll be able to write the following code with the next release:

--
require "parquet"

table = Arrow::Table.new({"count" => [1, 2, 3]})
table.save("test.parquet", compression: :snappy)
--


Thanks,
--
kou

In <78b1b196-4217-4526-b848-fe126edb2...@contoso.com>
  "Snappy Compression with red-parquet Ruby Gem" on Thu, 23 Apr 2020 20:13:25 
+,
  David Lahn  wrote:

> Hi,
> 
> Does anyone have any examples of how to output a Parquet file with Snappy 
> compression using the Ruby gem?
> 
> We have tested trying to set compression to “snappy” on the TableSaver, but 
> we get the following:
> 
> [compressed-output-stream][new]: NotImplemented: Streaming compression 
> unsupported with Snappy (Arrow::Error::NotImplemented)
> 
> Example:
> 
> Arrow::TableSaver.new(table, 'test.parquet', {compression: 'snappy'}).save
> 
> Or are we completely turned around on how to accomplish this?
> 
> Dave
> 
> David Lahn
> DevOps Lead
> Development
>
> ForwardPMX 
> Privacy Policy
> 
>  
>   
> 
> This e-mail is confidential to ForwardPMX intended for use by the recipient. 
> If you received this in error or are not the intended recipient, you are 
> hereby notified that any review, retransmission, copying or other use of, or 
> taking of any action in reliance upon this information is strictly prohibited.
> 


Re: Mailing List Web Archives not updating

2020-02-24 Thread Sutou Kouhei
Hi,

In 
  "Mailing List Web Archives not updating" on Mon, 24 Feb 2020 16:03:44 -0500,
  Daniel Nugent  wrote:

> Is anyone else experiencing this issue? Neither the dev nor user mailing list 
> have updated since Sunday February 16
> 
> I’m not sure exactly who to contact about this.

Thanks for your report.
I didn't notice it too.

I also don't know the maintainer of the service. Sorry.

Please use the following service for now:

  https://lists.apache.org/list.html?user@arrow.apache.org



Thanks,
--
kou


Re: Pyarrow build/install from source in ubuntu not working

2020-01-23 Thread Sutou Kouhei
Hi,

Changing

> RUN python3 -c 'import pyarrow'

to

  RUN LD_LIBRARY_PATH=/usr/local/lib python3 -c 'import pyarrow'

works on my environment.

Other solution:

Adding

  ENV LD_LIBRARY_PATH=/usr/local/lib

before

  RUN python3 -c 'import pyarrow'

Dockerfile:

  ...
  RUN bash install_arrow.sh
  ENV LD_LIBRARY_PATH=/usr/local/lib
  RUN python3 -c 'import pyarrow'


Thanks,
--
kou

In 
  "Pyarrow build/install from source in ubuntu not working" on Thu, 23 Jan 2020 
17:38:33 -0800,
  Anna Waldron  wrote:

> Hi,
> 
> I am trying to build and install pyarrow from source in an ubuntu 18.04
> docker image and getting the following error when attempting to import the
> module:
> 
> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File
>> "/usr/local/lib/python3.6/dist-packages/pyarrow-0.14.0-py3.6-linux-x86_64.egg/pyarrow/__init__.py",
>> line 49, in 
>> from pyarrow.lib import cpu_count, set_cpu_count
>> ImportError: libarrow.so.14: cannot open shared object file: No such file
>> or directory
>>
> 
> Here is the Dockerfile I am using:
> 
> FROM ubuntu:18.04
>> RUN apt-get update
>> RUN apt-get install -y git
>> RUN mkdir /arrow
>> RUN git clone https://github.com/apache/arrow.git /arrow
>> WORKDIR /arrow/arrow
>> RUN git checkout apache-arrow-0.14.0
>> WORKDIR /
> 
> COPY install_arrow.sh /install_arrow.sh
> 
> RUN bash install_arrow.sh
> 
> 
>> RUN python3 -c 'import pyarrow'
> 
> 
> and the install_arrow.sh script copied into the image:
> 
> export ARROW_BUILD_TYPE=release
>> export ARROW_HOME=/usr/local \
>>PARQUET_HOME=/usr/local
>> export PYTHON_EXECUTABLE=/usr/bin/python3
>>
> 
> 
> # install requirements
>> export DEBIAN_FRONTEND="noninteractive"
>> apt-get update
>> apt-get install -y --no-install-recommends apt-utils
>> apt-get install -y git python3-minimal python3-pip autoconf libtool
>> apt-get install -y cmake \
>>python3-dev \
>>libjemalloc-dev libboost-dev \
>>build-essential \
>>libboost-filesystem-dev \
>>libboost-regex-dev \
>>libboost-system-dev \
>>flex \
>>bison
>> pip3 install --no-cache-dir six pytest numpy cython
> 
> 
>> mkdir -p /arrow/cpp/build \
>>   && cd /arrow/cpp/build \
>>   && cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
>>-DOPENSSL_ROOT_DIR=/usr/local/ssl \
>>-DCMAKE_INSTALL_LIBDIR=lib \
>>-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>>-DARROW_PARQUET=ON \
>>-DARROW_PYTHON=ON \
>>-DARROW_PLASMA=ON \
>>-DARROW_BUILD_TESTS=OFF \
>>-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
>>.. \
>>   && make -j$(nproc) \
>>   && make install \
>>   && cd /arrow/python \
>>   && python3 setup.py build_ext --build-type=$ARROW_BUILD_TYPE
>> --with-parquet \
>>   && python3 setup.py install
>>
> 
> 
> LD_LIBRARY_PATH=/usr/local/lib
>>
> 
> I'm using Docker 19.03.5 on Ubuntu 18.04.3 LTS to build the image.
> 
> Thanks in advance for any help.
> 
> Anna


Re: Using Pyarrow and C++ API

2020-01-05 Thread Sutou Kouhei
Hi,

How about install pyarrow with "pip install --no-binary :all: pyarrow"?
Then you will be able to build your pyarrow with your
libarrow.so and libarrow_python.so.

Thanks,
--
kou

In 
 

  "Using Pyarrow and C++ API " on Sun, 5 Jan 2020 03:45:21 +,
  Raúl Bocanegra Algarra  wrote:

> Hi!
> 
> I am trying to use pyarrow with arrow C++ API in an application that embeds a 
> python3 interpreter and loads an extension module using pybind11. 
> Documentation says C++ headers and libraries are bundled with pyarrow but I 
> am having some segfaults when calling some API functions like the wrap/unwrap 
> ones. I am calling import_pyarrow and also import_numpy but segfaults still 
> happening. I feel the reason is that I compile and link with my own arrow and 
> arrow_python libs built with vcpkg so my app links with those, but the 
> extension module imported by the embedded python interpreter is loading the 
> arrow_python from the site-packages folder where pip installed pyarrow, and 
> that mismatch makes the segfault happen. So I was wondering if the correct 
> approach for a situation like this with an embedded interpreter and an 
> extension module that imports pyarrow is to use the headers and libs from the 
> pyarrow installation removing the ones from vcpkg or if you know another 
> option I haven't contemplated yet.
> 
> Thanks for your work.
> 
> Best regards,
> 
> Raúl Bocanegra Algarra. C++ Software Engineer.
> 


Re: Installing Apache Arrow RHEL 7

2019-10-25 Thread Sutou Kouhei
Hi,

You need to install epel-release. It's available by default
on CentOS. But it's not available by default on RHEL. You
need to install epel-release manually on RHEL:

  sudo -H yum install 
https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
  sudo -H subscription-manager repos --enable "rhel-*-optional-rpms" --enable 
"rhel-*-extras-rpms"  --enable "rhel-ha-for-rhel-*-server-rpms"

See https://fedoraproject.org/wiki/EPEL for details.


For "7Server", we should provide

  https://dl.bintray.com/apache/arrow/centos/7Server/

as a symbolic link of

  https://dl.bintray.com/apache/arrow/centos/7/

. Then we can use the current repository without changing
.repo manually.

I opened an issue for this:
https://issues.apache.org/jira/browse/ARROW-6997


Thanks,
--
kou

In 
 

  "RE: Installing Apache Arrow RHEL 7" on Fri, 25 Oct 2019 18:32:42 +,
  Brian Klaassens  wrote:

> Yes if I change the URL https://dl.bintray.com/apache/arrow/centos/7/x86_64/ 
> the 404 error message goes away.
> But a new error says "Nothing to do". See below
> 
> 
> Loaded plugins: amazon-id, rhui-lb, search-disabled-repos
> 
>  apache-arrow| 
> 2.9 kB  00:00:00
> 
>  No package epel-release available.
> 
>  Error: Nothing to do
> 
> 
> 
> 
> -Original Message-
> From: Wes McKinney [mailto:wesmck...@gmail.com] 
> Sent: Friday, October 25, 2019 2:23 PM
> To: user@arrow.apache.org; dev 
> Subject: Re: Installing Apache Arrow RHEL 7
> 
> Does
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dl.bintray.com_apache_arrow_centos_7_x86-5F64_repodata_repomd.xml=DwIFaQ=wgu6hzw1MOrcVMSMqu8IcS59mhBvl1Fc7tKn_Em0PVg=3R49A2fePqnR9-O_QNj0pdjjuyuNumS36ByvtmgmUzQ=MaYTLi80BP6FNZOhIm-qNdV2-cSFa_BtLYvdPAYC7a0=M6CPNyeIydMAsnY3sIMKIgiSdyzU4p6eA7eClqy3EDw=
>  
> 
> work? I'm copying dev@ in case Kou is not subscribed to user@
> 
> On Fri, Oct 25, 2019 at 12:39 PM Brian Klaassens  
> wrote:
>>
>> According to 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__arrow.apache.org_install_=DwIFaQ=wgu6hzw1MOrcVMSMqu8IcS59mhBvl1Fc7tKn_Em0PVg=3R49A2fePqnR9-O_QNj0pdjjuyuNumS36ByvtmgmUzQ=MaYTLi80BP6FNZOhIm-qNdV2-cSFa_BtLYvdPAYC7a0=qSeBMFmmb1u2EDQtNrpQ9BJFfhB0chYhy0_8FssUGTQ=
>>   for Centos I can add a repo and install from there.
>>
>>
>>
>> sudo tee /etc/yum.repos.d/Apache-Arrow.repo <>
>> [apache-arrow]
>>
>> name=Apache Arrow
>>
>> baseurl=https://urldefense.proofpoint.com/v2/url?u=https-3A__dl.bintray.com_apache_arrow_centos_-255C-24releasever_-255C-24basearch_=DwIFaQ=wgu6hzw1MOrcVMSMqu8IcS59mhBvl1Fc7tKn_Em0PVg=3R49A2fePqnR9-O_QNj0pdjjuyuNumS36ByvtmgmUzQ=MaYTLi80BP6FNZOhIm-qNdV2-cSFa_BtLYvdPAYC7a0=UQerI8to1f1WnHEHJ2cM8obMXQ0sXJjP-XbJIVomWJM=
>>  
>>
>> gpgcheck=1
>>
>> enabled=1
>>
>> gpgkey=https://urldefense.proofpoint.com/v2/url?u=https-3A__dl.bintray.com_apache_arrow_centos_RPM-2DGPG-2DKEY-2Dapache-2Darrow=DwIFaQ=wgu6hzw1MOrcVMSMqu8IcS59mhBvl1Fc7tKn_Em0PVg=3R49A2fePqnR9-O_QNj0pdjjuyuNumS36ByvtmgmUzQ=MaYTLi80BP6FNZOhIm-qNdV2-cSFa_BtLYvdPAYC7a0=2Zs94M7AJrb0vjlp5tzTp6uBCyHbJc9hIrDaMccQXXA=
>>  
>>
>> REPO
>>
>> yum install -y epel-release
>>
>>
>>
>> The only problem is that the base URL is not constructed correctly.
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__dl.bintray.com_apache_arrow_centos_7Server_x86-5F64_repodata_repomd.xml=DwIFaQ=wgu6hzw1MOrcVMSMqu8IcS59mhBvl1Fc7tKn_Em0PVg=3R49A2fePqnR9-O_QNj0pdjjuyuNumS36ByvtmgmUzQ=MaYTLi80BP6FNZOhIm-qNdV2-cSFa_BtLYvdPAYC7a0=EhdJYxFVlPEZCrZILii_UA1H0KQSnsdaM9oTJfp0mog=
>>  
>>
>>
>>
>> I tried to cheat and changed the base URL to.
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__dl.bintray.com_apache_arrow_centos_7_x86-5F64_repodata_repomd.xml=DwIFaQ=wgu6hzw1MOrcVMSMqu8IcS59mhBvl1Fc7tKn_Em0PVg=3R49A2fePqnR9-O_QNj0pdjjuyuNumS36ByvtmgmUzQ=MaYTLi80BP6FNZOhIm-qNdV2-cSFa_BtLYvdPAYC7a0=M6CPNyeIydMAsnY3sIMKIgiSdyzU4p6eA7eClqy3EDw=
>>  
>>
>> The result was
>>
>>
>>
>> Loaded plugins: amazon-id, rhui-lb, search-disabled-repos
>>
>> apache-arrow| 
>> 2.9 kB  00:00:00
>>
>> No package epel-release available.
>>
>> Error: Nothing to do
>>
>>
>>
>> Is there a way for me to get this to work?
>>
>>
>>
>> This is the OS I’m on.
>>
>> cat /etc/os-release
>>
>> NAME="Red Hat Enterprise Linux Server"
>>
>> VERSION="7.7 (Maipo)"
>>
>> ID="rhel"
>>
>> ID_LIKE="fedora"
>>
>> VARIANT="Server"
>>
>> VARIANT_ID="server"
>>
>> VERSION_ID="7.7"
>>
>> PRETTY_NAME="Red Hat Enterprise Linux Server 7.7 (Maipo)"
>>
>> ANSI_COLOR="0;31"
>>
>> CPE_NAME="cpe:/o:redhat:enterprise_linux:7.7:GA:server"
>>
>> HOME_URL="https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_=DwIFaQ=wgu6hzw1MOrcVMSMqu8IcS59mhBvl1Fc7tKn_Em0PVg=3R49A2fePqnR9-O_QNj0pdjjuyuNumS36ByvtmgmUzQ=MaYTLi80BP6FNZOhIm-qNdV2-cSFa_BtLYvdPAYC7a0=u_y943PhNvX97txBEpsbTelAaUx5WSSR9yqiq0YqqGc=
>>  "
>>
>>