Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2024-04-26 Thread via GitHub


rok commented on code in PR #37298:
URL: https://github.com/apache/arrow/pull/37298#discussion_r1581448620


##
cpp/src/arrow/extension/CMakeLists.txt:
##
@@ -15,10 +15,16 @@
 # specific language governing permissions and limitations
 # under the License.
 
+set(CANONICAL_EXTENSION_TESTS uuid_test.cc)
+
+if(ARROW_JSON)

Review Comment:
   See my reply below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2024-04-26 Thread via GitHub


rok commented on code in PR #37298:
URL: https://github.com/apache/arrow/pull/37298#discussion_r1581445564


##
cpp/src/arrow/extension_type.cc:
##
@@ -145,12 +146,17 @@ static void CreateGlobalRegistry() {
   g_registry = std::make_shared();
 
 #ifdef ARROW_JSON

Review Comment:
   `fixed_shape_tensor` uses rapidjson to serialize and deserialize metadata 
(see `FixedShapeTensorType::Serialize` and `FixedShapeTensorType::Deserialize`) 
of the type so we need `ARROW_JSON` to build it. UUID has no metadata to 
serialize so we don't require JSON. It's annoying that for some builds we get 
all extensions and for some only a subset, but I don't have a better idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2024-04-26 Thread via GitHub


raulcd commented on code in PR #37298:
URL: https://github.com/apache/arrow/pull/37298#discussion_r1580820063


##
cpp/src/arrow/extension/CMakeLists.txt:
##
@@ -15,10 +15,16 @@
 # specific language governing permissions and limitations
 # under the License.
 
+set(CANONICAL_EXTENSION_TESTS uuid_test.cc)
+
+if(ARROW_JSON)

Review Comment:
   this shouldn't be required now that you are building extensions even without 
`ARROW_JSON`, right?



##
cpp/src/arrow/extension_type.cc:
##
@@ -145,12 +146,17 @@ static void CreateGlobalRegistry() {
   g_registry = std::make_shared();
 
 #ifdef ARROW_JSON

Review Comment:
   I think the `ARROW_JSON` was only required because we were having a bug on 
CMake that you are fixing right now. Why should `fixed_shape_tensor` has 
nothing to do with JSON?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2024-04-18 Thread via GitHub


rok commented on PR #37298:
URL: https://github.com/apache/arrow/pull/37298#issuecomment-2065031775

   I've opened a seperate PR for the [format 
change](https://github.com/apache/arrow/pull/41299). If there's no objections 
I'd like to call for a ML vote tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2024-04-18 Thread via GitHub


rok commented on PR #37298:
URL: https://github.com/apache/arrow/pull/37298#issuecomment-2064955949

   Integration failures seem unrelated. It would be good to check again after 
https://github.com/apache/arrow/pull/41264 merged to be sure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2023-12-11 Thread via GitHub


rok commented on code in PR #37298:
URL: https://github.com/apache/arrow/pull/37298#discussion_r1422708411


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -251,6 +251,14 @@ Variable shape tensor
Values inside each **data** tensor element are stored in 
row-major/C-contiguous
order according to the corresponding **shape**.
 
+UUID
+

Review Comment:
   Good point. Added:
   ```
   
   .. note::
  A specific UUID version is not required or guaranteed. This extension 
represents
  UUIDs as FixedSizeBinary(16) and does not interpret the bytes in any way.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2023-12-11 Thread via GitHub


rok commented on code in PR #37298:
URL: https://github.com/apache/arrow/pull/37298#discussion_r1422638468


##
cpp/src/arrow/extension/uuid_test.cc:
##
@@ -0,0 +1,51 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/extension/uuid.h"
+
+#include "arrow/testing/matchers.h"
+
+#include "arrow/array/array_nested.h"
+#include "arrow/array/array_primitive.h"
+#include "arrow/io/memory.h"
+#include "arrow/ipc/reader.h"
+#include "arrow/record_batch.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/util/key_value_metadata.h"
+
+#include "arrow/testing/extension_type.h"
+
+namespace arrow {
+
+using extension::uuid;
+
+class TestUuuidExtensionType : public ::testing::Test {};

Review Comment:
   Indeed, removing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2023-12-11 Thread via GitHub


rok commented on code in PR #37298:
URL: https://github.com/apache/arrow/pull/37298#discussion_r1422638020


##
cpp/src/arrow/acero/util_test.cc:
##
@@ -21,9 +21,12 @@
 #include "arrow/testing/gtest_util.h"
 #include "arrow/testing/matchers.h"
 
+#include "arrow/extension/uuid.h"

Review Comment:
   Removed.



##
cpp/src/arrow/extension/uuid.h:
##
@@ -0,0 +1,56 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "arrow/extension_type.h"
+
+namespace arrow {
+namespace extension {
+
+class ARROW_EXPORT UuidArray : public ExtensionArray {
+ public:
+  using ExtensionArray::ExtensionArray;
+};
+
+class ARROW_EXPORT UuidType : public ExtensionType {
+ public:
+  UuidType() : ExtensionType(fixed_size_binary(16)) {}
+
+  std::string extension_name() const override { return "arrow.uuid"; }
+
+  const std::shared_ptr value_type() const { return 
fixed_size_binary(16); }
+
+  bool ExtensionEquals(const ExtensionType& other) const override;
+
+  std::shared_ptr MakeArray(std::shared_ptr data) const 
override;
+
+  Result> Deserialize(
+  std::shared_ptr storage_type,
+  const std::string& serialized) const override;
+
+  std::string Serialize() const override { return "uuid-serialized"; }

Review Comment:
   Changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2023-12-11 Thread via GitHub


rok commented on code in PR #37298:
URL: https://github.com/apache/arrow/pull/37298#discussion_r1422635671


##
cpp/src/arrow/c/bridge_test.cc:
##
@@ -3694,7 +3695,10 @@ TEST_F(TestSchemaRoundtrip, Dictionary) {
 }
 
 TEST_F(TestSchemaRoundtrip, UnregisteredExtension) {
-  TestWithTypeFactory(uuid, []() { return fixed_size_binary(16); });
+  TestWithTypeFactory(complex128, []() {

Review Comment:
   Some of these were checking registering / deregistering mechanics for 
extension types and it wasn't practical to use UUID type for that so I either 
removed them or replaced with something equivalent. These tests typically 
assumed uuid was not globally available extension at the start.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2023-12-07 Thread via GitHub


westonpace commented on code in PR #37298:
URL: https://github.com/apache/arrow/pull/37298#discussion_r1419303692


##
cpp/src/arrow/acero/util_test.cc:
##
@@ -21,9 +21,12 @@
 #include "arrow/testing/gtest_util.h"
 #include "arrow/testing/matchers.h"
 
+#include "arrow/extension/uuid.h"

Review Comment:
   Nit: You can remove the `#include "arrow/testing/extension_type.h"` import 
now.



##
cpp/src/arrow/extension/uuid.h:
##
@@ -0,0 +1,56 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "arrow/extension_type.h"
+
+namespace arrow {
+namespace extension {
+
+class ARROW_EXPORT UuidArray : public ExtensionArray {
+ public:
+  using ExtensionArray::ExtensionArray;
+};
+
+class ARROW_EXPORT UuidType : public ExtensionType {
+ public:
+  UuidType() : ExtensionType(fixed_size_binary(16)) {}
+
+  std::string extension_name() const override { return "arrow.uuid"; }
+
+  const std::shared_ptr value_type() const { return 
fixed_size_binary(16); }
+
+  bool ExtensionEquals(const ExtensionType& other) const override;
+
+  std::shared_ptr MakeArray(std::shared_ptr data) const 
override;
+
+  Result> Deserialize(
+  std::shared_ptr storage_type,
+  const std::string& serialized) const override;
+
+  std::string Serialize() const override { return "uuid-serialized"; }

Review Comment:
   Why can't we just serialize this as an empty string?



##
cpp/src/arrow/c/bridge_test.cc:
##
@@ -3694,7 +3695,10 @@ TEST_F(TestSchemaRoundtrip, Dictionary) {
 }
 
 TEST_F(TestSchemaRoundtrip, UnregisteredExtension) {
-  TestWithTypeFactory(uuid, []() { return fixed_size_binary(16); });
+  TestWithTypeFactory(complex128, []() {

Review Comment:
   It looks like you're removing uuid from a number of tests here.  Why is that?



##
docs/source/format/CanonicalExtensions.rst:
##
@@ -251,6 +251,14 @@ Variable shape tensor
Values inside each **data** tensor element are stored in 
row-major/C-contiguous
order according to the corresponding **shape**.
 
+UUID
+

Review Comment:
   I wonder if we should add a small note that we do not require any specific 
version of UUID.  I.e. if a field is a UUID field then the user has no 
guarantee that it is UUIDv4 or v1 or it could even be mixed.



##
cpp/src/arrow/extension/uuid_test.cc:
##
@@ -0,0 +1,51 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/extension/uuid.h"
+
+#include "arrow/testing/matchers.h"
+
+#include "arrow/array/array_nested.h"
+#include "arrow/array/array_primitive.h"
+#include "arrow/io/memory.h"
+#include "arrow/ipc/reader.h"
+#include "arrow/record_batch.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/util/key_value_metadata.h"
+
+#include "arrow/testing/extension_type.h"
+
+namespace arrow {
+
+using extension::uuid;
+
+class TestUuuidExtensionType : public ::testing::Test {};

Review Comment:
   Do we need a test fixture?



##
cpp/src/arrow/extension/uuid_test.cc:
##
@@ -0,0 +1,51 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of 

Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2023-12-04 Thread via GitHub


rok commented on PR #37298:
URL: https://github.com/apache/arrow/pull/37298#issuecomment-1839526314

   @arogozhnikov I think it was mostly about lack of reviews :)
   I've rebased, let's see what current problems are.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-15058: [C++][Python] Native support for UUID [arrow]

2023-12-04 Thread via GitHub


arogozhnikov commented on PR #37298:
URL: https://github.com/apache/arrow/pull/37298#issuecomment-1839384913

   Hi @rok , can you please summarize where you're stuck at with UUIDs? I see 
some installation failures


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org