Re: [PR] feat(c++): kuzu extension to import/export graphar data [incubator-graphar]

2025-10-28 Thread via GitHub


yangxk1 merged PR #776:
URL: https://github.com/apache/incubator-graphar/pull/776


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat(c++): kuzu extension to import/export graphar data [incubator-graphar]

2025-10-28 Thread via GitHub


gary-cloud commented on PR #776:
URL: 
https://github.com/apache/incubator-graphar/pull/776#issuecomment-3457620110

   I’ve updated the PR according to the comments. It’s ready to be merged now. 
@Thespica @yangxk1 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat(c++): kuzu extension to import/export graphar data [incubator-graphar]

2025-10-28 Thread via GitHub


Thespica commented on code in PR #776:
URL: https://github.com/apache/incubator-graphar/pull/776#discussion_r2468784908


##
cpp/extensions/kuzu-extension/README.md:
##
@@ -0,0 +1,118 @@
+# Kuzu — GraphAr Extension
+
+**API:** C++
+**Status:** first-cut, pragmatic implementation (ingest, export, metadata 
inspection)
+**Repository branch:** 
[https://github.com/gary-cloud/kuzu/tree/graphar-extension](https://github.com/gary-cloud/kuzu/tree/graphar-extension)

Review Comment:
   Switch to https://github.com/apache/incubator-graphar/tree/kuzu-extension



##
cpp/extensions/kuzu-extension/test/graphar_extension_metadata_test.cpp:
##
@@ -0,0 +1,106 @@
+#include 
+#include 
+#include 
+#include 
+#include "kuzu.hpp"
+
+using namespace std;
+using namespace kuzu::main;   // Database, Connection, QueryResult
+using namespace kuzu::common; // Value, LogicalTypeID
+using namespace kuzu::processor;  // ResultSet, FlatTuple
+
+#include 
+#include 
+#include 
+
+using namespace std;
+using namespace kuzu;
+using namespace kuzu::common;
+
+#include 
+#include 
+#include 
+
+using namespace std;

Review Comment:
   avoid using `namespace std`, using `std::xxx` even if in test



##
cpp/extensions/kuzu-extension/test/graphar_extension_copy_test.cpp:
##
@@ -0,0 +1,120 @@
+#include 
+#include 
+#include "kuzu.hpp"
+
+using namespace kuzu::main;
+using namespace std;

Review Comment:
   do it too



##
cpp/extensions/kuzu-extension/test/graphar_extension_copy_to_test.cpp:
##
@@ -0,0 +1,140 @@
+#include 
+#include 
+#include "kuzu.hpp"
+
+using namespace kuzu::main;
+using namespace std;

Review Comment:
   do it too



##
cpp/extensions/kuzu-extension/test/graphar_extension_load_test.cpp:
##
@@ -0,0 +1,67 @@
+#include 
+#include 
+#include "kuzu.hpp"
+
+using namespace kuzu::main;
+using namespace std;

Review Comment:
   do it too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat(c++): kuzu extension to import/export graphar data [incubator-graphar]

2025-10-27 Thread via GitHub


gary-cloud commented on code in PR #776:
URL: https://github.com/apache/incubator-graphar/pull/776#discussion_r2467969172


##
cpp/extensions/kuzu-extension/src/function/graphar_export.cpp:
##
@@ -0,0 +1,403 @@
+#include "function/graphar_export.h"
+
+using namespace kuzu::function;
+using namespace kuzu::common;
+
+namespace kuzu {
+namespace graphar_extension {
+
+void initSharedState(ExportFuncSharedState& sharedState, main::ClientContext& 
context,
+const ExportFuncBindData& bindData) {
+sharedState.init(context, bindData);
+}
+
+std::shared_ptr createSharedStateFunc() {
+return std::make_shared();
+}
+
+std::unique_ptr initLocalState(main::ClientContext&,
+const ExportFuncBindData&, std::vector) {
+return std::make_unique();
+}
+
+void sinkFunc(ExportFuncSharedState&, ExportFuncLocalState& localState,
+[[maybe_unused]] const ExportFuncBindData& bindData,
+std::vector> inputVectors) {
+auto& grapharLocalState = localState.cast();
+
+// ATTENTION: postpone buffer creation until first sink call, because we 
can't
+// know the schema before that (It's hard for bindData to get the type 
info).
+if (!grapharLocalState.buffer) {
+// schema of input vectors
+std::vector schema_to_create;
+
+// fill the schema and create buffer in local state.
+KU_ASSERT(inputVectors.size() == bindData.columnNames.size());
+for (size_t i = 0; i < inputVectors.size(); i++) {
+schema_to_create.push_back(PropMeta{bindData.columnNames[i],
+kuzuTypeToGrapharType(inputVectors[i]->dataType), 
Cardinality::SINGLE});
+}
+
+grapharLocalState.buffer = 
std::make_shared(std::move(schema_to_create));
+}
+
+auto& buffer = grapharLocalState.buffer;
+const auto& schema = grapharLocalState.buffer->Schema();
+
+if (inputVectors.size() != schema.size()) {
+throw common::RuntimeException("inputVectors size != schema size");
+}
+
+// Compute the number of logical rows in the current batch (supporting 
mixed flat / unflat
+// vectors) A flat vector is treated as selSize = 1 (broadcasted), while 
an unflat vector uses
+// its own selSize.
+size_t num_rows = 1;
+for (size_t c = 0; c < inputVectors.size(); ++c) {
+auto& v = inputVectors[c];
+if (!v->state->isFlat()) {
+auto s = 
static_cast(v->state->getSelVector().getSelSize());
+if (s > num_rows) {
+num_rows = s;
+}
+}
+}
+
+// Optional: if prefer a fail-fast policy for inconsistent non-flat 
columns,
+// you can disable the “max” strategy above and instead check that all 
non-flat
+// columns have the same selSize, throwing an exception otherwise. Example:
+//   size_t expected = 0;
+//   for (...) if (!v->state->isFlat()) { if (expected==0) expected = s; 
else if (expected != s)
+//   throw ...; }
+// The current implementation adopts a pad-null strategy to improve 
robustness
+// and compatibility.
+
+for (size_t logicalRow = 0; logicalRow < num_rows; ++logicalRow) {
+size_t rid = buffer->NewRow();
+
+for (size_t col = 0; col < schema.size(); ++col) {
+const auto& meta = schema[col];
+auto& vecPtr = inputVectors[col];
+
+// If the vector is flat: always use sel[0].
+// If unflat: use sel[logicalRow] if logicalRow < selSize,
+// otherwise treat as missing (null).
+uint32_t physPos = 0;
+if (vecPtr->state->isFlat()) {
+physPos = vecPtr->state->getSelVector()[0];
+} else {
+auto selSize = 
static_cast(vecPtr->state->getSelVector().getSelSize());
+if (logicalRow >= selSize) {
+// pad-null: this column has no value for the current 
logicalRow,
+// leave as monostate (not written).
+continue;
+}
+physPos = vecPtr->state->getSelVector()[logicalRow];
+}
+
+// use physical position to check for null
+if (vecPtr->isNull(physPos)) {
+continue; // keep as monostate
+}
+
+switch (meta.type) {
+case Type::INT64:
+case Type::TIMESTAMP:
+buffer->SetProperty(rid, meta.name, 
Scalar(vecPtr->getValue(physPos)));
+break;
+case Type::INT32:
+case Type::DATE:
+buffer->SetProperty(rid, meta.name, 
Scalar(vecPtr->getValue(physPos)));
+break;
+case Type::DOUBLE:
+buffer->SetProperty(rid, meta.name, 
Scalar(vecPtr->getValue(physPos)));
+break;
+case Type::FLOAT:
+buffer->SetProperty(rid, meta.name, 
Scalar(vecPtr->getValue(physPos)));
+break;
+case Type::STRING:
+buffer->S

Re: [PR] feat(c++): kuzu extension to import/export graphar data [incubator-graphar]

2025-10-27 Thread via GitHub


yangxk1 commented on code in PR #776:
URL: https://github.com/apache/incubator-graphar/pull/776#discussion_r2467717827


##
cpp/extensions/kuzu-extension/src/utils/graphar_utils.cpp:
##
@@ -0,0 +1,159 @@
+#include "utils/graphar_utils.h"
+
+namespace kuzu {
+namespace graphar_extension {
+
+std::string getFirstToken(const std::string& input) {
+size_t pos = input.find('-');
+if (pos == std::string::npos) {
+return input;
+}
+return input.substr(0, pos);
+}
+
+void getYamlNameWithoutGrapharLabel(const std::string& filePath) {
+const std::string grapharLabel = DEFAULT_GRAPHAR_LABEL;
+if (ends_with(filePath, grapharLabel)) {
+// remove the trailing ".graphar"
+const_cast(filePath).erase(filePath.size() - 
grapharLabel.size());
+}
+}
+
+bool ends_with(const std::string& s, const std::string& suffix) {
+if (s.size() < suffix.size())
+return false;
+return s.compare(s.size() - suffix.size(), suffix.size(), suffix) == 0;
+}
+
+bool parse_is_edge(const std::string& path) {
+if (path.empty()) {
+throw std::invalid_argument("empty path");
+}
+
+// Extract the file name (substring after the last '/' or '\\')
+const size_t pos = path.find_last_of("/\\");
+const std::string filename = (pos == std::string::npos) ? path : 
path.substr(pos + 1);
+
+if (filename.empty()) {
+throw std::invalid_argument("path has no filename");
+}
+
+// Convert to uppercase for case-insensitive comparison
+const std::string up = common::StringUtils::getUpper(filename);

Review Comment:
   suggestion: use more detailed variable names
   
   



##
cpp/extensions/kuzu-extension/src/function/graphar_scan_bindfunc.cpp:
##
@@ -0,0 +1,515 @@
+#include "function/graphar_scan.h"
+
+namespace kuzu {
+namespace graphar_extension {
+
+using namespace function;
+using namespace common;
+
+// Vertex setter maker for properties
+template
+VertexColumnSetter makeTypedVertexSetter(uint64_t fieldIdx, std::string 
colName) {
+return [fieldIdx, colName = std::move(colName)](graphar::VertexIter& it,
+   function::TableFuncOutput& output, kuzu::common::idx_t row) {
+auto res = it.property(colName);
+auto& vec = output.dataChunk.getValueVectorMutable(fieldIdx);
+vec.setValue(row, res.value());
+};
+}
+
+// Vertex setter maker for internal_id
+VertexColumnSetter makeInternalIdVertexSetter(uint64_t fieldIdx, std::string 
colName) {
+return [fieldIdx, colName = std::move(colName)](graphar::VertexIter& it,
+   function::TableFuncOutput& output, kuzu::common::idx_t row) {
+auto id = it.id();
+auto& vec = output.dataChunk.getValueVectorMutable(fieldIdx);
+vec.setValue(row, static_cast(id));
+};
+}
+
+template<>
+VertexColumnSetter makeTypedVertexSetter([[maybe_unused]] 
uint64_t fieldIdx,
+[[maybe_unused]] std::string colName) {
+throw NotImplementedException("List type is not supported in graphar 
scan.");
+}
+
+// Edge setter maker for properties
+template
+EdgeColumnSetter makeTypedEdgeSetter(uint64_t fieldIdx, std::string colName) {
+return [fieldIdx, colName = std::move(colName)](graphar::EdgeIter& it,
+   function::TableFuncOutput& output, kuzu::common::idx_t row,
+   [[maybe_unused]] std::shared_ptr 
unused) {
+auto res = it.property(colName);
+auto& vec = output.dataChunk.getValueVectorMutable(fieldIdx);
+vec.setValue(row, res.value());
+};
+}
+
+template<>
+EdgeColumnSetter makeTypedEdgeSetter([[maybe_unused]] uint64_t 
fieldIdx,
+[[maybe_unused]] std::string colName) {
+throw NotImplementedException("List type is not supported in graphar 
scan.");
+}
+
+// Edge setter for "from" (source) and "to" (destination)
+template
+EdgeColumnSetter makeFromSetter(uint64_t fieldIdx, std::string colName) {
+return [fieldIdx, colName = std::move(colName)](graphar::EdgeIter& it,
+   function::TableFuncOutput& output, kuzu::common::idx_t row,
+   std::shared_ptr from_vertices) {
+graphar::IdType src = it.source();
+auto vertex_it = from_vertices->find(src);
+auto res = vertex_it.property(colName);
+auto& vec = output.dataChunk.getValueVectorMutable(fieldIdx);
+vec.setValue(row, res.value());
+};
+}
+
+template
+EdgeColumnSetter makeToSetter(uint64_t fieldIdx, std::string colName) {
+return [fieldIdx, colName = std::move(colName)](graphar::EdgeIter& it,
+   function::TableFuncOutput& output, kuzu::common::idx_t row,
+   std::shared_ptr to_vertices) {
+graphar::IdType dst = it.destination();
+auto vertex_it = to_vertices->find(dst);
+auto res = vertex_it.property(colName);
+auto& vec = output.dataChunk.getValueVectorMutable(fieldIdx);
+vec.setValue(row, res.value());
+};
+}
+
+// Edge setter for "internal_from" (source) and "in