Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
wgtmac commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4428618092 1.17.1 is a patched release with bug fixes only. For this feature, it is targeted in the 1.18.0 which I don't have an ETA yet. Perhaps you may want to reply to the relevant discussion thread in the [email protected]. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4428529641 thanks @wgtmac , i see release 1.17.1 is released 5 hrs ago, just want to know by when next release will get release so that i can use same in flink parquet and confluent kafka s3 sink connector -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
wgtmac commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4428361470 Thanks @gaurav7261 for working on this and @steveloughran for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
wgtmac merged PR #3415: URL: https://github.com/apache/parquet-java/pull/3415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4412848680 @wgtmac @Fokko can you please review and merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on code in PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#discussion_r3180327950 ## .gitignore: ## @@ -20,4 +20,5 @@ target/ mvn_install.log .vscode/* .DS_Store +.memsearch/ Review Comment: not as such @wgtmac , if you need i can remove it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on code in PR #3415:
URL: https://github.com/apache/parquet-java/pull/3415#discussion_r3179655291
##
parquet-variant/src/main/java/org/apache/parquet/variant/VariantJsonParser.java:
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.parquet.variant;
+
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonParseException;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.core.StreamReadConstraints;
+import com.fasterxml.jackson.core.exc.InputCoercionException;
+import java.io.IOException;
+import java.math.BigDecimal;
+
+/**
+ * Parses JSON into {@link Variant} values using Jackson streaming.
+ *
+ * This class isolates the Jackson dependency from {@link VariantBuilder},
+ * so that core variant construction does not require Jackson on the classpath.
+ *
+ * Ported from Apache Spark's {@code VariantBuilder.parseJson}.
+ */
+public final class VariantJsonParser {
+
+ private static final JsonFactory JSON_FACTORY = JsonFactory.builder()
+ .streamReadConstraints(StreamReadConstraints.builder()
+ .maxNestingDepth(500)
+ .maxStringLength(10_000_000)
+ .maxDocumentLength(50_000_000L)
+ .build())
+ .build();
+
+ private VariantJsonParser() {}
+
+ /**
+ * Parses a JSON string and returns the corresponding {@link Variant}.
+ *
+ * Uses Jackson streaming parser for single-pass conversion
+ * with no intermediate tree. Number handling preserves precision:
+ * integers use the smallest fitting type, floating-point numbers
+ * prefer decimal encoding (no scientific notation) and fall back
+ * to double.
+ *
+ * @param json the JSON string to parse
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(String json) throws IOException {
+try (JsonParser parser = JSON_FACTORY.createParser(json)) {
+ parser.nextToken();
+ return parseJson(parser);
+}
+ }
+
+ /**
+ * Parses a JSON value from an already-positioned {@link JsonParser}
+ * and returns the corresponding {@link Variant}. The parser must
+ * have its current token set (i.e., {@code parser.nextToken()}
+ * or equivalent must have been called).
+ *
+ * @param parser a positioned Jackson JsonParser
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(JsonParser parser) throws IOException {
+VariantBuilder builder = new VariantBuilder();
+buildJson(builder, parser);
+return builder.build();
+ }
+
+ /**
+ * Recursively builds a Variant value from the current position of a
+ * Jackson streaming parser. Handles objects, arrays, strings, numbers
+ * (int/long/decimal/double), booleans, and null.
+ */
+ private static void buildJson(VariantBuilder builder, JsonParser parser)
throws IOException {
+JsonToken token = parser.currentToken();
+if (token == null) {
+ throw new JsonParseException(parser, "Unexpected null token");
+}
+switch (token) {
Review Comment:
i think jackson will treat them string only
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
wgtmac commented on code in PR #3415:
URL: https://github.com/apache/parquet-java/pull/3415#discussion_r3173779835
##
.gitignore:
##
@@ -20,4 +20,5 @@ target/
mvn_install.log
.vscode/*
.DS_Store
+.memsearch/
Review Comment:
Do we really need it in this PR?
##
parquet-variant/src/main/java/org/apache/parquet/variant/VariantJsonParser.java:
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.parquet.variant;
+
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonParseException;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.core.StreamReadConstraints;
+import com.fasterxml.jackson.core.exc.InputCoercionException;
+import java.io.IOException;
+import java.math.BigDecimal;
+
+/**
+ * Parses JSON into {@link Variant} values using Jackson streaming.
+ *
+ * This class isolates the Jackson dependency from {@link VariantBuilder},
+ * so that core variant construction does not require Jackson on the classpath.
+ *
+ * Ported from Apache Spark's {@code VariantBuilder.parseJson}.
+ */
+public final class VariantJsonParser {
+
+ private static final JsonFactory JSON_FACTORY = JsonFactory.builder()
+ .streamReadConstraints(StreamReadConstraints.builder()
+ .maxNestingDepth(500)
+ .maxStringLength(10_000_000)
+ .maxDocumentLength(50_000_000L)
+ .build())
+ .build();
+
+ private VariantJsonParser() {}
+
+ /**
+ * Parses a JSON string and returns the corresponding {@link Variant}.
+ *
+ * Uses Jackson streaming parser for single-pass conversion
+ * with no intermediate tree. Number handling preserves precision:
+ * integers use the smallest fitting type, floating-point numbers
+ * prefer decimal encoding (no scientific notation) and fall back
+ * to double.
+ *
+ * @param json the JSON string to parse
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(String json) throws IOException {
+try (JsonParser parser = JSON_FACTORY.createParser(json)) {
+ parser.nextToken();
+ return parseJson(parser);
+}
+ }
+
+ /**
+ * Parses a JSON value from an already-positioned {@link JsonParser}
+ * and returns the corresponding {@link Variant}. The parser must
+ * have its current token set (i.e., {@code parser.nextToken()}
+ * or equivalent must have been called).
+ *
+ * @param parser a positioned Jackson JsonParser
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(JsonParser parser) throws IOException {
+VariantBuilder builder = new VariantBuilder();
+buildJson(builder, parser);
+return builder.build();
+ }
+
+ /**
+ * Recursively builds a Variant value from the current position of a
+ * Jackson streaming parser. Handles objects, arrays, strings, numbers
+ * (int/long/decimal/double), booleans, and null.
+ */
+ private static void buildJson(VariantBuilder builder, JsonParser parser)
throws IOException {
+JsonToken token = parser.currentToken();
+if (token == null) {
+ throw new JsonParseException(parser, "Unexpected null token");
+}
+switch (token) {
Review Comment:
Is there any chance to parse types like date and timestamp?
##
parquet-variant/pom.xml:
##
@@ -46,6 +46,17 @@
parquet-column
${project.version}
+
+ ${jackson.groupId}
+ jackson-core
Review Comment:
Does this dependency change look good to you? @gszadovszky @Fokko
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
emkornfield commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4355523038 @gaurav7261 I think we need a committer to take a look. CC @wgtmac @Fokko @julienledem if you have time, otherwise I can take a look but I'm not super familiar with the java side of things. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4355461973 @emkornfield CI passed, can we merge now please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4350830874 @wgtmac can you please help run workflow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4313854924 @wgtmac java version issue in local, now fixed with java 11 spotless,please rerun again, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4310616053 @wgtmac i have fixed mvn spotless, please rerun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4301828946 @steveloughran can you please help here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4269702657 @steveloughran i don't have access to email archive, @alamb created thread for me, @alamb can you please help here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
steveloughran commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4269283552 gaurav, you'll need to ask on the mail list for approval to run the workflows. Best to only do incremental changes on a PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on code in PR #3415:
URL: https://github.com/apache/parquet-java/pull/3415#discussion_r3079432669
##
parquet-variant/src/main/java/org/apache/parquet/variant/VariantJsonParser.java:
##
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.parquet.variant;
+
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonParseException;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.core.StreamReadConstraints;
+import com.fasterxml.jackson.core.exc.InputCoercionException;
+import java.io.IOException;
+import java.math.BigDecimal;
+
+/**
+ * Parses JSON into {@link Variant} values using Jackson streaming.
+ *
+ * This class isolates the Jackson dependency from {@link VariantBuilder},
+ * so that core variant construction does not require Jackson on the classpath.
+ *
+ * Ported from Apache Spark's {@code VariantBuilder.parseJson}.
+ */
+public final class VariantJsonParser {
+
+ private static final JsonFactory JSON_FACTORY = JsonFactory.builder()
+ .streamReadConstraints(StreamReadConstraints.builder()
+ .maxNestingDepth(500)
+ .maxStringLength(10_000_000)
+ .maxDocumentLength(50_000_000L)
+ .build())
+ .build();
+
+ private VariantJsonParser() {}
+
+ /**
+ * Parses a JSON string and returns the corresponding {@link Variant}.
+ *
+ * Uses Jackson streaming parser for single-pass conversion
+ * with no intermediate tree. Number handling preserves precision:
+ * integers use the smallest fitting type, floating-point numbers
+ * prefer decimal encoding (no scientific notation) and fall back
+ * to double.
+ *
+ * @param json the JSON string to parse
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(String json) throws IOException {
+try (JsonParser parser = JSON_FACTORY.createParser(json)) {
+ parser.nextToken();
+ return parseJson(parser);
+}
+ }
+
+ /**
+ * Parses a JSON value from an already-positioned {@link JsonParser}
+ * and returns the corresponding {@link Variant}. The parser must
+ * have its current token set (i.e., {@code parser.nextToken()}
+ * or equivalent must have been called).
+ *
+ * @param parser a positioned Jackson JsonParser
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(JsonParser parser) throws IOException {
+VariantBuilder builder = new VariantBuilder();
+buildJson(builder, parser);
+return builder.build();
+ }
+
+ /**
+ * Recursively builds a Variant value from the current position of a
+ * Jackson streaming parser. Handles objects, arrays, strings, numbers
+ * (int/long/decimal/double), booleans, and null.
+ */
+ private static void buildJson(VariantBuilder builder, JsonParser parser)
throws IOException {
+JsonToken token = parser.currentToken();
+if (token == null) {
+ throw new JsonParseException(parser, "Unexpected null token");
+}
+switch (token) {
+ case START_OBJECT:
+buildJsonObject(builder, parser);
+break;
+ case START_ARRAY:
+buildJsonArray(builder, parser);
+break;
+ case VALUE_STRING:
+builder.appendString(parser.getText());
+break;
+ case VALUE_NUMBER_INT:
+buildJsonInteger(builder, parser);
+break;
+ case VALUE_NUMBER_FLOAT:
+buildJsonFloat(builder, parser);
+break;
+ case VALUE_TRUE:
+builder.appendBoolean(true);
+break;
+ case VALUE_FALSE:
+builder.appendBoolean(false);
+break;
+ case VALUE_NULL:
+builder.appendNull();
+break;
+ default:
+throw new JsonParseException(parser, "Unexpected token " + token);
+}
+ }
+
+ private static void buildJsonObject(VariantBuilder builder, JsonParser
parser) throws IOException {
+VariantObjectBuilder obj = builder.startObject();
+while (parser.nextToken() != JsonToken.END_OBJECT)
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
steveloughran commented on code in PR #3415:
URL: https://github.com/apache/parquet-java/pull/3415#discussion_r3073217710
##
parquet-variant/src/main/java/org/apache/parquet/variant/VariantJsonParser.java:
##
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.parquet.variant;
+
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonParseException;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.core.StreamReadConstraints;
+import com.fasterxml.jackson.core.exc.InputCoercionException;
+import java.io.IOException;
+import java.math.BigDecimal;
+
+/**
+ * Parses JSON into {@link Variant} values using Jackson streaming.
+ *
+ * This class isolates the Jackson dependency from {@link VariantBuilder},
+ * so that core variant construction does not require Jackson on the classpath.
+ *
+ * Ported from Apache Spark's {@code VariantBuilder.parseJson}.
+ */
+public final class VariantJsonParser {
+
+ private static final JsonFactory JSON_FACTORY = JsonFactory.builder()
+ .streamReadConstraints(StreamReadConstraints.builder()
+ .maxNestingDepth(500)
+ .maxStringLength(10_000_000)
+ .maxDocumentLength(50_000_000L)
+ .build())
+ .build();
+
+ private VariantJsonParser() {}
+
+ /**
+ * Parses a JSON string and returns the corresponding {@link Variant}.
+ *
+ * Uses Jackson streaming parser for single-pass conversion
+ * with no intermediate tree. Number handling preserves precision:
+ * integers use the smallest fitting type, floating-point numbers
+ * prefer decimal encoding (no scientific notation) and fall back
+ * to double.
+ *
+ * @param json the JSON string to parse
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(String json) throws IOException {
+try (JsonParser parser = JSON_FACTORY.createParser(json)) {
+ parser.nextToken();
+ return parseJson(parser);
+}
+ }
+
+ /**
+ * Parses a JSON value from an already-positioned {@link JsonParser}
+ * and returns the corresponding {@link Variant}. The parser must
+ * have its current token set (i.e., {@code parser.nextToken()}
+ * or equivalent must have been called).
+ *
+ * @param parser a positioned Jackson JsonParser
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(JsonParser parser) throws IOException {
+VariantBuilder builder = new VariantBuilder();
+buildJson(builder, parser);
+return builder.build();
+ }
+
+ /**
+ * Recursively builds a Variant value from the current position of a
+ * Jackson streaming parser. Handles objects, arrays, strings, numbers
+ * (int/long/decimal/double), booleans, and null.
+ */
+ private static void buildJson(VariantBuilder builder, JsonParser parser)
throws IOException {
+JsonToken token = parser.currentToken();
+if (token == null) {
+ throw new JsonParseException(parser, "Unexpected null token");
+}
+switch (token) {
+ case START_OBJECT:
+buildJsonObject(builder, parser);
+break;
+ case START_ARRAY:
+buildJsonArray(builder, parser);
+break;
+ case VALUE_STRING:
+builder.appendString(parser.getText());
+break;
+ case VALUE_NUMBER_INT:
+buildJsonInteger(builder, parser);
+break;
+ case VALUE_NUMBER_FLOAT:
+buildJsonFloat(builder, parser);
+break;
+ case VALUE_TRUE:
+builder.appendBoolean(true);
+break;
+ case VALUE_FALSE:
+builder.appendBoolean(false);
+break;
+ case VALUE_NULL:
+builder.appendNull();
+break;
+ default:
+throw new JsonParseException(parser, "Unexpected token " + token);
+}
+ }
+
+ private static void buildJsonObject(VariantBuilder builder, JsonParser
parser) throws IOException {
Review Comment:
nit: add javadoc explaining how this will co-recurse into buildJSON
##
parqu
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on code in PR #3415:
URL: https://github.com/apache/parquet-java/pull/3415#discussion_r3061908525
##
parquet-variant/pom.xml:
##
@@ -46,6 +46,17 @@
parquet-column
${project.version}
+
+ ${jackson.groupId}
Review Comment:
Hi @steveloughran , saw your reply on email, so all looks good to you in
above PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on code in PR #3415:
URL: https://github.com/apache/parquet-java/pull/3415#discussion_r3030311558
##
parquet-variant/pom.xml:
##
@@ -46,6 +46,17 @@
parquet-column
${project.version}
+
+ ${jackson.groupId}
Review Comment:
Hi @steveloughran , email is sent to `[email protected]`, can you
please check there
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
steveloughran commented on code in PR #3415:
URL: https://github.com/apache/parquet-java/pull/3415#discussion_r3028120135
##
parquet-variant/pom.xml:
##
@@ -46,6 +46,17 @@
parquet-column
${project.version}
+
+ ${jackson.groupId}
Review Comment:
comes with the parquet-jackson module; no need to reimport at a different
scope
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4176053595 @steveloughran can you please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4093497083 @steveloughran can you please review again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
steveloughran commented on code in PR #3415:
URL: https://github.com/apache/parquet-java/pull/3415#discussion_r2959013394
##
parquet-variant/src/main/java/org/apache/parquet/variant/VariantBuilder.java:
##
@@ -65,6 +73,152 @@ public VariantBuilder(Metadata metadata) {
this.metadata = metadata;
}
+ /**
+ * Parses a JSON string and returns the corresponding {@link Variant}.
+ *
+ * Uses Jackson streaming parser for single-pass conversion
+ * with no intermediate tree. Number handling preserves precision:
+ * integers use the smallest fitting type, floating-point numbers
+ * prefer decimal encoding (no scientific notation) and fall back
+ * to double.
+ *
+ * Ported from Apache Spark's {@code VariantBuilder.parseJson}.
+ *
+ * @param json the JSON string to parse
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(String json) throws IOException {
+try (JsonParser parser = JSON_FACTORY.createParser(json)) {
Review Comment:
can be quite slow
##
parquet-variant/src/test/java/org/apache/parquet/variant/TestVariantParseJson.java:
##
@@ -0,0 +1,307 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.variant;
+
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonParser;
+import java.io.IOException;
+import java.math.BigDecimal;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestVariantParseJson {
Review Comment:
1. nested object parsing?
2. what about invalid json?
* empty file
* not a json file
* incomplete
* large json with many values
##
NOTICE:
##
@@ -73,3 +73,14 @@ notice:
| See the License for the specific language governing permissions and
| limitations under the License.
+
+
+This project includes code from Apache Spark with the following copyright
+notice:
+
+ Apache Spark
Review Comment:
do cross-ASF projects need this credit? What I do think is good is ensure
the original authors get credit in the final commit message
##
parquet-variant/src/main/java/org/apache/parquet/variant/VariantBuilder.java:
##
@@ -30,6 +36,8 @@
*/
public class VariantBuilder {
+ private static final JsonFactory JSON_FACTORY = new JsonFactory();
Review Comment:
factory should be built with some constraints so that giving it a malicious
json file should be rejected rather than trigger OOM problems, etc. From the
javadocs.
```
JsonFactory f = JsonFactory.builder()
.streamReadConstraints(
StreamReadConstraints.builder()
.maxNestingDepth(500)
.maxStringLength(10_000_000)
.maxDocumentLength(5_000_000)
.build()
)
.build();
```
##
parquet-variant/src/main/java/org/apache/parquet/variant/VariantBuilder.java:
##
@@ -65,6 +73,152 @@ public VariantBuilder(Metadata metadata) {
this.metadata = metadata;
}
+ /**
+ * Parses a JSON string and returns the corresponding {@link Variant}.
+ *
+ * Uses Jackson streaming parser for single-pass conversion
+ * with no intermediate tree. Number handling preserves precision:
+ * integers use the smallest fitting type, floating-point numbers
+ * prefer decimal encoding (no scientific notation) and fall back
+ * to double.
+ *
+ * Ported from Apache Spark's {@code VariantBuilder.parseJson}.
+ *
+ * @param json the JSON string to parse
+ * @return the parsed Variant
+ * @throws IOException if the JSON is malformed or an I/O error occurs
+ */
+ public static Variant parseJson(String json) throws IOException {
Review Comment:
I was to suggest making CharSequence for feeding in from other places
(string fields within avro, ...) but it looks like jackson 2 doesn't support
that itself.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-ma
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
wgtmac commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4073504417 I'm not familiar with this code yet but I think it is worth adding. @emkornfield @gene-db @rdblue WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4068883305 @gszadovszky can you please review as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4041011367 @julienledem thanks for the call, I have added notice, please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
alamb commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4040878302 FWIW we have a similar method in Rust in case that is interesting - https://github.com/apache/arrow-rs/blob/d3c79006f2595e144d539f56b3054fe916ab184b/parquet-variant-compute/src/from_json.rs#L47 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4025087776 @Fokko can you please review, is it looking good to you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-3414: Add parseJson to VariantBuilder for JSON-to-Variant conversion [parquet-java]
gaurav7261 commented on PR #3415: URL: https://github.com/apache/parquet-java/pull/3415#issuecomment-4015764645 @alamb @aihuaxu read https://parquet.apache.org/blog/2026/02/27/variant-type-in-apache-parquet-for-semi-structured-data/ and check the feasibility of having our S3 Sink connector write variant, found out that parseJson can be a better fit here, wdyt? is it making sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
