[GitHub] [hudi] rahulpoptani commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.
rahulpoptani commented on issue #2180: URL: https://github.com/apache/hudi/issues/2180#issuecomment-712585912 I used a different environment where I used Spark 2.4.5 with Scala 2.12 and I was able to successfully perform Insert/Upsert/Deletes and Read on Merge-On-Read table type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation
SteNicholas commented on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-712566489 > @SteNicholas @leesf : Does this essentially mean we no longer support small file handling for "inserts" ? > If user doesn't essentially care about duplicates, I agree that we need to have same behavior w/o small file handling. Instead of this approach, can we create a new type of Write Handle which looks like MergeHandle but does not merge but rather appends records and creates a new version of Parquet file. You can then use this Handle instead of UpdateHandle when pure insert operation type is used. > > cc @vinothchandar Yes, user doesn't essentially care about duplicates for small files and the same behavior w/o small file handling makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation
bvaradar commented on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-712564198 @SteNicholas @leesf : Does this essentially mean we no longer support small file handling for "inserts" ? If user doesn't essentially care about duplicates, I agree that we need to have same behavior w/o small file handling. Instead of this approach, can we create a new type of Write Handle which looks like MergeHandle but does not merge but rather appends records and creates a new version of Parquet file. You can then use this Handle instead of UpdateHandle when pure insert operation type is used. cc @vinothchandar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution
lw309637554 commented on pull request #2127: URL: https://github.com/apache/hudi/pull/2127#issuecomment-712536085 > @lw309637554 looks like comments from @pratyakshsharma were addressed. sorry about the delay. merging now. Thank you @lw309637554 for adding the cases! Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility
vinothchandar commented on pull request #1760: URL: https://github.com/apache/hudi/pull/1760#issuecomment-712505539 >we want to make Hudi compile with spark 2 and then run with spark3? this was the intention. but as @bschell pointed out some classes have changed and we need to make parts of `hudi-spark` modular and plugin spark version specific implementations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-303) Avro schema case sensitivity testing
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217177#comment-17217177 ] Vinoth Chandar commented on HUDI-303: - [~309637554] this task is about exploring all possibilities and making a call. IIUC you are making the case for retaining the lower casing. I think what you point out is why we lower cased this. I can't decide for myself until we paint the full picture. :) > Avro schema case sensitivity testing > > > Key: HUDI-303 > URL: https://issues.apache.org/jira/browse/HUDI-303 > Project: Apache Hudi > Issue Type: Test > Components: Spark Integration >Reporter: Udit Mehrotra >Assignee: liwei >Priority: Minor > Labels: bug-bash-0.6.0 > > As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] we > would like to understand how Avro behaves with case sensitive column names. > Couple of action items: > * Test with different field names just differing in case. > * *AbstractRealtimeRecordReader* is one of the classes where we are > converting Avro Schema field names to lower case, to be able to verify them > against column names from Hive. We can consider removing the *lowercase* > conversion there if we verify it does not break anything. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1321) Support properties for metadata table via a properties.file
[ https://issues.apache.org/jira/browse/HUDI-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-1321: - Status: Open (was: New) > Support properties for metadata table via a properties.file > --- > > Key: HUDI-1321 > URL: https://issues.apache.org/jira/browse/HUDI-1321 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Prashant Wason >Priority: Major > > metadata properties should be in its own namespace -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1321) Support properties for metadata table via a properties.file
[ https://issues.apache.org/jira/browse/HUDI-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-1321: - Status: In Progress (was: Open) > Support properties for metadata table via a properties.file > --- > > Key: HUDI-1321 > URL: https://issues.apache.org/jira/browse/HUDI-1321 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Prashant Wason >Priority: Major > > metadata properties should be in its own namespace -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] umehrot2 commented on issue #2057: [SUPPORT] AWSDmsAvroPayload not processing Deletes correctly + IOException when reading log file
umehrot2 commented on issue #2057: URL: https://github.com/apache/hudi/issues/2057#issuecomment-712454567 > @umehrot2 It looks like for 0.6.0 where this issue is fixed, @WTa-hash is seeing the exception `java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.PartitionedFile.` originating from `at org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:142) `. Any ideas if this is wrong spark versions ? @n3nash this issue is because of EMR environment. The jar @WTa-hash was building for 0.6.0 is not compiled against EMR's spark version. EMR has its own spark with various modifications. This issue should not be there in emr-5.31.0 where hudi 0.6.0 is officially supported. No need to replace that jar's there. The jar's we provide there are compiled against our own spark and it fixes this issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] umehrot2 commented on issue #2057: [SUPPORT] AWSDmsAvroPayload not processing Deletes correctly + IOException when reading log file
umehrot2 commented on issue #2057: URL: https://github.com/apache/hudi/issues/2057#issuecomment-712453460 > > @umehrot2 Could the IOException be due to #2089 ? > > I'm not entirely sure if it's related to this issue as the steps to reproduce is different, but the thing I see in common is that both issues are referencing a MOR table. I don't get this issue when my table is COW. @n3nash @WTa-hash Apologies for the late response here. I think we need the full stack trace here to be able to debug this. Looking at https://github.com/apache/hudi/blob/release-0.6.0/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java#L244 it seems that the full exception is only logged at the executors. So, either the executor logs should be checked to see the full trace, or change the line here to throw the actual exception back to the driver. If this is still happening, you can also open a jira with some basic reproduction steps if possible. I do not personally think that it would be related to #2089 but we cannot be certain without seeing the full exception. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] prashantwason commented on pull request #2189: Some more updates to the rfc-15 implementation
prashantwason commented on pull request #2189: URL: https://github.com/apache/hudi/pull/2189#issuecomment-712449097 @vinothchandar Some more updates from my side. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] prashantwason opened a new pull request #2189: Some more updates to the rfc-15 implementation
prashantwason opened a new pull request #2189: URL: https://github.com/apache/hudi/pull/2189 ## Brief change log Please see individual commits and the tagged JIRA items for details. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] umehrot2 merged pull request #2185: [HUDI-1345] Remove Hbase and htrace relocation from utilities bundle
umehrot2 merged pull request #2185: URL: https://github.com/apache/hudi/pull/2185 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-1345] Remove Hbase and htrace relocation from utilities bundle (#2185)
This is an automated email from the ASF dual-hosted git repository. uditme pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 6490b02 [HUDI-1345] Remove Hbase and htrace relocation from utilities bundle (#2185) 6490b02 is described below commit 6490b029dd05e5a3d704ebd5b314899a53dd76fb Author: Bhavani Sudha Saktheeswaran AuthorDate: Mon Oct 19 14:11:08 2020 -0700 [HUDI-1345] Remove Hbase and htrace relocation from utilities bundle (#2185) --- packaging/hudi-utilities-bundle/pom.xml | 8 1 file changed, 8 deletions(-) diff --git a/packaging/hudi-utilities-bundle/pom.xml b/packaging/hudi-utilities-bundle/pom.xml index 91ae5fd..39e48bb 100644 --- a/packaging/hudi-utilities-bundle/pom.xml +++ b/packaging/hudi-utilities-bundle/pom.xml @@ -168,14 +168,6 @@ org.apache.hudi.org.apache.commons.codec. - org.apache.hadoop.hbase. - org.apache.hudi.org.apache.hadoop.hbase. - - - org.apache.htrace. - org.apache.hudi.org.apache.htrace. - - org.eclipse.jetty. org.apache.hudi.org.eclipse.jetty.
[GitHub] [hudi] ashishmgofficial edited a comment on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer
ashishmgofficial edited a comment on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712404711 Not sure if this is gonna be of any help but attaching the latest logs. I can see this messages towards the end ``` at scala.collection.Iterator$class.isEmpty(Iterator.scala:331) at scala.collection.AbstractIterator.isEmpty(Iterator.scala:1334) at org.apache.hudi.AvroConversionUtils$$anonfun$2.apply(AvroConversionUtils.scala:46) at org.apache.hudi.AvroConversionUtils$$anonfun$2.apply(AvroConversionUtils.scala:45) ``` [o.log](https://github.com/apache/hudi/files/5404352/o.log) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ashishmgofficial edited a comment on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer
ashishmgofficial edited a comment on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712404711 Not sure if this is gonna be of any help but attaching the latest logs. I can see this messages towards the end ``` at scala.collection.Iterator$class.isEmpty(Iterator.scala:331) at scala.collection.AbstractIterator.isEmpty(Iterator.scala:1334) **at org.apache.hudi.AvroConversionUtils$$anonfun$2.apply(AvroConversionUtils.scala:46) at org.apache.hudi.AvroConversionUtils$$anonfun$2.apply(AvroConversionUtils.scala:45)** ``` [o.log](https://github.com/apache/hudi/files/5404352/o.log) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712404711 Not sure if this is gonna be of any help but attaching the latest logs. I can see this messages towards the end ``` at scala.collection.Iterator$class.isEmpty(Iterator.scala:331) at scala.collection.AbstractIterator.isEmpty(Iterator.scala:1334) **at org.apache.hudi.AvroConversionUtils$$anonfun$2.apply(AvroConversionUtils.scala:46) at org.apache.hudi.AvroConversionUtils$$anonfun$2.apply(AvroConversionUtils.scala:45)** ``` [Uploading o.log…]() This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhedoubushishi commented on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility
zhedoubushishi commented on pull request #1760: URL: https://github.com/apache/hudi/pull/1760#issuecomment-712391147 @bschell @vinothchandar to make clear, just wondering what is the exact goal for this pr? Do we want to make Hudi support both compile & run with spark 3 or we want to make Hudi compile with spark 2 and then run with spark3? Ideally we should make Hudi both compile and run with Spark3. But current code change cannot compile with spark 3. Run ``` mvn clean install -DskipTests -DskipITs -Dspark.version=3.0.0 -Pscala-2.12 ``` returns ``` [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile (default-testCompile) on project hudi-client: Compilation failure [ERROR] /Users/wenningd/workplace/Aws157Hudi/src/Aws157Hudi/hudi-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java:[146,27] cannot find symbol [ERROR] symbol: method toRow(org.apache.spark.sql.Row) [ERROR] location: variable encoder of type org.apache.spark.sql.catalyst.encoders.ExpressionEncoder [ERROR] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712377184 @bvaradar I can provide all the SQL's in Postgres which I'm using to reproduce this though : ``` DROP TABLE public.motor_crash_violation_incidents; CREATE TABLE public.motor_crash_violation_incidents ( inc_id serial , "year" int4 NULL, violation_desc varchar(100) NULL, violation_code varchar(20) NULL, case_individual_id int4 NULL, flag varchar(1) NULL, last_modified_ts timestamp not NULL, CONSTRAINT motor_crash_violation_incidents_pkey PRIMARY KEY (inc_id) ); ALTER TABLE public.motor_crash_violation_incidents REPLICA IDENTITY FULL; ``` Insert records : ``` INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(1, 2016, 'DRIVING WHILE INTOXICATED', '11923', 17475366, 'I', '2020-09-24 11:03:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(3, 2016, 'AGGRAVATED UNLIC OPER 2ND/PREV CONV', '5112A1', 17475367, 'U', '2020-09-24 15:00:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(4, 2019, 'AGGRAVATED UNLIC OPER 2ND/PREV', '5112A2', 17475368, 'I', '2019-09-24 15:00:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(2, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '2180F', 17475569, 'U', '2020-09-29 11:00:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(9, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180E', 17475573, 'I', '2020-09-29 11:00:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(10, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180D', 17475574, 'I', '2020-09-29 11:10:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(11, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180D', 17475574, 'I', '2020-09-29 12:10:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(12, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180E', 17475574, 'I', '2020-09-29 13:10:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(13, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180E', 17475574, 'I', '2020-09-29 14:10:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(34, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180E', 17475574, 'I', '2020-09-29 15:10:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(35, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180E', 17475574, 'I', '2020-09-29 16:10:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(36, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180E', 17475574, 'I', '2020-09-29 17:00:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(37, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180E', 17475574, 'I', '2020-09-29 17:10:00.000'); commit; INSERT INTO public.motor_crash_violation_incidents (inc_id, "year", violation_desc, violation_code, case_individual_id, flag, last_modified_ts) VALUES(38, 2020, 'UNREASONABLE SPEED/SPECIAL HAZARDS', '1180D', 17475574, 'I', '2020-09-29 18:00:00.000'); commit; ``` Issue Delete : ``` DELETE FROM public.motor_crash_violation_incidents WHERE inc_id=3; ``` These changes are automatically picked by the Confluent Kafka's Postgres Debezium Connector and written to topic This is an automated message from the Apache Git Service. To respond to the
[GitHub] [hudi] bvaradar closed issue #2108: [SUPPORT]Submit rollback -->Pending job --> kill YARN --> lost data
bvaradar closed issue #2108: URL: https://github.com/apache/hudi/issues/2108 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1340) Not able to query real time table when rows contains nested elements
[ https://issues.apache.org/jira/browse/HUDI-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216913#comment-17216913 ] Balaji Varadarajan commented on HUDI-1340: -- [~bdighe]: Did you use --conf spark.sql.hive.convertMetastoreParquet=false when you started your spark-shell where you are running the query ? https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-Whydowehavetoset2differentwaysofconfiguringSparktoworkwithHudi? > Not able to query real time table when rows contains nested elements > > > Key: HUDI-1340 > URL: https://issues.apache.org/jira/browse/HUDI-1340 > Project: Apache Hudi > Issue Type: Bug >Reporter: Bharat Dighe >Priority: Major > Attachments: create_avro.py, user.avsc, users1.avro, users2.avro, > users3.avro, users4.avro, users5.avro > > > AVRO schema: Attached > Script to generate sample data: attached > Sample data attached > == > the schema as nested elements, here is the output from hive > {code:java} > CREATE EXTERNAL TABLE `users_mor_rt`( > `_hoodie_commit_time` string, > `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, > `_hoodie_partition_path` string, > `_hoodie_file_name` string, > `name` string, > `userid` int, > `datehired` string, > `meta` struct, > `experience` > struct>>) > PARTITIONED BY ( > `role` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 'hdfs://namenode:8020/tmp/hudi_repair_order_mor' > TBLPROPERTIES ( > 'last_commit_time_sync'='20201011190954', > 'transient_lastDdlTime'='1602442906') > {code} > scala code: > {code:java} > import java.io.File > import org.apache.hudi.QuickstartUtils._ > import org.apache.spark.sql.SaveMode._ > import org.apache.avro.Schema > import org.apache.hudi.DataSourceReadOptions._ > import org.apache.hudi.DataSourceWriteOptions._ > import org.apache.hudi.config.HoodieWriteConfig._ > val tableName = "users_mor" > // val basePath = "hdfs:///tmp/hudi_repair_order_mor" > val basePath = "hdfs:///tmp/hudi_repair_order_mor" > // Insert Data > /// local not hdfs !!! > //val schema = new Schema.Parser().parse(new > File("/var/hoodie/ws/docker/demo/data/user/user.avsc")) > def updateHudi( num:String, op:String) = { > val path = "hdfs:///var/demo/data/user/users" + num + ".avro" > println( path ); > val avdf2 = new org.apache.spark.sql.SQLContext(sc).read.format("avro"). > // option("avroSchema", schema.toString). > load(path) > avdf2.select("name").show(false) > avdf2.write.format("hudi"). > options(getQuickstartWriteConfigs). > option(OPERATION_OPT_KEY,op). > option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ"). // > default:COPY_ON_WRITE, MERGE_ON_READ > option(KEYGENERATOR_CLASS_OPT_KEY, > "org.apache.hudi.keygen.ComplexKeyGenerator"). > option(PRECOMBINE_FIELD_OPT_KEY, "meta.ingestTime"). // dedup > option(RECORDKEY_FIELD_OPT_KEY, "userId"). // key > option(PARTITIONPATH_FIELD_OPT_KEY, "role"). > option(TABLE_NAME, tableName). > option("hoodie.compact.inline", false). > option(HIVE_STYLE_PARTITIONING_OPT_KEY, "true"). > option(HIVE_SYNC_ENABLED_OPT_KEY, "true"). > option(HIVE_TABLE_OPT_KEY, tableName). > option(HIVE_USER_OPT_KEY, "hive"). > option(HIVE_PASS_OPT_KEY, "hive"). > option(HIVE_URL_OPT_KEY, "jdbc:hive2://hiveserver:1"). > option(HIVE_PARTITION_FIELDS_OPT_KEY, "role"). > option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, > "org.apache.hudi.hive.MultiPartKeysValueExtractor"). > option("hoodie.datasource.hive_sync.assume_date_partitioning", > "false"). > mode(Append). > save(basePath) > spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, > _hoodie_partition_path, experience.companies[0] from " + tableName + > "_rt").show() > spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, > _hoodie_partition_path, _hoodie_commit_seqno from " + tableName + > "_ro").show() > } > updateHudi("1", "bulkinsert") > updateHudi("2", "upsert") > updateHudi("3", "upsert") > updateHudi("4", "upsert") > {code} > If nested fields are not included, it works fine > {code} > scala> spark.sql("select name from users_mor_rt"); > res19: org.apache.spark.sql.DataFrame = [name: string] > scala> spark.sql("select name from users_mor_rt").show(); > +-+ > | name| > +-+ > |engg3| > |engg1_new| > |engg2_new| > | mgr1| > | mgr2| > |
[GitHub] [hudi] bvaradar commented on issue #2108: [SUPPORT]Submit rollback -->Pending job --> kill YARN --> lost data
bvaradar commented on issue #2108: URL: https://github.com/apache/hudi/issues/2108#issuecomment-712304151 Closing this due to inactivity. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1340) Not able to query real time table when rows contains nested elements
[ https://issues.apache.org/jira/browse/HUDI-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1340: - Status: Open (was: New) > Not able to query real time table when rows contains nested elements > > > Key: HUDI-1340 > URL: https://issues.apache.org/jira/browse/HUDI-1340 > Project: Apache Hudi > Issue Type: Bug >Reporter: Bharat Dighe >Priority: Major > Attachments: create_avro.py, user.avsc, users1.avro, users2.avro, > users3.avro, users4.avro, users5.avro > > > AVRO schema: Attached > Script to generate sample data: attached > Sample data attached > == > the schema as nested elements, here is the output from hive > {code:java} > CREATE EXTERNAL TABLE `users_mor_rt`( > `_hoodie_commit_time` string, > `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, > `_hoodie_partition_path` string, > `_hoodie_file_name` string, > `name` string, > `userid` int, > `datehired` string, > `meta` struct, > `experience` > struct>>) > PARTITIONED BY ( > `role` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 'hdfs://namenode:8020/tmp/hudi_repair_order_mor' > TBLPROPERTIES ( > 'last_commit_time_sync'='20201011190954', > 'transient_lastDdlTime'='1602442906') > {code} > scala code: > {code:java} > import java.io.File > import org.apache.hudi.QuickstartUtils._ > import org.apache.spark.sql.SaveMode._ > import org.apache.avro.Schema > import org.apache.hudi.DataSourceReadOptions._ > import org.apache.hudi.DataSourceWriteOptions._ > import org.apache.hudi.config.HoodieWriteConfig._ > val tableName = "users_mor" > // val basePath = "hdfs:///tmp/hudi_repair_order_mor" > val basePath = "hdfs:///tmp/hudi_repair_order_mor" > // Insert Data > /// local not hdfs !!! > //val schema = new Schema.Parser().parse(new > File("/var/hoodie/ws/docker/demo/data/user/user.avsc")) > def updateHudi( num:String, op:String) = { > val path = "hdfs:///var/demo/data/user/users" + num + ".avro" > println( path ); > val avdf2 = new org.apache.spark.sql.SQLContext(sc).read.format("avro"). > // option("avroSchema", schema.toString). > load(path) > avdf2.select("name").show(false) > avdf2.write.format("hudi"). > options(getQuickstartWriteConfigs). > option(OPERATION_OPT_KEY,op). > option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ"). // > default:COPY_ON_WRITE, MERGE_ON_READ > option(KEYGENERATOR_CLASS_OPT_KEY, > "org.apache.hudi.keygen.ComplexKeyGenerator"). > option(PRECOMBINE_FIELD_OPT_KEY, "meta.ingestTime"). // dedup > option(RECORDKEY_FIELD_OPT_KEY, "userId"). // key > option(PARTITIONPATH_FIELD_OPT_KEY, "role"). > option(TABLE_NAME, tableName). > option("hoodie.compact.inline", false). > option(HIVE_STYLE_PARTITIONING_OPT_KEY, "true"). > option(HIVE_SYNC_ENABLED_OPT_KEY, "true"). > option(HIVE_TABLE_OPT_KEY, tableName). > option(HIVE_USER_OPT_KEY, "hive"). > option(HIVE_PASS_OPT_KEY, "hive"). > option(HIVE_URL_OPT_KEY, "jdbc:hive2://hiveserver:1"). > option(HIVE_PARTITION_FIELDS_OPT_KEY, "role"). > option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, > "org.apache.hudi.hive.MultiPartKeysValueExtractor"). > option("hoodie.datasource.hive_sync.assume_date_partitioning", > "false"). > mode(Append). > save(basePath) > spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, > _hoodie_partition_path, experience.companies[0] from " + tableName + > "_rt").show() > spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, > _hoodie_partition_path, _hoodie_commit_seqno from " + tableName + > "_ro").show() > } > updateHudi("1", "bulkinsert") > updateHudi("2", "upsert") > updateHudi("3", "upsert") > updateHudi("4", "upsert") > {code} > If nested fields are not included, it works fine > {code} > scala> spark.sql("select name from users_mor_rt"); > res19: org.apache.spark.sql.DataFrame = [name: string] > scala> spark.sql("select name from users_mor_rt").show(); > +-+ > | name| > +-+ > |engg3| > |engg1_new| > |engg2_new| > | mgr1| > | mgr2| > | devops1| > | devops2| > +-+ > {code} > But fails when I include nested field 'experience' > {code} > scala> spark.sql("select name, experience from users_mor_rt").show(); > 20/10/11 19:53:58 ERROR executor.Executor: Exception in task 0.0 in stage > 147.0 (TID 153) >
[GitHub] [hudi] bvaradar commented on issue #2162: [SUPPORT] Deltastreamer transform cannot add fields
bvaradar commented on issue #2162: URL: https://github.com/apache/hudi/issues/2162#issuecomment-712289095 @liujinhui1994 : Adding > Can work, but if the default value is not null, it will not work > { > "name": "adnetDesc", > "type": ["null", "long"], > "default": -1 > } > @bvaradar Let's discuss this in PR once you open it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan merged pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution
xushiyan merged pull request #2127: URL: https://github.com/apache/hudi/pull/2127 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-284] add more test for UpdateSchemaEvolution (#2127)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 4d80e1e [HUDI-284] add more test for UpdateSchemaEvolution (#2127) 4d80e1e is described below commit 4d80e1e221b0ddfd542baeee9b6cbb3b28a88e68 Author: lw0090 AuthorDate: Mon Oct 19 22:38:04 2020 +0800 [HUDI-284] add more test for UpdateSchemaEvolution (#2127) Unit test different schema evolution scenarios. --- .../java/org/apache/hudi/io/HoodieMergeHandle.java | 11 +- ...EvolvedSchema.txt => exampleEvolvedSchema.avsc} | 0 ...ma.txt => exampleEvolvedSchemaChangeOrder.avsc} | 8 +- txt => exampleEvolvedSchemaColumnRequire.avsc} | 2 +- ...ema.txt => exampleEvolvedSchemaColumnType.avsc} | 8 +- ...a.txt => exampleEvolvedSchemaDeleteColumn.avsc} | 8 +- .../{exampleSchema.txt => exampleSchema.avsc} | 0 .../hudi/client/TestUpdateSchemaEvolution.java | 194 +++-- .../org/apache/hudi/index/TestHoodieIndex.java | 2 +- .../hudi/index/bloom/TestHoodieBloomIndex.java | 2 +- .../index/bloom/TestHoodieGlobalBloomIndex.java| 2 +- .../commit/TestCopyOnWriteActionExecutor.java | 2 +- .../table/action/commit/TestUpsertPartitioner.java | 2 +- 13 files changed, 159 insertions(+), 82 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java index 77fef5c..faa7ff6 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java @@ -197,7 +197,6 @@ public class HoodieMergeHandle extends H } else { recordsDeleted++; } - writeStatus.markSuccess(hoodieRecord, recordMetadata); // deflate record payload after recording success. This will help users access payload as a // part of marking @@ -243,16 +242,14 @@ public class HoodieMergeHandle extends H if (copyOldRecord) { // this should work as it is, since this is an existing record String errMsg = "Failed to merge old record into new file for key " + key + " from old file " + getOldFilePath() - + " to new file " + newFilePath; + + " to new file " + newFilePath + " with writerSchema " + writerSchemaWithMetafields.toString(true); try { fileWriter.writeAvro(key, oldRecord); } catch (ClassCastException e) { -LOG.error("Schema mismatch when rewriting old record " + oldRecord + " from file " + getOldFilePath() -+ " to file " + newFilePath + " with writerSchema " + writerSchemaWithMetafields.toString(true)); +LOG.debug("Old record is " + oldRecord); throw new HoodieUpsertException(errMsg, e); - } catch (IOException e) { -LOG.error("Failed to merge old record into new file for key " + key + " from old file " + getOldFilePath() -+ " to new file " + newFilePath, e); + } catch (IOException | RuntimeException e) { +LOG.debug("Old record is " + oldRecord); throw new HoodieUpsertException(errMsg, e); } recordsWritten++; diff --git a/hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchema.txt b/hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchema.avsc similarity index 100% copy from hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchema.txt copy to hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchema.avsc diff --git a/hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchema.txt b/hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchemaChangeOrder.avsc similarity index 94% copy from hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchema.txt copy to hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchemaChangeOrder.avsc index c85c3ce..16844ff 100644 --- a/hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchema.txt +++ b/hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchemaChangeOrder.avsc @@ -21,10 +21,6 @@ "name": "trip", "fields": [ { -"name": "number", -"type": ["int", "null"] -}, -{ "name": "time", "type": "string" }, @@ -35,6 +31,10 @@ { "name": "added_field", "type": ["int", "null"] +}, +{ + "name": "number", + "type": ["int", "null"] } ] } diff --git a/hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchema.txt b/hudi-client/hudi-client-common/src/test/resources/exampleEvolvedSchemaColumnRequire.avsc similarity index
[GitHub] [hudi] xushiyan commented on pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution
xushiyan commented on pull request #2127: URL: https://github.com/apache/hudi/pull/2127#issuecomment-712208378 @lw309637554 looks like comments from @pratyakshsharma were addressed. sorry about the delay. merging now. Thank you @lw309637554 for adding the cases! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-303) Avro schema case sensitivity testing
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216732#comment-17216732 ] liwei commented on HUDI-303: [~uditme] , [~vinoth] what do you think about this :D** > Avro schema case sensitivity testing > > > Key: HUDI-303 > URL: https://issues.apache.org/jira/browse/HUDI-303 > Project: Apache Hudi > Issue Type: Test > Components: Spark Integration >Reporter: Udit Mehrotra >Assignee: liwei >Priority: Minor > Labels: bug-bash-0.6.0 > > As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] we > would like to understand how Avro behaves with case sensitive column names. > Couple of action items: > * Test with different field names just differing in case. > * *AbstractRealtimeRecordReader* is one of the classes where we are > converting Avro Schema field names to lower case, to be able to verify them > against column names from Hive. We can consider removing the *lowercase* > conversion there if we verify it does not break anything. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HUDI-303) Avro schema case sensitivity testing
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei reopened HUDI-303: > Avro schema case sensitivity testing > > > Key: HUDI-303 > URL: https://issues.apache.org/jira/browse/HUDI-303 > Project: Apache Hudi > Issue Type: Test > Components: Spark Integration >Reporter: Udit Mehrotra >Assignee: liwei >Priority: Minor > Labels: bug-bash-0.6.0 > > As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] we > would like to understand how Avro behaves with case sensitive column names. > Couple of action items: > * Test with different field names just differing in case. > * *AbstractRealtimeRecordReader* is one of the classes where we are > converting Avro Schema field names to lower case, to be able to verify them > against column names from Hive. We can consider removing the *lowercase* > conversion there if we verify it does not break anything. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-303) Avro schema case sensitivity testing
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216728#comment-17216728 ] liwei commented on HUDI-303: i do not think this should fix. because hive meta column is case insensitive. if do not *lowercase will not match the hive meta schema with avro schema. just like : hive_metastoreConstants.META_TABLE_COLUMNS will be case insensitive.* Map schemaFieldsMap = HoodieRealtimeRecordReaderUtils.getNameToFieldMap(writerSchema); hiveSchema = constructHiveOrderedSchema(writerSchema, schemaFieldsMap); // Get all column names of hive table String hiveColumnString = jobConf.get(hive_metastoreConstants.META_TABLE_COLUMNS); LOG.info("Hive Columns : " + hiveColumnString); String[] hiveColumns = hiveColumnString.split(","); LOG.info("Hive Columns : " + hiveColumnString); List hiveSchemaFields = new ArrayList<>(); for (String columnName : hiveColumns) { Field field = schemaFieldsMap.get(columnName.toLowerCase()); if (field != null) { hiveSchemaFields.add(new Schema.Field(field.name(), field.schema(), field.doc(), field.defaultVal())); } else { // Hive has some extra virtual columns like BLOCK__OFFSET__INSIDE__FILE which do not exist in table schema. // They will get skipped as they won't be found in the original schema. LOG.debug("Skipping Hive Column => " + columnName); } } > Avro schema case sensitivity testing > > > Key: HUDI-303 > URL: https://issues.apache.org/jira/browse/HUDI-303 > Project: Apache Hudi > Issue Type: Test > Components: Spark Integration >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Minor > Labels: bug-bash-0.6.0 > > As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] we > would like to understand how Avro behaves with case sensitive column names. > Couple of action items: > * Test with different field names just differing in case. > * *AbstractRealtimeRecordReader* is one of the classes where we are > converting Avro Schema field names to lower case, to be able to verify them > against column names from Hive. We can consider removing the *lowercase* > conversion there if we verify it does not break anything. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-303) Avro schema case sensitivity testing
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei reassigned HUDI-303: -- Assignee: liwei (was: Udit Mehrotra) > Avro schema case sensitivity testing > > > Key: HUDI-303 > URL: https://issues.apache.org/jira/browse/HUDI-303 > Project: Apache Hudi > Issue Type: Test > Components: Spark Integration >Reporter: Udit Mehrotra >Assignee: liwei >Priority: Minor > Labels: bug-bash-0.6.0 > > As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] we > would like to understand how Avro behaves with case sensitive column names. > Couple of action items: > * Test with different field names just differing in case. > * *AbstractRealtimeRecordReader* is one of the classes where we are > converting Avro Schema field names to lower case, to be able to verify them > against column names from Hive. We can consider removing the *lowercase* > conversion there if we verify it does not break anything. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-303) Avro schema case sensitivity testing
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei resolved HUDI-303. Resolution: Fixed > Avro schema case sensitivity testing > > > Key: HUDI-303 > URL: https://issues.apache.org/jira/browse/HUDI-303 > Project: Apache Hudi > Issue Type: Test > Components: Spark Integration >Reporter: Udit Mehrotra >Assignee: liwei >Priority: Minor > Labels: bug-bash-0.6.0 > > As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] we > would like to understand how Avro behaves with case sensitive column names. > Couple of action items: > * Test with different field names just differing in case. > * *AbstractRealtimeRecordReader* is one of the classes where we are > converting Avro Schema field names to lower case, to be able to verify them > against column names from Hive. We can consider removing the *lowercase* > conversion there if we verify it does not break anything. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] lw309637554 commented on pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution
lw309637554 commented on pull request #2127: URL: https://github.com/apache/hudi/pull/2127#issuecomment-712104176 @pratyakshsharma @xushiyan @vinothchandar hello, Is there anything that needs to be fixed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on issue #2162: [SUPPORT] Deltastreamer transform cannot add fields
liujinhui1994 commented on issue #2162: URL: https://github.com/apache/hudi/issues/2162#issuecomment-711759479 I am late in reply, sorry. I have passed the verification in the production environment, and I am currently writing unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on issue #2162: [SUPPORT] Deltastreamer transform cannot add fields
liujinhui1994 commented on issue #2162: URL: https://github.com/apache/hudi/issues/2162#issuecomment-711757869 Can work, but if the default value is not null, it will not work { "name": "adnetDesc", "type": ["null", "long"], "default": -1 } @bvaradar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] KarthickAN commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced
KarthickAN commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-711645166 @nsivabalan @vinothchandar Thank you so much for all the explanations. If I think about it, having 10MB worth of index data may not be an issue as long as the file contains considerable amount of records. In my case there was a scenario where I had only 1000 records but with 10MB for index. So I switched to dynamic bloom now which is really helpful in this case. We are dealing with two different types of data out of which one doesn't have much volume. That's where it threw it off where as for the other type where we do have good volume of data this didn't come out as an issue as we'd already have around 110-120MB worth of data plus index. As of now I've configure it like below IndexBloomNumEntries = 35000 BloomIndexFilterType = DYNAMIC_V0 BloomIndexFilterDynamicMaxEntries = 140 starting off with 35k (1% of max no of entries in a file) as a base and scaling it out till 1.4M(40% of max no of entries in a file) entries as the file grows. So that should solve the problem possibly. Anyways we need to test this out for the volume we are seeing right now and tune it further if required. @vinothchandar Yes. Having a blog around this will definitely be very helpful. I felt hudi has a lot of features that can be used efficiently with some more in depth explanations than what we have right now as part of the documentation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org