[jira] [Closed] (HUDI-358) Add Java-doc and importOrder checkstyle rule
[ https://issues.apache.org/jira/browse/HUDI-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-358. -- Fix Version/s: 0.5.1 Resolution: Fixed Fixed via master: 212282c8aaf623f451e3f72674ed4d3ed550602d > Add Java-doc and importOrder checkstyle rule > > > Key: HUDI-358 > URL: https://issues.apache.org/jira/browse/HUDI-358 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Common Core >Reporter: lamber-ken >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > 1, Add Java-doc and importOrder checkstyle rule. > 2, Keep severity as info level before finish the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-359) Add hudi-env for hudi-cli module
[ https://issues.apache.org/jira/browse/HUDI-359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-359. -- Fix Version/s: 0.5.1 Resolution: Fixed Fixed via master: a7e07cd910425b5cfe9886677e780bfb2ae96c52 > Add hudi-env for hudi-cli module > > > Key: HUDI-359 > URL: https://issues.apache.org/jira/browse/HUDI-359 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: CLI >Reporter: hong dongdong >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Add hudi-env.sh for hudi-cli module to set running environments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-362) Adds a check for the existence of field
[ https://issues.apache.org/jira/browse/HUDI-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-362. -- Fix Version/s: 0.5.1 Resolution: Fixed Fixed via master: 44823041a37601fed8163502272a8fcb7a5be45d > Adds a check for the existence of field > --- > > Key: HUDI-362 > URL: https://issues.apache.org/jira/browse/HUDI-362 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: CLI >Reporter: hong dongdong >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Attachments: image-2019-11-25-15-32-14-057.png, > image-2019-11-25-15-33-21-610.png > > Time Spent: 20m > Remaining Estimate: 0h > > Use command > {code:java} > commits show --sortBy "Total Bytes Written" --desc true --limit 10{code} > when sortBy field not in columns, it throw > !image-2019-11-25-15-32-14-057.png! > It is better to give a friendly hint as: !image-2019-11-25-15-33-21-610.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
[ https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984368#comment-16984368 ] leesf commented on HUDI-288: Thanks for your sharing. Looks very comprehensive. I have some thoughts. Regarding point 6, the target path was designed to _//_, as discussed above with vinoth, is it resonable to _ `/`_ ? Regarding point 7, would we get rid of oozie as introducing it to hudi might be not very resonable? And is there any other considerations not supporting continous mode currently? Also, the wrapper seem to be able to replace the current DeltaStreamer? > Add support for ingesting multiple kafka streams in a single DeltaStreamer > deployment > - > > Key: HUDI-288 > URL: https://issues.apache.org/jira/browse/HUDI-288 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: deltastreamer >Reporter: Vinoth Chandar >Assignee: leesf >Priority: Major > > https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@ > has all the context -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-209) Implement JMX metrics reporter
[ https://issues.apache.org/jira/browse/HUDI-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-209. -- Fix Version/s: 0.5.1 Resolution: Fixed Fixed via master: 0b52ae3ac2685c5afa7821d663854b526b5a1cff > Implement JMX metrics reporter > -- > > Key: HUDI-209 > URL: https://issues.apache.org/jira/browse/HUDI-209 > Project: Apache Hudi (incubating) > Issue Type: New Feature >Reporter: vinoyang >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, there are only two reporters {{MetricsGraphiteReporter}} and > {{InMemoryMetricsReporter}}. {{InMemoryMetricsReporter}} is used for testing. > So actually we only have one metrics reporter. Since JMX is a standard of the > monitor on the JVM platform, I propose to provide a JMX metrics reporter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-277) Translate Documentation -> Performance page
[ https://issues.apache.org/jira/browse/HUDI-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-277. -- Resolution: Fixed Fixed via asf-site: 747a1d4e21dd7900085b8cc0f695daa147727241 > Translate Documentation -> Performance page > --- > > Key: HUDI-277 > URL: https://issues.apache.org/jira/browse/HUDI-277 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: docs-chinese >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Translate this page into Chinese: > > [http://hudi.apache.org/performance.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-255) Translate Talks & Powered By page
[ https://issues.apache.org/jira/browse/HUDI-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-255. -- Resolution: Fixed Fixed via asf-site: 8cfe93700bba8cb3025babe9182e9ad63a7e1035 > Translate Talks & Powered By page > - > > Key: HUDI-255 > URL: https://issues.apache.org/jira/browse/HUDI-255 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: docs-chinese >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > The online HTML web page: [https://hudi.apache.org/powered_by.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-232) Implement sealing/unsealing for HoodieRecord class
[ https://issues.apache.org/jira/browse/HUDI-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937477#comment-16937477 ] leesf commented on HUDI-232: How about add seal and unseal methods to HoodieRecord? Error will thrown if modify HoodieRecord after sealed. and modification is allowed after unsealed. cc [~vinoth] > Implement sealing/unsealing for HoodieRecord class > -- > > Key: HUDI-232 > URL: https://issues.apache.org/jira/browse/HUDI-232 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Write Client >Affects Versions: 0.5.0 >Reporter: Vinoth Chandar >Priority: Major > > HoodieRecord class sometimes is modified to set the record location. We can > get into issues like HUDI-170 if the modification is misplaced. We need a > mechanism to seal the class and unseal for modification explicity.. Try to > modify in sealed state should throw an error -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-278) Translate Administering page
leesf created HUDI-278: -- Summary: Translate Administering page Key: HUDI-278 URL: https://issues.apache.org/jira/browse/HUDI-278 Project: Apache Hudi (incubating) Issue Type: Sub-task Components: docs-chinese Reporter: leesf Assignee: leesf Fix For: 0.5.1 he online HTML web page: [http://hudi.apache.org/admin_guide.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-296) Explore use of spotless to auto fix formatting errors
[ https://issues.apache.org/jira/browse/HUDI-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-296: -- Assignee: leesf > Explore use of spotless to auto fix formatting errors > -- > > Key: HUDI-296 > URL: https://issues.apache.org/jira/browse/HUDI-296 > Project: Apache Hudi (incubating) > Issue Type: Test >Reporter: Vinoth Chandar >Assignee: leesf >Priority: Major > > https://github.com/diffplug/spotless/tree/master/plugin-maven -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-292) Consume more entries from kafka than specified sourceLimit.
[ https://issues.apache.org/jira/browse/HUDI-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946383#comment-16946383 ] leesf commented on HUDI-292: `long toOffset = Math.min(toOffsetMax, range.untilOffset() + eventsPerPartition);` to compute the offset is well, but we should handle the case in which remainingEvents is less than `toOffset - range.untilOffset()`. Also it may not affect so much even consume more entries from partial partitions, but we had better to fix it. And i would like to open a PR to fix it. CC [~vinoth] > Consume more entries from kafka than specified sourceLimit. > --- > > Key: HUDI-292 > URL: https://issues.apache.org/jira/browse/HUDI-292 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Utilities >Reporter: leesf >Assignee: leesf >Priority: Major > Fix For: 0.5.1 > > > When _CheckpointUtils#computeOffsetRanges_ for consuming kafka messges. > Given > topic = "test", > fromOffsets(partition -> offset pair) = (0 -> 0), (1 -> 0), (2 -> 0), (3 -> > 0), (4 -> 0), > toOffsets = (0, 100), (1, 1000), (2, 1000), (3, 1000), (4, 1000), > numEvents = 1001. > The output of _CheckpointUtils#computesOffsetRanges_ is > OffsetRange(topic: 'test', partition: 0, range: [0 -> 100]) > OffsetRange(topic: 'test', partition: 1, range: [0 -> 226]) > OffsetRange(topic: 'test', partition: 2, range: [0 -> 226]) > OffsetRange(topic: 'test', partition: 3, range: [0 -> 226]) > OffsetRange(topic: 'test', partition: 4, range: [0 -> 226]) > Total count is 1004(100 + 266 * 4), more than 1001, and thus consume more > entries from kafka than specified 1001. > CC [~vinoth] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-295) Do one-time cleanup of Hudi git history
[ https://issues.apache.org/jira/browse/HUDI-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946399#comment-16946399 ] leesf commented on HUDI-295: In order to clean up git history, it seems that we need rebase and force push against master branch. Does others(contributers) have access to write to master branch? If not, i think only committers and PMC who have access to master branch would take the ticket and help to clean up git history. > Do one-time cleanup of Hudi git history > --- > > Key: HUDI-295 > URL: https://issues.apache.org/jira/browse/HUDI-295 > Project: Apache Hudi (incubating) > Issue Type: Task > Components: Docs >Reporter: Vinoth Chandar >Priority: Major > > https://lists.apache.org/thread.html/dc6eb516e248088dac1a2b5c9690383dfe2eb3912f76bbe9dd763c2b@ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-300) Explore use of spotbugs to find bugs
[ https://issues.apache.org/jira/browse/HUDI-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-300: --- Description: https://spotbugs.github.io/ https://github.com/apache/incubator-hudi/pull/945 was:https://spotbugs.github.io/ > Explore use of spotbugs to find bugs > > > Key: HUDI-300 > URL: https://issues.apache.org/jira/browse/HUDI-300 > Project: Apache Hudi (incubating) > Issue Type: Test > Reporter: leesf >Assignee: leesf >Priority: Major > > https://spotbugs.github.io/ > https://github.com/apache/incubator-hudi/pull/945 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-265) Failed to delete tmp dirs created in unit tests
[ https://issues.apache.org/jira/browse/HUDI-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-265. -- Resolution: Fixed Fixed via master: 3dedc7e5fdd5f885915e81e47e110b845a905dbf > Failed to delete tmp dirs created in unit tests > --- > > Key: HUDI-265 > URL: https://issues.apache.org/jira/browse/HUDI-265 > Project: Apache Hudi (incubating) > Issue Type: Test > Components: Testing >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > In some unit tests, such as TestHoodieSnapshotCopier, TestUpdateMapFunction. > After run these tests, it fails to delete tmp dir created in _init(with > before annotation)_ after clean(with after annotation), thus will cause too > many folders in /tmp. we need to delete these dirs after finishing ut. > I will go through all the unit tests that did not properly delete the tmp dir > and send a patch. > > cc [~vinoth] [~vbalaji] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-292) Consume more entries from kafka than specified sourceLimit.
leesf created HUDI-292: -- Summary: Consume more entries from kafka than specified sourceLimit. Key: HUDI-292 URL: https://issues.apache.org/jira/browse/HUDI-292 Project: Apache Hudi (incubating) Issue Type: Improvement Components: Utilities Reporter: leesf Assignee: leesf Fix For: 0.5.1 When _CheckpointUtils#computeOffsetRanges_ for consuming kafka messges. Given topic = "test", fromOffsets(partition -> offset pair) = (0 -> 0), (1 -> 0), (2 -> 0), (3 -> 0), (4 -> 0), toOffsets = (0, 100), (1, 1000), (2, 1000), (3, 1000), (4, 1000), numEvents = 1001. The output of _CheckpointUtils#computesOffsetRanges_ is OffsetRange(topic: 'test', partition: 0, range: [0 -> 100]) OffsetRange(topic: 'test', partition: 1, range: [0 -> 226]) OffsetRange(topic: 'test', partition: 2, range: [0 -> 226]) OffsetRange(topic: 'test', partition: 3, range: [0 -> 226]) OffsetRange(topic: 'test', partition: 4, range: [0 -> 226]) Total count is 1004(100 + 266 * 4), more than 1001, and thus consume more entries from kafka than specified 1001. CC [~vinoth] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-285) Implement HoodieStorageWriter based on actual file type
[ https://issues.apache.org/jira/browse/HUDI-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-285. -- Resolution: Fixed Fixed via master: 7dd9c74b1bb28c3a934e46d560abbb4c5b6d4586 > Implement HoodieStorageWriter based on actual file type > --- > > Key: HUDI-285 > URL: https://issues.apache.org/jira/browse/HUDI-285 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Write Client >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently the _getStorageWriter_ method in HoodieStorageWriterFactory to get > HoodieStorageWriter is hard code to HoodieParquetWriter since currently only > parquet is supported for HoodieStorageWriter. However, it is better to > implement HoodieStorageWriter based on actual file type for extension. > cc [~vinoth] [~vbalaji] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-290) Normalize Test class name of HoodieWriteConfigTest
[ https://issues.apache.org/jira/browse/HUDI-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943558#comment-16943558 ] leesf commented on HUDI-290: +1 rename to TestHoodieWriteConfig, and i see many UTs start already start with Test... Also please check other UTs name not started with Test in the project. Thanks. > Normalize Test class name of HoodieWriteConfigTest > -- > > Key: HUDI-290 > URL: https://issues.apache.org/jira/browse/HUDI-290 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Testing >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > In general, a test case name start with {{Test}}. It would be better to > rename {{HoodieWriteConfigTest}} to {{TestHoodieWriteConfig}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
[ https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-288: -- Assignee: leesf > Add support for ingesting multiple kafka streams in a single DeltaStreamer > deployment > - > > Key: HUDI-288 > URL: https://issues.apache.org/jira/browse/HUDI-288 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: deltastreamer >Reporter: Vinoth Chandar >Assignee: leesf >Priority: Major > > https://lists.apache.org/thread.html/3a69934657c48b1c0d85cba223d69cb18e18cd8aaa4817c9fd72cef6@ > has all the context -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-290) Normalize Test class name of HoodieWriteConfigTest
[ https://issues.apache.org/jira/browse/HUDI-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943558#comment-16943558 ] leesf edited comment on HUDI-290 at 10/3/19 2:53 PM: - +1 rename to TestHoodieWriteConfig, and i see many UT names already start with Test... Also please check other UT names not started with Test in the project. Thanks. was (Author: xleesf): +1 rename to TestHoodieWriteConfig, and i see many UTs start already start with Test... Also please check other UTs name not started with Test in the project. Thanks. > Normalize Test class name of HoodieWriteConfigTest > -- > > Key: HUDI-290 > URL: https://issues.apache.org/jira/browse/HUDI-290 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Testing >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > In general, a test case name start with {{Test}}. It would be better to > rename {{HoodieWriteConfigTest}} to {{TestHoodieWriteConfig}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-292) Consume more entries from kafka than specified sourceLimit.
[ https://issues.apache.org/jira/browse/HUDI-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-292. -- Resolution: Fixed Fixed via master: e10e06918e4758917513c55f9bc02c35dad99128 > Consume more entries from kafka than specified sourceLimit. > --- > > Key: HUDI-292 > URL: https://issues.apache.org/jira/browse/HUDI-292 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Utilities >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > When _CheckpointUtils#computeOffsetRanges_ for consuming kafka messges. > Given > topic = "test", > fromOffsets(partition -> offset pair) = (0 -> 0), (1 -> 0), (2 -> 0), (3 -> > 0), (4 -> 0), > toOffsets = (0, 100), (1, 1000), (2, 1000), (3, 1000), (4, 1000), > numEvents = 1001. > The output of _CheckpointUtils#computesOffsetRanges_ is > OffsetRange(topic: 'test', partition: 0, range: [0 -> 100]) > OffsetRange(topic: 'test', partition: 1, range: [0 -> 226]) > OffsetRange(topic: 'test', partition: 2, range: [0 -> 226]) > OffsetRange(topic: 'test', partition: 3, range: [0 -> 226]) > OffsetRange(topic: 'test', partition: 4, range: [0 -> 226]) > Total count is 1004(100 + 266 * 4), more than 1001, and thus consume more > entries from kafka than specified 1001. > CC [~vinoth] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-437) Support user-defined index
leesf created HUDI-437: -- Summary: Support user-defined index Key: HUDI-437 URL: https://issues.apache.org/jira/browse/HUDI-437 Project: Apache Hudi (incubating) Issue Type: Improvement Reporter: leesf Assignee: leesf Fix For: 0.5.2 Currently, Hudi does not support user-defined index, and will throw exception if configured other index type except for HBASE/INMEMORY/BLOOM/GLOBAL_BLOOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-386) Refactor hudi scala checkstyle rules
[ https://issues.apache.org/jira/browse/HUDI-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-386. -- Fix Version/s: 0.5.1 Resolution: Fixed Fixed via master: b284091783af44341f20af11825ea9b6e3ba23da > Refactor hudi scala checkstyle rules > > > Key: HUDI-386 > URL: https://issues.apache.org/jira/browse/HUDI-386 > Project: Apache Hudi (incubating) > Issue Type: Sub-task >Reporter: lamber-ken >Assignee: lamber-ken >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Refactor hudi scala checkstyle rules -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-415) HoodieSparkSqlWriter Commit time not representing the Spark job starting time
[ https://issues.apache.org/jira/browse/HUDI-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-415: --- Fix Version/s: 0.5.1 > HoodieSparkSqlWriter Commit time not representing the Spark job starting time > - > > Key: HUDI-415 > URL: https://issues.apache.org/jira/browse/HUDI-415 > Project: Apache Hudi (incubating) > Issue Type: Bug >Reporter: Yanjia Gary Li >Assignee: Yanjia Gary Li >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Hudi records the commit time after the first action complete. If there is a > heavy transformation before isEmpty(), then the commit time could be > inaccurate. > {code:java} > if (hoodieRecords.isEmpty()) { > log.info("new batch has no new records, skipping...") > return (true, common.util.Option.empty()) > } > commitTime = client.startCommit() > writeStatuses = DataSourceUtils.doWriteOperation(client, hoodieRecords, > commitTime, operation) > {code} > For example, I start the spark job at 20190101, but *isEmpty()* ran for 2 > hours, then the commit time in the .hoodie folder will be 201901010*2*00. If > I use the commit time to ingest data starting from 201901010200(from HDFS, > not using deltastreamer), then I will miss 2 hours of data. > Is this set up intended? Can we move the commit time before isEmpty()? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-93) Enforce semantics on HoodieRecordPayload to allow for a consistent instantiation of custom payloads via reflection
[ https://issues.apache.org/jira/browse/HUDI-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-93: - Assignee: leesf > Enforce semantics on HoodieRecordPayload to allow for a consistent > instantiation of custom payloads via reflection > -- > > Key: HUDI-93 > URL: https://issues.apache.org/jira/browse/HUDI-93 > Project: Apache Hudi (incubating) > Issue Type: New Feature > Components: Common Core >Reporter: Nishith Agarwal >Assignee: leesf >Priority: Major > > At the moment, the expectation is that any implementation of > HoodieRecordPayload needs to have a constructor with Optional. > But this is not enforced in the HoodieRecordPayload interface. We require a > method to enforce a semantic that works consistently. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-248) CLI doesn't allow rolling back a Delta commit
[ https://issues.apache.org/jira/browse/HUDI-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-248: -- Assignee: leesf > CLI doesn't allow rolling back a Delta commit > - > > Key: HUDI-248 > URL: https://issues.apache.org/jira/browse/HUDI-248 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: CLI, Usability >Reporter: Rahul Bhartia >Assignee: leesf >Priority: Minor > Labels: aws-emr > Fix For: 0.5.1 > > > [https://github.com/apache/incubator-hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L128] > > When trying to find a match for passed in commit value, the "commit rollback" > command is always default to using HoodieTimeline.COMMIT_ACTION - and hence > doesn't allow rolling back delta commits. > Note: Delta Commits can be rolled back using a HoodieWriteClient, so seems > like it's a just a matter of having to match against both COMMIT_ACTION and > DELTA_COMMIT_ACTION in the CLI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-211) Maintain Chinese docs for Hudi
[ https://issues.apache.org/jira/browse/HUDI-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-211. -- Fix Version/s: 0.5.1 Resolution: Fixed > Maintain Chinese docs for Hudi > -- > > Key: HUDI-211 > URL: https://issues.apache.org/jira/browse/HUDI-211 > Project: Apache Hudi (incubating) > Issue Type: Task > Components: docs-chinese >Reporter: vinoyang >Priority: Major > Fix For: 0.5.1 > > > All the translation of docs should be held under this umbrella issue. The > best practice would be *one doc page one subtask*. Before releasing a new > version, we will align with the English docs. > The doc and website are held in {{asf-site}} branch, more details please see: > [https://hudi.apache.org/contributing.html#website] > The Chinese docs support by jekyll-multiple-languages plugin. More details > about this plugin, please see: [http://jekyll-langs-sample.liaohuqiu.net/] > Generally speaking, two basic steps: > * create a subtask issue of this umbrella issue for the page you want to > translate; > * copy the English markdown page and rename it to {{*.cn.md}} then translate > it to Chinese -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-333) Improve page navigation using TOC
[ https://issues.apache.org/jira/browse/HUDI-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-333. -- Fix Version/s: 0.5.1 Resolution: Fixed Fixed via asf-site: 1fd0439a84c65ae12e025a64b6e0a0087aa7295e > Improve page navigation using TOC > - > > Key: HUDI-333 > URL: https://issues.apache.org/jira/browse/HUDI-333 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Docs >Reporter: Bhavani Sudha Saktheeswaran >Assignee: Bhavani Sudha Saktheeswaran >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Add Table of Contents to all pages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-380) Update IDE set up documentation for IDE related errors
[ https://issues.apache.org/jira/browse/HUDI-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-380. -- Fix Version/s: 0.5.1 Resolution: Fixed Fixed via asf-site: 9e30add249bdadb6b94cf0ff0090c4eaac625d68 > Update IDE set up documentation for IDE related errors > -- > > Key: HUDI-380 > URL: https://issues.apache.org/jira/browse/HUDI-380 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Docs, newbie, Usability >Reporter: Pratyaksh Sharma >Assignee: Pratyaksh Sharma >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > The errors are generally caused by jetty version conflicts. > > Sample issues -> > [https://github.com/apache/incubator-hudi/issues/894] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-67) Tool to convert sequence file based archived commits to log format #224
[ https://issues.apache.org/jira/browse/HUDI-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-67: - Assignee: leesf > Tool to convert sequence file based archived commits to log format #224 > --- > > Key: HUDI-67 > URL: https://issues.apache.org/jira/browse/HUDI-67 > Project: Apache Hudi (incubating) > Issue Type: Wish > Components: CLI, Write Client >Reporter: Vinoth Chandar >Assignee: leesf >Priority: Major > > https://github.com/uber/hudi/issues/224 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-213) Add gem dependencies installation step for building doc description
[ https://issues.apache.org/jira/browse/HUDI-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-213. -- Fix Version/s: 0.5.1 Resolution: Fixed Fixed via asf-site: 09525c1fb4e066e981047957e85bbefbd3b3ae91 > Add gem dependencies installation step for building doc description > --- > > Key: HUDI-213 > URL: https://issues.apache.org/jira/browse/HUDI-213 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Docs >Reporter: vinoyang >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > In asf-site branch, following the building doc steps > [here|[https://github.com/apache/incubator-hudi/tree/asf-site/docs#host-os]] > under "docs" folder, if we just invoke this command: > {code:java} > bundle exec jekyll serve > {code} > We will get an error: > {code:java} > Could not find concurrent-ruby-1.1.4 in any of the sources > Run `bundle install` to install missing gems. > {code} > The reason is that we do not install the gem dependencies. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-427) Implement CLI support for performing bootstrap
[ https://issues.apache.org/jira/browse/HUDI-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-427: -- Assignee: leesf > Implement CLI support for performing bootstrap > -- > > Key: HUDI-427 > URL: https://issues.apache.org/jira/browse/HUDI-427 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: CLI >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Major > Fix For: 0.5.1 > > > Need CLI to perform bootstrap as described in > [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-416) Improve hint information for Cli
[ https://issues.apache.org/jira/browse/HUDI-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-416. -- Fix Version/s: 0.5.1 Resolution: Fixed Fixed via master: 8affdf8bcbb4c7b236283e97c3afad186d5b6a3e > Improve hint information for Cli > > > Key: HUDI-416 > URL: https://issues.apache.org/jira/browse/HUDI-416 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: CLI >Reporter: hong dongdong >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Right now, cli always give error information: > {code:java} > Command 'desc' was found but is not currently available (type 'help' then > ENTER to learn about this command) > {code} > but it is confused to user. We can give a hint clearly like: > {code:java} > Command failed java.lang.NullPointerException: There is no hudi dataset. > Please use connect command to set dataset first > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-585) Optimize the steps of building with scala-2.12
[ https://issues.apache.org/jira/browse/HUDI-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-585. Fix Version/s: 0.5.2 Resolution: Fixed Fixed via master: 425e3e6c78b9be00fc3fecfc335c94e05a1c70e5 > Optimize the steps of building with scala-2.12 > --- > > Key: HUDI-585 > URL: https://issues.apache.org/jira/browse/HUDI-585 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Utilities >Reporter: lamber-ken >Assignee: lamber-ken >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Optimize the steps of building with scala-2.12. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-596) KafkaConsumer need to be closed
[ https://issues.apache.org/jira/browse/HUDI-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-596. Fix Version/s: 0.5.2 Resolution: Fixed Fixed via master: 347e297ac19ed55172e84e13075e19ce060954c6 > KafkaConsumer need to be closed > --- > > Key: HUDI-596 > URL: https://issues.apache.org/jira/browse/HUDI-596 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Utilities >Reporter: dengziming >Assignee: dengziming >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > `offsetGen.getNextOffsetRanges` will is called periodically in DeltaStreamer > application, and it will `new KafkaConsumer(kafkaParams)` without close, and > Exception will throw after a while. > ``` > java.net.SocketException: Too many open files > at sun.nio.ch.Net.socket0(Native Method) > at sun.nio.ch.Net.socket(Net.java:411) > at sun.nio.ch.Net.socket(Net.java:404) > at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105) > at > sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60) > at java.nio.channels.SocketChannel.open(SocketChannel.java:145) > at org.apache.kafka.common.network.Selector.connect(Selector.java:211) > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:864) > at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:265) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:485) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218) > at > org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274) > at > org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1774) > at > org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1742) > at > org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen.getNextOffsetRanges(KafkaOffsetGen.java:177) > at > org.apache.hudi.utilities.sources.JsonKafkaSource.fetchNewData(JsonKafkaSource.java:56) > at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73) > at > org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:107) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:288) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226) > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-596) KafkaConsumer need to be closed
[ https://issues.apache.org/jira/browse/HUDI-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-596: --- Status: Open (was: New) > KafkaConsumer need to be closed > --- > > Key: HUDI-596 > URL: https://issues.apache.org/jira/browse/HUDI-596 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Utilities >Reporter: dengziming >Assignee: dengziming >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > `offsetGen.getNextOffsetRanges` will is called periodically in DeltaStreamer > application, and it will `new KafkaConsumer(kafkaParams)` without close, and > Exception will throw after a while. > ``` > java.net.SocketException: Too many open files > at sun.nio.ch.Net.socket0(Native Method) > at sun.nio.ch.Net.socket(Net.java:411) > at sun.nio.ch.Net.socket(Net.java:404) > at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105) > at > sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60) > at java.nio.channels.SocketChannel.open(SocketChannel.java:145) > at org.apache.kafka.common.network.Selector.connect(Selector.java:211) > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:864) > at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:265) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:485) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218) > at > org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274) > at > org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1774) > at > org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1742) > at > org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen.getNextOffsetRanges(KafkaOffsetGen.java:177) > at > org.apache.hudi.utilities.sources.JsonKafkaSource.fetchNewData(JsonKafkaSource.java:56) > at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73) > at > org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:107) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:288) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226) > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-617) Add support for data types convertible to String in TimestampBasedKeyGenerator
[ https://issues.apache.org/jira/browse/HUDI-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-617. Fix Version/s: 0.5.2 Resolution: Fixed Fixed via master: c2b08cdfc9b762801a63fee988f1c24cc17df4ce > Add support for data types convertible to String in TimestampBasedKeyGenerator > -- > > Key: HUDI-617 > URL: https://issues.apache.org/jira/browse/HUDI-617 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Utilities >Reporter: Amit Singh >Priority: Minor > Labels: easyfix, pull-request-available > Fix For: 0.5.2 > > Attachments: test_data.json, test_schema.avsc > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, TimestampBasedKeyGenerator only supports 4 data types for the > partition key. They are Double, Long, Float and String. However, if the > `avro.java.string` is not specified in the schema provided, Hudi throws the > following error: > org.apache.hudi.exception.HoodieNotSupportedException: Unexpected type for > partition field: org.apache.avro.util.Utf8 > at > org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator.getKey(TimestampBasedKeyGenerator.java:111) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$f92c188c$1(DeltaSync.java:338) > > It will be better if the support was more generalised to include the data > types that provide method to convert them to String such as `Utf8` since all > these methods implement the `CharSequence` interface. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-617) Add support for data types convertible to String in TimestampBasedKeyGenerator
[ https://issues.apache.org/jira/browse/HUDI-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-617: --- Status: Open (was: New) > Add support for data types convertible to String in TimestampBasedKeyGenerator > -- > > Key: HUDI-617 > URL: https://issues.apache.org/jira/browse/HUDI-617 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Utilities >Reporter: Amit Singh >Priority: Minor > Labels: easyfix, pull-request-available > Attachments: test_data.json, test_schema.avsc > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, TimestampBasedKeyGenerator only supports 4 data types for the > partition key. They are Double, Long, Float and String. However, if the > `avro.java.string` is not specified in the schema provided, Hudi throws the > following error: > org.apache.hudi.exception.HoodieNotSupportedException: Unexpected type for > partition field: org.apache.avro.util.Utf8 > at > org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator.getKey(TimestampBasedKeyGenerator.java:111) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$f92c188c$1(DeltaSync.java:338) > > It will be better if the support was more generalised to include the data > types that provide method to convert them to String such as `Utf8` since all > these methods implement the `CharSequence` interface. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-636) Fix could not get sources warnings while compiling
[ https://issues.apache.org/jira/browse/HUDI-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-636. -- > Fix could not get sources warnings while compiling > --- > > Key: HUDI-636 > URL: https://issues.apache.org/jira/browse/HUDI-636 > Project: Apache Hudi (incubating) > Issue Type: Improvement >Reporter: lamber-ken >Assignee: lamber-ken >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > During the voting process on rc1 0.5.1-incubating release, Justin pointed out > that mvn log display could not get sources warnings > > [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E] > > {code:java} > [INFO] --- maven-shade-plugin:3.1.1:shade (default) @ hudi-hadoop-mr-bundle > --- > [INFO] Including org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT in the shaded > jar. > Downloading from aliyun: > http://maven.aliyun.com/nexus/content/groups/public/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from cloudera: > https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from confluent: > https://packages.confluent.io/maven/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from libs-milestone: > https://repo.spring.io/libs-milestone/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from libs-release: > https://repo.spring.io/libs-release/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from apache.snapshots: > https://repository.apache.org/snapshots/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > [WARNING] Could not get sources for > org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT:compile > [INFO] Excluding com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7 > from the shaded jar. > [INFO] Excluding com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1 from > the shaded jar. > [INFO] Excluding com.fasterxml.jackson.core:jackson-core:jar:2.6.7 from the > shaded jar. > [INFO] Excluding org.apache.httpcomponents:fluent-hc:jar:4.3.2 from the > shaded jar. > [INFO] Excluding commons-logging:commons-logging:jar:1.1.3 from the shaded > jar. > [INFO] Excluding org.apache.httpcomponents:httpclient:jar:4.3.6 from the > shaded jar. > [INFO] Excluding org.apache.httpcomponents:httpcore:jar:4.3.2 from the shaded > jar. > [INFO] Excluding commons-codec:commons-codec:jar:1.6 from the shaded jar. > [INFO] Excluding org.rocksdb:rocksdbjni:jar:5.17.2 from the shaded jar. > [INFO] Including com.esotericsoftware:kryo-shaded:jar:4.0.2 in the shaded jar. > [INFO] Including com.esotericsoftware:minlog:jar:1.3.0 in the shaded jar. > [INFO] Including org.objenesis:objenesis:jar:2.5.1 in the shaded jar. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-636) Fix could not get sources warnings while compiling
[ https://issues.apache.org/jira/browse/HUDI-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-636. Fix Version/s: 0.5.2 Resolution: Fixed Fixed via master: cacd9a33222d28c905891362312545230b6d30b9 > Fix could not get sources warnings while compiling > --- > > Key: HUDI-636 > URL: https://issues.apache.org/jira/browse/HUDI-636 > Project: Apache Hudi (incubating) > Issue Type: Improvement >Reporter: lamber-ken >Assignee: lamber-ken >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > During the voting process on rc1 0.5.1-incubating release, Justin pointed out > that mvn log display could not get sources warnings > > [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E] > > {code:java} > [INFO] --- maven-shade-plugin:3.1.1:shade (default) @ hudi-hadoop-mr-bundle > --- > [INFO] Including org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT in the shaded > jar. > Downloading from aliyun: > http://maven.aliyun.com/nexus/content/groups/public/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from cloudera: > https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from confluent: > https://packages.confluent.io/maven/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from libs-milestone: > https://repo.spring.io/libs-milestone/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from libs-release: > https://repo.spring.io/libs-release/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > Downloading from apache.snapshots: > https://repository.apache.org/snapshots/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar > [WARNING] Could not get sources for > org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT:compile > [INFO] Excluding com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7 > from the shaded jar. > [INFO] Excluding com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1 from > the shaded jar. > [INFO] Excluding com.fasterxml.jackson.core:jackson-core:jar:2.6.7 from the > shaded jar. > [INFO] Excluding org.apache.httpcomponents:fluent-hc:jar:4.3.2 from the > shaded jar. > [INFO] Excluding commons-logging:commons-logging:jar:1.1.3 from the shaded > jar. > [INFO] Excluding org.apache.httpcomponents:httpclient:jar:4.3.6 from the > shaded jar. > [INFO] Excluding org.apache.httpcomponents:httpcore:jar:4.3.2 from the shaded > jar. > [INFO] Excluding commons-codec:commons-codec:jar:1.6 from the shaded jar. > [INFO] Excluding org.rocksdb:rocksdbjni:jar:5.17.2 from the shaded jar. > [INFO] Including com.esotericsoftware:kryo-shaded:jar:4.0.2 in the shaded jar. > [INFO] Including com.esotericsoftware:minlog:jar:1.3.0 in the shaded jar. > [INFO] Including org.objenesis:objenesis:jar:2.5.1 in the shaded jar. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-597) Enable incremental pulling from defined partitions
[ https://issues.apache.org/jira/browse/HUDI-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048544#comment-17048544 ] leesf commented on HUDI-597: [~garyli1019] I think we could update the DOC after cutting the 0.5.1 docs and merge it to 0.5.2 docs, FYI: [~bhasudha] > Enable incremental pulling from defined partitions > -- > > Key: HUDI-597 > URL: https://issues.apache.org/jira/browse/HUDI-597 > Project: Apache Hudi (incubating) > Issue Type: New Feature >Reporter: Yanjia Gary Li >Assignee: Yanjia Gary Li >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > For the use case that I only need to pull the incremental part of certain > partitions, I need to do the incremental pulling from the entire dataset > first then filtering in Spark. > If we can use the folder partitions directly as part of the input path, it > could run faster by only load relevant parquet files. > Example: > > {code:java} > spark.read.format("org.apache.hudi") > .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL) > .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000") > .load(path, "year=2020/*/*/*") > > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-627) Publish coverage to codecov.io
[ https://issues.apache.org/jira/browse/HUDI-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-627: --- Fix Version/s: 0.5.2 > Publish coverage to codecov.io > -- > > Key: HUDI-627 > URL: https://issues.apache.org/jira/browse/HUDI-627 > Project: Apache Hudi (incubating) > Issue Type: Sub-task >Reporter: Ramachandran M S >Assignee: Ramachandran M S >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > * Publish the coverage to codecov.io on every build > * Fix code coverage to pickup cross module testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-627) Publish coverage to codecov.io
[ https://issues.apache.org/jira/browse/HUDI-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-627. -- Fixed via master: acf359c834bc1d9b9c4ea64d362ea20c7410c70a > Publish coverage to codecov.io > -- > > Key: HUDI-627 > URL: https://issues.apache.org/jira/browse/HUDI-627 > Project: Apache Hudi (incubating) > Issue Type: Sub-task >Reporter: Ramachandran M S >Assignee: Ramachandran M S >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > * Publish the coverage to codecov.io on every build > * Fix code coverage to pickup cross module testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-554) Restructure code/packages to move more code back into hudi-writer-common
[ https://issues.apache.org/jira/browse/HUDI-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-554. Resolution: Fixed Fixed via master: 71170fafe77e11ea1a458a38e3395a471d94a047 > Restructure code/packages to move more code back into hudi-writer-common > - > > Key: HUDI-554 > URL: https://issues.apache.org/jira/browse/HUDI-554 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Code Cleanup >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-618) Improve unit test coverage for org.apache.hudi.common.table.view. PriorityBasedFileSystemView
[ https://issues.apache.org/jira/browse/HUDI-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-618: --- Fix Version/s: 0.5.2 > Improve unit test coverage for org.apache.hudi.common.table.view. > PriorityBasedFileSystemView > - > > Key: HUDI-618 > URL: https://issues.apache.org/jira/browse/HUDI-618 > Project: Apache Hudi (incubating) > Issue Type: Sub-task >Reporter: Ramachandran M S >Assignee: Ramachandran M S >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Add unit tests for all methods -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-618) Improve unit test coverage for org.apache.hudi.common.table.view. PriorityBasedFileSystemView
[ https://issues.apache.org/jira/browse/HUDI-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-618. -- > Improve unit test coverage for org.apache.hudi.common.table.view. > PriorityBasedFileSystemView > - > > Key: HUDI-618 > URL: https://issues.apache.org/jira/browse/HUDI-618 > Project: Apache Hudi (incubating) > Issue Type: Sub-task >Reporter: Ramachandran M S >Assignee: Ramachandran M S >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Add unit tests for all methods -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build
[ https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-599: --- Status: Open (was: New) > Update release guide & release scripts due to the change of scala 2.12 build > > > Key: HUDI-599 > URL: https://issues.apache.org/jira/browse/HUDI-599 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Release Administrative >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Update release guide due to the change of scala 2.12 build, PR link below > [https://github.com/apache/incubator-hudi/pull/1293] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build
[ https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-599. -- > Update release guide & release scripts due to the change of scala 2.12 build > > > Key: HUDI-599 > URL: https://issues.apache.org/jira/browse/HUDI-599 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Release Administrative >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Update release guide due to the change of scala 2.12 build, PR link below > [https://github.com/apache/incubator-hudi/pull/1293] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build
[ https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-599. Resolution: Fixed Fixed via master: 0cde27e63c2cf9b70f24f0ae6b63fad9259b28d3 and updated the release guide accordingly. > Update release guide & release scripts due to the change of scala 2.12 build > > > Key: HUDI-599 > URL: https://issues.apache.org/jira/browse/HUDI-599 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Release Administrative >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Update release guide due to the change of scala 2.12 build, PR link below > [https://github.com/apache/incubator-hudi/pull/1293] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-666) sync updated docs to chinese.
leesf created HUDI-666: -- Summary: sync updated docs to chinese. Key: HUDI-666 URL: https://issues.apache.org/jira/browse/HUDI-666 Project: Apache Hudi (incubating) Issue Type: Improvement Components: docs-chinese Reporter: leesf Assignee: vinoyang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-578) Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator
leesf created HUDI-578: -- Summary: Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator Key: HUDI-578 URL: https://issues.apache.org/jira/browse/HUDI-578 Project: Apache Hudi (incubating) Issue Type: Improvement Reporter: leesf Assignee: leesf Fix For: 0.5.2 when use ComplexKeyGenerator the options the below. {code:java} option("hoodie.datasource.write.recordkey.field", "name, age"). option("hoodie.datasource.write.keygenerator.class", ComplexKeyGenerator.class.getName()). option("hoodie.datasource.write.partitionpath.field", "location, age"). {code} and the data is {code:java} "{ \"name\": \"name1\", \"ts\": 1574297893839, \"age\": 15, \"location\": \"latitude\", \"sex\":\"male\"}" {code} the result is incorrect with age = null in recordkey, and age = default in partitionpath. We would trim the paritions and recordkeys in complexKeyGenerator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-587) Jacoco coverage report is not generated
[ https://issues.apache.org/jira/browse/HUDI-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-587. Fix Version/s: 0.5.2 Resolution: Fixed Fixed via master: d26dc0b229043afa5aefca239e72f40d80446917 > Jacoco coverage report is not generated > --- > > Key: HUDI-587 > URL: https://issues.apache.org/jira/browse/HUDI-587 > Project: Apache Hudi (incubating) > Issue Type: Bug >Reporter: Prashant Wason >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Original Estimate: 1h > Time Spent: 20m > Remaining Estimate: 40m > > When running tests, the jacoco coverage report is not generated. The jacoco > plugin is loaded, it sets the correct Java Agent line, bit it fails to find > the execution data file after tests complete. > Example: > mvn test -Dtest=TestHoodieActiveTimeline > ... > 22:42:40 [INFO] — jacoco-maven-plugin:0.7.8:prepare-agent (pre-unit-test) @ > hudi-common — > 22:42:40 [INFO] *surefireArgLine set to > javaagent:/home/pwason/.m2/repository/org/jacoco/org.jacoco.agent/0.7.8/org.jacoco.agent-0.7.8-runtime.jar=destfile=/home/pwason/work/java/incubator-hudi/hudi-common/target/coverage-reports/jacocout.exec* > *...* > 22:42:49 [INFO] — jacoco-maven-plugin:0.7.8:report (post-unit-test) @ > hudi-common — > 22:42:49 [INFO] *Skipping JaCoCo execution due to missing execution data > file.* > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-570) Improve unit test coverage FSUtils.java
[ https://issues.apache.org/jira/browse/HUDI-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-570: --- Status: Open (was: New) > Improve unit test coverage FSUtils.java > --- > > Key: HUDI-570 > URL: https://issues.apache.org/jira/browse/HUDI-570 > Project: Apache Hudi (incubating) > Issue Type: Sub-task >Reporter: Balajee Nagasubramaniam >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Add test cases for > - deleteOlderRollbackMetaFiles() > - deleteOlderCleanMetaFiles() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-570) Improve unit test coverage FSUtils.java
[ https://issues.apache.org/jira/browse/HUDI-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-570. Fix Version/s: 0.5.2 Resolution: Fixed Fixed via master: 1fb0b001a38ddc940995e45f5cd53701d0110c3b > Improve unit test coverage FSUtils.java > --- > > Key: HUDI-570 > URL: https://issues.apache.org/jira/browse/HUDI-570 > Project: Apache Hudi (incubating) > Issue Type: Sub-task >Reporter: Balajee Nagasubramaniam >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Add test cases for > - deleteOlderRollbackMetaFiles() > - deleteOlderCleanMetaFiles() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-571) Modify Hudi CLI to show archived commits
[ https://issues.apache.org/jira/browse/HUDI-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-571: --- Status: Closed (was: Patch Available) > Modify Hudi CLI to show archived commits > > > Key: HUDI-571 > URL: https://issues.apache.org/jira/browse/HUDI-571 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: CLI >Reporter: satish >Assignee: satish >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Hudi CLI has 'show archived commits' command which is not very helpful > > {code:java} > ->show archived commits > ===> Showing only 10 archived commits <=== > > | CommitTime | CommitType| > |===| > | 2019033304| commit | > | 20190323220154| commit | > | 20190323220154| commit | > | 20190323224004| commit | > | 20190323224013| commit | > | 20190323224229| commit | > | 20190323224229| commit | > | 20190323232849| commit | > | 20190323233109| commit | > | 20190323233109| commit | > {code} > Modify or introduce new command to make it easy to debug > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-571) Modify Hudi CLI to show archived commits
[ https://issues.apache.org/jira/browse/HUDI-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-571: --- Fix Version/s: 0.5.2 > Modify Hudi CLI to show archived commits > > > Key: HUDI-571 > URL: https://issues.apache.org/jira/browse/HUDI-571 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: CLI >Reporter: satish >Assignee: satish >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Hudi CLI has 'show archived commits' command which is not very helpful > > {code:java} > ->show archived commits > ===> Showing only 10 archived commits <=== > > | CommitTime | CommitType| > |===| > | 2019033304| commit | > | 20190323220154| commit | > | 20190323220154| commit | > | 20190323224004| commit | > | 20190323224013| commit | > | 20190323224229| commit | > | 20190323224229| commit | > | 20190323232849| commit | > | 20190323233109| commit | > | 20190323233109| commit | > {code} > Modify or introduce new command to make it easy to debug > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-564) Improve unit test coverage for org.apache.hudi.common.table.log.HoodieLogFormatVersion
[ https://issues.apache.org/jira/browse/HUDI-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-564. Fix Version/s: 0.5.2 Resolution: Fixed Fixedd via master: f27c7a16c6d437efaa83e50a7117b83e5201ac49 > Improve unit test coverage for > org.apache.hudi.common.table.log.HoodieLogFormatVersion > -- > > Key: HUDI-564 > URL: https://issues.apache.org/jira/browse/HUDI-564 > Project: Apache Hudi (incubating) > Issue Type: Sub-task >Reporter: Prashant Wason >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-578) Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator
[ https://issues.apache.org/jira/browse/HUDI-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-578. Resolution: Fixed Fixed via master: 652224edc882c083ac46cff095324975e2457004 > Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator > --- > > Key: HUDI-578 > URL: https://issues.apache.org/jira/browse/HUDI-578 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > when use ComplexKeyGenerator > the options the below. > {code:java} > option("hoodie.datasource.write.recordkey.field", "name, age"). > option("hoodie.datasource.write.keygenerator.class", > ComplexKeyGenerator.class.getName()). > option("hoodie.datasource.write.partitionpath.field", "location, age"). > {code} > and the data is > {code:java} > "{ \"name\": \"name1\", \"ts\": 1574297893839, \"age\": 15, \"location\": > \"latitude\", \"sex\":\"male\"}" > {code} > the result is incorrect with age = null in recordkey, and age = default in > partitionpath. > We would trim the paritions and recordkeys in complexKeyGenerator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-578) Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator
[ https://issues.apache.org/jira/browse/HUDI-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-578: --- Status: Open (was: New) > Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator > --- > > Key: HUDI-578 > URL: https://issues.apache.org/jira/browse/HUDI-578 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > when use ComplexKeyGenerator > the options the below. > {code:java} > option("hoodie.datasource.write.recordkey.field", "name, age"). > option("hoodie.datasource.write.keygenerator.class", > ComplexKeyGenerator.class.getName()). > option("hoodie.datasource.write.partitionpath.field", "location, age"). > {code} > and the data is > {code:java} > "{ \"name\": \"name1\", \"ts\": 1574297893839, \"age\": 15, \"location\": > \"latitude\", \"sex\":\"male\"}" > {code} > the result is incorrect with age = null in recordkey, and age = default in > partitionpath. > We would trim the paritions and recordkeys in complexKeyGenerator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-550) Add to Release Notes : Configuration Value change for Kafka Reset Offset Strategies
[ https://issues.apache.org/jira/browse/HUDI-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027253#comment-17027253 ] leesf commented on HUDI-550: Fixed via asf-site: 20ede76c4c79c0804518a4fe148b8fcd48391f5c > Add to Release Notes : Configuration Value change for Kafka Reset Offset > Strategies > --- > > Key: HUDI-550 > URL: https://issues.apache.org/jira/browse/HUDI-550 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Release Administrative >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Blocker > Fix For: 0.5.1 > > > Enum Values are changed for configuring kafka reset offset strategies in > deltastreamer > LARGEST -> LATEST > SMALLEST -> EARLIEST > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-550) Add to Release Notes : Configuration Value change for Kafka Reset Offset Strategies
[ https://issues.apache.org/jira/browse/HUDI-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-550: --- Status: Closed (was: Patch Available) > Add to Release Notes : Configuration Value change for Kafka Reset Offset > Strategies > --- > > Key: HUDI-550 > URL: https://issues.apache.org/jira/browse/HUDI-550 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Release Administrative >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Blocker > Fix For: 0.5.1 > > > Enum Values are changed for configuring kafka reset offset strategies in > deltastreamer > LARGEST -> LATEST > SMALLEST -> EARLIEST > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-588) Sync latest docs to cn docs
leesf created HUDI-588: -- Summary: Sync latest docs to cn docs Key: HUDI-588 URL: https://issues.apache.org/jira/browse/HUDI-588 Project: Apache Hudi (incubating) Issue Type: Improvement Components: docs-chinese Reporter: leesf Assignee: vinoyang Sync latest website docs to cn docs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-543) Carefully draft release notes for 0.5.1 with all breaking/user impacting changes
[ https://issues.apache.org/jira/browse/HUDI-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-543. Resolution: Fixed Fixed via asf-site: 20ede76c4c79c0804518a4fe148b8fcd48391f5c > Carefully draft release notes for 0.5.1 with all breaking/user impacting > changes > > > Key: HUDI-543 > URL: https://issues.apache.org/jira/browse/HUDI-543 > Project: Apache Hudi (incubating) > Issue Type: Task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Call out all breaking changes : > * Spark 2.4 support drop, avro version change etc. "Hudi 0.5.1+ above needs > Spark 2.4+" > * Need for shading custom Payloads > * --packages for spark-shell > * key generator changes > * _ro suffix for read optimized views.. > * Delta streamer command line changes > * Scala version changes.. packages names now have _2.11 > > Also need to call out major release highlights (quoting docs/blogs as > available) > * better delete support > * dynamic bloom filters > * DMS support > > > I am also linking the different jiras as subtaks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-547) Call out changes in package names due to scala cross compiling support
[ https://issues.apache.org/jira/browse/HUDI-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027254#comment-17027254 ] leesf commented on HUDI-547: Fixed via asf-site: 20ede76c4c79c0804518a4fe148b8fcd48391f5c > Call out changes in package names due to scala cross compiling support > -- > > Key: HUDI-547 > URL: https://issues.apache.org/jira/browse/HUDI-547 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Release Administrative >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Blocker > Fix For: 0.5.1 > > > Two versions of each of the below packages needs to be built. > hudi-spark is hudi-spark_2.11 and hudi-spark_2.12 > hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12 > hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12 > hudi-utilities-bundle is hudi-utilities-bundle_2.11 and > hudi-utilities-bundle_2.12 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-547) Call out changes in package names due to scala cross compiling support
[ https://issues.apache.org/jira/browse/HUDI-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-547: --- Status: Closed (was: Patch Available) > Call out changes in package names due to scala cross compiling support > -- > > Key: HUDI-547 > URL: https://issues.apache.org/jira/browse/HUDI-547 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Release Administrative >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Blocker > Fix For: 0.5.1 > > > Two versions of each of the below packages needs to be built. > hudi-spark is hudi-spark_2.11 and hudi-spark_2.12 > hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12 > hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12 > hudi-utilities-bundle is hudi-utilities-bundle_2.11 and > hudi-utilities-bundle_2.12 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-586) Revisit the release guide
leesf created HUDI-586: -- Summary: Revisit the release guide Key: HUDI-586 URL: https://issues.apache.org/jira/browse/HUDI-586 Project: Apache Hudi (incubating) Issue Type: Improvement Components: Release Administrative Reporter: leesf Fix For: 0.5.2 Currently, the release guide is not very standard, mainly meaning the finalize the release step, we would refer to FLINK [https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release] , main change might be not adding rc-\{RC_NUM} to the pom.xml. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-586) Revisit the release guide
[ https://issues.apache.org/jira/browse/HUDI-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027150#comment-17027150 ] leesf commented on HUDI-586: [~vinoth] [~vbalaji] please chime in to standarize the release guide. > Revisit the release guide > - > > Key: HUDI-586 > URL: https://issues.apache.org/jira/browse/HUDI-586 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Release Administrative >Reporter: leesf >Priority: Major > Fix For: 0.5.2 > > > Currently, the release guide is not very standard, mainly meaning the > finalize the release step, we would refer to FLINK > [https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release] > , main change might be not adding rc-\{RC_NUM} to the pom.xml. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-583) cleanup legacy code
[ https://issues.apache.org/jira/browse/HUDI-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-583: --- Status: Open (was: New) > cleanup legacy code > > > Key: HUDI-583 > URL: https://issues.apache.org/jira/browse/HUDI-583 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Cleaner >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 10m > Remaining Estimate: 0h > > See [https://github.com/apache/incubator-hudi/pull/1237] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-583) cleanup legacy code
[ https://issues.apache.org/jira/browse/HUDI-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-583. Resolution: Fixed Fixed via master: 5b7bb142dc6712c41fd8ada208ab3186369431f9 > cleanup legacy code > > > Key: HUDI-583 > URL: https://issues.apache.org/jira/browse/HUDI-583 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Cleaner >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 10m > Remaining Estimate: 0h > > See [https://github.com/apache/incubator-hudi/pull/1237] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-238) Make separate release for hudi spark/scala based packages for scala 2.12
[ https://issues.apache.org/jira/browse/HUDI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-238: --- Fix Version/s: (was: 0.5.2) 0.5.1 > Make separate release for hudi spark/scala based packages for scala 2.12 > - > > Key: HUDI-238 > URL: https://issues.apache.org/jira/browse/HUDI-238 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Release Administrative, Usability >Reporter: Balaji Varadarajan >Assignee: Tadas Sugintas >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > [https://github.com/apache/incubator-hudi/issues/881#issuecomment-528700749] > Suspects: > h3. Hudi utilities package > bringing in spark-streaming-kafka-0.8* > {code:java} > [INFO] Scanning for projects... > [INFO] > [INFO] ---< org.apache.hudi:hudi-utilities > >--- > [INFO] Building hudi-utilities 0.5.0-SNAPSHOT > [INFO] [ jar > ]- > [INFO] > [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ hudi-utilities > --- > [INFO] org.apache.hudi:hudi-utilities:jar:0.5.0-SNAPSHOT > [INFO] ... > [INFO] +- org.apache.hudi:hudi-client:jar:0.5.0-SNAPSHOT:compile >... > [INFO] > [INFO] +- org.apache.hudi:hudi-spark:jar:0.5.0-SNAPSHOT:compile > [INFO] | \- org.scala-lang:scala-library:jar:2.11.8:compile > [INFO] +- log4j:log4j:jar:1.2.17:compile >... > [INFO] +- org.apache.spark:spark-core_2.11:jar:2.1.0:provided > [INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:provided > [INFO] | | +- org.apache.avro:avro-ipc:jar:1.7.7:provided > [INFO] | | \- org.apache.avro:avro-ipc:jar:tests:1.7.7:provided > [INFO] | +- com.twitter:chill_2.11:jar:0.8.0:provided > [INFO] | +- com.twitter:chill-java:jar:0.8.0:provided > [INFO] | +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:provided > [INFO] | +- org.apache.spark:spark-launcher_2.11:jar:2.1.0:provided > [INFO] | +- org.apache.spark:spark-network-common_2.11:jar:2.1.0:provided > [INFO] | +- org.apache.spark:spark-network-shuffle_2.11:jar:2.1.0:provided > [INFO] | +- org.apache.spark:spark-unsafe_2.11:jar:2.1.0:provided > [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.7.1:provided > [INFO] | +- org.apache.curator:curator-recipes:jar:2.4.0:provided > [INFO] | +- org.apache.commons:commons-lang3:jar:3.5:provided > [INFO] | +- org.apache.commons:commons-math3:jar:3.4.1:provided > [INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:provided > [INFO] | +- org.slf4j:slf4j-api:jar:1.7.16:compile > [INFO] | +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided > [INFO] | +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided > [INFO] | +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile > [INFO] | +- com.ning:compress-lzf:jar:1.0.3:provided > [INFO] | +- org.xerial.snappy:snappy-java:jar:1.1.2.6:compile > [INFO] | +- net.jpountz.lz4:lz4:jar:1.3.0:compile > [INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:provided > [INFO] | +- commons-net:commons-net:jar:2.2:provided > > [INFO] +- org.apache.spark:spark-sql_2.11:jar:2.1.0:provided > [INFO] | +- com.univocity:univocity-parsers:jar:2.2.1:provided > [INFO] | +- org.apache.spark:spark-sketch_2.11:jar:2.1.0:provided > [INFO] | \- org.apache.spark:spark-catalyst_2.11:jar:2.1.0:provided > [INFO] | +- org.codehaus.janino:janino:jar:3.0.0:provided > [INFO] | +- org.codehaus.janino:commons-compiler:jar:3.0.0:provided > [INFO] | \- org.antlr:antlr4-runtime:jar:4.5.3:provided > [INFO] +- com.databricks:spark-avro_2.11:jar:4.0.0:provided > [INFO] +- org.apache.spark:spark-streaming_2.11:jar:2.1.0:compile > [INFO] +- org.apache.spark:spark-streaming-kafka-0-8_2.11:jar:2.1.0:compile > [INFO] | \- org.apache.kafka:kafka_2.11:jar:0.8.2.1:compile > [INFO] | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile > [INFO] | +- > org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile > [INFO] | \- org.apache.kafka:kafka-clients:jar:0.8.2.1:compile > [INFO] +- io.dropwizard.metrics:metrics-core:jar:4.0.2:compile > [INFO] +- org.antlr:stringtemplate:jar:4.0.2:compile > [INFO] | \- org.antlr:antlr-runtime:jar:3.3:compile > [INFO] +- com.beust:jcommander:jar:1.72:compile > [INFO] +- com.twitter:bijection-avro_2.11:jar:0.9.2:compile > [INFO] | \- com.twitter:bijection-core_2.11:jar:0.9.2:compile > [INFO] +- io.confluent:ka
[jira] [Commented] (HUDI-590) Cut a new Doc version 0.5.1 explicitly
[ https://issues.apache.org/jira/browse/HUDI-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032262#comment-17032262 ] leesf commented on HUDI-590: [~bhavanisudha] Thanks. > Cut a new Doc version 0.5.1 explicitly > -- > > Key: HUDI-590 > URL: https://issues.apache.org/jira/browse/HUDI-590 > Project: Apache Hudi (incubating) > Issue Type: Task > Components: Docs, Release Administrative >Reporter: Bhavani Sudha >Assignee: Bhavani Sudha >Priority: Major > > The latest version of docs needs to be tagged as 0.5.1 explicitly in the > site. Follow instructions in > [https://github.com/apache/incubator-hudi/blob/asf-site/README.md#updating-site] > to create a new dir 0.5.1 under docs/_docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-549) Update documentation to reflect changes in package names due to scala cross compiling support
[ https://issues.apache.org/jira/browse/HUDI-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-549. Resolution: Fixed Fixed via master: 1e79cbc259b92f75e5fd387c0271b163532aebb9 > Update documentation to reflect changes in package names due to scala cross > compiling support > - > > Key: HUDI-549 > URL: https://issues.apache.org/jira/browse/HUDI-549 > Project: Apache Hudi (incubating) > Issue Type: Task > Components: Release Administrative >Reporter: Balaji Varadarajan >Assignee: Bhavani Sudha Saktheeswaran >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Two versions of each of the below packages will be built. Please note the > change in package names and update documentation. > hudi-spark is hudi-spark_2.11 and hudi-spark_2.12 > hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12 > hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12 > hudi-utilities-bundle is hudi-utilities-bundle_2.11 and > hudi-utilities-bundle_2.12 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-550) Add to Release Notes : Configuration Value change for Kafka Reset Offset Strategies
[ https://issues.apache.org/jira/browse/HUDI-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-550: --- Status: Patch Available (was: In Progress) > Add to Release Notes : Configuration Value change for Kafka Reset Offset > Strategies > --- > > Key: HUDI-550 > URL: https://issues.apache.org/jira/browse/HUDI-550 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Release Administrative >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Blocker > Fix For: 0.5.1 > > > Enum Values are changed for configuring kafka reset offset strategies in > deltastreamer > LARGEST -> LATEST > SMALLEST -> EARLIEST > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-403) Publish a deployment guide talking about deployment options, upgrading etc
[ https://issues.apache.org/jira/browse/HUDI-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-403. Resolution: Fixed Fixed via asf-site: 41754bb31bb8656d0570371ba2283c987f9a8c22 > Publish a deployment guide talking about deployment options, upgrading etc > -- > > Key: HUDI-403 > URL: https://issues.apache.org/jira/browse/HUDI-403 > Project: Apache Hudi (incubating) > Issue Type: New Feature > Components: Docs >Reporter: Vinoth Chandar >Assignee: Balaji Varadarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 40m > Remaining Estimate: 0h > > Things to cover > # Upgrade readers first, Upgrade writers next, Principles of compatibility > followed > # DeltaStreamer Deployment models > # Scheduling Compactions. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-547) Call out changes in package names due to scala cross compiling support
[ https://issues.apache.org/jira/browse/HUDI-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-547: --- Status: Patch Available (was: In Progress) > Call out changes in package names due to scala cross compiling support > -- > > Key: HUDI-547 > URL: https://issues.apache.org/jira/browse/HUDI-547 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Release Administrative >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Blocker > Fix For: 0.5.1 > > > Two versions of each of the below packages needs to be built. > hudi-spark is hudi-spark_2.11 and hudi-spark_2.12 > hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12 > hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12 > hudi-utilities-bundle is hudi-utilities-bundle_2.11 and > hudi-utilities-bundle_2.12 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-536) Update release notes to include KeyGenerator package changes
[ https://issues.apache.org/jira/browse/HUDI-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-536: --- Status: In Progress (was: Open) > Update release notes to include KeyGenerator package changes > > > Key: HUDI-536 > URL: https://issues.apache.org/jira/browse/HUDI-536 > Project: Apache Hudi (incubating) > Issue Type: Bug >Reporter: Brandon Scheller >Priority: Major > Fix For: 0.5.1 > > > The change introduced here: > [https://github.com/apache/incubator-hudi/pull/1194] > Refactors hudi keygenerators into their own package. > We need to make this a backwards compatible change or update the release > notes to address this. > Specifically: > org.apache.hudi.ComplexKeyGenerator -> > org.apache.hudi.keygen.ComplexKeyGenerator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-580) Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh
leesf created HUDI-580: -- Summary: Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh Key: HUDI-580 URL: https://issues.apache.org/jira/browse/HUDI-580 Project: Apache Hudi (incubating) Issue Type: Improvement Components: newbie Reporter: leesf Fix For: 0.5.2 Issues pointed out in general@incubator ML, more context here: [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-580) Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh
[ https://issues.apache.org/jira/browse/HUDI-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-580: --- Description: Issues pointed out in general@incubator ML, more context here: [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E] Would get it fixed before next release. was:Issues pointed out in general@incubator ML, more context here: [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E] > Incorrect license header in docker/hoodie/hadoop/base/entrypoint.sh > --- > > Key: HUDI-580 > URL: https://issues.apache.org/jira/browse/HUDI-580 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: newbie >Reporter: leesf >Priority: Major > Fix For: 0.5.2 > > > Issues pointed out in general@incubator ML, more context here: > [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E] > > Would get it fixed before next release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-582) NOTICE year is incorrect
leesf created HUDI-582: -- Summary: NOTICE year is incorrect Key: HUDI-582 URL: https://issues.apache.org/jira/browse/HUDI-582 Project: Apache Hudi (incubating) Issue Type: Improvement Components: newbie Reporter: leesf Fix For: 0.5.2 Issues pointed out in general@incubator ML, more context here: [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E] Would get it fixed before next release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-581) NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files
leesf created HUDI-581: -- Summary: NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files Key: HUDI-581 URL: https://issues.apache.org/jira/browse/HUDI-581 Project: Apache Hudi (incubating) Issue Type: Improvement Reporter: leesf Issues pointed out in general@incubator ML, more context here: [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E] Would get it fixed before next release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-579) Add border to table on hudi website
[ https://issues.apache.org/jira/browse/HUDI-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-579. Resolution: Fixed Fixed via asf-site: 4670c026010b61d5bd591119902a19d64d2b8889 > Add border to table on hudi website > --- > > Key: HUDI-579 > URL: https://issues.apache.org/jira/browse/HUDI-579 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Docs >Reporter: lamber-ken >Assignee: lamber-ken >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Add border to table which on hudi website -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-595) code cleanup
[ https://issues.apache.org/jira/browse/HUDI-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-595: --- Status: Open (was: New) > code cleanup > - > > Key: HUDI-595 > URL: https://issues.apache.org/jira/browse/HUDI-595 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Code Cleanup >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Moving out the cleanup code from PR# 1159 into a separate PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-595) code cleanup
[ https://issues.apache.org/jira/browse/HUDI-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-595. Resolution: Fixed Fixed via master: 594da28fbf64fb20432e718a409577fd10516c4a > code cleanup > - > > Key: HUDI-595 > URL: https://issues.apache.org/jira/browse/HUDI-595 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Code Cleanup >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Moving out the cleanup code from PR# 1159 into a separate PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-586) Revisit the release guide
[ https://issues.apache.org/jira/browse/HUDI-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-586: -- Assignee: leesf > Revisit the release guide > - > > Key: HUDI-586 > URL: https://issues.apache.org/jira/browse/HUDI-586 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Release Administrative >Reporter: leesf >Assignee: leesf >Priority: Major > Fix For: 0.6.0 > > > Currently, the release guide is not very standard, mainly meaning the > finalize the release step, we would refer to FLINK > [https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release] > , main change might be not adding rc-\{RC_NUM} to the pom.xml. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-595) code cleanup
[ https://issues.apache.org/jira/browse/HUDI-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-595. -- > code cleanup > - > > Key: HUDI-595 > URL: https://issues.apache.org/jira/browse/HUDI-595 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Code Cleanup >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Moving out the cleanup code from PR# 1159 into a separate PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-599) Update release guide/release scripts due to the change of scala 2.12 build
[ https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-599: --- Summary: Update release guide/release scripts due to the change of scala 2.12 build (was: Update Release guide due to the change of scala 2.12 build) > Update release guide/release scripts due to the change of scala 2.12 build > -- > > Key: HUDI-599 > URL: https://issues.apache.org/jira/browse/HUDI-599 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Release Administrative >Reporter: leesf >Assignee: leesf >Priority: Major > Fix For: 0.5.2 > > > Update release guide due to the change of scala 2.12 build, PR link below > [https://github.com/apache/incubator-hudi/pull/1293] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build
[ https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-599: --- Summary: Update release guide & release scripts due to the change of scala 2.12 build (was: Update release guide/release scripts due to the change of scala 2.12 build) > Update release guide & release scripts due to the change of scala 2.12 build > > > Key: HUDI-599 > URL: https://issues.apache.org/jira/browse/HUDI-599 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Release Administrative >Reporter: leesf >Assignee: leesf >Priority: Major > Fix For: 0.5.2 > > > Update release guide due to the change of scala 2.12 build, PR link below > [https://github.com/apache/incubator-hudi/pull/1293] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-590) Cut a new Doc version 0.5.1 explicitly
[ https://issues.apache.org/jira/browse/HUDI-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030656#comment-17030656 ] leesf commented on HUDI-590: [~bhavanisudha] It would be better to create the 0.5.1 version faster since there are some docs update in exist PRs. WDYT? > Cut a new Doc version 0.5.1 explicitly > -- > > Key: HUDI-590 > URL: https://issues.apache.org/jira/browse/HUDI-590 > Project: Apache Hudi (incubating) > Issue Type: Task > Components: Docs, Release Administrative >Reporter: Bhavani Sudha >Assignee: Bhavani Sudha >Priority: Major > > The latest version of docs needs to be tagged as 0.5.1 explicitly in the > site. Follow instructions in > [https://github.com/apache/incubator-hudi/blob/asf-site/README.md#updating-site] > to create a new dir 0.5.1 under docs/_docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-599) Update Release guide due to the change of scala 2.12 build
leesf created HUDI-599: -- Summary: Update Release guide due to the change of scala 2.12 build Key: HUDI-599 URL: https://issues.apache.org/jira/browse/HUDI-599 Project: Apache Hudi (incubating) Issue Type: Improvement Components: Release Administrative Reporter: leesf Assignee: leesf Fix For: 0.5.2 Update release guide due to the change of scala 2.12 build, PR link below [https://github.com/apache/incubator-hudi/pull/1293] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-238) Make separate release for hudi spark/scala based packages for scala 2.12
[ https://issues.apache.org/jira/browse/HUDI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028623#comment-17028623 ] leesf commented on HUDI-238: Fixed via master: 292c1e2ff436a711cbbb53ad9b1f6232121d53ec > Make separate release for hudi spark/scala based packages for scala 2.12 > - > > Key: HUDI-238 > URL: https://issues.apache.org/jira/browse/HUDI-238 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Release Administrative, Usability >Reporter: Balaji Varadarajan >Assignee: Tadas Sugintas >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > [https://github.com/apache/incubator-hudi/issues/881#issuecomment-528700749] > Suspects: > h3. Hudi utilities package > bringing in spark-streaming-kafka-0.8* > {code:java} > [INFO] Scanning for projects... > [INFO] > [INFO] ---< org.apache.hudi:hudi-utilities > >--- > [INFO] Building hudi-utilities 0.5.0-SNAPSHOT > [INFO] [ jar > ]- > [INFO] > [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ hudi-utilities > --- > [INFO] org.apache.hudi:hudi-utilities:jar:0.5.0-SNAPSHOT > [INFO] ... > [INFO] +- org.apache.hudi:hudi-client:jar:0.5.0-SNAPSHOT:compile >... > [INFO] > [INFO] +- org.apache.hudi:hudi-spark:jar:0.5.0-SNAPSHOT:compile > [INFO] | \- org.scala-lang:scala-library:jar:2.11.8:compile > [INFO] +- log4j:log4j:jar:1.2.17:compile >... > [INFO] +- org.apache.spark:spark-core_2.11:jar:2.1.0:provided > [INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:provided > [INFO] | | +- org.apache.avro:avro-ipc:jar:1.7.7:provided > [INFO] | | \- org.apache.avro:avro-ipc:jar:tests:1.7.7:provided > [INFO] | +- com.twitter:chill_2.11:jar:0.8.0:provided > [INFO] | +- com.twitter:chill-java:jar:0.8.0:provided > [INFO] | +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:provided > [INFO] | +- org.apache.spark:spark-launcher_2.11:jar:2.1.0:provided > [INFO] | +- org.apache.spark:spark-network-common_2.11:jar:2.1.0:provided > [INFO] | +- org.apache.spark:spark-network-shuffle_2.11:jar:2.1.0:provided > [INFO] | +- org.apache.spark:spark-unsafe_2.11:jar:2.1.0:provided > [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.7.1:provided > [INFO] | +- org.apache.curator:curator-recipes:jar:2.4.0:provided > [INFO] | +- org.apache.commons:commons-lang3:jar:3.5:provided > [INFO] | +- org.apache.commons:commons-math3:jar:3.4.1:provided > [INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:provided > [INFO] | +- org.slf4j:slf4j-api:jar:1.7.16:compile > [INFO] | +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided > [INFO] | +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided > [INFO] | +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile > [INFO] | +- com.ning:compress-lzf:jar:1.0.3:provided > [INFO] | +- org.xerial.snappy:snappy-java:jar:1.1.2.6:compile > [INFO] | +- net.jpountz.lz4:lz4:jar:1.3.0:compile > [INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:provided > [INFO] | +- commons-net:commons-net:jar:2.2:provided > > [INFO] +- org.apache.spark:spark-sql_2.11:jar:2.1.0:provided > [INFO] | +- com.univocity:univocity-parsers:jar:2.2.1:provided > [INFO] | +- org.apache.spark:spark-sketch_2.11:jar:2.1.0:provided > [INFO] | \- org.apache.spark:spark-catalyst_2.11:jar:2.1.0:provided > [INFO] | +- org.codehaus.janino:janino:jar:3.0.0:provided > [INFO] | +- org.codehaus.janino:commons-compiler:jar:3.0.0:provided > [INFO] | \- org.antlr:antlr4-runtime:jar:4.5.3:provided > [INFO] +- com.databricks:spark-avro_2.11:jar:4.0.0:provided > [INFO] +- org.apache.spark:spark-streaming_2.11:jar:2.1.0:compile > [INFO] +- org.apache.spark:spark-streaming-kafka-0-8_2.11:jar:2.1.0:compile > [INFO] | \- org.apache.kafka:kafka_2.11:jar:0.8.2.1:compile > [INFO] | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile > [INFO] | +- > org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile > [INFO] | \- org.apache.kafka:kafka-clients:jar:0.8.2.1:compile > [INFO] +- io.dropwizard.metrics:metrics-core:jar:4.0.2:compile > [INFO] +- org.antlr:stringtemplate:jar:4.0.2:compile > [INFO] | \- org.antlr:antlr-runtime:jar:3.3:compile > [INFO] +- com.beust:jcommander:jar:1.72:compile > [INFO] +- com.twitter:bijection-avro_2.11:jar:0.9.2:compile > [INFO] | \- com.twitter:bijection-core_2.11:jar:0.9
[jira] [Updated] (HUDI-550) Add to Release Notes : Configuration Value change for Kafka Reset Offset Strategies
[ https://issues.apache.org/jira/browse/HUDI-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-550: --- Fix Version/s: (was: 0.5.2) 0.5.1 > Add to Release Notes : Configuration Value change for Kafka Reset Offset > Strategies > --- > > Key: HUDI-550 > URL: https://issues.apache.org/jira/browse/HUDI-550 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Release Administrative >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Blocker > Fix For: 0.5.1 > > > Enum Values are changed for configuring kafka reset offset strategies in > deltastreamer > LARGEST -> LATEST > SMALLEST -> EARLIEST > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-547) Call out changes in package names due to scala cross compiling support
[ https://issues.apache.org/jira/browse/HUDI-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-547: --- Fix Version/s: (was: 0.5.2) 0.5.1 > Call out changes in package names due to scala cross compiling support > -- > > Key: HUDI-547 > URL: https://issues.apache.org/jira/browse/HUDI-547 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Release Administrative >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Blocker > Fix For: 0.5.1 > > > Two versions of each of the below packages needs to be built. > hudi-spark is hudi-spark_2.11 and hudi-spark_2.12 > hudi-utilities is hudi-utilities_2.11 and hudi-utilities_2.12 > hudi-spark-bundle is hudi-spark-bundle_2.11 and hudi-spark-bundle_2.12 > hudi-utilities-bundle is hudi-utilities-bundle_2.11 and > hudi-utilities-bundle_2.12 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-12) Upgrade Hudi to Spark 2.4
[ https://issues.apache.org/jira/browse/HUDI-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-12: -- Fix Version/s: (was: 0.5.2) 0.5.1 > Upgrade Hudi to Spark 2.4 > - > > Key: HUDI-12 > URL: https://issues.apache.org/jira/browse/HUDI-12 > Project: Apache Hudi (incubating) > Issue Type: New Feature > Components: Usability, Writer Core >Reporter: Vinoth Chandar >Assignee: Udit Mehrotra >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > https://github.com/uber/hudi/issues/549 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-343) Create a DOAP File for Hudi
[ https://issues.apache.org/jira/browse/HUDI-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-343: --- Fix Version/s: (was: 0.5.2) 0.5.1 > Create a DOAP File for Hudi > --- > > Key: HUDI-343 > URL: https://issues.apache.org/jira/browse/HUDI-343 > Project: Apache Hudi (incubating) > Issue Type: New Feature > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > But please create a DOAP file for Hudi, where you can also list the > release: https://projects.apache.org/create.html > <https://projects.apache.org/project.html?incubator-hudi> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-377) Add Delete() support to HoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-377: --- Fix Version/s: (was: 0.5.2) 0.5.1 > Add Delete() support to HoodieDeltaStreamer > --- > > Key: HUDI-377 > URL: https://issues.apache.org/jira/browse/HUDI-377 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: DeltaStreamer >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Original Estimate: 72h > Time Spent: 20m > Remaining Estimate: 71h 40m > > Add Delete() support to HoodieDeltaStreamer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-389) Updates sent to diff partition for a given key with Global Index
[ https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028630#comment-17028630 ] leesf commented on HUDI-389: Fixed via master: 9c4217a3e1b9b728690282c914db2067117f4cfb > Updates sent to diff partition for a given key with Global Index > - > > Key: HUDI-389 > URL: https://issues.apache.org/jira/browse/HUDI-389 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Index >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Original Estimate: 48h > Time Spent: 20m > Remaining Estimate: 47h 40m > > Updates sent to diff partition for a given key with Global Index should > succeed by updating the record under original partition. As of now, it throws > exception. > [https://github.com/apache/incubator-hudi/issues/1021] > > > error log: > {code:java} > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants > java.util.stream.ReferencePipeline$Head@d02b1c7 > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file > system view for partition (2016/04/15) > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found > in partition (2016/04/15) =0, Time taken =0 > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.AbstractTableFileSystemView - > addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0 > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding > file-groups for partition :2016/04/15, #FileGroups=0 > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load > partition (2016/04/15) =0 > 14754 [Executor task launch worker-0] ERROR > com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType > UPDATE for partition :0 > java.util.NoSuchElementException: No value present > at com.uber.hoodie.common.util.Option.get(Option.java:112) > at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71) > at > com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226) > at > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180) > at > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263) > at > com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apac
[jira] [Updated] (HUDI-443) Add slides for Hadoop summit 2019, Bangalore to powered-by page
[ https://issues.apache.org/jira/browse/HUDI-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-443: --- Fix Version/s: (was: 0.5.2) 0.5.1 > Add slides for Hadoop summit 2019, Bangalore to powered-by page > --- > > Key: HUDI-443 > URL: https://issues.apache.org/jira/browse/HUDI-443 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Docs, newbie >Reporter: Pratyaksh Sharma >Assignee: Pratyaksh Sharma >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Add slides for the talk on Apache Hudi and debezium at Hadoop summit 2019, > Bangalore to powered-by page -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-389) Updates sent to diff partition for a given key with Global Index
[ https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-389: --- Fix Version/s: (was: 0.5.2) 0.5.1 > Updates sent to diff partition for a given key with Global Index > - > > Key: HUDI-389 > URL: https://issues.apache.org/jira/browse/HUDI-389 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Index >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Original Estimate: 48h > Time Spent: 20m > Remaining Estimate: 47h 40m > > Updates sent to diff partition for a given key with Global Index should > succeed by updating the record under original partition. As of now, it throws > exception. > [https://github.com/apache/incubator-hudi/issues/1021] > > > error log: > {code:java} > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants > java.util.stream.ReferencePipeline$Head@d02b1c7 > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file > system view for partition (2016/04/15) > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found > in partition (2016/04/15) =0, Time taken =0 > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.AbstractTableFileSystemView - > addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0 > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding > file-groups for partition :2016/04/15, #FileGroups=0 > 14738 [Executor task launch worker-0] INFO > com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load > partition (2016/04/15) =0 > 14754 [Executor task launch worker-0] ERROR > com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType > UPDATE for partition :0 > java.util.NoSuchElementException: No value present > at com.uber.hoodie.common.util.Option.get(Option.java:112) > at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71) > at > com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226) > at > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180) > at > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263) > at > com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(
[jira] [Commented] (HUDI-311) Support AWS DMS source on DeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028629#comment-17028629 ] leesf commented on HUDI-311: Fix via master: 350b0ecb4d137411c6231a1568add585c6d7b7d5 > Support AWS DMS source on DeltaStreamer > --- > > Key: HUDI-311 > URL: https://issues.apache.org/jira/browse/HUDI-311 > Project: Apache Hudi (incubating) > Issue Type: New Feature > Components: DeltaStreamer >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > https://aws.amazon.com/dms/ seems like a one-stop shop for database change > logs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-415) HoodieSparkSqlWriter Commit time not representing the Spark job starting time
[ https://issues.apache.org/jira/browse/HUDI-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-415: --- Fix Version/s: (was: 0.5.2) 0.5.1 > HoodieSparkSqlWriter Commit time not representing the Spark job starting time > - > > Key: HUDI-415 > URL: https://issues.apache.org/jira/browse/HUDI-415 > Project: Apache Hudi (incubating) > Issue Type: Bug >Reporter: Yanjia Gary Li >Assignee: Yanjia Gary Li >Priority: Major > Labels: pull-request-available > Fix For: 0.5.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Hudi records the commit time after the first action complete. If there is a > heavy transformation before isEmpty(), then the commit time could be > inaccurate. > {code:java} > if (hoodieRecords.isEmpty()) { > log.info("new batch has no new records, skipping...") > return (true, common.util.Option.empty()) > } > commitTime = client.startCommit() > writeStatuses = DataSourceUtils.doWriteOperation(client, hoodieRecords, > commitTime, operation) > {code} > For example, I start the spark job at 20190101, but *isEmpty()* ran for 2 > hours, then the commit time in the .hoodie folder will be 201901010*2*00. If > I use the commit time to ingest data starting from 201901010200(from HDFS, > not using deltastreamer), then I will miss 2 hours of data. > Is this set up intended? Can we move the commit time before isEmpty()? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-106) Dynamically tune bloom filter entries
[ https://issues.apache.org/jira/browse/HUDI-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-106: --- Fix Version/s: (was: 0.5.2) 0.5.1 > Dynamically tune bloom filter entries > - > > Key: HUDI-106 > URL: https://issues.apache.org/jira/browse/HUDI-106 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Index >Reporter: Vinoth Chandar >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available, realtime-data-lakes > Fix For: 0.5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Tuning bloom filters is currently based on a configuration, that could be > cumbersome to tune per dataset to obtain good indexing performance.. Lets add > support for Dynamic Bloom Filters, that can automatically achieve a > configured false positive ratio depending on number of entries. -- This message was sent by Atlassian Jira (v8.3.4#803005)