hughfdjackson commented on issue #2265:
URL: https://github.com/apache/hudi/issues/2265#issuecomment-754502750
@umehrot2
Thanks for looking into this - I'm taking a bit of hope from error message
of the code you linked ;)
[
https://issues.apache.org/jira/browse/HUDI-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
vinoyang updated HUDI-913:
--
Status: Open (was: New)
> Update docs about KeyGenerator
> --
>
>
[
https://issues.apache.org/jira/browse/HUDI-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
vinoyang updated HUDI-913:
--
Fix Version/s: 0.7.0
> Update docs about KeyGenerator
> --
>
> Key:
[
https://issues.apache.org/jira/browse/HUDI-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
vinoyang closed HUDI-913.
-
Resolution: Done
> Update docs about KeyGenerator
> --
>
> Key:
codecov-io edited a comment on pull request #2374:
URL: https://github.com/apache/hudi/pull/2374#issuecomment-750782300
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=h1) Report
> Merging
[#2374](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=desc) (21792c6)
into
wangxianghu opened a new pull request #2405:
URL: https://github.com/apache/hudi/pull/2405
## *Tips*
- *Thank you very much for contributing to Apache Hudi.*
- *Please review https://hudi.apache.org/contributing.html before opening a
pull request.*
## What is the purpose of
[
https://issues.apache.org/jira/browse/HUDI-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wangxianghu updated HUDI-1506:
--
Description:
{code:java}
//
Caused by: org.apache.spark.SparkException: Job aborted due to stage
wangxianghu commented on pull request #2405:
URL: https://github.com/apache/hudi/pull/2405#issuecomment-754644106
@yanghua please take a look when free
This is an automated message from the Apache Git Service.
To respond to
wangxianghu commented on pull request #2404:
URL: https://github.com/apache/hudi/pull/2404#issuecomment-754643863
@yanghua please take a look when free
This is an automated message from the Apache Git Service.
To respond to
SureshK-T2S opened a new issue #2406:
URL: https://github.com/apache/hudi/issues/2406
I am attempting to create a hudi table using a parquet file on S3. The
motivation for this approach is based on this Hudi blog:
codecov-io commented on pull request #2404:
URL: https://github.com/apache/hudi/pull/2404#issuecomment-754638146
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2404?src=pr=h1) Report
> Merging
[#2404](https://codecov.io/gh/apache/hudi/pull/2404?src=pr=desc) (fdeb851)
into
codecov-io edited a comment on pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#issuecomment-751244130
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=h1) Report
> Merging
[#2379](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=desc) (70ffbba)
into
wangxianghu created HUDI-1506:
-
Summary: Fix wrong exception thrown in HoodieAvroUtils
Key: HUDI-1506
URL: https://issues.apache.org/jira/browse/HUDI-1506
Project: Apache Hudi
Issue Type: Bug
[
https://issues.apache.org/jira/browse/HUDI-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-1506:
-
Labels: pull-request-available (was: )
> Fix wrong exception thrown in HoodieAvroUtils
>
wangxianghu opened a new pull request #2404:
URL: https://github.com/apache/hudi/pull/2404
## *Tips*
- *Thank you very much for contributing to Apache Hudi.*
- *Please review https://hudi.apache.org/contributing.html before opening a
pull request.*
## What is the purpose of
yanghua merged pull request #2403:
URL: https://github.com/apache/hudi/pull/2403
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
This is an automated email from the ASF dual-hosted git repository.
vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new ee00bd6 [HUDI-913] Update docs about
liujinhui1994 closed pull request #2386:
URL: https://github.com/apache/hudi/pull/2386
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new f7ca68a Travis CI build asf-site
f7ca68a is
codecov-io commented on pull request #2405:
URL: https://github.com/apache/hudi/pull/2405#issuecomment-754665459
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2405?src=pr=h1) Report
> Merging
[#2405](https://codecov.io/gh/apache/hudi/pull/2405?src=pr=desc) (b51e61e)
into
vinothchandar commented on a change in pull request #2359:
URL: https://github.com/apache/hudi/pull/2359#discussion_r551618729
##
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -232,17 +250,18 @@ void
[
https://issues.apache.org/jira/browse/HUDI-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar updated HUDI-1479:
-
Description:
*Change #1*
{code:java}
public static List getAllPartitionPaths(FileSystem fs,
[
https://issues.apache.org/jira/browse/HUDI-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar updated HUDI-1479:
-
Attachment: image-2021-01-05-10-00-35-187.png
> Replace FSUtils.getAllPartitionPaths() with
>
nsivabalan commented on a change in pull request #2400:
URL: https://github.com/apache/hudi/pull/2400#discussion_r552072132
##
File path: docker/demo/config/test-suite/complex-dag-cow.yaml
##
@@ -14,41 +14,47 @@
# See the License for the specific language governing
nsivabalan commented on pull request #2400:
URL: https://github.com/apache/hudi/pull/2400#issuecomment-754812328
@n3nash : Patch is ready for review.
@satishkotha : I have added clustering node. Do check it out.
This is
[
https://issues.apache.org/jira/browse/HUDI-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259074#comment-17259074
]
Vinoth Chandar commented on HUDI-1459:
--
[~pwason] [~satishkotha]
several users reporting this when
[
https://issues.apache.org/jira/browse/HUDI-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar updated HUDI-1479:
-
Description:
*Change #1*
{code:java}
public static List getAllPartitionPaths(FileSystem fs,
[
https://issues.apache.org/jira/browse/HUDI-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259072#comment-17259072
]
Vinoth Chandar commented on HUDI-1308:
--
More testing on S3 from [~vbalaji]
{code}
Caused by:
[
https://issues.apache.org/jira/browse/HUDI-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259090#comment-17259090
]
Vinoth Chandar commented on HUDI-1479:
--
[~uditme] I have updated the description with detailed steps
codecov-io edited a comment on pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#issuecomment-751244130
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
yanghua commented on a change in pull request #2405:
URL: https://github.com/apache/hudi/pull/2405#discussion_r551988234
##
File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
##
@@ -428,10 +429,14 @@ public static Object
codecov-io edited a comment on pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#issuecomment-751244130
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=h1) Report
> Merging
[#2379](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=desc) (4cc4b35)
into
afilipchik commented on a change in pull request #2380:
URL: https://github.com/apache/hudi/pull/2380#discussion_r552037619
##
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java
##
@@ -0,0 +1,130 @@
+/*
+ *
sivabalan narayanan created HUDI-1507:
-
Summary: Hive sync having issues w/ Clustering
Key: HUDI-1507
URL: https://issues.apache.org/jira/browse/HUDI-1507
Project: Apache Hudi
Issue
afilipchik commented on a change in pull request #2380:
URL: https://github.com/apache/hudi/pull/2380#discussion_r552040233
##
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java
##
@@ -0,0 +1,130 @@
+/*
+ *
nsivabalan commented on a change in pull request #2402:
URL: https://github.com/apache/hudi/pull/2402#discussion_r552046190
##
File path:
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -21,10 +21,10 @@
import
[
https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259040#comment-17259040
]
sivabalan narayanan commented on HUDI-1507:
---
CC : [~satish]
> Hive sync having issues w/
lw309637554 commented on a change in pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#discussion_r551984532
##
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache
lw309637554 commented on a change in pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#discussion_r551984722
##
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache
lw309637554 commented on a change in pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#discussion_r551984380
##
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache
yanghua commented on pull request #2405:
URL: https://github.com/apache/hudi/pull/2405#issuecomment-754694548
@wangxianghu And Travis failed, please check what's wrong...
This is an automated message from the Apache Git
nsivabalan commented on a change in pull request #2402:
URL: https://github.com/apache/hudi/pull/2402#discussion_r55204
##
File path:
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -56,7 +56,7 @@
}
private static Iterable
lw309637554 commented on a change in pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#discussion_r551986565
##
File path:
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -682,6 +693,58 @@ public void
yanghua commented on pull request #2404:
URL: https://github.com/apache/hudi/pull/2404#issuecomment-754682021
@vinothchandar Do you agree with this change?
This is an automated message from the Apache Git Service.
To respond
afilipchik commented on pull request #2380:
URL: https://github.com/apache/hudi/pull/2380#issuecomment-754744538
On making AbstractHoodieKafkaAvroDeserializer abstract - it looks like
modified Confluent deserializer, so it believe it should be called like that.
If we want to support
Ryan Pifer created HUDI-1508:
Summary: Partition update with global index in MOR tables
resulting in duplicate values during read optimized queries
Key: HUDI-1508
URL: https://issues.apache.org/jira/browse/HUDI-1508
[
https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
satish reassigned HUDI-1507:
Assignee: satish
> Hive sync having issues w/ Clustering
> -
>
>
[
https://issues.apache.org/jira/browse/HUDI-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar updated HUDI-1399:
-
Status: Patch Available (was: In Progress)
> support a independent clustering spark job to
codecov-io edited a comment on pull request #2400:
URL: https://github.com/apache/hudi/pull/2400#issuecomment-753557036
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2400?src=pr=h1) Report
> Merging
[#2400](https://codecov.io/gh/apache/hudi/pull/2400?src=pr=desc) (ab40bd6)
into
[
https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-1507:
-
Labels: pull-request-available (was: )
> Hive sync having issues w/ Clustering
>
satishkotha opened a new pull request #2407:
URL: https://github.com/apache/hudi/pull/2407
## What is the purpose of the pull request
Change timeline utils to support reading replacecommit metadata
## Brief change log
HiveSync uses TimelineUtils to get modified
WTa-hash commented on issue #2229:
URL: https://github.com/apache/hudi/issues/2229#issuecomment-754894794
@bvaradar - I would like to understand a little bit more about what's going
on here with the spark stage "Getting small files from partitions" from the
screenshot.
WTa-hash edited a comment on issue #2229:
URL: https://github.com/apache/hudi/issues/2229#issuecomment-754894794
@bvaradar - I would like to understand a little bit more about what's going
on here with the spark stage "Getting small files from partitions" from the
screenshot.
satishkotha commented on a change in pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#discussion_r552133598
##
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -109,6 +111,9 @@ public static void main(String[]
WTa-hash edited a comment on issue #2229:
URL: https://github.com/apache/hudi/issues/2229#issuecomment-754894794
@bvaradar - I would like to understand a little bit more about what's going
on here with the spark stage "Getting small files from partitions" from the
screenshot.
[
https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Wason updated HUDI-1509:
-
Description:
During the in-house testing for 0.5x to 0.6x release upgrade, I have detected a
[
https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259323#comment-17259323
]
Prashant Wason commented on HUDI-1509:
--
I timed the various code fragments involved in the above
Prashant Wason created HUDI-1509:
Summary: Major performance degradation due to rewriting records
with default values
Key: HUDI-1509
URL: https://issues.apache.org/jira/browse/HUDI-1509
Project:
[
https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259323#comment-17259323
]
Prashant Wason edited comment on HUDI-1509 at 1/6/21, 12:52 AM:
I timed
[
https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259323#comment-17259323
]
Prashant Wason edited comment on HUDI-1509 at 1/6/21, 12:52 AM:
I timed
[
https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259323#comment-17259323
]
Prashant Wason edited comment on HUDI-1509 at 1/6/21, 12:52 AM:
I timed
codecov-io commented on pull request #2407:
URL: https://github.com/apache/hudi/pull/2407#issuecomment-754918776
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2407?src=pr=h1) Report
> Merging
[#2407](https://codecov.io/gh/apache/hudi/pull/2407?src=pr=desc) (88ff431)
into
[
https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-1507:
-
Labels: pull-request-available release-blocker (was: release-blocker)
> Hive sync having issues
[
https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-1507:
--
Labels: release-blocker (was: pull-request-available)
> Hive sync having issues w/
jtmzheng opened a new issue #2408:
URL: https://github.com/apache/hudi/issues/2408
**Describe the problem you faced**
We have a Spark Streaming application running on EMR 5.31.0 that reads from
a Kinesis stream (batch interval of 30 minutes) and upserts to a MOR dataset
that is
[
https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Wason updated HUDI-1509:
-
Fix Version/s: 0.7.0
> Major performance degradation due to rewriting records with default values
[
https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259324#comment-17259324
]
Prashant Wason commented on HUDI-1509:
--
So calling getCombinedFieldsToWrite() is adding 275usec for
[
https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Wason updated HUDI-1509:
-
Affects Version/s: 0.7.0
0.6.1
0.6.0
> Major
[
https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259332#comment-17259332
]
Nishith Agarwal commented on HUDI-1509:
---
[~pwason] Thanks for digging into this and instrumenting
lw309637554 commented on a change in pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#discussion_r552329734
##
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -153,7 +161,12 @@ private int
wosow opened a new issue #2409:
URL: https://github.com/apache/hudi/issues/2409
Spark structured Streaming writes to Hudi and synchronizes Hive to create
only read-optimized tables without creating real-time tables , no errors
happening
**Environment Description**
yanghua commented on pull request #2405:
URL: https://github.com/apache/hudi/pull/2405#issuecomment-754997547
@wangxianghu Please check Travis again.
This is an automated message from the Apache Git Service.
To respond to
lw309637554 commented on a change in pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#discussion_r552329605
##
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -109,6 +111,9 @@ public static void main(String[]
lw309637554 commented on a change in pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#discussion_r552318635
##
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -109,6 +111,9 @@ public static void main(String[]
yanghua commented on pull request #2375:
URL: https://github.com/apache/hudi/pull/2375#issuecomment-755056830
> > > Hi @garyli1019. Maybe I think the current implementation is OK.
Beacause even in streaming job, we need to accumulate batch records in memory
during the check-point cycle
ivorzhou commented on pull request #2091:
URL: https://github.com/apache/hudi/pull/2091#issuecomment-755121964
> @ivorzhou : is the requirement to set default value or value from previous
version of the record? if previous version of the record, then guess we already
have another PR for
[
https://issues.apache.org/jira/browse/HUDI-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Udit Mehrotra updated HUDI-1510:
Component/s: (was: Writer Core)
(was: Common Core)
umehrot2 opened a new pull request #2410:
URL: https://github.com/apache/hudi/pull/2410
## What is the purpose of the pull request
Moves HoodieEngineContext class and its dependencies to hudi-common, so that
we can parallelize fetching of files and partitions in
[
https://issues.apache.org/jira/browse/HUDI-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-1510:
-
Labels: pull-request-available (was: )
> Move HoodieEngineContext to hudi-common module
>
codecov-io edited a comment on pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#issuecomment-751244130
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=h1) Report
> Merging
[#2379](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=desc) (d53595e)
into
Udit Mehrotra created HUDI-1510:
---
Summary: Move HoodieEngineContext to hudi-common module
Key: HUDI-1510
URL: https://issues.apache.org/jira/browse/HUDI-1510
Project: Apache Hudi
Issue Type:
[
https://issues.apache.org/jira/browse/HUDI-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Udit Mehrotra updated HUDI-1510:
Issue Type: Improvement (was: Bug)
> Move HoodieEngineContext to hudi-common module
>
codecov-io edited a comment on pull request #2405:
URL: https://github.com/apache/hudi/pull/2405#issuecomment-754665459
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2405?src=pr=h1) Report
> Merging
[#2405](https://codecov.io/gh/apache/hudi/pull/2405?src=pr=desc) (9bded33)
into
Nieal-Yang commented on pull request #2375:
URL: https://github.com/apache/hudi/pull/2375#issuecomment-755103932
> > > > Hi @garyli1019. Maybe I think the current implementation is OK.
Beacause even in streaming job, we need to accumulate batch records in memory
during the check-point
codecov-io edited a comment on pull request #2405:
URL: https://github.com/apache/hudi/pull/2405#issuecomment-754665459
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
85 matches
Mail list logo