KarthickAN edited a comment on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-710747888
@nsivabalan I tried out Dynamic filter. It seems to be fine. It's growing
along with the number of entries dynamically. That's a good feature. Thanks.
However what's
KarthickAN commented on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-710747888
@nsivabalan I tried out Dynamic filter. It seems to be fine. It's growing
along with the number of entries dynamically. That's a good feature. Thanks.
However what's the
bvaradar commented on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-710713850
@ashishmgofficial : If I need to test with Kafka, would need a way to
generate both Key and Value payload. Do you have some script to publish records
to Kafka ? BTW, yeah, you are
Prashant Wason created HUDI-1346:
Summary: Fix clean and Asyn Clean when metadata table is enabled
Key: HUDI-1346
URL: https://issues.apache.org/jira/browse/HUDI-1346
Project: Apache Hudi
[
https://issues.apache.org/jira/browse/HUDI-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Wason updated HUDI-1346:
-
Status: Open (was: New)
> Fix clean and Asyn Clean when metadata table is enabled
>
[
https://issues.apache.org/jira/browse/HUDI-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Wason updated HUDI-1346:
-
Status: In Progress (was: Open)
> Fix clean and Asyn Clean when metadata table is enabled
>
bvaradar commented on issue #2162:
URL: https://github.com/apache/hudi/issues/2162#issuecomment-710699297
@liujinhui1994 : Did this work ?
This is an automated message from the Apache Git Service.
To respond to the message,
bvaradar commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-710699046
@rahulpoptani : Would it be possible to test with OSS spark version and read
the snapshot to verify ?
This is an
codecov-io commented on pull request #2185:
URL: https://github.com/apache/hudi/pull/2185#issuecomment-710695585
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2185?src=pr=h1) Report
> Merging
[#2185](https://codecov.io/gh/apache/hudi/pull/2185?src=pr=desc) into
codecov-io edited a comment on pull request #2185:
URL: https://github.com/apache/hudi/pull/2185#issuecomment-710695585
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2185?src=pr=h1) Report
> Merging
[#2185](https://codecov.io/gh/apache/hudi/pull/2185?src=pr=desc) into
[
https://issues.apache.org/jira/browse/HUDI-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-1345:
-
Labels: pull-request-available (was: )
> undo Hbase and htrace relocation in hudi-utilities
bhasudha opened a new pull request #2185:
URL: https://github.com/apache/hudi/pull/2185
## *Tips*
- *Thank you very much for contributing to Apache Hudi.*
- *Please review https://hudi.apache.org/contributing.html before opening a
pull request.*
## What is the purpose of the
[
https://issues.apache.org/jira/browse/HUDI-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bhavani Sudha updated HUDI-1345:
Summary: undo Hbase and htrace relocation in Hudi-utilities bundle as well
(was: undo base and
[
https://issues.apache.org/jira/browse/HUDI-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bhavani Sudha updated HUDI-1345:
Summary: undo Hbase and htrace relocation in hudi-utilities bundle as well
(was: undo Hbase and
Bhavani Sudha created HUDI-1345:
---
Summary: undo base and htrace relocation in Hudi-utilities bundle
as well
Key: HUDI-1345
URL: https://issues.apache.org/jira/browse/HUDI-1345
Project: Apache Hudi
vinothchandar commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-710520474
So to clarify, GLOBAL_SIMPLE helps when the workload is random writes and
affecting every file for e.g in each write. But it is indeed slow in the sense,
it ll join against the
nsivabalan commented on pull request #2092:
URL: https://github.com/apache/hudi/pull/2092#issuecomment-710060014
LGTM. Do fix the title and description since you have fixed the rollback as
well.
Once you are done, let me know. or feel free to go ahead and merge it.
nsivabalan commented on a change in pull request #2092:
URL: https://github.com/apache/hudi/pull/2092#discussion_r506441500
##
File path: hudi-integ-test/src/test/resources/unit-test-cow-dag.yaml
##
@@ -17,23 +17,53 @@ first_insert:
config:
record_size: 7
nsivabalan edited a comment on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-710031904
If you wish to scale the bloom filer size along with the number of entries,
you can try out dynamic bloom filter.
Remember this is different from hoodie.index.type which
ashishmgofficial edited a comment on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-710034993
I followed these steps :
```
- Took fresh clone of release-0.6.0 branch
- applied the patch provided
- build and used the jar to run below commands
ashishmgofficial commented on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-710034993
AvroKafkaSource :
```
spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.4 --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
ashishmgofficial edited a comment on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-710034993
**AvroKafkaSource** :
```
spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.4 --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
nsivabalan edited a comment on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-710031904
If you wish to have dynamic bloom filter that scales its size as the number
of entries increase, you can try it out.
Remember this is different from hoodie.index.type
nsivabalan commented on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-710031904
If you wish to have dynamic bloom filter that scales its size as the number
of entries incase, you can try it out.
Remember this is different from hoodie.index.type which refers
ashishmgofficial edited a comment on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-710023639
@bvaradar Isnt the ``` --source-ordering-field _ts_ms ``` Then
precombine should be looking in for _ts_ms right for deletion ?
Delete worked fine for me as
ashishmgofficial edited a comment on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-710023639
@bvaradar Isnt the ``` --source-ordering-field _ts_ms ``` Then
precombine should be looking in for _ts_ms right for deletion ?
Delete worked fine for me as
nsivabalan commented on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-710029348
yes, are you right. bitsize to initialize bloom is an objective function of
both numEntries and ffp.
(int) Math.ceil(numEntries * (-Math.log(errorRate) / (Math.log(2) *
ashishmgofficial edited a comment on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-710023639
@bvaradar Isnt the ``` --source-ordering-field _ts_ms ``` Then
precombine should be looking in for _ts_ms right for deletion ?
I checked the same scenario
ashishmgofficial edited a comment on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-710023639
@bvaradar Isnt the ``` --source-ordering-field _ts_ms ``` Then
precombine should be looking in for _ts_ms right for deletion ?
I checked the same scenario
ashishmgofficial commented on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-710023639
@bvaradar Isnt the ``` --source-ordering-field _ts_ms ``` Then
precombine should be looking in for _ts_ms right for deletion ?
spyzzz commented on issue #2175:
URL: https://github.com/apache/hudi/issues/2175#issuecomment-71914
@naka13 Yes i'll, but i'd like to make something cleaner first. Yet its
really Q
Still, my avro deserialisation is take 80% of my spark time jobs ...
Dunno yet if there is a wait
lw309637554 commented on a change in pull request #2177:
URL: https://github.com/apache/hudi/pull/2177#discussion_r506287585
##
File path:
hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
##
@@ -194,4 +199,31 @@ class TestCOWDataSource extends
leesf commented on a change in pull request #2177:
URL: https://github.com/apache/hudi/pull/2177#discussion_r506276147
##
File path:
hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
##
@@ -194,4 +199,31 @@ class TestCOWDataSource extends
naka13 commented on issue #2175:
URL: https://github.com/apache/hudi/issues/2175#issuecomment-709882868
@spyzzz Would it be possible for you to share the complete code? It'll be
really helpful for others
This is an
LeoHsu0802 opened a new issue #2184:
URL: https://github.com/apache/hudi/issues/2184
Describe the problem you faced
partition value be duplicated after UPSERT
**Setting in Jupyter Notebook**
```
%%configure -f
{
"conf": {
"spark.jars":
codecov-io edited a comment on pull request #2111:
URL: https://github.com/apache/hudi/pull/2111#issuecomment-708984716
# [Codecov](https://codecov.io/gh/apache/hudi/pull/2111?src=pr=h1) Report
> Merging
[#2111](https://codecov.io/gh/apache/hudi/pull/2111?src=pr=desc) into
spyzzz commented on issue #2175:
URL: https://github.com/apache/hudi/issues/2175#issuecomment-709876261
After some deep research i finally found something. I first try to do only a
read and write without any transformation and its was way faster (around 500K
in 30s) so i tried step by
KarthickAN edited a comment on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-709875380
@bvaradar @nsivabalan I did run some test around this issue. So I ran the
job after setting the config hoodie.index.bloom.num_entries to 150 and
inspected the file
KarthickAN commented on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-709875380
@nsivabalan I did run some test around this issue. So I ran the job after
setting the config hoodie.index.bloom.num_entries to 150 and inspected the
file produced. There are
bvaradar commented on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-709869418
BTW, it looks like both create and delete have the same last_modified_ts
which means that precombine would not have deleted the records. Is this fake
data ? If so, can you set the
bvaradar commented on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-709868001
@ashishmgofficial : With your provided avro file, I am able to ingest
without any errors.
```
spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.4 --class
bvaradar commented on issue #2174:
URL: https://github.com/apache/hudi/issues/2174#issuecomment-709864864
@halkar : Yes, https://issues.apache.org/jira/browse/HUDI-845 tracks it
This is an automated message from the Apache
bvaradar closed issue #2174:
URL: https://github.com/apache/hudi/issues/2174
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
[
https://issues.apache.org/jira/browse/HUDI-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215226#comment-17215226
]
Balaji Varadarajan commented on HUDI-845:
-
Yes [~309637554]. this ticket is for tracking general
halkar commented on issue #2174:
URL: https://github.com/apache/hudi/issues/2174#issuecomment-709830826
@bvaradar thanks for confirming. Are there any plans to support concurrent
writes? I'll try to change the logic not do concurrent writes.
KarthickAN commented on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-709817321
@nsivabalan Please find below my answers
1. That's the average record size. I inspected the parquet files produced
and calculated that based on the metrics I found there.
46 matches
Mail list logo