date:20190712

[GitHub] [incubator-hudi] thesuperzapper commented on issue #780: Cleanup Maven POM/Classpath

2019-07-12 Thread GitBox

thesuperzapper commented on issue #780: Cleanup Maven POM/Classpath
URL: https://github.com/apache/incubator-hudi/pull/780#issuecomment-511091893
 
 
   > @thesuperzapper as for testing, if you can run the demo steps once and 
confirm there are no NoClassDefFound errors and such, it would be a good start.
   
   After our discussion, just ran through the demo, and it works properly. 
   
   If you want, we can just finalised and merge your one, then I can rebase 
mine? 
   (Unless you really want to make both changes at once)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] eisig commented on issue #779: HoodieDeltaStreamer may insert duplicate record?

2019-07-12 Thread GitBox

eisig commented on issue #779: HoodieDeltaStreamer may insert duplicate record?
URL: https://github.com/apache/incubator-hudi/issues/779#issuecomment-511091534
 
 
   I am testing the master branch.
   
   ```
0 Jul 12 20:01 20190712120106.deltacommit.inflight
  913 Jul 12 20:01 20190712120106.rollback
 1218 Jul 12 19:47 20190712114645.clean
73378 Jul 12 19:47 20190712114645.deltacommit
 1218 Jul 12 19:46 20190712114551.clean
62036 Jul 12 19:46 20190712114551.deltacommit
 1218 Jul 12 19:45 20190712114453.clean
69016 Jul 12 19:45 20190712114453.deltacommit
 1218 Jul 12 19:44 20190712114357.clean
75986 Jul 12 19:44 20190712114357.deltacommit
 1218 Jul 12 19:43 20190712114302.clean
65526 Jul 12 19:43 20190712114302.deltacommit
 1218 Jul 12 19:42 20190712114214.clean
69863 Jul 12 19:42 20190712114214.deltacommit
 1218 Jul 12 19:42 20190712114128.clean
66374 Jul 12 19:42 20190712114128.deltacommit
 1218 Jul 12 19:41 20190712114049.clean
60268 Jul 12 19:41 20190712114049.deltacommit
 1218 Jul 12 19:40 20190712114008.clean
63753 Jul 12 19:40 20190712114008.deltacommit
 1218 Jul 12 19:40 20190712113920.clean
64645 Jul 12 19:40 20190712113920.deltacommit
 1218 Jul 12 19:39 20190712113835.clean
64632 Jul 12 19:39 20190712113835.deltacommit
 1218 Jul 12 19:38 20190712113748.clean
69859 Jul 12 19:38 20190712113748.deltacommit
 1218 Jul 12 19:37 20190712113659.clean
56778 Jul 12 19:37 20190712113659.deltacommit
 1218 Jul 12 19:36 20190712113616.clean
67234 Jul 12 19:36 20190712113616.deltacommit
 1218 Jul 12 19:36 20190712113520.clean
69874 Jul 12 19:36 20190712113520.deltacommit
 1218 Jul 12 19:35 20190712113427.clean
68984 Jul 12 19:35 20190712113427.deltacommit
 1218 Jul 12 19:34 20190712113340.clean
65494 Jul 12 19:34 20190712113340.deltacommit
 1218 Jul 12 19:33 20190712113220.clean
   105746 Jul 12 19:33 20190712113220.deltacommit
 1218 Jul 12 19:32 20190712113129.clean
69853 Jul 12 19:32 20190712113129.deltacommit
 1218 Jul 12 19:31 20190712113031.clean
75100 Jul 12 19:31 20190712113031.deltacommit
 1218 Jul 12 19:30 20190712112927.clean
70739 Jul 12 19:30 20190712112927.deltacommit
 1218 Jul 12 19:29 20190712112829.clean
65504 Jul 12 19:29 20190712112829.deltacommit
 1218 Jul 12 19:28 20190712112737.clean
67232 Jul 12 19:28 20190712112737.deltacommit
 1218 Jul 12 19:27 20190712112638.clean
64629 Jul 12 19:27 20190712112638.deltacommit
 1218 Jul 12 19:26 20190712112547.clean
67225 Jul 12 19:26 20190712112547.deltacommit
61138 Jul 12 19:25 20190712112456.deltacommit
64626 Jul 12 19:24 20190712112407.deltacommit
  913 Jul 12 16:54 20190712085450.rollback
  913 Jul 12 16:51 20190712085153.rollback
  173 Jul 11 11:36 hoodie.properties
   ```
   
   order by time desc
   ```
   4442 Jul 12 19:47 
.349fb959-6762-41dc-b657-c3ac2cb0581f-0_20190712082340.log.116_27-5532-38216
   7067 Jul 12 19:47 
.927ac226-15f5-49f6-916c-c7789a59d722-0_20190712082937.log.110_24-5532-38213
   5292 Jul 12 19:47 
.43d29590-5309-4fa5-9c00-b85fd8e2f23d-0_20190712084151.log.110_23-5532-38212
   5309 Jul 12 19:47 
.00b2332d-ee2e-4b13-9081-338afe0688dd-0_20190712080919.log.101_15-5532-38204
   7999 Jul 12 19:47 
.2d00af36-27b7-4312-955d-c86cc81291f7-0_20190712092457.log.86_14-5532-38203
   4422 Jul 12 19:47 
.3fae4203-735c-462c-99e2-975d1e223bf0-0_20190712080712.log.103_12-5532-38201
   5304 Jul 12 19:47 
.4ed26f22-7660-4c03-8c73-ccf9cdf5f35d-0_20190712082340.log.106_9-5532-38198
   4447 Jul 12 19:47 
.b17cba40-e664-4e17-b3b0-b3e7e74b005a-0_20190712080712.log.98_10-5532-38199
   6188 Jul 12 19:47 
.34cf303a-a4d7-4d8f-8a5b-3d11540535ed-0_20190712082340.log.115_7-5532-38196
   5318 Jul 12 19:47 
.f2b1e4f2-4032-40fc-bd48-287a1f0b5b77-0_20190712081407.log.110_8-5532-38197
   4452 Jul 12 19:47 
.eb9621b0-59c6-42d9-8b20-1c2a8d15b12b-0_20190712083508.log.102_5-5532-38194
1348740 Jul 12 19:47 
5084ee47-9a81-4a21-8557-d1b250f7e16b-0_2-5532-38191_20190712114645.parquet
   5282 Jul 12 19:47 
.c87d3580-86fe-40f9-8f6c-7c95cc91caa6-0_20190712084720.log.118_1-5532-38190
   4423 Jul 12 19:46 
.00b2332d-ee2e-4b13-9081-338afe0688dd-0_20190712080919.log.100_15-5496-38025
   4423 Jul 12 19:46 
.349fb959-6762-41dc-b657-c3ac2cb0581f-0_20190712082340.log.115_13-5496-38023
   4424 Jul 12 19:46 
.4ed26f22-7660-4c03-8c73-ccf9cdf5f35d-0_20190712082340.log.105_10-5496-38020
   4442 Jul 12 19:46 
.43d29590-5309-4fa5-9c00-b85fd8e2f23d-0_20190712084151.log.109_8-5496-38018
   5286 Jul 12 19:46 
.927ac226-15f5-49f6-916c-c7789a59d722-0_20190712082937.log.109_9-5496-38019
   4422 Jul 12 19:46 
.eb9621b0-59c6-42d9-8b20-1c2a8d15b12b-0_20190712083508.log.101_5-5496-38015
1348239 Jul 12 19:46 
5084ee47-9a81-4a21-8557-d1b250f7e16b-0_2-5496-38012_20190712114551.parquet
   4421

[GitHub] [incubator-hudi] eisig edited a comment on issue #779: HoodieDeltaStreamer may insert duplicate record?

2019-07-12 Thread GitBox

eisig edited a comment on issue #779: HoodieDeltaStreamer may insert duplicate 
record?
URL: https://github.com/apache/incubator-hudi/issues/779#issuecomment-511091534
 
 
   @vinothchandar 
   I am testing the master branch.
   
   ```
0 Jul 12 20:01 20190712120106.deltacommit.inflight
  913 Jul 12 20:01 20190712120106.rollback
 1218 Jul 12 19:47 20190712114645.clean
73378 Jul 12 19:47 20190712114645.deltacommit
 1218 Jul 12 19:46 20190712114551.clean
62036 Jul 12 19:46 20190712114551.deltacommit
 1218 Jul 12 19:45 20190712114453.clean
69016 Jul 12 19:45 20190712114453.deltacommit
 1218 Jul 12 19:44 20190712114357.clean
75986 Jul 12 19:44 20190712114357.deltacommit
 1218 Jul 12 19:43 20190712114302.clean
65526 Jul 12 19:43 20190712114302.deltacommit
 1218 Jul 12 19:42 20190712114214.clean
69863 Jul 12 19:42 20190712114214.deltacommit
 1218 Jul 12 19:42 20190712114128.clean
66374 Jul 12 19:42 20190712114128.deltacommit
 1218 Jul 12 19:41 20190712114049.clean
60268 Jul 12 19:41 20190712114049.deltacommit
 1218 Jul 12 19:40 20190712114008.clean
63753 Jul 12 19:40 20190712114008.deltacommit
 1218 Jul 12 19:40 20190712113920.clean
64645 Jul 12 19:40 20190712113920.deltacommit
 1218 Jul 12 19:39 20190712113835.clean
64632 Jul 12 19:39 20190712113835.deltacommit
 1218 Jul 12 19:38 20190712113748.clean
69859 Jul 12 19:38 20190712113748.deltacommit
 1218 Jul 12 19:37 20190712113659.clean
56778 Jul 12 19:37 20190712113659.deltacommit
 1218 Jul 12 19:36 20190712113616.clean
67234 Jul 12 19:36 20190712113616.deltacommit
 1218 Jul 12 19:36 20190712113520.clean
69874 Jul 12 19:36 20190712113520.deltacommit
 1218 Jul 12 19:35 20190712113427.clean
68984 Jul 12 19:35 20190712113427.deltacommit
 1218 Jul 12 19:34 20190712113340.clean
65494 Jul 12 19:34 20190712113340.deltacommit
 1218 Jul 12 19:33 20190712113220.clean
   105746 Jul 12 19:33 20190712113220.deltacommit
 1218 Jul 12 19:32 20190712113129.clean
69853 Jul 12 19:32 20190712113129.deltacommit
 1218 Jul 12 19:31 20190712113031.clean
75100 Jul 12 19:31 20190712113031.deltacommit
 1218 Jul 12 19:30 20190712112927.clean
70739 Jul 12 19:30 20190712112927.deltacommit
 1218 Jul 12 19:29 20190712112829.clean
65504 Jul 12 19:29 20190712112829.deltacommit
 1218 Jul 12 19:28 20190712112737.clean
67232 Jul 12 19:28 20190712112737.deltacommit
 1218 Jul 12 19:27 20190712112638.clean
64629 Jul 12 19:27 20190712112638.deltacommit
 1218 Jul 12 19:26 20190712112547.clean
67225 Jul 12 19:26 20190712112547.deltacommit
61138 Jul 12 19:25 20190712112456.deltacommit
64626 Jul 12 19:24 20190712112407.deltacommit
  913 Jul 12 16:54 20190712085450.rollback
  913 Jul 12 16:51 20190712085153.rollback
  173 Jul 11 11:36 hoodie.properties
   ```
   
   order by time desc
   ```
   4442 Jul 12 19:47 
.349fb959-6762-41dc-b657-c3ac2cb0581f-0_20190712082340.log.116_27-5532-38216
   7067 Jul 12 19:47 
.927ac226-15f5-49f6-916c-c7789a59d722-0_20190712082937.log.110_24-5532-38213
   5292 Jul 12 19:47 
.43d29590-5309-4fa5-9c00-b85fd8e2f23d-0_20190712084151.log.110_23-5532-38212
   5309 Jul 12 19:47 
.00b2332d-ee2e-4b13-9081-338afe0688dd-0_20190712080919.log.101_15-5532-38204
   7999 Jul 12 19:47 
.2d00af36-27b7-4312-955d-c86cc81291f7-0_20190712092457.log.86_14-5532-38203
   4422 Jul 12 19:47 
.3fae4203-735c-462c-99e2-975d1e223bf0-0_20190712080712.log.103_12-5532-38201
   5304 Jul 12 19:47 
.4ed26f22-7660-4c03-8c73-ccf9cdf5f35d-0_20190712082340.log.106_9-5532-38198
   4447 Jul 12 19:47 
.b17cba40-e664-4e17-b3b0-b3e7e74b005a-0_20190712080712.log.98_10-5532-38199
   6188 Jul 12 19:47 
.34cf303a-a4d7-4d8f-8a5b-3d11540535ed-0_20190712082340.log.115_7-5532-38196
   5318 Jul 12 19:47 
.f2b1e4f2-4032-40fc-bd48-287a1f0b5b77-0_20190712081407.log.110_8-5532-38197
   4452 Jul 12 19:47 
.eb9621b0-59c6-42d9-8b20-1c2a8d15b12b-0_20190712083508.log.102_5-5532-38194
1348740 Jul 12 19:47 
5084ee47-9a81-4a21-8557-d1b250f7e16b-0_2-5532-38191_20190712114645.parquet
   5282 Jul 12 19:47 
.c87d3580-86fe-40f9-8f6c-7c95cc91caa6-0_20190712084720.log.118_1-5532-38190
   4423 Jul 12 19:46 
.00b2332d-ee2e-4b13-9081-338afe0688dd-0_20190712080919.log.100_15-5496-38025
   4423 Jul 12 19:46 
.349fb959-6762-41dc-b657-c3ac2cb0581f-0_20190712082340.log.115_13-5496-38023
   4424 Jul 12 19:46 
.4ed26f22-7660-4c03-8c73-ccf9cdf5f35d-0_20190712082340.log.105_10-5496-38020
   4442 Jul 12 19:46 
.43d29590-5309-4fa5-9c00-b85fd8e2f23d-0_20190712084151.log.109_8-5496-38018
   5286 Jul 12 19:46 
.927ac226-15f5-49f6-916c-c7789a59d722-0_20190712082937.log.109_9-5496-38019
   4422 Jul 12 19:46 
.eb9621b0-59c6-42d9-8b20-1c2a8d15b12b-0_20190712083508.log.101_5-5496-38015
1348239 Jul 12 19:46

[incubator-hudi] branch asf-site updated: Remove --key-generator-class CLI arg for DeltaStreamer.

2019-07-12 Thread nagarwal

This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new e55816b  Remove --key-generator-class CLI arg for DeltaStreamer.
e55816b is described below

commit e55816bcf48ccb26612c978b0abc38e4c8df1063
Author: Ethan Guo 
AuthorDate: Fri Jul 12 16:39:53 2019 -0700

Remove --key-generator-class CLI arg for DeltaStreamer.
---
 docs/writing_data.md | 6 --
 1 file changed, 6 deletions(-)

diff --git a/docs/writing_data.md b/docs/writing_data.md
index c2d1df8..9f5eb2b 100644
--- a/docs/writing_data.md
+++ b/docs/writing_data.md
@@ -42,12 +42,6 @@ Usage:  [options]
   parameter "--propsFilePath") can also be passed command line using 
this 
   parameter 
   Default: []
---key-generator-class
-  Subclass of com.uber.hoodie.KeyGenerator to generate a HoodieKey from
-  the given avro record. Built in: SimpleKeyGenerator (uses provided field
-  names as recordkey & partitionpath. Nested fields specified via dot
-  notation, e.g: a.b.c)
-  Default: com.uber.hoodie.SimpleKeyGenerator
 --op
   Takes one of these values : UPSERT (default), INSERT (use when input is
   purely new data/inserts to gain speed)

[GitHub] [incubator-hudi] n3nash merged pull request #785: Remove --key-generator-class CLI arg for DeltaStreamer

2019-07-12 Thread GitBox

n3nash merged pull request #785: Remove --key-generator-class CLI arg for 
DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/785
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] yihua commented on issue #781: [HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and use key generator class specified in datasource properties

2019-07-12 Thread GitBox

yihua commented on issue #781: [HUDI-161] Remove --key-generator-class CLI arg 
in HoodieDeltaStreamer and use key generator class specified in datasource 
properties
URL: https://github.com/apache/incubator-hudi/pull/781#issuecomment-511049228
 
 
   @vinothchandar yes, I'll send another PR on the docs change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-12 Thread GitBox

vinothchandar commented on issue #764: Hoodie 0.4.7:  Error upserting 
bucketType UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-511030769
 
 
   To summarize 
   
   @n3nash is looking into the avro issue 
   and 
   @bhasudha is going to try repro the empty path exception, as a ramp up task. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-hudi] branch master updated: [HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and use key generator class specified in datasource properties. (#781)

2019-07-12 Thread vinoth

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 621c246  [HUDI-161] Remove --key-generator-class CLI arg in 
HoodieDeltaStreamer and use key generator class specified in datasource 
properties. (#781)
621c246 is described below

commit 621c246fa9ea607a3cd8f33fdc3d8b528315e327
Author: Yihua Guo 
AuthorDate: Fri Jul 12 13:45:49 2019 -0700

[HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and 
use key generator class specified in datasource properties. (#781)
---
 .../main/java/com/uber/hoodie/DataSourceUtils.java | 13 +++--
 .../com/uber/hoodie/HoodieSparkSqlWriter.scala |  5 +-
 .../hoodie/utilities/deltastreamer/DeltaSync.java  |  2 +-
 .../deltastreamer/HoodieDeltaStreamer.java |  6 ---
 .../hoodie/utilities/TestHoodieDeltaStreamer.java  | 58 +++---
 5 files changed, 62 insertions(+), 22 deletions(-)

diff --git a/hoodie-spark/src/main/java/com/uber/hoodie/DataSourceUtils.java 
b/hoodie-spark/src/main/java/com/uber/hoodie/DataSourceUtils.java
index e7b9494..d700ff6 100644
--- a/hoodie-spark/src/main/java/com/uber/hoodie/DataSourceUtils.java
+++ b/hoodie-spark/src/main/java/com/uber/hoodie/DataSourceUtils.java
@@ -90,10 +90,17 @@ public class DataSourceUtils {
   }
 
   /**
-   * Create a key generator class via reflection, passing in any configs needed
+   * Create a key generator class via reflection, passing in any configs 
needed.
+   *
+   * If the class name of key generator is configured through the properties 
file, i.e., {@code
+   * props}, use the corresponding key generator class; otherwise, use the 
default key generator
+   * class specified in {@code DataSourceWriteOptions}.
*/
-  public static KeyGenerator createKeyGenerator(String keyGeneratorClass,
-  TypedProperties props) throws IOException {
+  public static KeyGenerator createKeyGenerator(TypedProperties props) throws 
IOException {
+String keyGeneratorClass = props.getString(
+DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY(),
+DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL()
+);
 try {
   return (KeyGenerator) ReflectionUtils.loadClass(keyGeneratorClass, 
props);
 } catch (Throwable e) {
diff --git 
a/hoodie-spark/src/main/scala/com/uber/hoodie/HoodieSparkSqlWriter.scala 
b/hoodie-spark/src/main/scala/com/uber/hoodie/HoodieSparkSqlWriter.scala
index cf44e09..414cad4 100644
--- a/hoodie-spark/src/main/scala/com/uber/hoodie/HoodieSparkSqlWriter.scala
+++ b/hoodie-spark/src/main/scala/com/uber/hoodie/HoodieSparkSqlWriter.scala
@@ -84,10 +84,7 @@ private[hoodie] object HoodieSparkSqlWriter {
 log.info(s"Registered avro schema : ${schema.toString(true)}")
 
 // Convert to RDD[HoodieRecord]
-val keyGenerator = DataSourceUtils.createKeyGenerator(
-  parameters(KEYGENERATOR_CLASS_OPT_KEY),
-  toProperties(parameters)
-)
+val keyGenerator = 
DataSourceUtils.createKeyGenerator(toProperties(parameters))
 val genericRecords: RDD[GenericRecord] = AvroConversionUtils.createRdd(df, 
structName, nameSpace)
 val hoodieAllIncomingRecords = genericRecords.map(gr => {
   val orderingVal = DataSourceUtils.getNestedFieldValAsString(
diff --git 
a/hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/DeltaSync.java
 
b/hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/DeltaSync.java
index 89e5c73..00d270b 100644
--- 
a/hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/DeltaSync.java
+++ 
b/hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/DeltaSync.java
@@ -171,7 +171,7 @@ public class DeltaSync implements Serializable {
 refreshTimeline();
 
 this.transformer = UtilHelpers.createTransformer(cfg.transformerClassName);
-this.keyGenerator = 
DataSourceUtils.createKeyGenerator(cfg.keyGeneratorClass, props);
+this.keyGenerator = DataSourceUtils.createKeyGenerator(props);
 
 this.formatAdapter = new 
SourceFormatAdapter(UtilHelpers.createSource(cfg.sourceClassName, props, jssc,
 sparkSession, schemaProvider));
diff --git 
a/hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/HoodieDeltaStreamer.java
 
b/hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/HoodieDeltaStreamer.java
index c49f3f8..1951546 100644
--- 
a/hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/HoodieDeltaStreamer.java
+++ 
b/hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/HoodieDeltaStreamer.java
@@ -27,7 +27,6 @@ import com.beust.jcommander.ParameterException;
 import com.google.common.base.Preconditions;
 import com.uber.hoodie.HoodieWriteClient;
 import com.uber.hoodie.OverwriteWithLatestAvroPayload;
-import com.uber.hoodie.SimpleKeyGenerator;

[GitHub] [incubator-hudi] n3nash commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-07-12 Thread GitBox

n3nash commented on issue #770: remove com.databricks:spark-avro to build spark 
avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-510983462
 
 
   @cdmikechen Have you tried running the demo steps to ensure these changes 
work fine ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #771: fix error: java.lang.IllegalArgumentException: Can not create a Path from an empty string

2019-07-12 Thread GitBox

n3nash commented on a change in pull request #771: fix error: 
java.lang.IllegalArgumentException: Can not create a Path from an empty string
URL: https://github.com/apache/incubator-hudi/pull/771#discussion_r303093241
 
 

 ##
 File path: 
hoodie-common/src/main/java/com/uber/hoodie/common/table/view/AbstractTableFileSystemView.java
 ##
 @@ -216,7 +218,9 @@ private void ensurePartitionLoadedCorrectly(String 
partition) {
   log.info("Building file system view for partition (" + 
partitionPathStr + ")");
 
   // Create the path if it does not exist already
-  Path partitionPath = 
FSUtils.getPartitionPath(metaClient.getBasePath(), partitionPathStr);
 
 Review comment:
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #779: HoodieDeltaStreamer may insert duplicate record?

2019-07-12 Thread GitBox

vinothchandar commented on issue #779: HoodieDeltaStreamer may insert duplicate 
record?
URL: https://github.com/apache/incubator-hudi/issues/779#issuecomment-510967073
 
 
   @eisig are you testing on master? what version are you using? 
   
   Given you are disabling compaction, not sure how ro and rt views match, 
since records would have made their way from log to parquet files without 
compaction. I am suspecting somehow you end up using `--op INSERT` or `--op 
BULK_INSERT` instead of UPSERT.  I don't see this option in your command 
above.. 
   
   Can you list the .hoodie folder and a partition for me, so I can look at 
what files you have underneath?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #779: HoodieDeltaStreamer may insert duplicate record?

2019-07-12 Thread GitBox

vinothchandar commented on issue #779: HoodieDeltaStreamer may insert duplicate 
record?
URL: https://github.com/apache/incubator-hudi/issues/779#issuecomment-510960909
 
 
   @bvaradar is this related to #775  ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar closed issue #784: Can Hudi delete records?

2019-07-12 Thread GitBox

vinothchandar closed issue #784: Can Hudi delete records?
URL: https://github.com/apache/incubator-hudi/issues/784
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #784: Can Hudi delete records?

2019-07-12 Thread GitBox

vinothchandar commented on issue #784: Can Hudi delete records?
URL: https://github.com/apache/incubator-hudi/issues/784#issuecomment-510960314
 
 
   yes . you can use the `EmptyRecordPayload` as in #635 to perform hard 
deletes. upsert with null for soft deletes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-12 Thread GitBox

vinothchandar edited a comment on issue #714: Performance Comparison of 
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-510959002
 
 
   This indicates general spark shuffle failures.. I'd suggest first running it 
in a larger say 20 executor cluster first, and then start shrinking. 
   
   >>tage 2 is showing that the input size is 1888.8 MB while stage 21 its 
showing 6.6 GB
   
   That expansion is just for the index checking operation. output table will 
not be 6.6GB 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-12 Thread GitBox

vinothchandar commented on issue #714: Performance Comparison of 
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-510959002
 
 
   This indicates general spark shuffle failures.. I'd suggest first running it 
in a larger say 20 executor cluster first, and then start shrinking. 
   
   >>tage 2 is showing that the input size is 1888.8 MB while stage 21 its 
showing 6.6 GB
   That expansion is just for the index checking operation. output table will 
not be 6.6GB 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar merged pull request #778: Fixed TableNotFoundException when write with structured streaming

2019-07-12 Thread GitBox

vinothchandar merged pull request #778: Fixed TableNotFoundException when write 
with structured streaming
URL: https://github.com/apache/incubator-hudi/pull/778
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[incubator-hudi] branch master updated: Fixed TableNotFoundException when write with structured streaming (#778)

2019-07-12 Thread vinoth

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 11c4121  Fixed TableNotFoundException when write with structured 
streaming (#778)
11c4121 is described below

commit 11c4121f739d1d00a4ec66b4e243e47602d6ffb4
Author: Ho Tien Vu 
AuthorDate: Sat Jul 13 00:17:16 2019 +0800

Fixed TableNotFoundException when write with structured streaming (#778)

- When write to a new hoodie table, if checkpoint dir is under target path, 
Spark will create the base path and thus skip initializing .hoodie which result 
in error

- apply .hoodie existent check for all save mode
---
 .../src/main/scala/com/uber/hoodie/HoodieSparkSqlWriter.scala  | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/hoodie-spark/src/main/scala/com/uber/hoodie/HoodieSparkSqlWriter.scala 
b/hoodie-spark/src/main/scala/com/uber/hoodie/HoodieSparkSqlWriter.scala
index 35c19aa..cf44e09 100644
--- a/hoodie-spark/src/main/scala/com/uber/hoodie/HoodieSparkSqlWriter.scala
+++ b/hoodie-spark/src/main/scala/com/uber/hoodie/HoodieSparkSqlWriter.scala
@@ -100,23 +100,23 @@ private[hoodie] object HoodieSparkSqlWriter {
 
 val basePath = new Path(parameters("path"))
 val fs = basePath.getFileSystem(sparkContext.hadoopConfiguration)
-var exists = fs.exists(basePath)
+var exists = fs.exists(new Path(basePath, 
HoodieTableMetaClient.METAFOLDER_NAME))
 
 // Handle various save modes
 if (mode == SaveMode.ErrorIfExists && exists) {
-  throw new HoodieException(s"basePath ${basePath} already exists.")
+  throw new HoodieException(s"hoodie dataset at $basePath already exists.")
 }
 if (mode == SaveMode.Ignore && exists) {
-  log.warn(s" basePath ${basePath} already exists. Ignoring & not 
performing actual writes.")
+  log.warn(s"hoodie dataset at $basePath already exists. Ignoring & not 
performing actual writes.")
   return (true, None)
 }
 if (mode == SaveMode.Overwrite && exists) {
-  log.warn(s" basePath ${basePath} already exists. Deleting existing data 
& overwriting with new data.")
+  log.warn(s"hoodie dataset at $basePath already exists. Deleting existing 
data & overwriting with new data.")
   fs.delete(basePath, true)
   exists = false
 }
 
-// Create the dataset if not present (APPEND mode)
+// Create the dataset if not present
 if (!exists) {
   HoodieTableMetaClient.initTableType(sparkContext.hadoopConfiguration, 
path.get, storageType,
 tblName.get, "archived")

[GitHub] [incubator-hudi] eisig edited a comment on issue #779: HoodieDeltaStreamer may install duplicate record?

2019-07-12 Thread GitBox

eisig edited a comment on issue #779: HoodieDeltaStreamer may install duplicate 
record?
URL: https://github.com/apache/incubator-hudi/issues/779#issuecomment-510828768
 
 
I have restart the job several times,
   and add  --disable-compaction
   Ohter results seems wrong.
   
   ```
   select count(*) count1, count(distinct id) count2 from 
hive200.test.t_order_mor03_rt
   select count(*) count3, count(distinct id) count4 from 
hive200.test.t_order_mor03 
   ```
   count1 == count3
   count2 == count4
   count1 != count2 and count3 != count4
   
   ```
   select (select max(_hoodie_commit_time) from hive200.test.t_order_mor03),
   (select max(_hoodie_commit_time) from hive200.test.t_order_mor03_rt)
   ```
   _hoodie_commit_time are always the some.
   
   ```
   select count(*) count
   from  hive200.test.t_order_mor03_rt rt 
join hive200.test.t_order_mor03 ro
   on ro.id = rt.id
   where rt.modify_date!=ro.modify_date
   ```
   count is going up


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] hotienvu commented on issue #778: Fixed TableNotFoundException when write with structured streaming

2019-07-12 Thread GitBox

hotienvu commented on issue #778: Fixed TableNotFoundException when write with 
structured streaming
URL: https://github.com/apache/incubator-hudi/pull/778#issuecomment-510830795
 
 
   @vinothchandar commits squashed. thanks for looking into this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] eisig edited a comment on issue #779: HoodieDeltaStreamer may install duplicate record?

2019-07-12 Thread GitBox

eisig edited a comment on issue #779: HoodieDeltaStreamer may install duplicate 
record?
URL: https://github.com/apache/incubator-hudi/issues/779#issuecomment-510828768
 
 
I have restart the job several times,
   and add  --disable-compactionother .
   Ohter results seems wrong.
   
   ```
   select count(*) count1, count(distinct id) count2 from 
hive200.test.t_order_mor03_rt
   select count(*) count3, count(distinct id) count4 from 
hive200.test.t_order_mor03 
   ```
   count1 == count3
   count2 == count4
   count1 != count2 and count3 != count4
   
   ```
   select (select max(_hoodie_commit_time) from hive200.test.t_order_mor03),
   (select max(_hoodie_commit_time) from hive200.test.t_order_mor03_rt)
   ```
   _hoodie_commit_time are always the some value.
   
   ```
   select count(*) count
   from  hive200.test.t_order_mor03_rt rt 
join hive200.test.t_order_mor03 ro
   on ro.id = rt.id
   where rt.modify_date!=ro.modify_date
   ```
   count is going up


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] eisig commented on issue #779: HoodieDeltaStreamer may install duplicate record?

2019-07-12 Thread GitBox

eisig commented on issue #779: HoodieDeltaStreamer may install duplicate record?
URL: https://github.com/apache/incubator-hudi/issues/779#issuecomment-510828768
 
 
other  results seems wrong.
   
   ```
   select count(*) count1, count(distinct id) count2 from 
hive200.test.t_order_mor03_rt
   select count(*) count3, count(distinct id) count4 from 
hive200.test.t_order_mor03 
   ```
   count1 == count3
   count2 == count4
   count1 != count2 and count3 != count4
   
   ```
   select (select max(_hoodie_commit_time) from hive200.test.t_order_mor03),
   (select max(_hoodie_commit_time) from hive200.test.t_order_mor03_rt)
   ```
   _hoodie_commit_time are always the some value.
   
   ```
   select count(*) count
   from  hive200.test.t_order_mor03_rt rt 
join hive200.test.t_order_mor03 ro
   on ro.id = rt.id
   where rt.modify_date!=ro.modify_date
   ```
   count is going up


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-12 Thread GitBox

NetsanetGeb edited a comment on issue #714: Performance Comparison of 
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-510818215
 
 
   The failures are: 
   ``` 
   org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output 
location for shuffle 3
at 
org.apache.spark.MapOutputTracker$$anonfun$convertMapStatuses$2.apply(MapOutputTracker.scala:882)
at 
org.apache.spark.MapOutputTracker$$anonfun$convertMapStatuses$2.apply(MapOutputTracker.scala:878)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at 
org.apache.spark.MapOutputTracker$.convertMapStatuses(MapOutputTracker.scala:878)
at 
org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:691)
at 
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:49)
at 
org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:148)
at 
org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:137)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:137)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   ```
   
   In addition, stage 2 is showing that the input size is 1888.8 MB while stage 
21 its showing  6.6 GB.  Is this showing that a total of 6.6 GB is written as a 
hoodie modeled table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] NetsanetGeb commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-12 Thread GitBox

NetsanetGeb commented on issue #714: Performance Comparison of 
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-510818215
 
 
   The failures are: 
   ``` org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output 
location for shuffle 3
at 
org.apache.spark.MapOutputTracker$$anonfun$convertMapStatuses$2.apply(MapOutputTracker.scala:882)
at 
org.apache.spark.MapOutputTracker$$anonfun$convertMapStatuses$2.apply(MapOutputTracker.scala:878)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at 
org.apache.spark.MapOutputTracker$.convertMapStatuses(MapOutputTracker.scala:878)
at 
org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:691)
at 
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:49)
at 
org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:148)
at 
org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:137)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:137)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```
   
   In addition, stage 2 is showing that the input size is 1888.8 MB while stage 
21 its showing  6.6 GB.  Is this showing that a total of 6.6 GB is written as a 
hoodie modeled table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

[GitHub] [incubator-hudi] zhangxinjian123 opened a new issue #784: Can Hudi delete records?

2019-07-12 Thread GitBox

zhangxinjian123 opened a new issue #784: Can Hudi delete records?
URL: https://github.com/apache/incubator-hudi/issues/784
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar opened a new pull request #783: Updating site with latest content from docs folder

2019-07-12 Thread GitBox

vinothchandar opened a new pull request #783: Updating site with latest content 
from docs folder
URL: https://github.com/apache/incubator-hudi/pull/783
 
 
- yotpo usage
- hoodie-utilities-bundle jar replacement in deltastreamer commands


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] thesuperzapper commented on issue #780: Cleanup Maven POM/Classpath

[GitHub] [incubator-hudi] eisig commented on issue #779: HoodieDeltaStreamer may insert duplicate record?

[GitHub] [incubator-hudi] eisig edited a comment on issue #779: HoodieDeltaStreamer may insert duplicate record?

[incubator-hudi] branch asf-site updated: Remove --key-generator-class CLI arg for DeltaStreamer.

[GitHub] [incubator-hudi] n3nash merged pull request #785: Remove --key-generator-class CLI arg for DeltaStreamer

[GitHub] [incubator-hudi] yihua commented on issue #781: [HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and use key generator class specified in datasource properties

[GitHub] [incubator-hudi] vinothchandar commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

[incubator-hudi] branch master updated: [HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and use key generator class specified in datasource properties. (#781)

[GitHub] [incubator-hudi] n3nash commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #771: fix error: java.lang.IllegalArgumentException: Can not create a Path from an empty string

[GitHub] [incubator-hudi] vinothchandar commented on issue #779: HoodieDeltaStreamer may insert duplicate record?

[GitHub] [incubator-hudi] vinothchandar commented on issue #779: HoodieDeltaStreamer may insert duplicate record?

[GitHub] [incubator-hudi] vinothchandar closed issue #784: Can Hudi delete records?

[GitHub] [incubator-hudi] vinothchandar commented on issue #784: Can Hudi delete records?

[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

[GitHub] [incubator-hudi] vinothchandar commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

[GitHub] [incubator-hudi] vinothchandar merged pull request #778: Fixed TableNotFoundException when write with structured streaming

[incubator-hudi] branch master updated: Fixed TableNotFoundException when write with structured streaming (#778)

[GitHub] [incubator-hudi] eisig edited a comment on issue #779: HoodieDeltaStreamer may install duplicate record?

[GitHub] [incubator-hudi] hotienvu commented on issue #778: Fixed TableNotFoundException when write with structured streaming

[GitHub] [incubator-hudi] eisig edited a comment on issue #779: HoodieDeltaStreamer may install duplicate record?

[GitHub] [incubator-hudi] eisig commented on issue #779: HoodieDeltaStreamer may install duplicate record?

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

[GitHub] [incubator-hudi] NetsanetGeb commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

[GitHub] [incubator-hudi] zhangxinjian123 opened a new issue #784: Can Hudi delete records?

[GitHub] [incubator-hudi] vinothchandar opened a new pull request #783: Updating site with latest content from docs folder

26 matches

Site Navigation

Mail list logo

Footer information