Build failed in Jenkins: hudi-snapshot-deployment-0.5 #211

2020-03-08 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.38 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-08 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r389449973
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.SerializableConfiguration;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTimeline;
+import org.apache.hudi.common.table.TableFileSystemView;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameWriter;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.execution.datasources.DataSource;
+
+import scala.Tuple2;
+import scala.collection.JavaConversions;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Export the latest records of Hudi dataset to a set of external files (e.g., 
plain parquet files).
+ */
+
+public class HoodieSnapshotExporter {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieSnapshotExporter.class);
+
+  public static class Config implements Serializable {
+@Parameter(names = {"--source-base-path"}, description = "Base path for 
the source Hudi dataset to be snapshotted", required = true)
+String sourceBasePath = null;
+
+@Parameter(names = {"--target-base-path"}, description = "Base path for 
the target output files (snapshots)", required = true)
+String targetOutputPath = null;
+
+@Parameter(names = {"--snapshot-prefix"}, description = "Snapshot prefix 
or directory under the target base path in order to segregate different 
snapshots")
 
 Review comment:
   I see there're some gaps...this param `snapshot-prefix` is meant to let 
users export to a specific output directory. For example, 
--target-base-path=/mytable/ --snapshot-prefix=2020/03/03 and the output data 
will reside in `/mytable/2020/03/03/`. After removing --snapshot-prefix, the 
users will simply set --target-base-path=/mytable/2020/03/03/. This is a 
redundant param that should be removed from the RFC document too.
   
   It is not meant to use on --source-base-path. Users will give the right base 
path to the hudi dataset to export like --source-base-path=/myhuditable/


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-08 Thread GitBox
OpenOpened commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot 
Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#issuecomment-596311930
 
 
   @xushiyan `@experimental` annotation doesn't seem to exist in jdk8, so I 
just added comments for HoodieSnapshotExporter.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1386: [HUDI-676] Address issues towards removing use of WIP Disclaimer

2020-03-08 Thread GitBox
codecov-io edited a comment on issue #1386: [HUDI-676] Address issues towards 
removing use of WIP Disclaimer
URL: https://github.com/apache/incubator-hudi/pull/1386#issuecomment-596163892
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1386?src=pr=h1) 
Report
   > Merging 
[#1386](https://codecov.io/gh/apache/incubator-hudi/pull/1386?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/415882f9023795994e9cc8a8294909bbec7ab191?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1386/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1386?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1386   +/-   ##
   =
 Coverage 67.19%   67.19%   
 Complexity  223  223   
   =
 Files   335  335   
 Lines 1627916279   
 Branches   1661 1661   
   =
 Hits  1093910939   
   - Misses 4604 4605+1 
   + Partials736  735-1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1386?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1386/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `25% <0%> (-50%)` | `0% <0%> (ø)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1386/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0%> (-5.56%)` | `0% <0%> (ø)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1386/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `76.92% <0%> (+0.96%)` | `0% <0%> (ø)` | :arrow_down: |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1386/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `80.8% <0%> (+1.01%)` | `8% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1386?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1386?src=pr=footer).
 Last update 
[415882f...ac2ff6d](https://codecov.io/gh/apache/incubator-hudi/pull/1386?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-08 Thread GitBox
OpenOpened commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r389444912
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.SerializableConfiguration;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTimeline;
+import org.apache.hudi.common.table.TableFileSystemView;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameWriter;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.execution.datasources.DataSource;
+
+import scala.Tuple2;
+import scala.collection.JavaConversions;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Export the latest records of Hudi dataset to a set of external files (e.g., 
plain parquet files).
+ */
+
+public class HoodieSnapshotExporter {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieSnapshotExporter.class);
+
+  public static class Config implements Serializable {
+@Parameter(names = {"--source-base-path"}, description = "Base path for 
the source Hudi dataset to be snapshotted", required = true)
+String sourceBasePath = null;
+
+@Parameter(names = {"--target-base-path"}, description = "Base path for 
the target output files (snapshots)", required = true)
+String targetOutputPath = null;
+
+@Parameter(names = {"--snapshot-prefix"}, description = "Snapshot prefix 
or directory under the target base path in order to segregate different 
snapshots")
 
 Review comment:
   I don't think we can delete this parameter. We rely on the metadata file 
.hoodie metadata in the root directory of the datasource to find things like 
`commitime`, valid parquet files, etc. If you point directly to the folder that 
needs to be exported, such as ROOT/2015/03/16 for the test case, an exception 
will be thrown `Hoodie table not found in path / tmp / junit4184871464195097137 
/ 2015/03/16 / .hoodie`.  I agree with @leesf, modify the comment of 
`--snapshot-perfix`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054622#comment-17054622
 ] 

vinoyang commented on HUDI-662:
---

+1 too

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.
> http://www.apache.org/dev/licensing-howto.html is the comprehensive guide 
> here.
> Previously, we asked about some specific dependencies here
> https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1386: [HUDI-676] Address issues towards removing use of WIP Disclaimer

2020-03-08 Thread GitBox
yanghua commented on a change in pull request #1386: [HUDI-676] Address issues 
towards removing use of WIP Disclaimer
URL: https://github.com/apache/incubator-hudi/pull/1386#discussion_r389437581
 
 

 ##
 File path: DISCLAIMER-STANDARD
 ##
 @@ -0,0 +1,10 @@
+Apache Hudi (incubating) is an effort undergoing incubation
 
 Review comment:
   Done, please review again cc @vinothchandar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-681) Remove the dependency of EmbeddedTimelineService from HoodieReadClient

2020-03-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-681:

Labels: pull-request-available  (was: )

> Remove the dependency of EmbeddedTimelineService from HoodieReadClient
> --
>
> Key: HUDI-681
> URL: https://issues.apache.org/jira/browse/HUDI-681
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: vinoyang
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>
> After decoupling {{HoodieReadClient}} and {{AbstractHoodieClient}}, we can 
> remove the {{EmbeddedTimelineService}} from {{HoodieReadClient}} so that we 
> can remove {{HoodieReadClient}} into hudi-spark module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-08 Thread GitBox
lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset 
not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-596299950
 
 
   Will come back, after #1373, it may needs more time to debuging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong opened a new pull request #1388: [HUDI-681]Remove embeddedTimelineService from HoodieReadClient

2020-03-08 Thread GitBox
hddong opened a new pull request #1388: [HUDI-681]Remove 
embeddedTimelineService from HoodieReadClient
URL: https://github.com/apache/incubator-hudi/pull/1388
 
 
   ## What is the purpose of the pull request
   
   *Remove the `EmbeddedTimelineService` from `HoodieReadClient`, so that we 
can remove `HoodieReadClient` into hudi-spark module*
   
   ## Brief change log
   
 - *Remove embeddedTimelineService from HoodieReadClient*
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1387: [HUDI-674] Rename hudi-hadoop-mr-bundle to hudi-hive-bundle

2020-03-08 Thread GitBox
lamber-ken commented on issue #1387: [HUDI-674] Rename hudi-hadoop-mr-bundle to 
hudi-hive-bundle
URL: https://github.com/apache/incubator-hudi/pull/1387#issuecomment-596297205
 
 
   Hi @bvaradar, reasonable. Keep it WIP untill next release version.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread Suneel Marthi (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054575#comment-17054575
 ] 

Suneel Marthi commented on HUDI-662:


+1 to this. 

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.
> http://www.apache.org/dev/licensing-howto.html is the comprehensive guide 
> here.
> Previously, we asked about some specific dependencies here
> https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054532#comment-17054532
 ] 

Vinoth Chandar edited comment on HUDI-662 at 3/8/20, 10:37 PM:
---

cc [~vinoyang] [~smarthi] This seems like a model we can emulate .. NOTICE is 
supposed to be for transitively carrying over all the NOTICEs from the 
dependencies..

Pasting from ASLV2 

{code}
  (d) If the Work includes a "NOTICE" text file as part of its
  distribution, then any Derivative Works that You distribute must
  include a readable copy of the attribution notices contained
  within such NOTICE file, excluding those notices that do not
  pertain to any part of the Derivative Works, in at least one
  of the following places: within a NOTICE text file distributed
  as part of the Derivative Works; within the Source form or
  documentation, if provided along with the Derivative Works; or,
  within a display generated by the Derivative Works, if and
  wherever such third-party notices normally appear. The contents
  of the NOTICE file are for informational purposes only and
  do not modify the License. You may add Your own attribution
  notices within Derivative Works that You distribute, alongside
  or as an addendum to the NOTICE text from the Work, provided
  that such additional attribution notices cannot be construed
  as modifying the License.
{code}

is our NOTICE file complete in these respects? Previously, [~vbalaji] was 
trying to automate this by concatenating all the NOTICE files from dependencies 
(you may remember the super long NOTICE file that we kept trimming and 
expanding)

We have a scripts/releases/generate_notice.sh which should ideally provide a 
concatenated NOTICE from all jars.. Not sure if its working as intended.. But 
this seems like a good model to try..  

In short, 
- list all the bundled dependencies and source dependencies with licenses in 
LICENSE
- Fix the script and generate a NOTICE based on the dependencies's NOTICE 
files. 

Is it worth raising a LEGAL JIRA to confirm that maven central distribution of 
these bundles do count as a binary distribution and thus we need to attribute 
them? 


was (Author: vc):
cc [~vinoyang] [~smarthi] This seems like a model we can emulate .. NOTICE is 
supposed to be for transitively carrying over all the NOTICEs from the 
dependencies..

Pasting from ASLV2 

{code}
  (d) If the Work includes a "NOTICE" text file as part of its
  distribution, then any Derivative Works that You distribute must
  include a readable copy of the attribution notices contained
  within such NOTICE file, excluding those notices that do not
  pertain to any part of the Derivative Works, in at least one
  of the following places: within a NOTICE text file distributed
  as part of the Derivative Works; within the Source form or
  documentation, if provided along with the Derivative Works; or,
  within a display generated by the Derivative Works, if and
  wherever such third-party notices normally appear. The contents
  of the NOTICE file are for informational purposes only and
  do not modify the License. You may add Your own attribution
  notices within Derivative Works that You distribute, alongside
  or as an addendum to the NOTICE text from the Work, provided
  that such additional attribution notices cannot be construed
  as modifying the License.
{code}

is our NOTICE file complete in these respects? Previously, [~vbalaji] was 
trying to automate this by concatenating all the NOTICE files from dependencies 
(you may remember the super long NOTICE file that we kept trimming and 
expanding)

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.
> http://www.apache.org/dev/licensing-howto.html is the comprehensive guide 
> here.
> Previously, we asked about some specific dependencies here
> https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054532#comment-17054532
 ] 

Vinoth Chandar commented on HUDI-662:
-

cc [~vinoyang] [~smarthi] This seems like a model we can emulate .. NOTICE is 
supposed to be for transitively carrying over all the NOTICEs from the 
dependencies..

Pasting from ASLV2 

{code}
  (d) If the Work includes a "NOTICE" text file as part of its
  distribution, then any Derivative Works that You distribute must
  include a readable copy of the attribution notices contained
  within such NOTICE file, excluding those notices that do not
  pertain to any part of the Derivative Works, in at least one
  of the following places: within a NOTICE text file distributed
  as part of the Derivative Works; within the Source form or
  documentation, if provided along with the Derivative Works; or,
  within a display generated by the Derivative Works, if and
  wherever such third-party notices normally appear. The contents
  of the NOTICE file are for informational purposes only and
  do not modify the License. You may add Your own attribution
  notices within Derivative Works that You distribute, alongside
  or as an addendum to the NOTICE text from the Work, provided
  that such additional attribution notices cannot be construed
  as modifying the License.
{code}

is our NOTICE file complete in these respects? Previously, [~vbalaji] was 
trying to automate this by concatenating all the NOTICE files from dependencies 
(you may remember the super long NOTICE file that we kept trimming and 
expanding)

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.
> http://www.apache.org/dev/licensing-howto.html is the comprehensive guide 
> here.
> Previously, we asked about some specific dependencies here
> https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054530#comment-17054530
 ] 

Vinoth Chandar commented on HUDI-662:
-

Spark follows a similar model

https://github.com/apache/spark/blob/master/LICENSE (calls out reused sources 
grouped by license)
https://github.com/apache/spark/blob/master/LICENSE-binary (calls out licenses 
of bundled dependencies grouped by license)

https://github.com/apache/spark/blob/master/NOTICE (seems to be more about 
calling our specific advisories .. ) 
https://github.com/apache/spark/blob/master/NOTICE-binary (seems to list 
homepage/license location of various bundled dependencies and if there are 
specific notices in those)



> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.
> http://www.apache.org/dev/licensing-howto.html is the comprehensive guide 
> here.
> Previously, we asked about some specific dependencies here
> https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-662:

Description: 
https://www.apache.org/legal/resolved.html is the comprehensive guide here.
http://www.apache.org/dev/licensing-howto.html is the comprehensive guide here.


Previously, we asked about some specific dependencies here
https://issues.apache.org/jira/browse/LEGAL-461



  was:
https://www.apache.org/legal/resolved.html is the comprehensive guide here.
http://www.apache.org/dev/licensing-howto.html





> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.
> http://www.apache.org/dev/licensing-howto.html is the comprehensive guide 
> here.
> Previously, we asked about some specific dependencies here
> https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054528#comment-17054528
 ] 

Vinoth Chandar commented on HUDI-662:
-

Skywalking also has source reuses as 
https://github.com/apache/skywalking/blob/master/LICENSE


> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.
> http://www.apache.org/dev/licensing-howto.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-662:

Description: 
https://www.apache.org/legal/resolved.html is the comprehensive guide here.
http://www.apache.org/dev/licensing-howto.html




  was:
https://www.apache.org/legal/resolved.html is the comprehensive guide here.




> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.
> http://www.apache.org/dev/licensing-howto.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on issue #1387: [HUDI-674] Rename hudi-hadoop-mr-bundle to hudi-hive-bundle

2020-03-08 Thread GitBox
bvaradar commented on issue #1387: [HUDI-674] Rename hudi-hadoop-mr-bundle to 
hudi-hive-bundle
URL: https://github.com/apache/incubator-hudi/pull/1387#issuecomment-596256048
 
 
   I went through the relevant DISCUSS thread. @lamber-ken : I strongly feel we 
should  at the least wait for next release for this change. I agree with 
changing hudi-hive to hudi-hive-sync. But changing hudi-hadoop-mr to hudi-hive 
in the same release would be very confusing to users who are using previous 
versions of hudi in production. Let me know your thoughts.
   
cc @vinothchandar  
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-08 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-662:

Description: 
https://www.apache.org/legal/resolved.html is the comprehensive guide here.



> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>
> https://www.apache.org/legal/resolved.html is the comprehensive guide here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] smarthi commented on a change in pull request #1386: [HUDI-676] Address issues towards removing use of WIP Disclaimer

2020-03-08 Thread GitBox
smarthi commented on a change in pull request #1386: [HUDI-676] Address issues 
towards removing use of WIP Disclaimer
URL: https://github.com/apache/incubator-hudi/pull/1386#discussion_r389391341
 
 

 ##
 File path: DISCLAIMER-STANDARD
 ##
 @@ -0,0 +1,10 @@
+Apache Hudi (incubating) is an effort undergoing incubation
 
 Review comment:
   +1 - please change to just DISCLAIMER.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-646) Re-enable TestUpdateSchemaEvolution after triaging weird CI issue

2020-03-08 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-646:
---

Assignee: lamber-ken  (was: Vinoth Chandar)

> Re-enable TestUpdateSchemaEvolution after triaging weird CI issue
> -
>
> Key: HUDI-646
> URL: https://issues.apache.org/jira/browse/HUDI-646
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Testing
>Reporter: Vinoth Chandar
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/incubator-hudi/pull/1346/commits/5b20891619380a66e2a62c9e57fb28c4f5ed948b
>  undo this
> {code}
> Job aborted due to stage failure: Task 7 in stage 1.0 failed 1 times, most 
> recent failure: Lost task 7.0 in stage 1.0 (TID 15, localhost, executor 
> driver): org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file 
> file:/tmp/junit3406952253616234024/2016/01/31/f1-0_7-0-7_100.parquet
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
>   at 
> org.apache.hudi.common.util.ParquetUtils.readAvroRecords(ParquetUtils.java:190)
>   at 
> org.apache.hudi.client.TestUpdateSchemaEvolution.lambda$testSchemaEvolutionOnUpdate$dfb2f24e$1(TestUpdateSchemaEvolution.java:123)
>   at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1334)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.UnsupportedOperationException: Byte-buffer read 
> unsupported by input stream
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:146)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
>   at 
> org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81)
>   at 
> org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90)
>   at 
> org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:127)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
>   ... 29 more
> {code}
> Only happens on travis. Locally succeeded 

[GitHub] [incubator-hudi] vinothchandar commented on issue #1384: [SUPPORT] Hudi datastore missing updates for many records

2020-03-08 Thread GitBox
vinothchandar commented on issue #1384: [SUPPORT] Hudi datastore missing 
updates for many records
URL: https://github.com/apache/incubator-hudi/issues/1384#issuecomment-596226463
 
 
   Hmmm the datasource does fail the commit if there are such errors..
   
   ```
} else {
 log.error(s"$operation failed with $errorCount errors :")
 if (log.isTraceEnabled) {
   log.trace("Printing out the top 100 errors")
   writeStatuses.rdd.filter(ws => ws.hasErrors)
 .take(100)
 .foreach(ws => {
   log.trace("Global error :", ws.getGlobalError)
   if (ws.getErrors.size() > 0) {
 ws.getErrors.foreach(kt =>
   log.trace(s"Error for key: ${kt._1}", kt._2))
   }
 })
 }
 false
   }
   ``` 
   
   In any case, having some information on workload,  MOR vs COW and % of 
missing records would help debug more.. Did you also have the issue on 0.4.7? 
or only after you upgraded to 0.5.1? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1386: [HUDI-676] Address issues towards removing use of WIP Disclaimer

2020-03-08 Thread GitBox
vinothchandar commented on a change in pull request #1386: [HUDI-676] Address 
issues towards removing use of WIP Disclaimer
URL: https://github.com/apache/incubator-hudi/pull/1386#discussion_r389386255
 
 

 ##
 File path: DISCLAIMER-STANDARD
 ##
 @@ -0,0 +1,10 @@
+Apache Hudi (incubating) is an effort undergoing incubation
 
 Review comment:
   should it be just named `DISCLAIMER` or does a doc explicity want it named 
`DISCLAIMER-STANDARD`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-08 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054466#comment-17054466
 ] 

Vinoth Chandar commented on HUDI-677:
-

Sounds good . This is ultra critical for the project.. So please keep me in the 
loop :)  

 

 

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: hong dongdong
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-08 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-677:
---

Assignee: hong dongdong  (was: Vinoth Chandar)

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: hong dongdong
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1387: [HUDI-674] Rename hudi-hadoop-mr-bundle to hudi-hive-bundle

2020-03-08 Thread GitBox
vinothchandar commented on issue #1387: [HUDI-674] Rename hudi-hadoop-mr-bundle 
to hudi-hive-bundle
URL: https://github.com/apache/incubator-hudi/pull/1387#issuecomment-596225503
 
 
   @bvaradar can you take a pass at this? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch release-0.5.2 updated: [HUDI-581] NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files (#1354)

2020-03-08 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch release-0.5.2
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/release-0.5.2 by this push:
 new e9f114f  [HUDI-581] NOTICE need more work as it missing content form 
included 3rd party ALv2 licensed NOTICE files (#1354)
e9f114f is described below

commit e9f114f3dd8673b8b66b024e62cc1e3e808e7ea9
Author: Suneel Marthi 
AuthorDate: Sat Mar 7 22:08:35 2020 -0500

[HUDI-581] NOTICE need more work as it missing content form included 3rd 
party ALv2 licensed NOTICE files (#1354)

* [HUDI-581] - Add 3rd party library NOTICE

* [HUDI-581]: NOTICE need more work as it missing content form included 3rd 
party ALv2 licensed NOTICE files
---
 NOTICE | 83 +-
 1 file changed, 82 insertions(+), 1 deletion(-)

diff --git a/NOTICE b/NOTICE
index c2961ac..c0469fa 100644
--- a/NOTICE
+++ b/NOTICE
@@ -1,5 +1,86 @@
 Apache Hudi (incubating)
-Copyright 2020 The Apache Software Foundation
+Copyright 2019 and onwards The Apache Software Foundation
 
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).
+
+This project bundles the following dependencies
+
+
+Metrics
+Copyright 2010-2013 Coda Hale and Yammer, Inc.
+
+This product includes software developed by Coda Hale and Yammer, Inc.
+
+-
+Guava
+Copyright (C) 2007 The Guava Authors
+
+Licensed under the Apache License, Version 2.0
+
+-
+Kryo (https://github.com/EsotericSoftware/kryo)
+Copyright (c) 2008-2018, Nathan Sweet All rights reserved.
+
+Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the
+following conditions are met:
+
+Redistributions of source code must retain the above copyright notice, this 
list of conditions and the following disclaimer.
+Redistributions in binary form must reproduce the above copyright notice, this 
list of conditions and the following disclaimer in the documentation and/or 
other materials provided with the distribution.
+
+Neither the name of Esoteric Software nor the names of its contributors may be 
used to endorse or promote products derived from this software without specific 
prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 
SERVICES; LOSS OF USE, DATA, OR PROF [...]
+
+
+Jackson JSON Processor
+
+This copy of Jackson JSON processor streaming parser/generator is licensed 
under the
+Apache (Software) License, version 2.0 ("the License").
+See the License for details about distribution rights, and the
+specific rights regarding derivate works.
+
+You may obtain a copy of the License at:
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+--
+
+Gson
+Copyright 2008 Google Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+
+= Apache Hadoop 2.8.5 =
+Apache Hadoop
+Copyright 2009-2017 The Apache Software Foundation
+
+= Apache Hive 2.3.1 =
+Apache Hive
+Copyright 2008-2017 The Apache Software Foundation
+
+= Apache Spark 2.4.4 =
+Apache Spark
+Copyright 2014 and onwards The Apache Software Foundation
+
+= Apache Kafka 2.0.0 =
+Apache Kafka
+Copyright 2020 The Apache Software Foundation.
+
+= Apache HBase 1.2.3 =
+Apache HBase
+Copyright 2007-2019 The Apache Software Foundation.
+
+= Apache Avro 1.8.2 =
+Apache Avro
+Copyright 2010-2019 The Apache Software Foundation.
\ No newline at end of file



[GitHub] [incubator-hudi] xushiyan commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-08 Thread GitBox
xushiyan commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot 
Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#issuecomment-596201573
 
 
   @OpenOpened To summarize from the review comments, could you make these 
changes as last round of changes? thanks
   1. remove `--snapshot-prefix`
   2. remove `@deprecated` from javadoc
   3. add `@experimental` to the new class javadoc
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-08 Thread hong dongdong (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054391#comment-17054391
 ] 

hong dongdong commented on HUDI-677:


yep,I will try with this issue.

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-08 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r389365583
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.SerializableConfiguration;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTimeline;
+import org.apache.hudi.common.table.TableFileSystemView;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameWriter;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.execution.datasources.DataSource;
+
+import scala.Tuple2;
+import scala.collection.JavaConversions;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Export the latest records of Hudi dataset to a set of external files (e.g., 
plain parquet files).
+ */
+
+public class HoodieSnapshotExporter {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieSnapshotExporter.class);
+
+  public static class Config implements Serializable {
+@Parameter(names = {"--source-base-path"}, description = "Base path for 
the source Hudi dataset to be snapshotted", required = true)
+String sourceBasePath = null;
+
+@Parameter(names = {"--target-base-path"}, description = "Base path for 
the target output files (snapshots)", required = true)
+String targetOutputPath = null;
+
+@Parameter(names = {"--snapshot-prefix"}, description = "Snapshot prefix 
or directory under the target base path in order to segregate different 
snapshots")
 
 Review comment:
   @leesf thanks for catching this. I missed this one: this param is meant to 
segregate output not input. It was meant to be used in case for multiple 
exports with the same target path but wanted to be separated from each other 
(e.g., due to different export date). It actually overlaps with the target base 
path; users can simply change the target base paths to achieve this. So in 
conclusion we can just remove this param. @OpenOpened sounds good?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-08 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r389365076
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.SerializableConfiguration;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTimeline;
+import org.apache.hudi.common.table.TableFileSystemView;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.DataFrameWriter;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.execution.datasources.DataSource;
+
+import scala.Tuple2;
+import scala.collection.JavaConversions;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Export the latest records of Hudi dataset to a set of external files (e.g., 
plain parquet files).
+ */
+
+public class HoodieSnapshotExporter {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieSnapshotExporter.class);
+
+  public static class Config implements Serializable {
+@Parameter(names = {"--source-base-path"}, description = "Base path for 
the source Hudi dataset to be snapshotted", required = true)
+String sourceBasePath = null;
+
+@Parameter(names = {"--target-base-path"}, description = "Base path for 
the target output files (snapshots)", required = true)
+String targetOutputPath = null;
+
+@Parameter(names = {"--snapshot-prefix"}, description = "Snapshot prefix 
or directory under the target base path in order to segregate different 
snapshots")
+String snapshotPrefix;
+
+@Parameter(names = {"--output-format"}, description = "e.g. Hudi or 
Parquet", required = true)
+String outputFormat;
+
+@Parameter(names = {"--output-partition-field"}, description = "A field to 
be used by Spark repartitioning")
+String outputPartitionField;
+  }
+
+  public int export(SparkSession spark, Config cfg) throws IOException {
+JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
+FileSystem fs = FSUtils.getFs(cfg.sourceBasePath, 
jsc.hadoopConfiguration());
+
+final SerializableConfiguration serConf = new 
SerializableConfiguration(jsc.hadoopConfiguration());
+final HoodieTableMetaClient tableMetadata = new 
HoodieTableMetaClient(fs.getConf(), cfg.sourceBasePath);
+final TableFileSystemView.BaseFileOnlyView fsView = new 
HoodieTableFileSystemView(tableMetadata,
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants());
+// Get the latest commit
+Option latestCommit =
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+if (!latestCommit.isPresent()) {
+  LOG.error("No commits present. Nothing to snapshot");
+  return -1;
+}
+final String latestCommitTimestamp = latestCommit.get().getTimestamp();
 
 Review comment:
   Per the RFC, we aim to just get the latest commit time.


This 

[jira] [Closed] (HUDI-409) Replace Log Magic header with a secure hash to avoid clashes with data

2020-03-08 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-409.
--
Resolution: Fixed

Fixed via master: 9d46ce380a3929605b3838238e8aa07a9918ab7a

> Replace Log Magic header with a secure hash to avoid clashes with data
> --
>
> Key: HUDI-409
> URL: https://issues.apache.org/jira/browse/HUDI-409
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Ramachandran M S
>Priority: Major
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-682) Move HoodieReadClient into hudi-spark module

2020-03-08 Thread vinoyang (Jira)
vinoyang created HUDI-682:
-

 Summary: Move HoodieReadClient into hudi-spark module
 Key: HUDI-682
 URL: https://issues.apache.org/jira/browse/HUDI-682
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Code Cleanup
Reporter: vinoyang
Assignee: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-681) Remove the dependency of EmbeddedTimelineService from HoodieReadClient

2020-03-08 Thread vinoyang (Jira)
vinoyang created HUDI-681:
-

 Summary: Remove the dependency of EmbeddedTimelineService from 
HoodieReadClient
 Key: HUDI-681
 URL: https://issues.apache.org/jira/browse/HUDI-681
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Code Cleanup
Reporter: vinoyang
Assignee: hong dongdong


After decoupling {{HoodieReadClient}} and {{AbstractHoodieClient}}, we can 
remove the {{EmbeddedTimelineService}} from {{HoodieReadClient}} so that we can 
remove {{HoodieReadClient}} into hudi-spark module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-08 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054339#comment-17054339
 ] 

vinoyang commented on HUDI-677:
---

[~vinoth], [~hongdongdong] want to give a try with this issue, I will discuss 
with him.

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)