[GitHub] [incubator-hudi] yihua commented on issue #1165: [HUDI-76][WIP] Add CSV Source support for Hudi Delta Streamer

2019-12-31 Thread GitBox
yihua commented on issue #1165: [HUDI-76][WIP] Add CSV Source support for Hudi 
Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-570023155
 
 
   @vinothchandar I'm adding unit test cases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1165: [HUDI-76][WIP] Add CSV Source support for Hudi Delta Streamer

2019-12-31 Thread GitBox
vinothchandar commented on issue #1165: [HUDI-76][WIP] Add CSV Source support 
for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-570022857
 
 
   is this still WIP? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce 
configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362302100
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/table/UserDefinedBulkInsertPartitioner.java
 ##
 @@ -31,4 +31,6 @@
 public interface UserDefinedBulkInsertPartitioner {
 
   JavaRDD> repartitionRecords(JavaRDD> 
records, int outputSparkPartitions);
+
+  boolean arePartitionRecordsSorted();
 
 Review comment:
   Yeah, will do.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce 
configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362302093
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/func/bulkinsert/NonSortPartitioner.java
 ##
 @@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.func.bulkinsert;
+
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.spark.api.java.JavaRDD;
+
+public class NonSortPartitioner
 
 Review comment:
   Yes, will rename.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce 
configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362302090
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/func/bulkinsert/BulkInsertInternalPartitioner.java
 ##
 @@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.func.bulkinsert;
+
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.table.UserDefinedBulkInsertPartitioner;
+
+public abstract class BulkInsertInternalPartitioner implements
+UserDefinedBulkInsertPartitioner {
 
 Review comment:
   Yeah.  I was also wondering about the custom implementation of 
`UserDefinedBulkInsertPartitioner` in my previous comment..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce 
configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362302051
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/func/CopyOnWriteInsertHandler.java
 ##
 @@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.func;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hudi.WriteStatus;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer;
+import org.apache.hudi.config.HoodieWriteConfig;
+import 
org.apache.hudi.func.CopyOnWriteLazyInsertIterable.HoodieInsertValueGenResult;
+import org.apache.hudi.io.HoodieCreateHandle;
+import org.apache.hudi.io.HoodieWriteHandle;
+import org.apache.hudi.table.HoodieTable;
+
+/**
+ * Consumes stream of hoodie records from in-memory queue and writes to one or 
more create-handles.
 
 Review comment:
   Will do.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce 
configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362302021
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##
 @@ -367,20 +370,30 @@ public static SparkConf registerClasses(SparkConf conf) {
 }
   }
 
+  private BulkInsertMapFunction getBulkInsertMapFunction(
+  boolean isSorted, String commitTime, HoodieWriteConfig config, 
HoodieTable hoodieTable,
+  List fileIDPrefixes) {
+if (isSorted) {
+  return new BulkInsertMapFunctionForSortedRecords(
+  commitTime, config, hoodieTable, fileIDPrefixes);
+}
+return new BulkInsertMapFunctionForNonSortedRecords(
+commitTime, config, hoodieTable, fileIDPrefixes);
+  }
+
   private JavaRDD bulkInsertInternal(JavaRDD> 
dedupedRecords, String commitTime,
   HoodieTable table, Option 
bulkInsertPartitioner) {
 
 Review comment:
   I thought `UserDefinedBulkInsertPartitioner` is user-facing so I didn't 
touch the interface.  Looking at the source code again, I find that 
`UserDefinedBulkInsertPartitioner` is only used here.  Then we can just keep a 
single one as `BulkInsertInternalPartitioner `.
   
   BTW, is this interface (`UserDefinedBulkInsertPartitioner`) intended for 
user to implement a custom partitioner?  I don't seem to see a config to pass 
in a custom implementation.  wondering if we should provide such flexibility. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce 
configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362301880
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -77,6 +77,10 @@
   private static final String DEFAULT_HOODIE_WRITE_STATUS_CLASS = 
WriteStatus.class.getName();
   private static final String FINALIZE_WRITE_PARALLELISM = 
"hoodie.finalize.write.parallelism";
   private static final String DEFAULT_FINALIZE_WRITE_PARALLELISM = 
DEFAULT_PARALLELISM;
+  private static final String BULKINSERT_SORT_ENABLED = 
"hoodie.bulkinsert.sort.enable";
 
 Review comment:
   Yes, one config is better than two for the user to reason.  I'll make the 
change accordingly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299813
 
 

 ##
 File path: newui/content/docs/docs-versions.html
 ##
 @@ -0,0 +1,391 @@
+
+
+  
+
+
+Docs Versions - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/docs/docs-versions.html;>
+
+
+  
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+  
+
+  
+  
+  
+  
+
+
+
+
+  
+
+
+  
+  
+  Toggle menu
+  
+
+  
+
+  Getting Started
+
+
+
+
+  
+
+
+
+
+
+
+
+  Quick 
Start
+
+
+  
+
+
+
+
+
+
+
+  Structure
+
+
+  
+
+
+
+
+
+
+
+  User Cases
+
+
+  
+
+
+
+
+
+
+
+  Talks & Powered 
By
+
+
+  
+
+
+
+
+
+
+
+  Comparison
+
+
+  
+
+
+  
+
+  
+
+  Documentation
+
+
+
+
+  
+
+
+
+
+
+
+
+  Concepts
+
+
+  
+
+
+
+
+
+
+
+  Writing 
Data
+
+
+  
+
+
+
+
+
+
+
+  Querying 
Data
+
+
+  
+
+
+
+
+
+
+
+  Configuration
+
+
+  
+
+
+
+
+
+
+
+  Performance
+
+
+  
+
+
+
+
+
+
+
+  Administering
+
+
+  
+
+
+  
+
+  
+
+  Meta Info
+
+
+
+
+  
+
+
+
+
+
+
+
+  Docs 
Versions
 
 Review comment:
   here is basic support, not automatically. We had talk about this on 
https://lists.apache.org/thread.html/ea4e73fbd6874e6d412a97eed8e581aceed58ed3e54f54c644d5cf27%40%3Cdev.hudi.apache.org%3E
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299813
 
 

 ##
 File path: newui/content/docs/docs-versions.html
 ##
 @@ -0,0 +1,391 @@
+
+
+  
+
+
+Docs Versions - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/docs/docs-versions.html;>
+
+
+  
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+  
+
+  
+  
+  
+  
+
+
+
+
+  
+
+
+  
+  
+  Toggle menu
+  
+
+  
+
+  Getting Started
+
+
+
+
+  
+
+
+
+
+
+
+
+  Quick 
Start
+
+
+  
+
+
+
+
+
+
+
+  Structure
+
+
+  
+
+
+
+
+
+
+
+  User Cases
+
+
+  
+
+
+
+
+
+
+
+  Talks & Powered 
By
+
+
+  
+
+
+
+
+
+
+
+  Comparison
+
+
+  
+
+
+  
+
+  
+
+  Documentation
+
+
+
+
+  
+
+
+
+
+
+
+
+  Concepts
+
+
+  
+
+
+
+
+
+
+
+  Writing 
Data
+
+
+  
+
+
+
+
+
+
+
+  Querying 
Data
+
+
+  
+
+
+
+
+
+
+
+  Configuration
+
+
+  
+
+
+
+
+
+
+
+  Performance
+
+
+  
+
+
+
+
+
+
+
+  Administering
+
+
+  
+
+
+  
+
+  
+
+  Meta Info
+
+
+
+
+  
+
+
+
+
+
+
+
+  Docs 
Versions
 
 Review comment:
   here is a basic support, not automatically.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299813
 
 

 ##
 File path: newui/content/docs/docs-versions.html
 ##
 @@ -0,0 +1,391 @@
+
+
+  
+
+
+Docs Versions - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/docs/docs-versions.html;>
+
+
+  
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+  
+
+  
+  
+  
+  
+
+
+
+
+  
+
+
+  
+  
+  Toggle menu
+  
+
+  
+
+  Getting Started
+
+
+
+
+  
+
+
+
+
+
+
+
+  Quick 
Start
+
+
+  
+
+
+
+
+
+
+
+  Structure
+
+
+  
+
+
+
+
+
+
+
+  User Cases
+
+
+  
+
+
+
+
+
+
+
+  Talks & Powered 
By
+
+
+  
+
+
+
+
+
+
+
+  Comparison
+
+
+  
+
+
+  
+
+  
+
+  Documentation
+
+
+
+
+  
+
+
+
+
+
+
+
+  Concepts
+
+
+  
+
+
+
+
+
+
+
+  Writing 
Data
+
+
+  
+
+
+
+
+
+
+
+  Querying 
Data
+
+
+  
+
+
+
+
+
+
+
+  Configuration
+
+
+  
+
+
+
+
+
+
+
+  Performance
+
+
+  
+
+
+
+
+
+
+
+  Administering
+
+
+  
+
+
+  
+
+  
+
+  Meta Info
+
+
+
+
+  
+
+
+
+
+
+
+
+  Docs 
Versions
 
 Review comment:
   here is basic support, not automatically.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362301167
 
 

 ##
 File path: newui/docs/_data/navigation.yml
 ##
 @@ -0,0 +1,51 @@
+
+# main links
+main:
+  - title: "Documentation"
+url: /docs/quick-start-guide.html
+  - title: "Community"
+url: /community.html
+  - title: "Roadmap"
+url: /roadmap.html
+  - title: "Activities"
+url: /activity.html
+  - title: "FAQ"
+url: https://cwiki.apache.org/confluence/display/HUDI/FAQ
+  - title: "Releases"
+url: /releases.html
+
+# doc links
+docs:
+  - title: Getting Started
+children:
+  - title: "Quick Start"
+url: /docs/quick-start-guide.html
+  - title: "Structure"
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #146

2019-12-31 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.17 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/bin:
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362300955
 
 

 ##
 File path: newui/docs/_data/navigation.yml
 ##
 @@ -0,0 +1,51 @@
+
+# main links
+main:
+  - title: "Documentation"
+url: /docs/quick-start-guide.html
+  - title: "Community"
+url: /community.html
+  - title: "Roadmap"
+url: /roadmap.html
+  - title: "Activities"
+url: /activity.html
 
 Review comment:
   > can we show the apachehudi twitter stream here instead?
   
   Absolutely


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362300662
 
 

 ##
 File path: newui/content/index.html
 ##
 @@ -0,0 +1,233 @@
+
+
+  
+
+
+Welcome to Apache Hudi - Apache 
Hudi
+
+
+
+
+
+
+https://hudi.apache.org/;>
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+
+  Welcome to Apache Hudi
+
+
+Apache Hudi Stands for Hadoop Upserts and 
Incrementals to manage the storage of large analytical datasets on HDFS. 
https://github.com/apache/incubator-hudi/releases/tag/release-0.5.0-incubating;
 target="_blank">Latest release 0.5.0-incubating
+
+
+   Get Starting
+
+  
+
+
+
+  
+
+  Hudi Data Lakes
+
+
+  Hudi brings stream processing to big data, providing fresh data 
while being an order of magnitude efficient over traditional batch 
processing.
+
+
+  
 
 Review comment:
   I think keep it until the new image created. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362300574
 
 

 ##
 File path: newui/content/index.html
 ##
 @@ -0,0 +1,233 @@
+
+
+  
+
+
+Welcome to Apache Hudi - Apache 
Hudi
+
+
+
+
+
+
+https://hudi.apache.org/;>
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+
+  Welcome to Apache Hudi
+
+
+Apache Hudi Stands for Hadoop Upserts and 
Incrementals to manage the storage of large analytical datasets on HDFS. 
https://github.com/apache/incubator-hudi/releases/tag/release-0.5.0-incubating;
 target="_blank">Latest release 0.5.0-incubating
+
+
+   Get Starting
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362300507
 
 

 ##
 File path: newui/content/releases.html
 ##
 @@ -0,0 +1,292 @@
+
+
+  
+
+
+Releases - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/releases;>
+
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+
+
+https://schema.org/Person;>
+
+  
+
+  Quick Links
+
+
+  
+Apache Hudi stands for Hadoop Upserts 
Deletes and Incrementals.
+
+  
+
+  
+
+  
+
+  
+
+   
Documentation
+
+  
+
+  https://cwiki.apache.org/confluence/display/HUDI; 
target="_blank" rel="nofollow noopener noreferrer"> Technical Wiki
+
+  
+
+   Contribute 
Guide
+
+  
+
+  https://join.slack.com/t/apache-hudi/shared_invite/enQtODYyNDAxNzc5MTg2LTE5OTBlYmVhYjM0N2ZhOTJjOWM4YzBmMWU2MjZjMGE4NDc5ZDFiOGQ2N2VkYTVkNzU3ZDQ4OTI1NmFmYWQ0NzE;
 target="_blank" rel="nofollow noopener noreferrer"> Join on Slack
+
+  
+
+  https://github.com/apache/incubator-hudi; 
target="_blank" rel="nofollow noopener noreferrer"> Fork on GitHub
+
+  
+
+  https://issues.apache.org/jira/projects/HUDI/summary; 
target="_blank" rel="nofollow noopener noreferrer"> Report on Issues
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362300487
 
 

 ##
 File path: newui/docs/_config.yml
 ##
 @@ -0,0 +1,218 @@
+# Welcome to Jekyll!
+#
+# This config file is meant for settings that affect your entire site, values
+# which you are expected to set up once and rarely need to edit after that.
+# For technical reasons, this file is *NOT* reloaded automatically when you use
+# `jekyll serve`. If you change this file, please restart the server process.
+
+#
+hudi_style_skin  : "hudi"
+
+version :  "0.5.1-SNAPSHOT"
+
+previous_docs:
+  latest: /docs/quick-start-guide.html
+  0.5.0-incubating: /versions/0.5.0-incubating/docs/quick-start-guide.html
+
+
+# Site Settings
+locale   : "en-US"
+title: "Apache Hudi"
+title_separator  : "-"
+subtitle : *version
+description  : "Apache Hudi Stands for Hadoop Upserts and 
Incrementals to manage the storage of large analytical datasets on HDFS."
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362300478
 
 

 ##
 File path: newui/docs/README.md
 ##
 @@ -0,0 +1,46 @@
+## Site Documentation
+
+This folder contains resources that build the [Apache Hudi 
website](https://hudi.apache.org)
+
+
+### Building docs
+
+The site is based on a [Jekyll](https://jekyllrb.com/) theme hosted 
[here](https://idratherbewriting.com/documentation-theme-jekyll/) with detailed 
instructions.
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362300491
 
 

 ##
 File path: newui/docs/_config.yml
 ##
 @@ -0,0 +1,218 @@
+# Welcome to Jekyll!
+#
+# This config file is meant for settings that affect your entire site, values
+# which you are expected to set up once and rarely need to edit after that.
+# For technical reasons, this file is *NOT* reloaded automatically when you use
+# `jekyll serve`. If you change this file, please restart the server process.
+
+#
+hudi_style_skin  : "hudi"
+
+version :  "0.5.1-SNAPSHOT"
+
+previous_docs:
+  latest: /docs/quick-start-guide.html
+  0.5.0-incubating: /versions/0.5.0-incubating/docs/quick-start-guide.html
+
+
+# Site Settings
+locale   : "en-US"
+title: "Apache Hudi"
+title_separator  : "-"
+subtitle : *version
+description  : "Apache Hudi Stands for Hadoop Upserts and 
Incrementals to manage the storage of large analytical datasets on HDFS."
+url  : https://hudi.apache.org # the base hostname & 
protocol for your site e.g. "https://mmistakes.github.io;
+repository   : "apache/incubator-hudi"
+teaser   : "/assets/images/500x300.png" # path of fallback 
teaser image, e.g. "/assets/images/500x300.png"
+logo : "/assets/images/hudi.png" # path of logo image to 
display in the masthead, e.g. "/assets/images/88x88.png"
+masthead_title   : # overrides the website title displayed in the 
masthead, use " " for no title
+host : 0.0.0.0
+site_url : https://hudi.apache.org
+
+# Site QuickLinks
+author:
+  name : "Quick Links"
+  bio  : "Apache Hudi stands for *Hadoop* *Upserts* *Deletes* and 
*Incrementals*."
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-472) Make sortBy() inside bulkInsertInternal() configurable for bulk_insert

2019-12-31 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-472:
---
Fix Version/s: 0.5.1

> Make sortBy() inside bulkInsertInternal() configurable for bulk_insert
> --
>
> Key: HUDI-472
> URL: https://issues.apache.org/jira/browse/HUDI-472
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Ethan Guo
>Assignee: He ZongPing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362300142
 
 

 ##
 File path: newui/docs/_data/navigation.yml
 ##
 @@ -0,0 +1,51 @@
+
+# main links
+main:
+  - title: "Documentation"
+url: /docs/quick-start-guide.html
+  - title: "Community"
+url: /community.html
+  - title: "Roadmap"
 
 Review comment:
   > High level, I think our goal here should be _strictly_ just match the 
current site as much as possible.. Land this first, have a link from current 
site to this new site.. left a few comments around this.
   > 
   > A week from now, switch to new site and link back to old site..
   > A week from then, retire/remove the old site..
   > Then we can make any content changes on top of the new site..
   
   Hi @vinothchandar, I have a question that where I can place the new site.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-76) CSV Source support for Hudi Delta Streamer

2019-12-31 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-76:
--
Fix Version/s: 0.5.1

> CSV Source support for Hudi Delta Streamer
> --
>
> Key: HUDI-76
> URL: https://issues.apache.org/jira/browse/HUDI-76
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, Incremental Pull
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DeltaStreamer does not have support to pull CSV data from sources (hdfs log 
> files/kafka). THis ticket is to provide support for csv sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-488) Refactor Source classes in hudi-utilities

2019-12-31 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-488:
--

 Summary: Refactor Source classes in hudi-utilities 
 Key: HUDI-488
 URL: https://issues.apache.org/jira/browse/HUDI-488
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Code Cleanup
Reporter: Ethan Guo


There are copy-and-paste code in some of the Source classes due to the current 
class inheritance structure.  Refactoring of this part should make it easier 
and more efficient to create new sources and format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1120: [HUDI-440] Rework the 
hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299813
 
 

 ##
 File path: newui/content/docs/docs-versions.html
 ##
 @@ -0,0 +1,391 @@
+
+
+  
+
+
+Docs Versions - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/docs/docs-versions.html;>
+
+
+  
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+  
+
+  
+  
+  
+  
+
+
+
+
+  
+
+
+  
+  
+  Toggle menu
+  
+
+  
+
+  Getting Started
+
+
+
+
+  
+
+
+
+
+
+
+
+  Quick 
Start
+
+
+  
+
+
+
+
+
+
+
+  Structure
+
+
+  
+
+
+
+
+
+
+
+  User Cases
+
+
+  
+
+
+
+
+
+
+
+  Talks & Powered 
By
+
+
+  
+
+
+
+
+
+
+
+  Comparison
+
+
+  
+
+
+  
+
+  
+
+  Documentation
+
+
+
+
+  
+
+
+
+
+
+
+
+  Concepts
+
+
+  
+
+
+
+
+
+
+
+  Writing 
Data
+
+
+  
+
+
+
+
+
+
+
+  Querying 
Data
+
+
+  
+
+
+
+
+
+
+
+  Configuration
+
+
+  
+
+
+
+
+
+
+
+  Performance
+
+
+  
+
+
+
+
+
+
+
+  Administering
+
+
+  
+
+
+  
+
+  
+
+  Meta Info
+
+
+
+
+  
+
+
+
+
+
+
+
+  Docs 
Versions
 
 Review comment:
   here is a basicly framework, not automatically.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362298973
 
 

 ##
 File path: newui/docs/README.md
 ##
 @@ -0,0 +1,46 @@
+## Site Documentation
+
+This folder contains resources that build the [Apache Hudi 
website](https://hudi.apache.org)
+
+
+### Building docs
+
+The site is based on a [Jekyll](https://jekyllrb.com/) theme hosted 
[here](https://idratherbewriting.com/documentation-theme-jekyll/) with detailed 
instructions.
 
 Review comment:
   fix theme link?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299203
 
 

 ##
 File path: newui/docs/_data/navigation.yml
 ##
 @@ -0,0 +1,51 @@
+
+# main links
+main:
+  - title: "Documentation"
+url: /docs/quick-start-guide.html
+  - title: "Community"
+url: /community.html
+  - title: "Roadmap"
+url: /roadmap.html
+  - title: "Activities"
+url: /activity.html
+  - title: "FAQ"
+url: https://cwiki.apache.org/confluence/display/HUDI/FAQ
+  - title: "Releases"
+url: /releases.html
+
+# doc links
+docs:
+  - title: Getting Started
+children:
+  - title: "Quick Start"
+url: /docs/quick-start-guide.html
+  - title: "Structure"
+url: /docs/structure.html
+  - title: "User Cases"
 
 Review comment:
   typo: use cases


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299071
 
 

 ##
 File path: newui/docs/_config.yml
 ##
 @@ -0,0 +1,218 @@
+# Welcome to Jekyll!
+#
+# This config file is meant for settings that affect your entire site, values
+# which you are expected to set up once and rarely need to edit after that.
+# For technical reasons, this file is *NOT* reloaded automatically when you use
+# `jekyll serve`. If you change this file, please restart the server process.
+
+#
+hudi_style_skin  : "hudi"
+
+version :  "0.5.1-SNAPSHOT"
+
+previous_docs:
+  latest: /docs/quick-start-guide.html
+  0.5.0-incubating: /versions/0.5.0-incubating/docs/quick-start-guide.html
+
+
+# Site Settings
+locale   : "en-US"
+title: "Apache Hudi"
+title_separator  : "-"
+subtitle : *version
+description  : "Apache Hudi Stands for Hadoop Upserts and 
Incrementals to manage the storage of large analytical datasets on HDFS."
 
 Review comment:
   Reword : Apache Hudi ingests & manages storage of large analytical datasets 
over DFS (HDFS or cloud stores).
   
   Please keep the landing page consistent with what we have. We can deal with 
content changes separately


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299144
 
 

 ##
 File path: newui/docs/_data/navigation.yml
 ##
 @@ -0,0 +1,51 @@
+
+# main links
+main:
+  - title: "Documentation"
+url: /docs/quick-start-guide.html
+  - title: "Community"
+url: /community.html
+  - title: "Roadmap"
 
 Review comment:
   lets keep the roadmap on the wiki .. and remove from the site.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299393
 
 

 ##
 File path: newui/content/releases.html
 ##
 @@ -0,0 +1,292 @@
+
+
+  
+
+
+Releases - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/releases;>
+
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+
+
+https://schema.org/Person;>
+
+  
+
+  Quick Links
+
+
+  
+Apache Hudi stands for Hadoop Upserts 
Deletes and Incrementals.
+
+  
+
+  
+
+  
+
+  
+
+   
Documentation
+
+  
+
+  https://cwiki.apache.org/confluence/display/HUDI; 
target="_blank" rel="nofollow noopener noreferrer"> Technical Wiki
+
+  
+
+   Contribute 
Guide
+
+  
+
+  https://join.slack.com/t/apache-hudi/shared_invite/enQtODYyNDAxNzc5MTg2LTE5OTBlYmVhYjM0N2ZhOTJjOWM4YzBmMWU2MjZjMGE4NDc5ZDFiOGQ2N2VkYTVkNzU3ZDQ4OTI1NmFmYWQ0NzE;
 target="_blank" rel="nofollow noopener noreferrer"> Join on Slack
+
+  
+
+  https://github.com/apache/incubator-hudi; 
target="_blank" rel="nofollow noopener noreferrer"> Fork on GitHub
+
+  
+
+  https://issues.apache.org/jira/projects/HUDI/summary; 
target="_blank" rel="nofollow noopener noreferrer"> Report on Issues
 
 Review comment:
   Report Issues? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299109
 
 

 ##
 File path: newui/docs/_config.yml
 ##
 @@ -0,0 +1,218 @@
+# Welcome to Jekyll!
+#
+# This config file is meant for settings that affect your entire site, values
+# which you are expected to set up once and rarely need to edit after that.
+# For technical reasons, this file is *NOT* reloaded automatically when you use
+# `jekyll serve`. If you change this file, please restart the server process.
+
+#
+hudi_style_skin  : "hudi"
+
+version :  "0.5.1-SNAPSHOT"
+
+previous_docs:
+  latest: /docs/quick-start-guide.html
+  0.5.0-incubating: /versions/0.5.0-incubating/docs/quick-start-guide.html
+
+
+# Site Settings
+locale   : "en-US"
+title: "Apache Hudi"
+title_separator  : "-"
+subtitle : *version
+description  : "Apache Hudi Stands for Hadoop Upserts and 
Incrementals to manage the storage of large analytical datasets on HDFS."
+url  : https://hudi.apache.org # the base hostname & 
protocol for your site e.g. "https://mmistakes.github.io;
+repository   : "apache/incubator-hudi"
+teaser   : "/assets/images/500x300.png" # path of fallback 
teaser image, e.g. "/assets/images/500x300.png"
+logo : "/assets/images/hudi.png" # path of logo image to 
display in the masthead, e.g. "/assets/images/88x88.png"
+masthead_title   : # overrides the website title displayed in the 
masthead, use " " for no title
+host : 0.0.0.0
+site_url : https://hudi.apache.org
+
+# Site QuickLinks
+author:
+  name : "Quick Links"
+  bio  : "Apache Hudi stands for *Hadoop* *Upserts* *Deletes* and 
*Incrementals*."
 
 Review comment:
   Since this is a large PR, it will be difficult to review line by line... 
Even within this file, what Hudi stands for is written differently :). can you 
take a closer pass and match all content verbatim from the current site>? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299309
 
 

 ##
 File path: newui/content/index.html
 ##
 @@ -0,0 +1,233 @@
+
+
+  
+
+
+Welcome to Apache Hudi - Apache 
Hudi
+
+
+
+
+
+
+https://hudi.apache.org/;>
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+
+  Welcome to Apache Hudi
+
+
+Apache Hudi Stands for Hadoop Upserts and 
Incrementals to manage the storage of large analytical datasets on HDFS. 
https://github.com/apache/incubator-hudi/releases/tag/release-0.5.0-incubating;
 target="_blank">Latest release 0.5.0-incubating
+
+
+   Get Starting
+
+  
+
+
+
+  
+
+  Hudi Data Lakes
+
+
+  Hudi brings stream processing to big data, providing fresh data 
while being an order of magnitude efficient over traditional batch 
processing.
+
+
+  
 
 Review comment:
   hudi-delta-lake? please rename this file 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299232
 
 

 ##
 File path: newui/docs/_data/ui-text.yml
 ##
 @@ -0,0 +1,60 @@
+# User interface text and labels
+
+# English (default)
+# -
+en: _EN
+  skip_links : "Skip links"
+  skip_primary_nav   : "Skip to primary navigation"
+  skip_content   : "Skip to content"
+  skip_footer: "Skip to footer"
+  page   : "Page"
+  pagination_previous: "Previous"
+  pagination_next: "Next"
+  breadcrumb_home_label  : "Home"
+  breadcrumb_separator   : "/"
+  menu_label : "Toggle menu"
+  search_label   : "Toggle search"
+  toc_label  : "Section Nav"
 
 Review comment:
   Reword to: `In this page` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362298994
 
 

 ##
 File path: newui/docs/README.md
 ##
 @@ -0,0 +1,46 @@
+## Site Documentation
 
 Review comment:
   please make sure this file reflects the instructions for the newui and not 
the old ui 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299199
 
 

 ##
 File path: newui/docs/_data/navigation.yml
 ##
 @@ -0,0 +1,51 @@
+
+# main links
+main:
+  - title: "Documentation"
+url: /docs/quick-start-guide.html
+  - title: "Community"
+url: /community.html
+  - title: "Roadmap"
+url: /roadmap.html
+  - title: "Activities"
+url: /activity.html
+  - title: "FAQ"
+url: https://cwiki.apache.org/confluence/display/HUDI/FAQ
+  - title: "Releases"
+url: /releases.html
+
+# doc links
+docs:
+  - title: Getting Started
+children:
+  - title: "Quick Start"
+url: /docs/quick-start-guide.html
+  - title: "Structure"
 
 Review comment:
   lets remove this page and incorporate its content onto eithr the current 
home page or skip


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299640
 
 

 ##
 File path: newui/content/docs/docs-versions.html
 ##
 @@ -0,0 +1,391 @@
+
+
+  
+
+
+Docs Versions - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/docs/docs-versions.html;>
+
+
+  
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+  
+
+  
+  
+  
+  
+
+
+
+
+  
+
+
+  
+  
+  Toggle menu
+  
+
+  
+
+  Getting Started
+
+
+
+
+  
+
+
+
+
+
+
+
+  Quick 
Start
+
+
+  
+
+
+
+
+
+
+
+  Structure
+
+
+  
+
+
+
+
+
+
+
+  User Cases
+
+
+  
+
+
+
+
+
+
+
+  Talks & Powered 
By
+
+
+  
+
+
+
+
+
+
+
+  Comparison
+
+
+  
+
+
+  
+
+  
+
+  Documentation
+
+
+
+
+  
+
+
+
+
+
+
+
+  Concepts
+
+
+  
+
+
+
+
+
+
+
+  Writing 
Data
+
+
+  
+
+
+
+
+
+
+
+  Querying 
Data
+
+
+  
+
+
+
+
+
+
+
+  Configuration
+
+
+  
+
+
+
+
+
+
+
+  Performance
+
+
+  
+
+
+
+
+
+
+
+  Administering
+
+
+  
+
+
+  
+
+  
+
+  Meta Info
+
+
+
+
+  
+
+
+
+
+
+
+
+  Docs 
Versions
 
 Review comment:
   what versioning support do we have now? in this new model? There is an RFC 
already working towards doc versioning.. so would be good to sync with its 
author @yihua  and also the next release manager @leesf to ensure 
0.5.1-incubating release docs get pushed out.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299161
 
 

 ##
 File path: newui/docs/_data/navigation.yml
 ##
 @@ -0,0 +1,51 @@
+
+# main links
+main:
+  - title: "Documentation"
+url: /docs/quick-start-guide.html
+  - title: "Community"
+url: /community.html
+  - title: "Roadmap"
+url: /roadmap.html
+  - title: "Activities"
+url: /activity.html
 
 Review comment:
   can we show the apachehudi twitter stream here instead? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299381
 
 

 ##
 File path: newui/content/index.html
 ##
 @@ -0,0 +1,233 @@
+
+
+  
+
+
+Welcome to Apache Hudi - Apache 
Hudi
+
+
+
+
+
+
+https://hudi.apache.org/;>
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+
+  Welcome to Apache Hudi
+
+
+Apache Hudi Stands for Hadoop Upserts and 
Incrementals to manage the storage of large analytical datasets on HDFS. 
https://github.com/apache/incubator-hudi/releases/tag/release-0.5.0-incubating;
 target="_blank">Latest release 0.5.0-incubating
+
+
+   Get Starting
+
+  
+
+
+
+  
+
+  Hudi Data Lakes
+
+
+  Hudi brings stream processing to big data, providing fresh data 
while being an order of magnitude efficient over traditional batch 
processing.
+
+
+  
 
 Review comment:
   Also this image is not probably the best/accurate thing out there.. can you 
remove this section for now.. and file another JIRA to add it in later.. 
@bhasudha is already tracking the landing page image redesign I think.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299610
 
 

 ##
 File path: newui/content/docs/docs-versions.html
 ##
 @@ -0,0 +1,391 @@
+
+
+  
+
+
+Docs Versions - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/docs/docs-versions.html;>
+
+
+  
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+  
+
+  
+  
+  
+  
+
+
+
+
+  
+
+
+  
+  
+  Toggle menu
+  
+
+  
+
+  Getting Started
+
+
+
+
+  
+
+
+
+
+
+
+
+  Quick 
Start
+
+
+  
+
+
+
+
+
+
+
+  Structure
+
+
+  
+
+
+
+
+
+
+
+  User Cases
+
+
+  
+
+
+
+
+
+
+
+  Talks & Powered 
By
+
+
+  
+
+
+
+
+
+
+
+  Comparison
+
+
+  
+
+
+  
+
+  
+
+  Documentation
+
+
+
+
+  
+
+
+
+
+
+
+
+  Concepts
+
+
+  
+
+
+
+
+
+
+
+  Writing 
Data
+
+
+  
+
+
+
+
+
+
+
+  Querying 
Data
+
+
+  
+
+
+
+
+
+
+
+  Configuration
+
+
+  
+
+
+
+
+
+
+
+  Performance
+
+
+  
+
+
+
+
+
+
+
+  Administering
+
+
+  
+
+
+  
+
+  
+
+  Meta Info
 
 Review comment:
   just `INFO`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299403
 
 

 ##
 File path: newui/content/releases.html
 ##
 @@ -0,0 +1,292 @@
+
+
+  
+
+
+Releases - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/releases;>
+
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+
+
+https://schema.org/Person;>
+
+  
+
+  Quick Links
+
+
+  
+Apache Hudi stands for Hadoop Upserts 
Deletes and Incrementals.
+
+  
+
+  
+
+  
+
+  
+
+   
Documentation
+
+  
+
+  https://cwiki.apache.org/confluence/display/HUDI; 
target="_blank" rel="nofollow noopener noreferrer"> Technical Wiki
+
+  
+
+   Contribute 
Guide
 
 Review comment:
   Contribution Guide


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1120: [HUDI-440] Rework 
the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r362299432
 
 

 ##
 File path: newui/content/roadmap.html
 ##
 @@ -0,0 +1,253 @@
+
+
+  
+
+
+Roadmap - Apache Hudi
+
+
+
+
+
+
+https://hudi.apache.org/roadmap;>
+
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  Apache Hudi
+  0.5.1-SNAPSHOT(incubating)
+
+
+  Documentation
+
+  Community
+
+  Roadmap
+
+  Activities
+
+  https://cwiki.apache.org/confluence/display/HUDI/FAQ; 
target="_blank" >FAQ
+
+  Releases
+
+
+  Toggle menu
+  
+
+
+  
+
+  
+
+
+
+
+  
+  
+  
+
+  
+
+
+
+https://schema.org/Person;>
+
+  
+
+  Quick Links
+
+
+  
+Apache Hudi stands for Hadoop Upserts 
Deletes and Incrementals.
 
 Review comment:
   I don't know if this is useful to re-iterate everywhere.. what the project 
stands for.. instread lets use what the project enables.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1126: Fix Error: java.lang.IllegalArgumentException: Can not create a Path from an empty string

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1126: Fix Error: 
java.lang.IllegalArgumentException: Can not create a Path from an empty string
URL: https://github.com/apache/incubator-hudi/pull/1126#discussion_r362297795
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java
 ##
 @@ -109,7 +109,7 @@ public HoodieCopyOnWriteTable(HoodieWriteConfig config, 
JavaSparkContext jsc) {
 Tuple2 partitionDelFileTuple = iter.next();
 String partitionPath = partitionDelFileTuple._1();
 String delFileName = partitionDelFileTuple._2();
-Path deletePath = new Path(new Path(basePath, partitionPath), 
delFileName);
+Path deletePath = 
FSUtils.getPartitionPath(FSUtils.getPartitionPath(basePath, partitionPath), 
delFileName);
 
 Review comment:
   I think we can do a follow up small fix here.. confirm `delFileName` cannot 
be null and then remove the outer `FSUtils.getPartitionPath`. This one can have 
`[MINOR]` prefix IMO 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-389) Updates sent to diff partition for a given key with Global Index

2019-12-31 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-389:

Status: Closed  (was: Patch Available)

> Updates sent to diff partition for a given key with Global Index 
> -
>
> Key: HUDI-389
> URL: https://issues.apache.org/jira/browse/HUDI-389
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Index
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 48h
>  Time Spent: 20m
>  Remaining Estimate: 47h 40m
>
> Updates sent to diff partition for a given key with Global Index should 
> succeed by updating the record under original partition. As of now, it throws 
> exception. 
> [https://github.com/apache/incubator-hudi/issues/1021] 
>  
>  
> error log:
> {code:java}
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants 
> java.util.stream.ReferencePipeline$Head@d02b1c7
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file 
> system view for partition (2016/04/15)
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found 
> in partition (2016/04/15) =0, Time taken =0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - 
> addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding 
> file-groups for partition :2016/04/15, #FileGroups=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load 
> partition (2016/04/15) =0
>  14754 [Executor task launch worker-0] ERROR 
> com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType 
> UPDATE for partition :0
>  java.util.NoSuchElementException: No value present
>  at com.uber.hoodie.common.util.Option.get(Option.java:112)
>  at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
>  at 
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
>  at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 

[GitHub] [incubator-hudi] vinothchandar commented on issue #1167: [HUDI-484] Fix NPE when reading IncrementalPull.sqltemplate in HiveIncrementalPuller

2019-12-31 Thread GitBox
vinothchandar commented on issue #1167: [HUDI-484] Fix NPE when reading 
IncrementalPull.sqltemplate in HiveIncrementalPuller
URL: https://github.com/apache/incubator-hudi/pull/1167#issuecomment-570012389
 
 
   Could we add a unit test for this tool? its historically been not very 
popular.. But better to cover this for the future


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-474) Delta Streamer is not able to read the commit files

2019-12-31 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-474:

Status: Open  (was: New)

> Delta Streamer is not able to read the commit files
> ---
>
> Key: HUDI-474
> URL: https://issues.apache.org/jira/browse/HUDI-474
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Shahida Khan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Gmail - Commit time issue in DeltaStreamer 
> (Real-Time).pdf
>
>
> DeltaStreamer is not to able to read the correct commit files under when job 
> is deployed realtime.
> below is the stack trace: 
> {code:java}
> ava.util.concurrent.ExecutionException:
>  org.apache.hudi.exception.HoodieException: Could not read commit
>  details from 
> hdfs:/user/hive/warehouse/hudi.db/tbltest/.hoodie/.aux/20191226153400.clean.requested
>       at
>  java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
>    at
>  java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at
>  
> org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:72)
>       at
>  
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:117)
>   at
>  
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:297)
>   at
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at
>  
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>   at
>  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at
>  java.lang.reflect.Method.invoke(Method.java:498)        at
>  
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)Caused
>  by: org.apache.hudi.exception.HoodieException: Could not read commit
>  details from 
> hdfs:/user/hive/warehouse/hudi.db/tbltest/.hoodie/.aux/20191226153400.clean.requested
>       at
>  
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:411)
>         at
>  
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>      at
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at
>  java.lang.Thread.run(Thread.java:748)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-484) NPE in HiveIncrementalPuller

2019-12-31 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-484:

Status: Open  (was: New)

> NPE in HiveIncrementalPuller
> 
>
> Key: HUDI-484
> URL: https://issues.apache.org/jira/browse/HUDI-484
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Incremental Pull
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
> Attachments: Screenshot 2019-12-30 at 4.43.51 PM.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we try to use HiveIncrementalPuller class to incrementally pull changes 
> from hive, it throws NPE as it is unable to find IncrementalPull.sqltemplate 
> in the bundled jar. 
> Screenshot attached which shows the exception. 
> The jar contains the template. 
> Steps to reproduce - 
>  # copy hive-jdbc-2.3.1.jar, log4j-1.2.17.jar to docker/demo/config folder
>  # run cd docker && ./setup_demo.sh
>  # cat docker/demo/data/batch_1.json | kafkacat -b kafkabroker -t stock_ticks 
> -P
>  #  {{docker exec -it adhoc-2 /bin/bash}}
>  #  {{spark-submit --class 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
> $HUDI_UTILITIES_BUNDLE --storage-type COPY_ON_WRITE --source-class 
> org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts 
> --target-base-path /user/hive/warehouse/stock_ticks_cow --target-table 
> stock_ticks_cow --props /var/demo/config/kafka-source.properties 
> --schemaprovider-class 
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider}}
>  #  {{/var/hoodie/ws/hudi-hive/run_sync_tool.sh --jdbc-url 
> jdbc:hive2://hiveserver:1 --user hive --pass hive --partitioned-by dt 
> --base-path /user/hive/warehouse/stock_ticks_cow --database default --table 
> stock_ticks_cow}}
>  # java -cp 
> /var/hoodie/ws/docker/demo/config/hive-jdbc-2.3.1.jar:/var/hoodie/ws/docker/demo/config/log4j-1.2.17.jar:$HUDI_UTILITIES_BUNDLE
>  org.apache.hudi.utilities.HiveIncrementalPuller --hiveUrl 
> jdbc:hive2://hiveserver:1 --hiveUser hive --hivePass hive 
> --extractSQLFile /var/hoodie/ws/docker/demo/config/incr_pull.txt --sourceDb 
> default --sourceTable stock_ticks_cow --targetDb tmp --targetTable tempTable 
> --fromCommitTime 0 --maxCommits 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-486) Improve documentation for using HiveIncrementalPuller

2019-12-31 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006278#comment-17006278
 ] 

Vinoth Chandar commented on HUDI-486:
-

Hmmm.. could we replicate `hudi-hive-bundle` . Seems like this needs to be 
different than utilities-bundle (for obvious reasons).. How about moving the 
tool itself to hudi-hive? and reuse hudi-hive-bundle? 

> Improve documentation for using HiveIncrementalPuller
> -
>
> Key: HUDI-486
> URL: https://issues.apache.org/jira/browse/HUDI-486
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> For using HiveIncrementalPuller, one needs to have a lot of jars in 
> classPath. These jars are not listed anywhere. As a result, one has to keep 
> on adding the jars incrementally to the classPath with every 
> NoClassDefFoundError coming up when executing. 
> We should list down the jars needed so that it becomes easy for a first-time 
> user to use the mentioned tool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-485) Check for where clause is wrong in HiveIncrementalPuller

2019-12-31 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006277#comment-17006277
 ] 

Vinoth Chandar commented on HUDI-485:
-

tbh, its a simple enough utility.. feel free to change it as you see fit.. 

 

one thing to keep in mind : when you query from hive, you need the `where  
`_hoodie_commit_time_`  > ` filter..._ We can only filter out file(slice)s 
using the split level filtering that Hive does.. Back in the day, we tried 
adding an extra filter pushdown to also only select rows matching the commit 
time ranges from the incremental pulled files.. but ran into some issue.. if 
you are interested, we also revisit that and fix it more nicely. 

> Check for where clause is wrong in HiveIncrementalPuller
> 
>
> Key: HUDI-485
> URL: https://issues.apache.org/jira/browse/HUDI-485
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Incremental Pull, newbie
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> HiveIncrementalPuller checks the clause in incrementalSqlFile like this -> 
> if (!incrementalSQL.contains("`_hoodie_commit_time` > '%targetBasePath'")) {
>  LOG.info("Incremental SQL : " + incrementalSQL
>  + " does not contain `_hoodie_commit_time` > %targetBasePath. Please add "
>  + "this clause for incremental to work properly.");
>  throw new HoodieIncrementalPullSQLException(
>  "Incremental SQL does not have clause `_hoodie_commit_time` > 
> '%targetBasePath', which "
>  + "means its not pulling incrementally");
> }
> Basically we are trying to add a placeholder here which is later replaced 
> with config.fromCommitTime here - 
> incrementalPullSQLtemplate.add("incrementalSQL", 
> String.format(incrementalSQL, config.fromCommitTime));
> Hence, the above check needs to replaced with `_hoodie_commit_time` > 
> %targetBasePath



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
lamber-ken commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-570011511
 
 
   > > option("hoodie.datasource.hive_sync.assume_date_partitioning", true).
   > 
   > We don't expect this to be set by anyone except Uber.. `false` has been 
the default ever since.. So I think the right thing may be is to remove this 
config from DataSource if thats what causes this issue.. We should still keep 
this on the WriteClient level. 
http://hudi.apache.org/configurations.html#withAssumeDatePartitioning clearly 
calls out the version requirements..
   
   Yes, it causes this issue. I'll update pr as your advice that removing this 
config from DataSource and keep on the WriteClient level. When I finished it, 
ping you later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-29) Patch to Hive-sync to enable stats on Hive tables #393

2019-12-31 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-29:
---
Fix Version/s: 0.5.1

> Patch to Hive-sync to enable stats on Hive tables #393
> --
>
> Key: HUDI-29
> URL: https://issues.apache.org/jira/browse/HUDI-29
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/uber/hudi/issues/393



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-29) Patch to Hive-sync to enable stats on Hive tables #393

2019-12-31 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-29:
--

Assignee: cdmikechen  (was: Vinoth Chandar)

> Patch to Hive-sync to enable stats on Hive tables #393
> --
>
> Key: HUDI-29
> URL: https://issues.apache.org/jira/browse/HUDI-29
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/uber/hudi/issues/393



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
vinothchandar commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-57005
 
 
   >option("hoodie.datasource.hive_sync.assume_date_partitioning", true).
   
   We don't expect this to be set by anyone except Uber.. `false` has been the 
default ever since.. So I think the right thing may be is to remove this config 
from DataSource if thats what causes this issue.. We should still keep this on 
the WriteClient level. 
http://hudi.apache.org/configurations.html#withAssumeDatePartitioning clearly 
calls out the version requirements.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-281) HiveSync failure through Spark when useJdbc is set to false

2019-12-31 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006274#comment-17006274
 ] 

lamber-ken commented on HUDI-281:
-

hi [~uditme], I think this pr may solve this issue, please tracks 
[https://github.com/apache/incubator-hudi/pull/1125]

> HiveSync failure through Spark when useJdbc is set to false
> ---
>
> Key: HUDI-281
> URL: https://issues.apache.org/jira/browse/HUDI-281
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Hive Integration, Spark Integration, Usability
>Reporter: Udit Mehrotra
>Priority: Major
>
> Table creation with Hive sync through Spark fails, when I set *useJdbc* to 
> *false*. Currently I had to modify the code to set *useJdbc* to *false* as 
> there is not *DataSourceOption* through which I can specify this field when 
> running Hudi code.
> Here is the failure:
> {noformat}
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hive.ql.session.SessionState.start(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/session/SessionState;
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:527)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:517)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:507)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:272)
>   at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:132)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:96)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
>   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229){noformat}
> I was expecting this to fail through Spark, becuase *hive-exec* is not shaded 
> inside *hudi-spark-bundle*, while *HiveConf* is shaded and relocated. This 
> *SessionState* is coming from the spark-hive jar and obviously it does not 
> accept the relocated *HiveConf*.
> We in *EMR* are running into same problem when trying to integrate with Glue 
> Catalog. For this we have to create Hive metastore client through 
> *Hive.get(conf).getMsc()* instead of how it is being down now, so that 
> alternate implementations of metastore can get created. However, because 
> hive-exec is not shaded but HiveConf is relocated we run into same issues 
> there.
> It would not be recommended to shade *hive-exec* either because it itself is 
> an Uber jar that shades a lot of things, and all of them would end up in 
> *hudi-spark-bundle* jar. 

[GitHub] [incubator-hudi] vinothchandar commented on issue #1139: [MINOR]Optimize hudi-client module

2019-12-31 Thread GitBox
vinothchandar commented on issue #1139: [MINOR]Optimize hudi-client module
URL: https://github.com/apache/incubator-hudi/pull/1139#issuecomment-570010614
 
 
   @SteNicholas can you please rebase and resolve the conflicts? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] 
Introduce configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362295734
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/func/CopyOnWriteInsertHandler.java
 ##
 @@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.func;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hudi.WriteStatus;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer;
+import org.apache.hudi.config.HoodieWriteConfig;
+import 
org.apache.hudi.func.CopyOnWriteLazyInsertIterable.HoodieInsertValueGenResult;
+import org.apache.hudi.io.HoodieCreateHandle;
+import org.apache.hudi.io.HoodieWriteHandle;
+import org.apache.hudi.table.HoodieTable;
+
+/**
+ * Consumes stream of hoodie records from in-memory queue and writes to one or 
more create-handles.
 
 Review comment:
   Improve the javadocs? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] 
Introduce configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362295936
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/func/bulkinsert/NonSortPartitioner.java
 ##
 @@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.func.bulkinsert;
+
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.spark.api.java.JavaRDD;
+
+public class NonSortPartitioner
 
 Review comment:
   More like `NonShufflingPartitioner`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] 
Introduce configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362295839
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/func/bulkinsert/BulkInsertInternalPartitioner.java
 ##
 @@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.func.bulkinsert;
+
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.table.UserDefinedBulkInsertPartitioner;
+
+public abstract class BulkInsertInternalPartitioner implements
+UserDefinedBulkInsertPartitioner {
 
 Review comment:
   RDD API clients i.e Uber/marmaray  may need to change the name of the 
interface implemented. It should be fine IMO. cc @bvaradar @n3nash 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] 
Introduce configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362277905
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -77,6 +77,10 @@
   private static final String DEFAULT_HOODIE_WRITE_STATUS_CLASS = 
WriteStatus.class.getName();
   private static final String FINALIZE_WRITE_PARALLELISM = 
"hoodie.finalize.write.parallelism";
   private static final String DEFAULT_FINALIZE_WRITE_PARALLELISM = 
DEFAULT_PARALLELISM;
+  private static final String BULKINSERT_SORT_ENABLED = 
"hoodie.bulkinsert.sort.enable";
 
 Review comment:
   instead of sorting vs non-sorting.. and having two configs.. Can we just 
have `hoodie.bulkinsert.write.mode` as the single config, which supports 
following values 
   
   - `NONE` (No sorting/shuffling) (default)
   - `GLOBALLY_SORTED` 
   - `PARTITION_SORTED`
   
   might be much easier to reason about for end user


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] 
Introduce configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362295790
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/func/bulkinsert/BulkInsertInternalPartitioner.java
 ##
 @@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.func.bulkinsert;
+
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.table.UserDefinedBulkInsertPartitioner;
+
+public abstract class BulkInsertInternalPartitioner implements
+UserDefinedBulkInsertPartitioner {
 
 Review comment:
   We can rename UserDefinedBulkInsertPartitioner to just 
`BulkInsertPartitioner` and reuse it I think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] 
Introduce configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362278533
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##
 @@ -367,20 +370,30 @@ public static SparkConf registerClasses(SparkConf conf) {
 }
   }
 
+  private BulkInsertMapFunction getBulkInsertMapFunction(
+  boolean isSorted, String commitTime, HoodieWriteConfig config, 
HoodieTable hoodieTable,
+  List fileIDPrefixes) {
+if (isSorted) {
+  return new BulkInsertMapFunctionForSortedRecords(
+  commitTime, config, hoodieTable, fileIDPrefixes);
+}
+return new BulkInsertMapFunctionForNonSortedRecords(
+commitTime, config, hoodieTable, fileIDPrefixes);
+  }
+
   private JavaRDD bulkInsertInternal(JavaRDD> 
dedupedRecords, String commitTime,
   HoodieTable table, Option 
bulkInsertPartitioner) {
 
 Review comment:
   does it make sense to still have a separate 
`UserDefinedBulkInsertPartitioner` ? could we just combine this into 
`BulkInsertInternalPartitioner` ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] 
Introduce configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362295974
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/table/UserDefinedBulkInsertPartitioner.java
 ##
 @@ -31,4 +31,6 @@
 public interface UserDefinedBulkInsertPartitioner {
 
   JavaRDD> repartitionRecords(JavaRDD> 
records, int outputSparkPartitions);
+
+  boolean arePartitionRecordsSorted();
 
 Review comment:
   please improve javadocs of this interface while you are here :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362295605
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/table/RollbackExecutor.java
 ##
 @@ -217,7 +216,7 @@ private HoodieRollbackStat 
mergeRollbackStat(HoodieRollbackStat stat1, HoodieRol
 
   private Map generateHeader(String commit) {
 // generate metadata
-Map header = Maps.newHashMap();
+Map header = new HashMap<>();
 
 Review comment:
   good call. Still moving this in some form to a common class would be good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-484) NPE in HiveIncrementalPuller

2019-12-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-484:

Labels: pull-request-available  (was: )

> NPE in HiveIncrementalPuller
> 
>
> Key: HUDI-484
> URL: https://issues.apache.org/jira/browse/HUDI-484
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Incremental Pull
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
> Attachments: Screenshot 2019-12-30 at 4.43.51 PM.png
>
>
> When we try to use HiveIncrementalPuller class to incrementally pull changes 
> from hive, it throws NPE as it is unable to find IncrementalPull.sqltemplate 
> in the bundled jar. 
> Screenshot attached which shows the exception. 
> The jar contains the template. 
> Steps to reproduce - 
>  # copy hive-jdbc-2.3.1.jar, log4j-1.2.17.jar to docker/demo/config folder
>  # run cd docker && ./setup_demo.sh
>  # cat docker/demo/data/batch_1.json | kafkacat -b kafkabroker -t stock_ticks 
> -P
>  #  {{docker exec -it adhoc-2 /bin/bash}}
>  #  {{spark-submit --class 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
> $HUDI_UTILITIES_BUNDLE --storage-type COPY_ON_WRITE --source-class 
> org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts 
> --target-base-path /user/hive/warehouse/stock_ticks_cow --target-table 
> stock_ticks_cow --props /var/demo/config/kafka-source.properties 
> --schemaprovider-class 
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider}}
>  #  {{/var/hoodie/ws/hudi-hive/run_sync_tool.sh --jdbc-url 
> jdbc:hive2://hiveserver:1 --user hive --pass hive --partitioned-by dt 
> --base-path /user/hive/warehouse/stock_ticks_cow --database default --table 
> stock_ticks_cow}}
>  # java -cp 
> /var/hoodie/ws/docker/demo/config/hive-jdbc-2.3.1.jar:/var/hoodie/ws/docker/demo/config/log4j-1.2.17.jar:$HUDI_UTILITIES_BUNDLE
>  org.apache.hudi.utilities.HiveIncrementalPuller --hiveUrl 
> jdbc:hive2://hiveserver:1 --hiveUser hive --hivePass hive 
> --extractSQLFile /var/hoodie/ws/docker/demo/config/incr_pull.txt --sourceDb 
> default --sourceTable stock_ticks_cow --targetDb tmp --targetTable tempTable 
> --fromCommitTime 0 --maxCommits 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1167: [HUDI-484] Fix NPE when reading IncrementalPull.sqltemplate in HiveIncrementalPuller

2019-12-31 Thread GitBox
lamber-ken opened a new pull request #1167: [HUDI-484] Fix NPE when reading 
IncrementalPull.sqltemplate in HiveIncrementalPuller
URL: https://github.com/apache/incubator-hudi/pull/1167
 
 
   
   ## What is the purpose of the pull request
   
   When using `class.getResourceAsStream` method, the resource name should 
begins with a '/'.
   
   More detail, please visit 
https://lists.apache.org/thread.html/9c7ab9b8cf63d8f1f7fd72a0aba870976e81d9c50ed53db80f082284%40%3Cdev.hudi.apache.org%3E
   
   ## Brief change log
   
 - Change "IncrementalPull.sqltemplate" to "/IncrementalPull.sqltemplate".
   
   ## Verify this pull request
   
   This pull request is code cleanup without any test coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-486) Improve documentation for using HiveIncrementalPuller

2019-12-31 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006259#comment-17006259
 ] 

lamber-ken commented on HUDI-486:
-

(y) +1

> Improve documentation for using HiveIncrementalPuller
> -
>
> Key: HUDI-486
> URL: https://issues.apache.org/jira/browse/HUDI-486
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> For using HiveIncrementalPuller, one needs to have a lot of jars in 
> classPath. These jars are not listed anywhere. As a result, one has to keep 
> on adding the jars incrementally to the classPath with every 
> NoClassDefFoundError coming up when executing. 
> We should list down the jars needed so that it becomes easy for a first-time 
> user to use the mentioned tool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-417) Refactor HoodieWriteClient so that commit logic can be shareable by both bootstrap and normal write operations

2019-12-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-417:

Labels: pull-request-available  (was: )

> Refactor HoodieWriteClient so that commit logic can be shareable by both 
> bootstrap and normal write operations
> --
>
> Key: HUDI-417
> URL: https://issues.apache.org/jira/browse/HUDI-417
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>
>  
> Basic Code Changes are present in the fork : 
> [https://github.com/bvaradar/hudi/tree/vb_bootstrap]
>  
> The current implementation of HoodieBootstrapClient has duplicate code for 
> committing bootstrap. 
> [https://github.com/bvaradar/hudi/blob/vb_bootstrap/hudi-client/src/main/java/org/apache/hudi/bootstrap/HoodieBootstrapClient.java]
>  
>  
> We can have an independent PR which would move these commit functionality 
> from HoodieWriteClient to a new base class AbstractHoodieWriteClient which 
> HoodieBootstrapClient can inherit.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar opened a new pull request #1166: [HUDI-417] Refactor HoodieWriteClient so that commit logic can be shareable by both bootstrap and normal write operations

2019-12-31 Thread GitBox
bvaradar opened a new pull request #1166: [HUDI-417] Refactor HoodieWriteClient 
so that commit logic can be shareable by both bootstrap and normal write 
operations
URL: https://github.com/apache/incubator-hudi/pull/1166
 
 
   
   ## What is the purpose of the pull request
   
   Refactor only change as a preparatory step for supporting efficient 
bootstrap of parquet tables to Hudi.
   
   ## Verify this pull request
   
   Refactor only change for HoodieWriteClient
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1154: [HUDI-406] Added default partition path in TimestampBasedKeyGenerator

2019-12-31 Thread GitBox
bvaradar commented on a change in pull request #1154: [HUDI-406] Added default 
partition path in TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1154#discussion_r362285831
 
 

 ##
 File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
 ##
 @@ -64,6 +64,14 @@ public static String 
getNullableNestedFieldValAsString(GenericRecord record, Str
 }
   }
 
+  public static Object getNullableNestedFieldVal(GenericRecord record, String 
fieldName) {
+try {
+  return getNestedFieldVal(record, fieldName);
 
 Review comment:
   @pratyakshsharma : Yeah, the other function also needs to be fixed. Thank 
you.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-31 Thread GitBox
nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r362278320
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -74,14 +75,15 @@
   public static final String[] DEFAULT_PARTITION_PATHS =
   {DEFAULT_FIRST_PARTITION_PATH, DEFAULT_SECOND_PARTITION_PATH, 
DEFAULT_THIRD_PARTITION_PATH};
   public static final int DEFAULT_PARTITION_DEPTH = 3;
-  public static String TRIP_EXAMPLE_SCHEMA = "{\"type\": \"record\",\"name\": 
\"triprec\",\"fields\": [ "
-  + "{\"name\": \"timestamp\",\"type\": \"double\"},{\"name\": 
\"_row_key\", \"type\": \"string\"},"
-  + "{\"name\": \"rider\", \"type\": \"string\"},{\"name\": \"driver\", 
\"type\": \"string\"},"
-  + "{\"name\": \"begin_lat\", \"type\": \"double\"},{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
-  + "{\"name\": \"end_lat\", \"type\": \"double\"},{\"name\": \"end_lon\", 
\"type\": \"double\"},"
-  + "{\"name\":\"fare\",\"type\": \"double\"}]}";
+  public static String TRIP_EXAMPLE_SCHEMA = "{\"type\": \"record\"," + 
"\"name\": \"triprec\"," + "\"fields\": [ "
+  + "{\"name\": \"timestamp\",\"type\": \"double\"}," + "{\"name\": 
\"_row_key\", \"type\": \"string\"},"
+  + "{\"name\": \"rider\", \"type\": \"string\"}," + "{\"name\": 
\"driver\", \"type\": \"string\"},"
+  + "{\"name\": \"begin_lat\", \"type\": \"double\"}," + "{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"end_lat\", \"type\": \"double\"}," + "{\"name\": 
\"end_lon\", \"type\": \"double\"},"
+  + "{\"name\":\"fare\",\"type\": \"double\"},"
+  + "{\"name\": \"_hoodie_delete_marker\", \"type\": \"boolean\", 
\"default\": false} ]}";
   public static String NULL_SCHEMA = 
Schema.create(Schema.Type.NULL).toString();
-  public static String TRIP_HIVE_COLUMN_TYPES = 
"double,string,string,string,double,double,double,double,double";
+  public static String TRIP_HIVE_COLUMN_TYPES = 
"double,string,string,string,double,double,double,double,double,string";
 
 Review comment:
   actual data type of the delete marker is boolean only in line 84. Missed to 
update this list of columns. Will fix it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1143: [MINOR] Fix out of limits for results

2019-12-31 Thread GitBox
vinothchandar commented on issue #1143: [MINOR] Fix out of limits for results
URL: https://github.com/apache/incubator-hudi/pull/1143#issuecomment-569987256
 
 
   @smarthi please squash and merge so there is just the 1 commit on git 
history.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-31 Thread GitBox
vinothchandar commented on issue #1128: [HUDI-453] Fix throw failed to archive 
commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-569987171
 
 
   @bvaradar can you please squash and merge next time, so there is jsut 1 
commit on the history? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or 
Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362275520
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/io/compact/HoodieRealtimeTableCompactor.java
 ##
 @@ -125,7 +125,7 @@
 config.getCompactionReverseLogReadEnabled(), 
config.getMaxDFSStreamBufferSize(),
 config.getSpillableMapBasePath());
 if (!scanner.iterator().hasNext()) {
-  return Lists.newArrayList();
+  return new ArrayList<>();
 
 Review comment:
   Agreed. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
lamber-ken commented on a change in pull request #1159: [HUDI-479] Eliminate or 
Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362276178
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/table/RollbackExecutor.java
 ##
 @@ -217,7 +216,7 @@ private HoodieRollbackStat 
mergeRollbackStat(HoodieRollbackStat stat1, HoodieRol
 
   private Map generateHeader(String commit) {
 // generate metadata
-Map header = Maps.newHashMap();
+Map header = new HashMap<>();
 
 Review comment:
   > Collections.emptyMap() works? Similarly, if we can pull this into our own 
`CollectionUtils` class, that would be useful to change later on?
   
   Hi @vinothchandar, `Collections.emptyMap()`, `Collections.emptyList()`, 
`Collections.emptySet()` don't support put / add /remove operations.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1150: [HUDI-288]: Add 
support for ingesting multiple kafka streams in a single DeltaStreamer 
deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r362275299
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -80,16 +79,27 @@
   + "{\"name\": \"begin_lat\", \"type\": \"double\"},{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
   + "{\"name\": \"end_lat\", \"type\": \"double\"},{\"name\": \"end_lon\", 
\"type\": \"double\"},"
   + "{\"name\":\"fare\",\"type\": \"double\"}]}";
+  public static String GROCERY_PURCHASE_SCHEMA = 
"{\"type\":\"record\",\"name\":\"purchaserec\",\"fields\":["
 
 Review comment:
   Could be better to re-use the same schema? if you really want two different 
schema, then its time to modularize this class better?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1150: [HUDI-288]: Add 
support for ingesting multiple kafka streams in a single DeltaStreamer 
deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r362275428
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/TableConfig.java
 ##
 @@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
+import com.fasterxml.jackson.annotation.JsonProperty;
+
+import java.util.Objects;
+
+/*
+Represents object with all the topic level overrides for multi table delta 
streamer execution
+ */
+@JsonIgnoreProperties(ignoreUnknown = true)
 
 Review comment:
   any reason this is in the `hudi-common` package? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce 
configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r362275524
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/func/bulkinsert/RDDPartitionLocalSortPartitioner.java
 ##
 @@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.func.bulkinsert;
+
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.spark.HashPartitioner;
+import org.apache.spark.api.java.JavaRDD;
+import scala.Tuple2;
+
+public class RDDPartitionLocalSortPartitioner
+extends BulkInsertInternalPartitioner {
+
+  @Override
+  public JavaRDD> repartitionRecords(JavaRDD> 
records,
+  int outputSparkPartitions) {
+return records.mapToPair(record ->
+new Tuple2<>(
+String.format("%s+%s", record.getPartitionPath(), 
record.getRecordKey()), record))
+.repartitionAndSortWithinPartitions(new 
HashPartitioner(outputSparkPartitions))
 
 Review comment:
   Yes, `RangePartitioner` is actually what I'm trying to look for.
   
   For this specific Partitioner, it tries to avoid shuffling to speed up the 
bulk insert but may introduce overlapping ranges.  will check the side effects 
of this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or 
Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362275520
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/io/compact/HoodieRealtimeTableCompactor.java
 ##
 @@ -125,7 +125,7 @@
 config.getCompactionReverseLogReadEnabled(), 
config.getMaxDFSStreamBufferSize(),
 config.getSpillableMapBasePath());
 if (!scanner.iterator().hasNext()) {
-  return Lists.newArrayList();
+  return new ArrayList<>();
 
 Review comment:
   Agreed. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on issue #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2019-12-31 Thread GitBox
yihua commented on issue #1149: [WIP] [HUDI-472] Introduce configurations and 
new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#issuecomment-569985331
 
 
   @cdmikechen Thanks for the suggestion.  I'm actually working on benchmarking 
each mode with different types of workload.  Once I have some data points I'll 
share them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362271984
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/index/bloom/BloomIndexFileInfo.java
 ##
 @@ -78,23 +78,21 @@ public boolean equals(Object o) {
 }
 
 BloomIndexFileInfo that = (BloomIndexFileInfo) o;
-return Objects.equal(that.fileId, fileId) && 
Objects.equal(that.minRecordKey, minRecordKey)
-&& Objects.equal(that.maxRecordKey, maxRecordKey);
+return Objects.equals(that.fileId, fileId) && 
Objects.equals(that.minRecordKey, minRecordKey)
+&& Objects.equals(that.maxRecordKey, maxRecordKey);
 
   }
 
   @Override
   public int hashCode() {
-return Objects.hashCode(fileId, minRecordKey, maxRecordKey);
+return Objects.hash(fileId, minRecordKey, maxRecordKey);
   }
 
   @Override
   public String toString() {
-final StringBuilder sb = new StringBuilder("BloomIndexFileInfo {");
-sb.append(" fileId=").append(fileId);
-sb.append(" minRecordKey=").append(minRecordKey);
-sb.append(" maxRecordKey=").append(maxRecordKey);
-sb.append('}');
-return sb.toString();
+return "BloomIndexFileInfo {" + " fileId=" + fileId
 
 Review comment:
   why this change? StringBuilder reuses allocation right? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362274816
 
 

 ##
 File path: hudi-integ-test/pom.xml
 ##
 @@ -117,7 +117,7 @@
 
   com.google.guava
   guava
-  20.0
+  15.0
 
 Review comment:
   why the downgrade? this should be fine for the integ test , if its passing 
consistently right 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362274938
 
 

 ##
 File path: pom.xml
 ##
 @@ -67,9 +67,9 @@
   
 
   
-2.6
+3.2.0
 
 Review comment:
   does the guava change need these version changes? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362274606
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/table/TestHoodieTableMetaClient.java
 ##
 @@ -120,7 +120,7 @@ public void checkArchiveCommitTimeline() throws 
IOException {
 HoodieInstant instant2 = new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "2");
 HoodieInstant instant3 = new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "3");
 
-assertEquals(Lists.newArrayList(instant1, instant2, instant3),
+assertEquals(Stream.of(instant1, instant2, 
instant3).collect(Collectors.toList()),
 
 Review comment:
   Arrays.asList()  instead? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362273413
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieKey.java
 ##
 @@ -58,20 +57,18 @@ public boolean equals(Object o) {
   return false;
 }
 HoodieKey otherKey = (HoodieKey) o;
-return Objects.equal(recordKey, otherKey.recordKey) && 
Objects.equal(partitionPath, otherKey.partitionPath);
+return Objects.equals(recordKey, otherKey.recordKey) && 
Objects.equals(partitionPath, otherKey.partitionPath);
   }
 
   @Override
   public int hashCode() {
-return Objects.hashCode(recordKey, partitionPath);
+return Objects.hash(recordKey, partitionPath);
   }
 
   @Override
   public String toString() {
-final StringBuilder sb = new StringBuilder("HoodieKey {");
 
 Review comment:
   same comment on string builder.. If we print the key in fast path (for 
debugging), the extra allocations would matter?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362273016
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/io/compact/HoodieRealtimeTableCompactor.java
 ##
 @@ -125,7 +125,7 @@
 config.getCompactionReverseLogReadEnabled(), 
config.getMaxDFSStreamBufferSize(),
 config.getSpillableMapBasePath());
 if (!scanner.iterator().hasNext()) {
-  return Lists.newArrayList();
+  return new ArrayList<>();
 
 Review comment:
   how about Collections.emptyList() for new list creations? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362273153
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/table/RollbackExecutor.java
 ##
 @@ -217,7 +216,7 @@ private HoodieRollbackStat 
mergeRollbackStat(HoodieRollbackStat stat1, HoodieRol
 
   private Map generateHeader(String commit) {
 // generate metadata
-Map header = Maps.newHashMap();
+Map header = new HashMap<>();
 
 Review comment:
   Collections.emptyMap() works? Similarly, if we can pull this into our own 
`CollectionUtils` class, that would be useful to change later on? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2019-12-31 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r362272832
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/io/compact/HoodieRealtimeTableCompactor.java
 ##
 @@ -112,8 +112,8 @@
 // loaded and load it using CompositeAvroLogReader
 // Since a DeltaCommit is not defined yet, reading all the records. 
revisit this soon.
 String maxInstantTime = metaClient
-
.getActiveTimeline().getTimelineOfActions(Sets.newHashSet(HoodieTimeline.COMMIT_ACTION,
-HoodieTimeline.ROLLBACK_ACTION, 
HoodieTimeline.DELTA_COMMIT_ACTION))
+
.getActiveTimeline().getTimelineOfActions(Stream.of(HoodieTimeline.COMMIT_ACTION,
 
 Review comment:
   Throw the set instantiation into a common `CollectionUtils#setOf()` in 
hudi-common ? This way we can change the implementation underneath later on if 
needed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-402) Code clean up in DataSourceUtils.java and all test classes

2019-12-31 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed HUDI-402.
--

> Code clean up in DataSourceUtils.java and all test classes
> --
>
> Key: HUDI-402
> URL: https://issues.apache.org/jira/browse/HUDI-402
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DataSourceUtils.java - 
> In function getNestedFieldValAsString, we call getNestedFieldVal function to 
> get the value. Then we check if the object returned is null, which is always 
> false, since the called function throws an exception rather than returning 
> null. 
> Need to change the code accordingly.
> Test classes - 
>  # We have defined checked exceptions at a lot of places which are actually 
> not thrown anywhere. 
>  # Similarly, java stream code can be refactored at a lot of places. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-402) Code clean up in DataSourceUtils.java and all test classes

2019-12-31 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved HUDI-402.

Resolution: Fixed

> Code clean up in DataSourceUtils.java and all test classes
> --
>
> Key: HUDI-402
> URL: https://issues.apache.org/jira/browse/HUDI-402
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DataSourceUtils.java - 
> In function getNestedFieldValAsString, we call getNestedFieldVal function to 
> get the value. Then we check if the object returned is null, which is always 
> false, since the called function throws an exception rather than returning 
> null. 
> Need to change the code accordingly.
> Test classes - 
>  # We have defined checked exceptions at a lot of places which are actually 
> not thrown anywhere. 
>  # Similarly, java stream code can be refactored at a lot of places. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
lamber-ken edited a comment on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569981982
 
 
   > > So, modify assumeDatePartitioning to ! assumeDatePartitioning is the 
best way to fix this issue.
   > 
   > This is not going to fix this issue.. and we should not be changing this 
code.. It will cause side effects, like I mentioned before.
   > 
   > Can we reproduce this issue in a unit test or sample code first?
   
   hi, follow bellow steps can reproduce this issuse
   
   1, Define the PartitionValueExtractor
   ```
   package org.apache.hudi.hive;
   
   import org.joda.time.DateTime;
   import org.joda.time.format.DateTimeFormat;
   import org.joda.time.format.DateTimeFormatter;
   
   import java.util.Collections;
   import java.util.List;
   
   public class DayPartitionValueExtractor implements PartitionValueExtractor {
   
   private transient DateTimeFormatter dtfOut;
   
   public DayPartitionValueExtractor() {
   this.dtfOut = DateTimeFormat.forPattern("-MM-dd");
   }
   
   private DateTimeFormatter getDtfOut() {
   if (dtfOut == null) {
   dtfOut = DateTimeFormat.forPattern("-MM-dd");
   }
   return dtfOut;
   }
   
   @Override
   public List extractPartitionValuesInPath(String partitionPath) {
   String[] splits = partitionPath.split("-");
   if (splits.length != 3) {
   throw new IllegalArgumentException(
   "Partition path " + partitionPath + " is not in the form 
-mm-dd ");
   }
   int year = Integer.parseInt(splits[0]);
   int mm = Integer.parseInt(splits[1]);
   int dd = Integer.parseInt(splits[2]);
   DateTime dateTime = new DateTime(year, mm, dd, 0, 0);
   return Collections.singletonList(getDtfOut().print(dateTime));
   }
   }
   ```
   
   2, Write data by spark-shell
   ```
   export SPARK_HOME=/work/BigData/install/spark/spark-2.3.3-bin-hadoop2.6
   $${SPARK_HOME}/bin/spark-shell --packages 
org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   
   import org.apache.spark.sql.SaveMode
   
   val basePath = "/tmp/hoodie_test"
   var datas = List("""{ "key": "uuid", "event_time": 1574297893836, 
"part_date": "2019-11-12"}""")
   val df = spark.read.json(spark.sparkContext.parallelize(datas, 2))
   
   df.write.format("org.apache.hudi").
   option("hoodie.insert.shuffle.parallelism", "10").
   option("hoodie.upsert.shuffle.parallelism", "10").
   option("hoodie.delete.shuffle.parallelism", "10").
   option("hoodie.bulkinsert.shuffle.parallelism", "10").
   
   option("hoodie.datasource.hive_sync.enable", true).
   option("hoodie.datasource.hive_sync.jdbcurl", 
"jdbc:hive2://0.0.0.0:12326").
   option("hoodie.datasource.hive_sync.username", "dcadmin").
   option("hoodie.datasource.hive_sync.password", "dcadmin").
   option("hoodie.datasource.hive_sync.database", "default").
   option("hoodie.datasource.hive_sync.table", "hoodie_test").
   option("hoodie.datasource.hive_sync.partition_fields", "part_date").
   
   option("hoodie.datasource.hive_sync.assume_date_partitioning", true).
   option("hoodie.datasource.hive_sync.partition_extractor_class", 
"org.apache.hudi.hive.DayPartitionValueExtractor").
   
   option("hoodie.datasource.write.precombine.field", "event_time").
   option("hoodie.datasource.write.recordkey.field", "key").
   option("hoodie.datasource.write.partitionpath.field", "part_date").
   
   option("hoodie.table.name", "hoodie_test").
   mode(SaveMode.Overwrite).
   save(basePath);
   ```
   
   3, Query data from hive
   ```
   no data at first time
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-76) CSV Source support for Hudi Delta Streamer

2019-12-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-76:
---
Labels: pull-request-available  (was: )

> CSV Source support for Hudi Delta Streamer
> --
>
> Key: HUDI-76
> URL: https://issues.apache.org/jira/browse/HUDI-76
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, Incremental Pull
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Minor
>  Labels: pull-request-available
>
> DeltaStreamer does not have support to pull CSV data from sources (hdfs log 
> files/kafka). THis ticket is to provide support for csv sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
lamber-ken edited a comment on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569982714
 
 
   > ` If HIVE_ASSUME_DATE_PARTITION_OPT_KEY is set true` is pretty much a 
historical config, that could solely affect old tables as at uber.. thats all..
   > 
   > Nonetheless. your fix is efficient, since it avoids full listing 
anyway!.so lets do it
   
   As you said, it's a historical config, we can remove this from configuration 
document. it's a another way to fix this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
lamber-ken commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569982714
 
 
   > ` If HIVE_ASSUME_DATE_PARTITION_OPT_KEY is set true` is pretty much a 
historical config, that could solely affect old tables as at uber.. thats all..
   > 
   > Nonetheless. your fix is efficient, since it avoids full listing 
anyway!.so lets do it
   
   As you said, it's a historical config, we can remove this from configuration 
document.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua opened a new pull request #1165: [HUDI-76][WIP] Add CSV Source support for Hudi Delta Streamer

2019-12-31 Thread GitBox
yihua opened a new pull request #1165: [HUDI-76][WIP] Add CSV Source support 
for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165
 
 
   ## What is the purpose of the pull request
   
   Add CSV Source support for Hudi Delta Streamer
   
   ## Brief change log
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
lamber-ken edited a comment on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569981982
 
 
   > > So, modify assumeDatePartitioning to ! assumeDatePartitioning is the 
best way to fix this issue.
   > 
   > This is not going to fix this issue.. and we should not be changing this 
code.. It will cause side effects, like I mentioned before.
   > 
   > Can we reproduce this issue in a unit test or sample code first?
   
   hi, follow bellow steps can reproduce this issuse
   
   1, Define the PartitionValueExtractor
   ```
   package org.apache.hudi.hive;
   
   import org.joda.time.DateTime;
   import org.joda.time.format.DateTimeFormat;
   import org.joda.time.format.DateTimeFormatter;
   
   import java.util.Collections;
   import java.util.List;
   
   public class DayPartitionValueExtractor implements PartitionValueExtractor {
   
   private transient DateTimeFormatter dtfOut;
   
   public DayPartitionValueExtractor() {
   this.dtfOut = DateTimeFormat.forPattern("-MM-dd");
   }
   
   private DateTimeFormatter getDtfOut() {
   if (dtfOut == null) {
   dtfOut = DateTimeFormat.forPattern("-MM-dd");
   }
   return dtfOut;
   }
   
   @Override
   public List extractPartitionValuesInPath(String partitionPath) {
   String[] splits = partitionPath.split("-");
   if (splits.length != 3) {
   throw new IllegalArgumentException(
   "Partition path " + partitionPath + " is not in the form 
-mm-dd ");
   }
   int year = Integer.parseInt(splits[0]);
   int mm = Integer.parseInt(splits[1]);
   int dd = Integer.parseInt(splits[2]);
   DateTime dateTime = new DateTime(year, mm, dd, 0, 0);
   return Collections.singletonList(getDtfOut().print(dateTime));
   }
   }
   ```
   
   2, Write data by spark-shell
   ```
   export SPARK_HOME=/work/BigData/install/spark/spark-2.3.3-bin-hadoop2.6
   $${SPARK_HOME}/bin/spark-shell --packages 
org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   
   import org.apache.spark.sql.SaveMode
   
   val basePath = "/tmp/hoodie_test"
   var datas = List("""{ "key": "uuid", "event_time": 1574297893836, 
"part_date": "2019-11-12"}""")
   val df = spark.read.json(spark.sparkContext.parallelize(datas, 2))
   
   df.write.format("org.apache.hudi").
   option("hoodie.insert.shuffle.parallelism", "10").
   option("hoodie.upsert.shuffle.parallelism", "10").
   option("hoodie.delete.shuffle.parallelism", "10").
   option("hoodie.bulkinsert.shuffle.parallelism", "10").
   
   option("hoodie.datasource.hive_sync.enable", true).
   option("hoodie.datasource.hive_sync.jdbcurl", 
"jdbc:hive2://0.0.0.0:12326").
   option("hoodie.datasource.hive_sync.username", "dcadmin").
   option("hoodie.datasource.hive_sync.password", "dcadmin").
   option("hoodie.datasource.hive_sync.database", "default").
   option("hoodie.datasource.hive_sync.table", "hoodie_test").
   option("hoodie.datasource.hive_sync.partition_fields", "part_date").
   
   option("hoodie.datasource.hive_sync.assume_date_partitioning", true).
   option("hoodie.datasource.hive_sync.partition_extractor_class", 
"org.apache.hudi.hive.DayPartitionValueExtractor").
   
   option("hoodie.datasource.write.precombine.field", "event_time").
   option("hoodie.datasource.write.recordkey.field", "key").
   option("hoodie.datasource.write.partitionpath.field", "part_date").
   
   option("hoodie.table.name", "hoodie_test").
   mode(SaveMode.Overwrite).
   save(basePath);
   ```
   
   3, Query data from hive
   ```
   no data
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
lamber-ken commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569981982
 
 
   > > So, modify assumeDatePartitioning to ! assumeDatePartitioning is the 
best way to fix this issue.
   > 
   > This is not going to fix this issue.. and we should not be changing this 
code.. It will cause side effects, like I mentioned before.
   > 
   > Can we reproduce this issue in a unit test or sample code first?
   
   hi, follow bellow steps can reproduce this issuse
   
   1, Define the PartitionValueExtractor
   ```
   package org.apache.hudi.hive;
   
   import org.joda.time.DateTime;
   import org.joda.time.format.DateTimeFormat;
   import org.joda.time.format.DateTimeFormatter;
   
   import java.util.Collections;
   import java.util.List;
   
   public class DayPartitionValueExtractor implements PartitionValueExtractor {
   
   private transient DateTimeFormatter dtfOut;
   
   public DayPartitionValueExtractor() {
   this.dtfOut = DateTimeFormat.forPattern("-MM-dd");
   }
   
   private DateTimeFormatter getDtfOut() {
   if (dtfOut == null) {
   dtfOut = DateTimeFormat.forPattern("-MM-dd");
   }
   return dtfOut;
   }
   
   @Override
   public List extractPartitionValuesInPath(String partitionPath) {
   String[] splits = partitionPath.split("-");
   if (splits.length != 3) {
   throw new IllegalArgumentException(
   "Partition path " + partitionPath + " is not in the form 
-mm-dd ");
   }
   int year = Integer.parseInt(splits[0]);
   int mm = Integer.parseInt(splits[1]);
   int dd = Integer.parseInt(splits[2]);
   DateTime dateTime = new DateTime(year, mm, dd, 0, 0);
   return Collections.singletonList(getDtfOut().print(dateTime));
   }
   }
   ```
   
   2, Write data by spark-shell
   ```
   export SPARK_HOME=/work/BigData/install/spark/spark-2.3.3-bin-hadoop2.6
   $${SPARK_HOME}/bin/spark-shell --packages 
org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   
   import org.apache.spark.sql.SaveMode
   
   val basePath = "/tmp/hoodie_test"
   var datas = List("""{ "key": "uuid", "event_time": 1574297893836, 
"part_date": "2019-11-12"}""")
   val df = spark.read.json(spark.sparkContext.parallelize(datas, 2))
   
   df.write.format("hudi").
   option("hoodie.insert.shuffle.parallelism", "10").
   option("hoodie.upsert.shuffle.parallelism", "10").
   option("hoodie.delete.shuffle.parallelism", "10").
   option("hoodie.bulkinsert.shuffle.parallelism", "10").
   
   option("hoodie.datasource.hive_sync.enable", true).
   option("hoodie.datasource.hive_sync.jdbcurl", 
"jdbc:hive2://0.0.0.0:12326").
   option("hoodie.datasource.hive_sync.username", "dcadmin").
   option("hoodie.datasource.hive_sync.password", "dcadmin").
   option("hoodie.datasource.hive_sync.database", "default").
   option("hoodie.datasource.hive_sync.table", "hoodie_test").
   option("hoodie.datasource.hive_sync.partition_fields", "part_date").
   
   option("hoodie.datasource.hive_sync.assume_date_partitioning", true).
   option("hoodie.datasource.hive_sync.partition_extractor_class", 
"org.apache.hudi.hive.DayPartitionValueExtractor").
   
   option("hoodie.datasource.write.precombine.field", "event_time").
   option("hoodie.datasource.write.recordkey.field", "key").
   option("hoodie.datasource.write.partitionpath.field", "part_date").
   
   option("hoodie.table.name", "hoodie_test").
   mode(SaveMode.Overwrite).
   save(basePath);
   ```
   
   3, Query data from hive
   ```
   no data
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-91) Replace Databricks spark-avro with native spark-avro #628

2019-12-31 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006219#comment-17006219
 ] 

Vinoth Chandar commented on HUDI-91:


>> I think not every user use Spark2.4

we ran a vote on this and everyone was fine with Spark 2.4 and above.. I'd 
rather have spark maintain the avro -> row conversion logic.  

>>. But timestamp need to change some codes in

I think aws folks also opened some Hive issues.. Not sure about hive 3 myself.. 
:) 

 

> Replace Databricks spark-avro with native spark-avro #628
> -
>
> Key: HUDI-91
> URL: https://issues.apache.org/jira/browse/HUDI-91
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark Integration, Usability
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/issues/628] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
vinothchandar commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569980467
 
 
   >So, modify assumeDatePartitioning to ! assumeDatePartitioning is the best 
way to fix this issue.
   
   This is not going to fix this issue..  and we should not be changing this 
code.. It will cause side effects, like I mentioned before.
   
   Can we reproduce this issue in a unit test or sample code first? 
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #828: Synchronizing to hive partition is incorrect

2019-12-31 Thread GitBox
lamber-ken commented on issue #828: Synchronizing to hive partition is incorrect
URL: https://github.com/apache/incubator-hudi/issues/828#issuecomment-569979828
 
 
   > In that case, I am not sure why the code would not find the partitions it 
just wrote? is this S3? may be an eventual consistency issue?
   
   We can discuss about it here, 
https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569978944


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
lamber-ken edited a comment on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569978944
 
 
   > @lamber-ken I still don't fully understand the need for this fix.. Can you 
please summarize where we are?
   
   Hi @vinothchandar 
   
   As we know, hudi's partition supports `/mm/dd` form. If the partition 
data is `-mm-dd` form in fact data, user needs to  implements 
`PartitionValueExtractor`.
   
   From the definition of `HIVE_ASSUME_DATE_PARTITION_OPT_KEY`, it means that 
user needs to set it `true` if they customed the `PartitionValueExtractor`
   
   But, this variable is used incorrectly, it is a logic error. The right usage 
is `!assumeDatePartitioning`.
   
   So, modify `assumeDatePartitioning` to `! assumeDatePartitioning ` is the 
best way to fix this issue.
   
   
   
   **The definition of `HIVE_ASSUME_DATE_PARTITION_OPT_KEY`**
   Property: `hoodie.datasource.hive_sync.assume_date_partitioning`, Default: 
`false` 
   Assume partitioning is `/mm/dd`
   
   **FSUtils#getAllPartitionPaths**
   ```
   public static List getAllPartitionPaths(FileSystem fs, String 
basePathStr, boolean assumeDatePartitioning)
   throws IOException {
 if (assumeDatePartitioning) {
   return getAllPartitionFoldersThreeLevelsDown(fs, basePathStr);
 } else {
   return getAllFoldersWithPartitionMetaFile(fs, basePathStr);
 }
   }
   ```
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time

2019-12-31 Thread GitBox
lamber-ken commented on issue #1105: [WIP] [HUDI-405] Fix sync no hive 
partition at first time
URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569979402
 
 
   IMO, change `assumeDatePartitioning` to `! assumeDatePartitioning ` is the 
best way to fix this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >