[GitHub] [incubator-hudi] yanghua commented on a change in pull request #896: Updating site to reflect recent doc changes

2019-09-18 Thread GitBox
yanghua commented on a change in pull request #896: Updating site to reflect 
recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#discussion_r325538985
 
 

 ##
 File path: content/404.html
 ##
 @@ -6,25 +6,25 @@
 
 
 Page Not Found | Hudi
-
+
 
 Review comment:
   OK @bhasudha You can fix it, just remove the first `/` and verify with the 
Jekyll to see if it works fine or not. Any problems, please let me know.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #898: Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time

2019-09-18 Thread GitBox
pratyakshsharma commented on issue #898: Caused by: 
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time
URL: https://github.com/apache/incubator-hudi/issues/898#issuecomment-532554181
 
 
   I am using Hadoop 3.1.0 with Hoodie-0.4.7, actually the constructor has 
changed in hadoop 3.x versions for FSDataOutputStream class. This is a possible 
duplicate of - https://www.mail-archive.com/dev@hudi.apache.org/msg00286.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 closed issue #888: Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path

2019-09-18 Thread GitBox
HariprasadAllaka1612 closed issue #888: Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path
URL: https://github.com/apache/incubator-hudi/issues/888
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 opened a new issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
HariprasadAllaka1612 opened a new issue #905: S3 folder paths messed up when 
running from Windows
URL: https://github.com/apache/incubator-hudi/issues/905
 
 
   I am running my code from Windows machine to push data to S3. When i am 
trying to write the data i am having an error where stats were not able to be 
found as i am passing stats as null in 
   
   public SizeAwareFSDataOutputStream(FSDataOutputStream out, Runnable 
   closeCallback)
 throws IOException {
   super(out, null);
   this.closeCallback = closeCallback;
 }
   
   I am correcting this issue by commenting the metrics collection part of 
HoodieWriteClient.finalizemetrics
   
   But the problem is due to this failure the cleanFailedWrites method is 
failing. This is expecting the path to be Linux based.
   
   SLF4J: Class path contains multiple SLF4J bindings.
   SLF4J: Found binding in 
[jar:file:/C:/Users/HariprasadAllaka/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: Found binding in 
[jar:file:/C:/Users/HariprasadAllaka/.m2/repository/com/github/HariprasadAllaka1612/incubator-hudi/hudi-timeline-server-bundle/playngoplatform-hoodie-0.4.7-gcde16ad-114/hudi-timeline-server-bundle-playngoplatform-hoodie-0.4.7-gcde16ad-114.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.
   SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
   Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:66)
   Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to 
complete commit 20190918145332 due to finalize errors.
at 
org.apache.hudi.HoodieWriteClient.finalizeWrite(HoodieWriteClient.java:1312)
at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:529)
at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:501)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at 
com.playngodataengg.scala.dao.DataAccessS3.writeDataToRefinedHudiS3(DataAccessS3.scala:38)
at 
com.playngodataengg.scala.controller.GameAndProviderDataTransform.processData(GameAndProviderDataTransform.scala:48)
at 

[jira] [Commented] (HUDI-257) Unit tests intermittently failing

2019-09-18 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932412#comment-16932412
 ] 

leesf commented on HUDI-257:


Fixed via master: 2c6da09d9d17f33ebc025c9ec9fa949605288bb7

> Unit tests intermittently failing 
> --
>
> Key: HUDI-257
> URL: https://issues.apache.org/jira/browse/HUDI-257
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: BALAJI VARADARAJAN
>Assignee: BALAJI VARADARAJAN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestCopyOnWriteTable.testUpdateRecords:170 » HoodieIO Failed to read footer 
> fo...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] HariprasadAllaka1612 opened a new pull request #904: #898 - Changes made to fix tjhe bug wuth the Hadoop version

2019-09-18 Thread GitBox
HariprasadAllaka1612 opened a new pull request #904: #898 - Changes made to fix 
tjhe bug wuth the Hadoop version
URL: https://github.com/apache/incubator-hudi/pull/904
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 opened a new pull request #904: #898 - Changes made to fix tjhe bug wuth the Hadoop version

2019-09-18 Thread GitBox
HariprasadAllaka1612 opened a new pull request #904: #898 - Changes made to fix 
tjhe bug wuth the Hadoop version
URL: https://github.com/apache/incubator-hudi/pull/904
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 closed pull request #904: #898 - Changes made to fix tjhe bug wuth the Hadoop version

2019-09-18 Thread GitBox
HariprasadAllaka1612 closed pull request #904: #898 - Changes made to fix tjhe 
bug wuth the Hadoop version
URL: https://github.com/apache/incubator-hudi/pull/904
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-262) Update Hudi website to reflect change in InputFormat Class name

2019-09-18 Thread BALAJI VARADARAJAN (Jira)
BALAJI VARADARAJAN created HUDI-262:
---

 Summary: Update Hudi website to reflect change in InputFormat 
Class name
 Key: HUDI-262
 URL: https://issues.apache.org/jira/browse/HUDI-262
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: asf-migration
Reporter: BALAJI VARADARAJAN






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-262) Update Hudi website to reflect change in InputFormat Class name

2019-09-18 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN reassigned HUDI-262:
---

Assignee: BALAJI VARADARAJAN

> Update Hudi website to reflect change in InputFormat Class name
> ---
>
> Key: HUDI-262
> URL: https://issues.apache.org/jira/browse/HUDI-262
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: asf-migration
>Reporter: BALAJI VARADARAJAN
>Assignee: BALAJI VARADARAJAN
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
vinothchandar commented on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532739103
 
 
   I have a slightly different strategy. We can move to [spark 
2.4](https://github.com/apache/spark/blob/branch-2.4/pom.xml) and match its 
parquet (1.10.1), avro  
   (1.8.2) versions. Match parquet-avro to parquet version. As long as Spark 
2.3 can work with parquet-avro 1.10.1, (we bundle this), it should be fine? 
   
   Also @umehrot2 , is supporting 2.3 a must or can we drop Hudi support for 
lower than 2.4 versions? Hudi community is ok per se to just support 2.4. if 
so, then we can also drop com.databricks from the code and use 
org.apache.spark.avro (which is only in version 2.4) 
   
   cc @bvaradar @bhasudha who are looking into the spark 2.4 move
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 edited a comment on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
HariprasadAllaka1612 edited a comment on issue #905: S3 folder paths messed up 
when running from Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532742261
 
 
   @vinothchandar I think i confused you here. 
   
   When the data is being written to S3 in Hudi format. The file structure is 
totally fine
   
   /gat-datalake-raw-dev/Games2/.hoodie/.temp
   
   But when trying to clean-up the .temp folder is being moved out from above 
mentioned directory and directly going to based path as 
/gat-datalake-raw-dev/Games2\.hoodie\.temp\20190918173239
   
   My understanding is
   
   The code below
   
   public String getTempFolderPath() {
   return basePath + File.separator + TEMPFOLDER_NAME;
 }
   
 /**
  *  Returns Marker folder path
   * @param instantTs Instant Timestamp
  * @return
  */
 public String getMarkerFolderPath(String instantTs) {
   return String.format("%s%s%s", getTempFolderPath(), File.separator, 
instantTs);
 }
   
   is taking File.seperator based on the local machine not based on DFS and 
forming a seperator \ for windows
   
   This might be the issue i am changing it now to below code to see if it 
works 
   
   public String getTempFolderPath() {
   return basePath + "/" + TEMPFOLDER_NAME;
 }
   
 /**
  *  Returns Marker folder path
   * @param instantTs Instant Timestamp
  * @return
  */
 public String getMarkerFolderPath(String instantTs) {
   return String.format("%s%s%s", getTempFolderPath(), "/", instantTs);
 }


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #888: Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path

2019-09-18 Thread GitBox
vinothchandar commented on issue #888: Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path
URL: https://github.com/apache/incubator-hudi/issues/888#issuecomment-532733626
 
 
   @HariprasadAllaka1612 were you able to resolve it?  curious to know what the 
issue was 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated (4072c83 -> 2d41bc1)

2019-09-18 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 4072c83  [DOCS] : Add ApacheCon talk to powered_by page
 add 2d41bc1  [docs][chinese] update permalink for translated 
pages(quickstart.cn.md, use_cases.cn.md)

No new revisions were added by this update.

Summary of changes:
 docs/quickstart.cn.md | 2 +-
 docs/use_cases.cn.md  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)



[GitHub] [incubator-hudi] vinothchandar merged pull request #900: [docs][chinese] update permalink for translated pages(quickstart.cn.md, use_cases.cn.md)

2019-09-18 Thread GitBox
vinothchandar merged pull request #900: [docs][chinese] update permalink for 
translated pages(quickstart.cn.md, use_cases.cn.md)
URL: https://github.com/apache/incubator-hudi/pull/900
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 commented on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
HariprasadAllaka1612 commented on issue #905: S3 folder paths messed up when 
running from Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532742261
 
 
   @vinothchandar I think i confused you here. 
   
   When the data is being written to S3 in Hudi format. The file structure is 
totally fine
   
   /gat-datalake-raw-dev/Games2/.hoodie/.temp
   
   But when trying to clean-up the .temp folder is being moved out from above 
mentioned directory and directly going to based path as 
/gat-datalake-raw-dev/Games2\.hoodie\.temp\20190918173239
   
   I don't understand why is this happening


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #899: [HUDI-228] Contributing page updated to include JIRA guidelines

2019-09-18 Thread GitBox
vinothchandar commented on a change in pull request #899: [HUDI-228] 
Contributing page updated to include JIRA guidelines
URL: https://github.com/apache/incubator-hudi/pull/899#discussion_r325730246
 
 

 ##
 File path: docs/contributing.md
 ##
 @@ -35,6 +35,10 @@ Here's a typical lifecycle of events to contribute to Hudi.
  - [Optional] If you want to get involved, but don't have a project in mind, 
please check JIRA for small, quick-starters.
  - [Optional] Familiarize yourself with internals of Hudi using content on 
this page, as well as [wiki](https://cwiki.apache.org/confluence/display/HUDI)
  - Once you finalize on a project/task, please open a new JIRA or assign an 
existing one to yourself. (If you don't have perms to do this, please email the 
dev mailing list with your JIRA id and a small intro for yourself. We'd be 
happy to add you as a contributor)
+ - While raising a new JIRA or updating an existing one, please make sure to 
do the following
+   - The issue type and versions (when resolving the ticket) are set correctly
+   - Summary should be descriptive enough to catch the essence of the problem/ 
feature
+   - Capture the version of Hoodie/Spark/Hive/Hadoop/Cloud environments in the 
ticket
 
 Review comment:
   can we also add 
` - Whenever possible, provide steps to reproduce via sample code or on the 
[docker setup](https://hudi.apache.org/docker_demo.html)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
vinothchandar commented on issue #905: S3 folder paths messed up when running 
from Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532734490
 
 
   Not following... the paths on S3 , HDFS (DFS) would be linux like, right? 
no? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 commented on issue #888: Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path

2019-09-18 Thread GitBox
HariprasadAllaka1612 commented on issue #888: Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path
URL: https://github.com/apache/incubator-hudi/issues/888#issuecomment-532736609
 
 
   @vinothchandar The issue was fixed using the hadoop 3.x version. i followed 
the email link here
   
   https://github.com/apache/incubator-hudi/issues/898#issuecomment-532554181


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #891: Metrics only reported upon shutdown

2019-09-18 Thread GitBox
bvaradar commented on issue #891: Metrics only reported upon shutdown
URL: https://github.com/apache/incubator-hudi/issues/891#issuecomment-532781725
 
 
   @jaguarx : When you use the higher-level DeltaStreamer in continuous mode, 
you will notice metrics emission for every ingestion/compaction run. Let us 
know if you are interested in any metric that we are not currently capturing. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
bvaradar commented on issue #905: S3 folder paths messed up when running from 
Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532788197
 
 
   @HariprasadAllaka1612 : We do not have windows setup ready. So, it would be 
great if you can have a crack at it first and see if there are any simple fixes 
similar to the FIle.Separator one you pointed. Let us know if you need help 
debugging this further though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-262) Update Hudi website to reflect change in InputFormat Class name

2019-09-18 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN reassigned HUDI-262:
---

Assignee: Bhavani Sudha Saktheeswaran  (was: BALAJI VARADARAJAN)

> Update Hudi website to reflect change in InputFormat Class name
> ---
>
> Key: HUDI-262
> URL: https://issues.apache.org/jira/browse/HUDI-262
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: asf-migration
>Reporter: BALAJI VARADARAJAN
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] HariprasadAllaka1612 commented on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
HariprasadAllaka1612 commented on issue #905: S3 folder paths messed up when 
running from Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532758920
 
 
   @vinothchandar Even this is failing. It set the path properly now but its 
failing with same error


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #891: Metrics only reported upon shutdown

2019-09-18 Thread GitBox
vinothchandar commented on issue #891: Metrics only reported upon shutdown
URL: https://github.com/apache/incubator-hudi/issues/891#issuecomment-532784301
 
 
   @bvaradar interesting.. so metrics is reported on close() of the 
writeClient? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 commented on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
HariprasadAllaka1612 commented on issue #905: S3 folder paths messed up when 
running from Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532784878
 
 
   > @HariprasadAllaka1612 : Can you run Hudi unit tests (mvn test) in your 
windows setup (without S3) and see if all the tests are passing. This way, it 
would be easier to catch a broader range of issues that going piece-meal
   
   Tests are failing in Windows local machine. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
bvaradar commented on issue #905: S3 folder paths messed up when running from 
Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532762952
 
 
   @HariprasadAllaka1612 : Can you run Hudi unit tests (mvn test) in your 
windows setup (without S3) and see if all the tests are passing. This way, it 
would be easier to catch a broader range of issues that going piece-meal 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #891: Metrics only reported upon shutdown

2019-09-18 Thread GitBox
bvaradar commented on issue #891: Metrics only reported upon shutdown
URL: https://github.com/apache/incubator-hudi/issues/891#issuecomment-532785507
 
 
   @vinothchandar : Guess, I was not clear. The metrics are reported for every 
commit. So, we will see multiple reports when running in continuous mode


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
vinothchandar commented on issue #905: S3 folder paths messed up when running 
from Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532800878
 
 
   and let move the conversation to 
https://issues.apache.org/jira/browse/HUDI-263 , now that we know this is a 
legit gap. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar closed issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
vinothchandar closed issue #905: S3 folder paths messed up when running from 
Windows
URL: https://github.com/apache/incubator-hudi/issues/905
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-263) Windows support for Hudi writing

2019-09-18 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-263:
---

 Summary: Windows support for Hudi writing
 Key: HUDI-263
 URL: https://issues.apache.org/jira/browse/HUDI-263
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Write Client
Reporter: Vinoth Chandar


https://github.com/apache/incubator-hudi/issues/905



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bhasudha commented on issue #896: Updating site to reflect recent doc changes

2019-09-18 Thread GitBox
bhasudha commented on issue #896: Updating site to reflect recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#issuecomment-532751477
 
 
   Sounds good. Thanks @yanghua . I ll reach out to you about testing Jekyll
   if I have any questions.
   
   On Wed, Sep 18, 2019 at 1:14 AM vinoyang  wrote:
   
   > *@yanghua* commented on this pull request.
   > --
   >
   > In content/404.html
   > :
   >
   > > @@ -6,25 +6,25 @@
   >  
   >  
   >  Page Not Found | Hudi
   > -
   > +
   >
   > OK @bhasudha  You can fix it, just remove
   > the first / and verify with the Jekyll to see if it works fine or not.
   > Any problems, please let me know.
   >
   > —
   > You are receiving this because you were mentioned.
   >
   >
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 edited a comment on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
HariprasadAllaka1612 edited a comment on issue #905: S3 folder paths messed up 
when running from Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532758920
 
 
   @vinothchandar Even this is failing. It set the path properly now but its 
failing with same error. But i see the data stored now in read-optimized format.
   
   Will need to further dig if i need to change something in basepath as well


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #905: S3 folder paths messed up when running from Windows

2019-09-18 Thread GitBox
vinothchandar commented on issue #905: S3 folder paths messed up when running 
from Windows
URL: https://github.com/apache/incubator-hudi/issues/905#issuecomment-532783717
 
 
   +1  then please open a JIRA for windows support and we can continue there. I 
don't think we have ever tested on windows.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
vinothchandar commented on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532809834
 
 
   @umehrot2  I think balaji has his hands full with the release atm. Do you 
have bandwidth to try moving to spark 2.4 and do these changes on top? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar merged pull request #906: [HUDI-262] Update website to reflect change in InputFormat class name

2019-09-18 Thread GitBox
vinothchandar merged pull request #906: [HUDI-262] Update website to reflect 
change in InputFormat class name
URL: https://github.com/apache/incubator-hudi/pull/906
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated: [HUDI-262] Update website to reflect name change in InputFormat class name

2019-09-18 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new edc13c1  [HUDI-262] Update website to reflect name change in 
InputFormat class name
edc13c1 is described below

commit edc13c125f2bebb147fd21eb192b8a50ad568ec8
Author: Bhavani Sudha Saktheeswaran 
AuthorDate: Wed Sep 18 12:32:40 2019 -0700

[HUDI-262] Update website to reflect name change in InputFormat class name
---
 docs/README.md   | 2 +-
 docs/querying_data.cn.md | 6 +++---
 docs/querying_data.md| 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/README.md b/docs/README.md
index 4307a6a..74c78e1 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -5,7 +5,7 @@ This folder contains resources that build the [Apache Hudi 
website](https://hudi
 
 ### Building docs
 
-The site is based on a [Jekyll](https://jekyllrb.com/) theme hosted 
[here](idratherbewriting.com/documentation-theme-jekyll/) with detailed 
instructions.
+The site is based on a [Jekyll](https://jekyllrb.com/) theme hosted 
[here](https://idratherbewriting.com/documentation-theme-jekyll/) with detailed 
instructions.
 
  Docker
 
diff --git a/docs/querying_data.cn.md b/docs/querying_data.cn.md
index 3a6fd0f..1653b08 100644
--- a/docs/querying_data.cn.md
+++ b/docs/querying_data.cn.md
@@ -14,8 +14,8 @@ bundle has been provided, the dataset can be queried by 
popular query engines li
 Specifically, there are two Hive tables named off [table 
name](configurations.html#TABLE_NAME_OPT_KEY) passed during write. 
 For e.g, if `table name = hudi_tbl`, then we get  
 
- - `hudi_tbl` realizes the read optimized view of the dataset backed by 
`HoodieInputFormat`, exposing purely columnar data.
- - `hudi_tbl_rt` realizes the real time view of the dataset  backed by 
`HoodieRealtimeInputFormat`, exposing merged view of base and log data.
+ - `hudi_tbl` realizes the read optimized view of the dataset backed by 
`HoodieParquetInputFormat`, exposing purely columnar data.
+ - `hudi_tbl_rt` realizes the real time view of the dataset  backed by 
`HoodieParquetRealtimeInputFormat`, exposing merged view of base and log data.
 
 As discussed in the concepts section, the one key primitive needed for 
[incrementally 
processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
 is `incremental pulls` (to obtain a change stream/log from a dataset). Hudi 
datasets can be pulled incrementally, which means you can get ALL and ONLY the 
updated & new rows 
@@ -33,7 +33,7 @@ classes with its dependencies are available for query 
planning & execution.
 
 ### Read Optimized table {#hive-ro-view}
 In addition to setup above, for beeline cli access, the `hive.input.format` 
variable needs to be set to the  fully qualified path name of the 
-inputformat `org.apache.hudi.hadoop.HoodieInputFormat`. For Tez, additionally 
the `hive.tez.input.format` needs to be set 
+inputformat `org.apache.hudi.hadoop.HoodieParquetInputFormat`. For Tez, 
additionally the `hive.tez.input.format` needs to be set 
 to `org.apache.hadoop.hive.ql.io.HiveInputFormat`
 
 ### Real time table {#hive-rt-view}
diff --git a/docs/querying_data.md b/docs/querying_data.md
index 3a6fd0f..1653b08 100644
--- a/docs/querying_data.md
+++ b/docs/querying_data.md
@@ -14,8 +14,8 @@ bundle has been provided, the dataset can be queried by 
popular query engines li
 Specifically, there are two Hive tables named off [table 
name](configurations.html#TABLE_NAME_OPT_KEY) passed during write. 
 For e.g, if `table name = hudi_tbl`, then we get  
 
- - `hudi_tbl` realizes the read optimized view of the dataset backed by 
`HoodieInputFormat`, exposing purely columnar data.
- - `hudi_tbl_rt` realizes the real time view of the dataset  backed by 
`HoodieRealtimeInputFormat`, exposing merged view of base and log data.
+ - `hudi_tbl` realizes the read optimized view of the dataset backed by 
`HoodieParquetInputFormat`, exposing purely columnar data.
+ - `hudi_tbl_rt` realizes the real time view of the dataset  backed by 
`HoodieParquetRealtimeInputFormat`, exposing merged view of base and log data.
 
 As discussed in the concepts section, the one key primitive needed for 
[incrementally 
processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
 is `incremental pulls` (to obtain a change stream/log from a dataset). Hudi 
datasets can be pulled incrementally, which means you can get ALL and ONLY the 
updated & new rows 
@@ -33,7 +33,7 @@ classes with its dependencies are available for query 
planning & execution.
 
 ### Read Optimized table {#hive-ro-view}
 In addition to setup above, for beeline cli access, the `hive.input.format` 
variable needs to be set to the  fully qualified path name of the 
-inputformat `org.apache.hudi.hadoop.HoodieInputFormat`. For 

[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532808342
 
 
   @vinothchandar At EMR we do not have a use-case to support Spark 2.3 or 
earlier. We would be offering Hudi starting with our latest release which has 
Spark 2.4.3. Anything earlier than this we would not be supporting.
   
   So, it might be a good idea to just move to 2.4 and drop support for earlier 
versions.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on issue #896: Updating site to reflect recent doc changes

2019-09-18 Thread GitBox
bhasudha commented on issue #896: Updating site to reflect recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#issuecomment-532819996
 
 
   @yanghua I bumped into some issues and may need your help here. Here are my 
observations.
   
   When I replace all the css, images and js paths with the first '/' removed, 
the 'en' website seems to be working fine but the 'cn' website styling is 
broken. On further inspection in the generate  docs/_site I see that the 
structure of the folders is a default folder that has all the pages for 'en' 
and a sub folder for 'cn' that has the same pages in Chinese. However, the 
footer.html, head.html and topnav.html under docs/_includes used for site 
generation is shared across languages. So when using absolute path the cn would 
work because it would refer to _site/css/*.css files whereas when using 
relative path the base url is appended like _site/cn/css/*.css ( But there is 
no /cn/css). This is the issue. 
   
   I dont know much about jkeyll. I was trying to figure out a similar folder 
structure for making changes. Something like this - 
http://chocanto.me/2016/04/16/jekyll-multilingual.html sounds similar to how 
you are setting up ? Do you have any pointers on how to fix this ? 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-264) Ensure website for both english and chinese is rendered properly with styling scripts and images

2019-09-18 Thread Bhavani Sudha Saktheeswaran (Jira)
Bhavani Sudha Saktheeswaran created HUDI-264:


 Summary: Ensure website for both english and chinese is rendered 
properly with styling scripts and images
 Key: HUDI-264
 URL: https://issues.apache.org/jira/browse/HUDI-264
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Docs, docs-chinese, Usability
Reporter: Bhavani Sudha Saktheeswaran
Assignee: vinoyang


Currently, We are seeing some issue with the path to styling scripts behaving 
differently for English and Chinese site. This is mostly folder structures for 
multi language setup and shared header, footer and topnav scripts. 

 

This can be seen by opening the generated content locally eg: 
content/admin_guide.html or content/cn/admin_guide/html from browser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-262) Update Hudi website to reflect change in InputFormat Class name

2019-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-262:

Labels: pull-request-available  (was: )

> Update Hudi website to reflect change in InputFormat Class name
> ---
>
> Key: HUDI-262
> URL: https://issues.apache.org/jira/browse/HUDI-262
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: asf-migration
>Reporter: BALAJI VARADARAJAN
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532810711
 
 
   > @umehrot2 I think balaji has his hands full with the release atm. Do you 
have bandwidth to try moving to spark 2.4 and do these changes on top?
   
   Sure. I will take this up then.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on issue #896: Updating site to reflect recent doc changes

2019-09-18 Thread GitBox
bhasudha commented on issue #896: Updating site to reflect recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#issuecomment-532830128
 
 
   @yanghua I created a JIRA issue separately since this involves little more 
work - https://issues.apache.org/jira/browse/HUDI-264. I assigned it to you 
since you may be able to look into  it faster. If you dont have cycles, please 
assign it back to me. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha opened a new pull request #906: [HUDI-262] Update website to reflect change in InputFormat class name

2019-09-18 Thread GitBox
bhasudha opened a new pull request #906: [HUDI-262] Update website to reflect 
change in InputFormat class name
URL: https://github.com/apache/incubator-hudi/pull/906
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-263) Windows support for Hudi writing

2019-09-18 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932899#comment-16932899
 ] 

Vinoth Chandar commented on HUDI-263:
-


[~msubbu] guessing you are also interested in this? 

> Windows support for Hudi writing
> 
>
> Key: HUDI-263
> URL: https://issues.apache.org/jira/browse/HUDI-263
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Write Client
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/apache/incubator-hudi/issues/905



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532919914
 
 
   @vinothchandar @umehrot2 
   I've tried changing avro version to 1.8.2 in hudi pom.xml before. Spark 2.2 
or 2.3 don't use avro 1.8.2 jars in hoodie jar, It will use avro 1.7.7 first 
and I will still encounter the same mistake(report missing logical type class 
and so on).
   Maybe you can try using below codes in shell to test avro 1.8.2
   ```
   --conf spark.driver.userClassPathFirst=true 
   --conf spark.executor.userClassPathFirst=true
   ```
   OR do some change like hive dependence in spark-bundle 
   ```
   
   org.apache.avro
   com.apache.hudi.org.apache.avro
   
   ```
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #899: [HUDI-228] Contributing page updated to include JIRA guidelines

2019-09-18 Thread GitBox
vinothchandar commented on issue #899: [HUDI-228] Contributing page updated to 
include JIRA guidelines
URL: https://github.com/apache/incubator-hudi/pull/899#issuecomment-532892285
 
 
   weird that https://github.com/apache/incubator-hudi/pull/900 changes are not 
reflecting on apache:asf-site. Merging. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar merged pull request #899: [HUDI-228] Contributing page updated to include JIRA guidelines

2019-09-18 Thread GitBox
vinothchandar merged pull request #899: [HUDI-228] Contributing page updated to 
include JIRA guidelines
URL: https://github.com/apache/incubator-hudi/pull/899
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated: [HUDI-228] contributing page updated to include jira guidelines (#899)

2019-09-18 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new c20ad5a  [HUDI-228] contributing page updated to include jira 
guidelines (#899)
c20ad5a is described below

commit c20ad5aeaa1994162c3c2461f742a2c9f3d7882e
Author: pratyakshsharma <30863489+pratyakshsha...@users.noreply.github.com>
AuthorDate: Thu Sep 19 04:03:25 2019 +0530

[HUDI-228] contributing page updated to include jira guidelines (#899)
---
 docs/contributing.md | 5 +
 1 file changed, 5 insertions(+)

diff --git a/docs/contributing.md b/docs/contributing.md
index f79ef03..4c01795 100644
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -35,6 +35,11 @@ Here's a typical lifecycle of events to contribute to Hudi.
  - [Optional] If you want to get involved, but don't have a project in mind, 
please check JIRA for small, quick-starters.
  - [Optional] Familiarize yourself with internals of Hudi using content on 
this page, as well as [wiki](https://cwiki.apache.org/confluence/display/HUDI)
  - Once you finalize on a project/task, please open a new JIRA or assign an 
existing one to yourself. (If you don't have perms to do this, please email the 
dev mailing list with your JIRA id and a small intro for yourself. We'd be 
happy to add you as a contributor)
+ - While raising a new JIRA or updating an existing one, please make sure to 
do the following
+   - The issue type and versions (when resolving the ticket) are set correctly
+   - Summary should be descriptive enough to catch the essence of the problem/ 
feature
+   - Capture the version of Hoodie/Spark/Hive/Hadoop/Cloud environments in the 
ticket
+   - Whenever possible, provide steps to reproduce via sample code or on the 
[docker setup](https://hudi.apache.org/docker_demo.html)
  - Almost all PRs should be linked to a JIRA. Before you begin work, click 
"Start Progress" on the JIRA, which tells everyone that you are working on the 
issue actively.
  - Make your code change
- Every source file needs to include the Apache license header. Every new 
dependency needs to



[GitHub] [incubator-hudi] vinothchandar commented on issue #898: Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time

2019-09-18 Thread GitBox
vinothchandar commented on issue #898: Caused by: 
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time
URL: https://github.com/apache/incubator-hudi/issues/898#issuecomment-532896765
 
 
   accmulating hadoop 3 issues here 
https://issues.apache.org/jira/browse/HUDI-259 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-259) Hadoop 3 support for Hudi writing

2019-09-18 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932902#comment-16932902
 ] 

Vinoth Chandar commented on HUDI-259:
-

Good first step would be ensuring Hudi can compile against all of 2.7, 2.8, 
2.9, 3.0 .. 

> Hadoop 3 support for Hudi writing
> -
>
> Key: HUDI-259
> URL: https://issues.apache.org/jira/browse/HUDI-259
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Vinoth Chandar
>Priority: Major
>
> Sample issues
>  
> [https://github.com/apache/incubator-hudi/issues/735]
> [https://github.com/apache/incubator-hudi/issues/877#issuecomment-528433568] 
> [https://github.com/apache/incubator-hudi/issues/898]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-259) Hadoop 3 support for Hudi writing

2019-09-18 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932902#comment-16932902
 ] 

Vinoth Chandar edited comment on HUDI-259 at 9/18/19 10:54 PM:
---

Good first step would be ensuring Hudi can compile against all of 2.7, 2.8, 
2.9, 3.0 and tests pass


was (Author: vc):
Good first step would be ensuring Hudi can compile against all of 2.7, 2.8, 
2.9, 3.0 .. 

> Hadoop 3 support for Hudi writing
> -
>
> Key: HUDI-259
> URL: https://issues.apache.org/jira/browse/HUDI-259
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Vinoth Chandar
>Priority: Major
>
> Sample issues
>  
> [https://github.com/apache/incubator-hudi/issues/735]
> [https://github.com/apache/incubator-hudi/issues/877#issuecomment-528433568] 
> [https://github.com/apache/incubator-hudi/issues/898]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-91) Replace Databricks spark-avro with native spark-avro #628

2019-09-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-91:
--

Assignee: Udit Mehrotra  (was: Vinoth Chandar)

> Replace Databricks spark-avro with native spark-avro #628
> -
>
> Key: HUDI-91
> URL: https://issues.apache.org/jira/browse/HUDI-91
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, Usability
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>
> [https://github.com/apache/incubator-hudi/issues/628] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-91) Replace Databricks spark-avro with native spark-avro #628

2019-09-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-91:
--

Assignee: Udit Mehrotra  (was: Mathew Wicks)

> Replace Databricks spark-avro with native spark-avro #628
> -
>
> Key: HUDI-91
> URL: https://issues.apache.org/jira/browse/HUDI-91
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, Usability
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>
> [https://github.com/apache/incubator-hudi/issues/628] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-91) Replace Databricks spark-avro with native spark-avro #628

2019-09-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-91:
--

Assignee: Vinoth Chandar  (was: Udit Mehrotra)

> Replace Databricks spark-avro with native spark-avro #628
> -
>
> Key: HUDI-91
> URL: https://issues.apache.org/jira/browse/HUDI-91
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, Usability
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> [https://github.com/apache/incubator-hudi/issues/628] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
vinothchandar commented on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532898038
 
 
   sg. assigned you https://issues.apache.org/jira/browse/HUDI-91 . lets 
continue there 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-263) Windows support for Hudi writing

2019-09-18 Thread Subramanian Mohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramanian Mohan reassigned HUDI-263:
--

Assignee: Subramanian Mohan

> Windows support for Hudi writing
> 
>
> Key: HUDI-263
> URL: https://issues.apache.org/jira/browse/HUDI-263
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Write Client
>Reporter: Vinoth Chandar
>Assignee: Subramanian Mohan
>Priority: Major
>
> https://github.com/apache/incubator-hudi/issues/905



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532919914
 
 
   @vinothchandar @umehrot2 
   I've tried changing avro version to 1.8.2 in hudi pom.xml before. Spark 2.2 
or 2.3 don't use avro 1.8.2 jars in hoodie jar, It will use avro 1.7.7 first 
and I will still encounter the same mistake(report missing logical type class 
and so on).
   Maybe you can try using below codes in shell to test avro 1.8.2
   ```
   --conf spark.driver.userClassPathFirst=true 
   --conf spark.executor.userClassPathFirst=true
   ```
   OR do some change like hive dependence in spark-bundle 
   ```
   
   org.apache.avro
   com.apache.hudi.org.apache.avro
   
   ```
   In this way, we can be compatible with sparks 2.2, 2.3 and 2.4.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532919914
 
 
   @vinothchandar @umehrot2 
   I've tried changing avro version to 1.8.2 in hudi pom.xml before. Spark 2.2 
or 2.3 don't use avro 1.8.2 jars in hoodie jar, It will use avro 1.7.7 first 
and I will still encounter the same mistake(report missing logical type class 
and so on).
   Maybe you can try using below codes in shell to test avro 1.8.2
   ```
   --conf spark.driver.userClassPathFirst=true 
   --conf spark.executor.userClassPathFirst=true
   ```
   OR do some change like hive dependence in spark-bundle 
   ```
   
   org.apache.avro
   com.apache.hudi.org.apache.avro
   
   ```
   In this way, we can be compatible with spark 2.2, 2.3 and 2.4.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532921049
 
 
   I will continue to discuss this issue on JIRA later. 
   
   The version I'm running in the production environment now is the Hudi 0.4.8 
version with this PR added. If there are new changes, I can also do some 
experiments in my test environment.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-42) Fix table.getActionType for MergeOnRead table types #385

2019-09-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-42:
--

Assignee: Vinoth Chandar  (was: Nishith Agarwal)

> Fix table.getActionType for MergeOnRead table types #385
> 
>
> Key: HUDI-42
> URL: https://issues.apache.org/jira/browse/HUDI-42
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Write Client
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/385



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on issue #900: [docs][chinese] update permalink for translated pages(quickstart.cn.md, use_cases.cn.md)

2019-09-18 Thread GitBox
leesf commented on issue #900: [docs][chinese] update permalink for translated 
pages(quickstart.cn.md, use_cases.cn.md)
URL: https://github.com/apache/incubator-hudi/pull/900#issuecomment-532950448
 
 
   @vinothchandar you could check 
[code](https://github.com/leesf/leesf.github.io/blob/master/_posts/2019-07-22-Flink-On-Yarn.md)
 with "/" at the front(/tech/flink-on-yarn) and it works well 
[here](https://leesf.github.io/tech/flink-on-yarn.html)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua edited a comment on issue #896: Updating site to reflect recent doc changes

2019-09-18 Thread GitBox
yanghua edited a comment on issue #896: Updating site to reflect recent doc 
changes
URL: https://github.com/apache/incubator-hudi/pull/896#issuecomment-532974518
 
 
   > @yanghua I bumped into some issues and may need your help here. Here are 
my observations.
   > 
   > When I replace all the css, images and js paths with the first '/' 
removed, the 'en' website seems to be working fine but the 'cn' website styling 
is broken. On further inspection in the generate docs/_site I see that the 
structure of the folders is a default folder that has all the pages for 'en' 
and a sub folder for 'cn' that has the same pages in Chinese. However, the 
footer.html, head.html and topnav.html under docs/_includes used for site 
generation is shared across languages. So when using absolute path the cn would 
work because it would refer to _site/css/_.css files whereas when using 
relative path the base url is appended like _site/cn/css/_.css ( But there is 
no /cn/css). This is the issue.
   > 
   > I dont know much about jkeyll. I was trying to figure out a similar folder 
structure for making changes. Something like this - 
http://chocanto.me/2016/04/16/jekyll-multilingual.html sounds similar to how 
you are setting up ? Do you have any pointers on how to fix this ?
   
   @bhasudha It seems it's the reason  that why I add the '/' char. Let me try 
to fix the problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf edited a comment on issue #900: [docs][chinese] update permalink for translated pages(quickstart.cn.md, use_cases.cn.md)

2019-09-18 Thread GitBox
leesf edited a comment on issue #900: [docs][chinese] update permalink for 
translated pages(quickstart.cn.md, use_cases.cn.md)
URL: https://github.com/apache/incubator-hudi/pull/900#issuecomment-532950448
 
 
   @vinothchandar you could check 
[code](https://github.com/leesf/leesf.github.io/blob/master/_posts/2019-07-22-Flink-On-Yarn.md)
 with "/" at the front(/tech/flink-on-yarn) and it works well 
[here](https://leesf.github.io/tech/flink-on-yarn.html). But if not place "/" 
at the front 
[here](https://github.com/leesf/leesf.github.io/blob/master/_posts/2019-07-19-Flink-rpc-analysis.md),
 it works well again 
[here](https://leesf.github.io/tech/flink-rpc-analysis.html).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-09-18 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933015#comment-16933015
 ] 

Vinoth Chandar edited comment on HUDI-260 at 9/19/19 3:06 AM:
--

Tried to search around for this, it seems like any code that's used in a 
closure (i.e ones that spark will serialize from driver to executor) needs to 
be in --jars and not extraClassPath. I figure they use different class loaders. 
Other theory is if we use Spark Java APIs, then they need to be placed under 
the `jars` folder and thats the way to go.. My guess is Java lambda to scala to 
codegen fails someplace, if specified via extraClassPath

I am still looking.. but does not seem like an issue with how we are 
bundling/shading (there are no spark/scala jars there)  

Anyways, having trouble reproducing this error. For me, somehow its not even 
getting picked up (spark.jars works)

{code}
root@adhoc-2:/opt# cat /opt/spark/conf/spark-defaults.conf
...
spark.driver.extraClassPath  
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-spark-bundle.jar
spark.executor.extraClassPath
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-spark-bundle.jar

root@adhoc-2:/opt# $SPARK_INSTALL/bin/spark-shell --master local[2] 
--driver-class-path $HADOOP_CONF_DIR --conf 
spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  
--driver-memory 1G --executor-memory 3G --num-executors 1
19/09/19 02:54:34 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Spark context Web UI available at http://adhoc-2:4040
Spark context available as 'sc' (master = local[2], app id = 
local-1568861680731).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.1
  /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.hudi.DataSourceReadOptions;
:23: error: object hudi is not a member of package org.apache
   import org.apache.hudi.DataSourceReadOptions;
 ^

scala>
{code}

can you give me a reproducible setup on the demo containers? 


was (Author: vc):
Tried to search around for this, it seems like any code that's used in a 
closure (i.e ones that spark will serialize from driver to executor) needs to 
be in --jars and not extraClassPath. I figure they use different class loaders. 
I am still looking..  

Anyways, having trouble reproducing this error. For me, somehow its not even 
getting picked up (spark.jars works)

{code}
root@adhoc-2:/opt# cat /opt/spark/conf/spark-defaults.conf
...
spark.driver.extraClassPath  
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-spark-bundle.jar
spark.executor.extraClassPath
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-spark-bundle.jar

root@adhoc-2:/opt# $SPARK_INSTALL/bin/spark-shell --master local[2] 
--driver-class-path $HADOOP_CONF_DIR --conf 
spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  
--driver-memory 1G --executor-memory 3G --num-executors 1
19/09/19 02:54:34 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Spark context Web UI available at http://adhoc-2:4040
Spark context available as 'sc' (master = local[2], app id = 
local-1568861680731).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.1
  /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.hudi.DataSourceReadOptions;
:23: error: object hudi is not a member of package org.apache
   import org.apache.hudi.DataSourceReadOptions;
 ^

scala>
{code}

can you give me a reproducible setup on the demo containers? 

> Hudi Spark Bundle does not work when passed in extraClassPath option
> 
>
> Key: HUDI-260
> URL: https://issues.apache.org/jira/browse/HUDI-260
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> On EMR's 

[GitHub] [incubator-hudi] leesf edited a comment on issue #900: [docs][chinese] update permalink for translated pages(quickstart.cn.md, use_cases.cn.md)

2019-09-18 Thread GitBox
leesf edited a comment on issue #900: [docs][chinese] update permalink for 
translated pages(quickstart.cn.md, use_cases.cn.md)
URL: https://github.com/apache/incubator-hudi/pull/900#issuecomment-532950448
 
 
   @vinothchandar you could check 
[code](https://github.com/leesf/leesf.github.io/blob/master/_posts/2019-07-22-Flink-On-Yarn.md)
 with "/" at the front(/tech/flink-on-yarn) and it works well 
[here](https://leesf.github.io/tech/flink-on-yarn.html). But if not place "/" 
at the front 
[here](https://github.com/leesf/leesf.github.io/blob/master/_posts/2019-07-19-Flink-rpc-analysis.md),
 but it works well [here](https://leesf.github.io/tech/flink-rpc-analysis.html).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-83) Support for timestamp datatype in Hudi

2019-09-18 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933043#comment-16933043
 ] 

Vinoth Chandar commented on HUDI-83:


also assigning the timestamp related issues to you [~uditme] 

> Support for timestamp datatype in Hudi
> --
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>
> [https://github.com/apache/incubator-hudi/issues/543] ; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-83) Support for timestamp datatype in Hudi

2019-09-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-83:
--

Assignee: Udit Mehrotra  (was: BALAJI VARADARAJAN)

> Support for timestamp datatype in Hudi
> --
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>
> [https://github.com/apache/incubator-hudi/issues/543] ; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-09-18 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933015#comment-16933015
 ] 

Vinoth Chandar commented on HUDI-260:
-

Tried to search around for this, it seems like any code that's used in a 
closure (i.e ones that spark will serialize from driver to executor) needs to 
be in --jars and not extraClassPath. I figure they use different class loaders. 
I am still looking..  

Anyways, having trouble reproducing this error. For me, somehow its not even 
getting picked up (spark.jars works)

{code}
root@adhoc-2:/opt# cat /opt/spark/conf/spark-defaults.conf
...
spark.driver.extraClassPath  
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-spark-bundle.jar
spark.executor.extraClassPath
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-spark-bundle.jar

root@adhoc-2:/opt# $SPARK_INSTALL/bin/spark-shell --master local[2] 
--driver-class-path $HADOOP_CONF_DIR --conf 
spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  
--driver-memory 1G --executor-memory 3G --num-executors 1
19/09/19 02:54:34 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Spark context Web UI available at http://adhoc-2:4040
Spark context available as 'sc' (master = local[2], app id = 
local-1568861680731).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.1
  /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.hudi.DataSourceReadOptions;
:23: error: object hudi is not a member of package org.apache
   import org.apache.hudi.DataSourceReadOptions;
 ^

scala>
{code}

can you give me a reproducible setup on the demo containers? 

> Hudi Spark Bundle does not work when passed in extraClassPath option
> 
>
> Key: HUDI-260
> URL: https://issues.apache.org/jira/browse/HUDI-260
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> On EMR's side we have the same findings. *a + b + c +d* work in the following 
> cases:
>  * The bundle jar (with databricks-avro shaded) is specified using *--jars* 
> or *spark.jars* option
>  * The bundle jar (with databricks-avro shaded) is placed in the Spark Home 
> jars folder i.e. */usr/lib/spark/jars* folder
> However, it does not work if the jar is specified using 
> *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options 
> which is what EMR uses to configure external dependencies. Although we can 
> drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is 
> recommended because that folder is supposed to contain the jars coming from 
> spark. Extra dependencies from users side would be better off specified 
> through *extraClassPath* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #42

2019-09-18 Thread Apache Jenkins Server
See 


--
[...truncated 2.21 KB...]
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle[jar]
[INFO] hudi-hadoop-docker [pom]
[INFO] 

[jira] [Assigned] (HUDI-137) Hudi cleaning state changes should be consistent with commit actions

2019-09-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-137:
---

Assignee: Vinoth Chandar  (was: Bhavani Sudha Saktheeswaran)

> Hudi cleaning state changes should be consistent with commit actions
> 
>
> Key: HUDI-137
> URL: https://issues.apache.org/jira/browse/HUDI-137
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner
>Reporter: BALAJI VARADARAJAN
>Assignee: Vinoth Chandar
>Priority: Minor
>
> Currently, clean() action performs cleaning first and at then end creates 
> .clean file. The intention to clean (.clean.inflight) must be saved before 
> files are deleted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on issue #896: Updating site to reflect recent doc changes

2019-09-18 Thread GitBox
yanghua commented on issue #896: Updating site to reflect recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#issuecomment-532974518
 
 
   > @yanghua I bumped into some issues and may need your help here. Here are 
my observations.
   > 
   > When I replace all the css, images and js paths with the first '/' 
removed, the 'en' website seems to be working fine but the 'cn' website styling 
is broken. On further inspection in the generate docs/_site I see that the 
structure of the folders is a default folder that has all the pages for 'en' 
and a sub folder for 'cn' that has the same pages in Chinese. However, the 
footer.html, head.html and topnav.html under docs/_includes used for site 
generation is shared across languages. So when using absolute path the cn would 
work because it would refer to _site/css/_.css files whereas when using 
relative path the base url is appended like _site/cn/css/_.css ( But there is 
no /cn/css). This is the issue.
   > 
   > I dont know much about jkeyll. I was trying to figure out a similar folder 
structure for making changes. Something like this - 
http://chocanto.me/2016/04/16/jekyll-multilingual.html sounds similar to how 
you are setting up ? Do you have any pointers on how to fix this ?
   
   @bhasudha It seems it's the reason  that's why I add the '/' char. Let me 
try to fix the problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services