[GitHub] [incubator-hudi] vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI with DeltaStreamer

2019-10-08 Thread GitBox
vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI 
with DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/917#issuecomment-539612532
 
 
   oh.. Good luck! and no need to apologize. Was just following up :) take your 
time 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI with DeltaStreamer

2019-10-08 Thread GitBox
vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI 
with DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/917#issuecomment-539598995
 
 
   @taherk77 Just a bump to make sure you got the last messages :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI with DeltaStreamer

2019-09-24 Thread GitBox
vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI 
with DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/917#issuecomment-534684983
 
 
   >So here is the kind of algorithm that I think of implementing
   
   steps 1 & 2 sound good to me. 
   >>but user mentions 5 mins interval that means jobs will keep on piling 
right?
   next batch wont be schedule until the first one completes. so there is 
backpressure there already to prevent pile up. 
   
   >>spark.read().jdbc("url,"table","someTableColumn",1,10,connectionProps)
   does not spark not already parallelize the pull? i.e allocate 1-2 to one 
executor, 2-3 to another and so on? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI with DeltaStreamer

2019-09-24 Thread GitBox
vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI 
with DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/917#issuecomment-534588368
 
 
   That seems like a flaky test? 
   ```
   Failed tests: 
 TestMergeOnReadTable.testRollbackWithDeltaAndCompactionCommit:421 
expected:<1> but was:<0>
   ```
   
   Hmmm. for now, you can restart the build on travis and it should go away.. 
have not seen travis be flaky in a while.. So if it persists, we can take it up 
separately 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI with DeltaStreamer

2019-09-24 Thread GitBox
vinothchandar commented on issue #917: [HUDI-251] JDBC incremental load to HUDI 
with DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/917#issuecomment-534587329
 
 
   >The other option is that when we set is_incremental as false then we pull 
all the data at once and then write it.
   
   this can be implemented by passing `--checkpoint null` or `--full-load` 
flag? This is actually a general issue for all sources.. it would be good to 
open a new JIRA for this and tackle separately.. For e.g, even if you have 
files on DFS, you want to probably have an option to do this.. For this PR, we 
can just focus on incremental pulling where the first run without checkpoint, 
pulls the entire table? 
   
   >>the interval we should be pulling the data every interval.
   On the interval, may be I was vague. apologies. What I meant was, the 
frequency at which we run DeltaStreamer is controlled by the user in 
non-continuous mode and #921 just added a flag to control this in continuous 
mode. Don't think we need to worry about it in this PR? 
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services