[GitHub] [incubator-pinot] chenboat commented on issue #4914: [POC] By-passing deep-store requirement for Realtime segment completion

2020-04-06 Thread GitBox
chenboat commented on issue #4914: [POC] By-passing deep-store requirement for 
Realtime segment completion
URL: https://github.com/apache/incubator-pinot/pull/4914#issuecomment-610080673
 
 
   Make sense. I thought you were talking about controller side change. I think 
the second option (i.e., pluggable committer in config) looks clearer. I will 
add these to the design doc. Thanks. @mcvsubbu 
   > The config change in step 5 is because we want the segment split-committer 
to look like this:
   > 
   > ```
   > doCommitStart();
   > url = uploadSegmentToDeepStore();
   > if (url == null) { // upload failed, or no deep store configured
   >   if (tableConfig.commitEvenIfUploadFailed) {
   > segmentUrlForCommitEnd = PEER_URL
   >   } else {
   > throw CommitFailedException
   >   }
   > } else {
   >   segmentUrilForCommitEnd = url
   > }
   > doCommitEnd();
   > ```
   > 
   > The config is needed in order to decide the action when upload returns 
null url.
   > 
   > Another way to do it is to configure a new class in `StreamConfig` section 
and use that class to bypass deep-store. Them you can leave the main 
split-committer as is, and build another class that can be configured in.
   > 
   > Either way, you need a config.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] chenboat commented on issue #4914: [POC] By-passing deep-store requirement for Realtime segment completion

2020-04-06 Thread GitBox
chenboat commented on issue #4914: [POC] By-passing deep-store requirement for 
Realtime segment completion
URL: https://github.com/apache/incubator-pinot/pull/4914#issuecomment-610066213
 
 
   
   > In phase-2, we will do:
   > 
   > 1. Intoduce a table config (inside stream config? or maybe in indexing 
config?) a boolean on whether to abort split commit if upload to deep store 
fails. This config is not used anywhere as yet. Since this is an architectural 
change, be sure to email dev mailing list, and also post in slack. Get 
consensus on where the config should go, and what exactly it means, naming, 
etc. These things can take time, and are not changeable once introduced.
   
   What is the purpose of this new config? It looks like we can achieve the 
same goal by adding a new parameter to control if to upload segment to 
controller? This is what I have done in the POC. The new segment committer in 
2/3 above can invoke this new endpoint.  
https://github.com/apache/incubator-pinot/pull/4914/files#diff-f0bd4d0172521b3f6bd52d4ebe12da2b
   
   Shouldn't changing the LLC protocol endpoint be easier than adding a new 
table config?
   
   > 2. Change the controller to accept a new URL in the protocol (I suggest we 
insert the following when segment upload fails: 
"peer://peer/URI_INTRODUCED_IN_STEP_1". Or, simply the word in all caps "PEER". 
I like the previous option, because it is in URI format, and the controller can 
do one check with the URL decoder for the scheme. When the controller detects 
peer, it should not set the download URL in  the metadata (or, insert a null, 
but I prefer NO url  in metadata). Add unit tests to cover this case. Since the 
server never inserts PEER, we should be good so far.
   
   I see. So  Pinot servers will not return an empty segment location to 
controller, right?
   
   > 3. Change the server to include logic to fetch from peer (using http or 
https fetcher as configured) IF the fetch from deep store fails, include 
retries as discussed in phone conference. (i.e. try deep store, if. it fails, 
try peer, if that fails, try the whole sequence again, N times). Try not to 
hook up this logic as yet. Add unit tests
   > 4. Change realtime segment fixer in the controller to fix segments that 
are not there in deep store. Add integration tests.
   > 5. Hook up logic in 6, and change the split committer to use the table 
config, add an integration test
   > 
   > You are done at this point
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] chenboat commented on issue #4914: [POC] By-passing deep-store requirement for Realtime segment completion

2020-02-12 Thread GitBox
chenboat commented on issue #4914: [POC] By-passing deep-store requirement for 
Realtime segment completion
URL: https://github.com/apache/incubator-pinot/pull/4914#issuecomment-585435355
 
 
   > Is this POC going to be merged, or are we going to split things as per the 
design doc?
   
   The POC will remain but I will remove the non-controller changes from it. I 
will create separate PRs (one for server side and the other for 
validationManager ). Most of the reviews in this PR is about controller changes 
and I would like to keep them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] chenboat commented on issue #4914: [POC] By-passing deep-store requirement for Realtime segment completion

2019-12-20 Thread GitBox
chenboat commented on issue #4914: [POC] By-passing deep-store requirement for 
Realtime segment completion
URL: https://github.com/apache/incubator-pinot/pull/4914#issuecomment-568131589
 
 
   > Can you add some more details in your design doc? Specifically:
   I will add these details to the design doc.
   > 
   > 1. The algorithm on the server side during commit.  Correct me if I am 
wrong, but I think you want to commit the metadata, and try uploading the 
segment to its destnation (whether pinot-fs or controller) in the background. 
Am I right? Or, is it that you attempt to upload the segment, and only if it 
fails, you commit the metadata and move on? (I think we should think through 
this, the latter seems better to me).
   
   The codes tried the first approach: i.e., if the enableUploadToController is 
set as false, the server will first commit metadata to the controller, and if 
the commit is successful, do a background upload to the PinotFS.
   
   > 2. Document the error handling scenario. What happens when the segment 
upload fails but the commit succeeds ? (the use case for which we are building 
all this). What happens if another server gets a "download" but the committer 
has not gotten to ONLINE state yet? I think there are many states here that we 
need to design correctly.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] chenboat commented on issue #4914: [POC] By-passing deep-store requirement for Realtime segment completion

2019-12-20 Thread GitBox
chenboat commented on issue #4914: [POC] By-passing deep-store requirement for 
Realtime segment completion
URL: https://github.com/apache/incubator-pinot/pull/4914#issuecomment-568125831
 
 
   > Left some comments. My main concern is that in a lot of places we resort 
to WARN instead of aborting the workflow. Might be better to think about 
failure scenarios and how they will be handled / corrected ?
   
   The WARN message was misplaced to allow me to verify the behaviour during 
integration tests. I will change them to INFO. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] chenboat commented on issue #4914: [POC] By-passing deep-store requirement for Realtime segment completion

2019-12-20 Thread GitBox
chenboat commented on issue #4914: [POC] By-passing deep-store requirement for 
Realtime segment completion
URL: https://github.com/apache/incubator-pinot/pull/4914#issuecomment-568125409
 
 
   > Looking at the POC so far, I think we might need to start a doc to discuss 
what's the best approach to design the workflow for using peer storage, so far 
I have a few things that I feel we should chat a bit more:
   > 
   > 1. backward compatibility and abstract of logics
   >Right now looking at the code it seems that uploading to pinotFS or 
just using peer is controlled by a boolean value/hardcoded into pinot server 
logics. It may work for POC purpose, but maybe if we should think about how to 
structure the codes such that we can preserve the backward compatiliby of such 
logics. For example, we could implement these logics as a new pinotFS 
(local-disk/peer disk/etc) and add a new protocol (say, naming it as 
peer://..). With this change, we can move all of these extra logics to some 
separated class and plug them into pinot server if we desired to use this model 
while keeping the existing models work fine.
   
   The design doc 
[https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion](url)
 has a section discussing about back compatibility issues. In a high level, we 
will upgrade controller first with a new optional field to allow servers skip 
uploading segments. After that, we will perform the server upgrade.
   
   > 2. extending the logics of segment completion protocols
   >We used to have deep storage to guarantee the safety of our data 
(multiple-replications of data in deep storage once commit is done). the new 
model seems to only have one copy of data once the segment commit is done. Are 
we going to work on this part and enhance our existing segment completion 
protocol so we can have more than one replica of data available before we 
decide a segment commit is finished? we should evaluate a couple of possible 
approaches and think about how to extend the segment completion 
protocols/managers to make this work
   
   The current solution allows you either have 1 copy of segment data (if there 
is no configured deep storage) or more than 1 copy when a deep storage is 
configured.
   
   > 3. coupling between pinot server components and tableDataManager
   >In this new change, we are adding more coupling between 
tableDataManager and pinot server with the new member such as helixAdmin or 
cluster name. We probably want to abstract those members out to some other 
components such that we can avoid the explicit coupling between two components. 
For example, we can create a class to pass that information around make 
different tableDataManager take in different implementation of such wrapper 
class so we can decouple multiple components.
   
   Let me look into this abstraction issue.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org