[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776946#comment-13776946 ] Roshan Naik commented on HIVE-5138: --- Capturing API related comments from [~ashutoshc] noted [here|https://issues.apache.org/jira/browse/HIVE-4196?focusedCommentId=13770314page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13770314] in HIVE-4196 {quote} We should try to eliminate the need of intermediate staging area while rolling on new partitions. Seems like there should not be any gotchas while moving data from streaming dir to partition dir directly. We should make thrift apis in metastore forward compatible. One way to do that is to use struct (which contains all parameters) instead of passing in list of arguments. We should try to leave TBLS table untouched in backend db. That will simplify upgrade story. One way to do that is to have all new columns in a new table and than add constraints for this new table. {quote} Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore, WebHCat Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch, HIVE-5138.v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776954#comment-13776954 ] Roshan Naik commented on HIVE-5138: --- bq. We should try to eliminate the need of intermediate staging area while rolling on new partitions. Seems like there should not be any gotchas while moving data from streaming dir to partition dir directly. Thanks. That change is already part of the patch. bq. We should make thrift apis in metastore forward compatible. One way to do that is to use struct (which contains all parameters) instead of passing in list of arguments. Hate it .. but Ok. :-) bq. We should try to leave TBLS table untouched in backend db. Sure. Will move them to a new table. Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore, WebHCat Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch, HIVE-5138.v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769784#comment-13769784 ] Eugene Koifman commented on HIVE-5138: -- OK, makes sense. It would be useful to add some javadoc about concurrency (or rather why it's not an issue) Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore, WebHCat Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769209#comment-13769209 ] Roshan Naik commented on HIVE-5138: --- Thanks [~ekoifman] for the comments: h5. On Pt 1. Thanks. I need to take a closer look at this. h5. On Pt 2. I think you mean 'safe to invoke concurrently' instead of 'atomic', since the intermediate states are going to be visible when an operation spans both file system and meta store. Here is a summary of the reasons why each operation is concurrency safe: - *streamingStatus* : Readonly metastore operation - *chunkGet* : This is an atomic metastore operation - *chunkAbort* : Just deletes a file. So no concurrency issues here. - *chunkCommit* : Just renames a file. So only one of concurrent operations will succeed. - *disableStreaming* : This is an atomic metastore operation - *enableStreaming* : Does a couple of mkdirs (for setup) followed by an atomic metastore operation. mkdirs() is idempotent, so all concurrent calls succeed. All concurrent invocations enter a transaction to do the metastore update atomically...only one should update metastore. - *partitionRoll* : Creates empty dir for the new current partition then atomically updates metastore as follows: -# Make note of this new current partition dir -# Do an addPartition() on the previous current partition. - If concurrent partitionRoll() invocations use same arguments, the addPartition() step will fail on all but one. If arguments are not same in concurrent invocations, they all succeed and updates made by the last invocation to exit the metastore transaction would override the others. Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore, WebHCat Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762146#comment-13762146 ] Eugene Koifman commented on HIVE-5138: -- [~roshan_naik] A couple of comments on this patch: 1. All delegators in WebHCat take the 'user' as determined by Server.java and use that to make secure calls to JobTrakcer, HDFS etc. HCatStreamingDelegator ignores it. Why is that? 2. Most operations in HCatStreamingDelegator do multiple things (like modify metadata, create some HDFS file, etc.). It sounds like every one of these operations should be atomic. For example, say for some reason 2 identical calls to partitionRoll() happen at the same time. How is this atomicity achieved? Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760863#comment-13760863 ] Roshan Naik commented on HIVE-5138: --- Patch v2 addresses the review comments from https://issues.apache.org/jira/browse/HIVE-4196?focusedCommentId=13714235page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13714235 Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming- Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748165#comment-13748165 ] Roshan Naik commented on HIVE-5138: --- Implement Webhcat API to: 1) Enable and Disable streaming on a table 2) Check streaming status 3) Transaction Support: - Get a Chunk File - Commit a Chunk File - Abort the chunk 4) Roll Partition: To roll the committed chunks from streaming partition to a new standard partition Streaming- Web HCat API Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Reporter: Roshan Naik Assignee: Roshan Naik -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira