[GitHub] [zeppelin] zhugezifang commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency
zhugezifang commented on PR #4611: URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1584017096 > > and the data cache is really for hive like this large data database (not for mysql), as we know , hive sql will tranlate to mapreduce job, complex hive sql have shuffle and reducer ,it cost a lot of time > > I agree that Zeppelin could help to reduce resource usage but I, as one of the maintainers, believe that we shouldn't change any user behaviors and mechanisms working as it is. Your approach might be better for some cases but it could have a kind of hidden steps and it could have another problem. If you would like to do it really, I believe that you can make your/your company's own interpreter. It can be merged into this repository but we, as reviewers, should consider not to make significant side effects with the highest priority. ok, thanks for your advice, and could you help to review this pr of sql debug, https://github.com/apache/zeppelin/pull/4598 is it suitable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ZEPPELIN-5927) Solve the concurrency calls to `saveNoteAuth`
yousj created ZEPPELIN-5927: --- Summary: Solve the concurrency calls to `saveNoteAuth` Key: ZEPPELIN-5927 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5927 Project: Zeppelin Issue Type: Bug Components: zeppelin-zengine Affects Versions: 0.10.1, 0.10.0, 0.9.0 Reporter: yousj Fix For: 0.10.1, 0.10.0, 0.9.0 I have problems the concurrency calls to `saveNoteAuth`. [related pull request |[https://github.com/apache/zeppelin/pull/4563],] this pull request migrates solve the concurrency problem caused by multiple concurrent calls to `org.apache.zeppelin.notebook.AuthorizationService#saveNoteAuth`, but this can result in concurrent modifications to `notebook authorization.json`, then throw java.nio.file.NoSuchFileException. {code:java} Caused by: java.nio.file.NoSuchFileException: /usr/local/zeppelin-0.10.1-bin-all/conf/notebook-authorization.json at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:447) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.zeppelin.util.FileUtils.atomicWriteToFile(FileUtils.java:60) at org.apache.zeppelin.util.FileUtils.atomicWriteToFile(FileUtils.java:71) at org.apache.zeppelin.storage.LocalConfigStorage.save(LocalConfigStorage.java:71) at org.apache.zeppelin.notebook.AuthorizationService.saveNoteAuth(AuthorizationService.java:109) at org.apache.zeppelin.notebook.Notebook.createNote(Notebook.java:258) at org.apache.zeppelin.service.NotebookService.createNote(NotebookService.java:168) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [zeppelin] jongyoul commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency
jongyoul commented on PR #4611: URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582577943 > and the data cache is really for hive like this large data database (not for mysql), as we know , hive sql will tranlate to mapreduce job, complex hive sql have shuffle and reducer ,it cost a lot of time I agree that Zeppelin could help to reduce resource usage but I, as one of the maintainers, believe that we shouldn't change any user behaviors and mechanisms working as it is. Your approach might be better for some cases but it could have a kind of hidden steps and it could have another problem. If you would like to do it really, I believe that you can make your/your company's own interpreter. It can be merged into this repository but we, as reviewers, should consider not to make significant side effects with the highest priority. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] jongyoul commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency
jongyoul commented on PR #4611: URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582561003 Sorry for the late and short comment but how about making debug paragraph as a new interpreter? I mean, for instance, `%jdbc.debug`. Debugging jdbc paragraph itself is reasonable but I'm worried that this feature would be included in ZeppelinServer because this is only for jdbc interpreter and it won't work for other interpereters. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] youshaojun commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note
youshaojun commented on PR #4563: URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1582337500 Thanks for your reply. I understand. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] Reamer commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note
Reamer commented on PR #4563: URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1582193016 I don't think so, because this function is used by all save operations. e.g. when note files are created. These would then block each other. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] zhugezifang commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency
zhugezifang commented on PR #4611: URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582091105 > A few thoughts about your work: > > * All functionality is located in the JDBC interpreter. All new dependencies and classes should also be located there. > * It seems that in case of a complex query you modify the dataset with the JDBC interpreter at the source. With your current implementation, the user is not aware of this right away. > * You want to cache complex queries with an LRUCache. Depending on the size of the cache, queries may not be removed from the cache and the user may see stale data. I think a time based cache would be more appropriate here. > * Have you ever heard of PreparedStatements? > * At first view your configuration options are only JDBC interpreter configuration, therefore the configuration should be stored only in the JDBC interpreter and not as global ZeppelinConfiguration. and the data cache is really for hive like this large data database (not for mysql), as we know ,complex hive sql have shuffle and reducer ,it cost a lot of time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] zhugezifang commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency
zhugezifang commented on PR #4611: URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582067198 > A few thoughts about your work: > > * All functionality is located in the JDBC interpreter. All new dependencies and classes should also be located there. > * It seems that in case of a complex query you modify the dataset with the JDBC interpreter at the source. With your current implementation, the user is not aware of this right away. > * You want to cache complex queries with an LRUCache. Depending on the size of the cache, queries may not be removed from the cache and the user may see stale data. I think a time based cache would be more appropriate here. > * Have you ever heard of PreparedStatements? > * At first view your configuration options are only JDBC interpreter configuration, therefore the configuration should be stored only in the JDBC interpreter and not as global ZeppelinConfiguration. and could you help me to continue complete the feature of sql debug https://github.com/apache/zeppelin/pull/4598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] zhugezifang commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency
zhugezifang commented on PR #4611: URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582065412 @Reamer hi,firstly thanks for your advice "All functionality is located in the JDBC interpreter. All new dependencies and classes should also be located there" it contains antlr4 ,and it also can be used in the feature in sql debug https://github.com/apache/zeppelin/pull/4598 but i am very sorry , the feature cost you a lot of time ,but it can not be merged if this feature of sql debug can be merge, the use of antlr can also be used in this feature. 'queries may not be removed from the cache and the user may see stale data' --- this is really a good question,it is i want to do in next step,i create temp table as the data, but if the origin data change or temp table very much ,it need to fix ,so i write the Design Document, the last step is to clean the temp table https://docs.google.com/document/d/1wruK0ZZ0XiriYOraFa5WYSz531pcsCpJeIBmne57fJY/edit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] youshaojun commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note
youshaojun commented on PR #4563: URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1582052448 Mybe a synchronized at the `org.apache.zeppelin.util.FileUtils#atomicWriteToFile(java.lang.String, java.io.File, java.util.Set)` is a better option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] Reamer commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note
Reamer commented on PR #4563: URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1582001696 Feel free to prepare a PullRequest with JIRA ticket. Your StackTrace should be included in the JIRA ticket. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] Reamer commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency
Reamer commented on PR #4611: URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-158298 A few thoughts about your work: - All functionality is located in the JDBC interpreter. All new dependencies and classes should also be located there. - It seems that in case of a complex query you modify the dataset with the JDBC interpreter at the source. With your current implementation, the user is not aware of this right away. - You want to cache complex queries with an LRUCache. Depending on the size of the cache, queries may not be removed from the cache and the user may see stale data. I think a time based cache would be more appropriate here. - Have you ever heard of PreparedStatements? - At first view your configuration options are only JDBC interpreter configuration, therefore the configuration should be stored only in the JDBC interpreter and not as global ZeppelinConfiguration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [zeppelin] youshaojun commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note
youshaojun commented on PR #4563: URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1581997675 Yes, i also think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org