[GitHub] [zeppelin] zhugezifang commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency

2023-06-08 Thread via GitHub


zhugezifang commented on PR #4611:
URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1584017096

   > > and the data cache is really for hive like this large data database (not 
for mysql), as we know , hive sql will tranlate to mapreduce job, complex hive 
sql have shuffle and reducer ,it cost a lot of time
   > 
   > I agree that Zeppelin could help to reduce resource usage but I, as one of 
the maintainers, believe that we shouldn't change any user behaviors and 
mechanisms working as it is. Your approach might be better for some cases but 
it could have a kind of hidden steps and it could have another problem. If you 
would like to do it really, I believe that you can make your/your company's own 
interpreter. It can be merged into this repository but we, as reviewers, should 
consider not to make significant side effects with the highest priority.
   
   ok, thanks for your advice, and could you help to review this pr of sql 
debug, https://github.com/apache/zeppelin/pull/4598
   is it suitable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ZEPPELIN-5927) Solve the concurrency calls to `saveNoteAuth`

2023-06-08 Thread yousj (Jira)
yousj created ZEPPELIN-5927:
---

 Summary: Solve the concurrency calls to `saveNoteAuth`
 Key: ZEPPELIN-5927
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5927
 Project: Zeppelin
  Issue Type: Bug
  Components: zeppelin-zengine
Affects Versions: 0.10.1, 0.10.0, 0.9.0
Reporter: yousj
 Fix For: 0.10.1, 0.10.0, 0.9.0


I have problems the concurrency calls to `saveNoteAuth`.
[related pull request |[https://github.com/apache/zeppelin/pull/4563],] this 
pull request migrates solve the concurrency problem caused by multiple 
concurrent calls to 
`org.apache.zeppelin.notebook.AuthorizationService#saveNoteAuth`, but this can 
result in concurrent modifications to `notebook authorization.json`, then throw 
java.nio.file.NoSuchFileException.
{code:java}
Caused by: java.nio.file.NoSuchFileException: 
/usr/local/zeppelin-0.10.1-bin-all/conf/notebook-authorization.json
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:447)
    at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    at java.nio.file.Files.move(Files.java:1395)
    at org.apache.zeppelin.util.FileUtils.atomicWriteToFile(FileUtils.java:60)
    at org.apache.zeppelin.util.FileUtils.atomicWriteToFile(FileUtils.java:71)
    at 
org.apache.zeppelin.storage.LocalConfigStorage.save(LocalConfigStorage.java:71)
    at 
org.apache.zeppelin.notebook.AuthorizationService.saveNoteAuth(AuthorizationService.java:109)
    at org.apache.zeppelin.notebook.Notebook.createNote(Notebook.java:258)
    at 
org.apache.zeppelin.service.NotebookService.createNote(NotebookService.java:168)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [zeppelin] jongyoul commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency

2023-06-08 Thread via GitHub


jongyoul commented on PR #4611:
URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582577943

   > and the data cache is really for hive like this large data database (not 
for mysql), as we know , hive sql will tranlate to mapreduce job, complex hive 
sql have shuffle and reducer ,it cost a lot of time
   
   I agree that Zeppelin could help to reduce resource usage but I, as one of 
the maintainers, believe that we shouldn't change any user behaviors and 
mechanisms working as it is. Your approach might be better for some cases but 
it could have a kind of hidden steps and it could have another problem. If you 
would like to do it really, I believe that you can make your/your company's own 
interpreter. It can be merged into this repository but we, as reviewers, should 
consider not to make significant side effects with the highest priority.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] jongyoul commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency

2023-06-08 Thread via GitHub


jongyoul commented on PR #4611:
URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582561003

   Sorry for the late and short comment but how about making debug paragraph as 
a new interpreter? I mean, for instance, `%jdbc.debug`. Debugging jdbc 
paragraph itself is reasonable but I'm worried that this feature would be 
included in ZeppelinServer because this is only for jdbc interpreter and it 
won't work for other interpereters. WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] youshaojun commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note

2023-06-08 Thread via GitHub


youshaojun commented on PR #4563:
URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1582337500

   Thanks for your reply. I understand.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] Reamer commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note

2023-06-08 Thread via GitHub


Reamer commented on PR #4563:
URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1582193016

   I don't think so, because this function is used by all save operations. e.g. 
when note files are created. These would then block each other.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] zhugezifang commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency

2023-06-08 Thread via GitHub


zhugezifang commented on PR #4611:
URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582091105

   > A few thoughts about your work:
   > 
   > * All functionality is located in the JDBC interpreter. All new 
dependencies and classes should also be located there.
   > * It seems that in case of a complex query you modify the dataset with the 
JDBC interpreter at the source. With your current implementation, the user is 
not aware of this right away.
   > * You want to cache complex queries with an LRUCache. Depending on the 
size of the cache, queries may not be removed from the cache and the user may 
see stale data. I think a time based cache would be more appropriate here.
   > * Have you ever heard of PreparedStatements?
   > * At first view your configuration options are only JDBC interpreter 
configuration, therefore the configuration should be stored only in the JDBC 
interpreter and not as global ZeppelinConfiguration.
   
   and the data cache is really for hive like this large data database (not  
for mysql), as we know ,complex hive sql have shuffle and reducer ,it cost a 
lot of time  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] zhugezifang commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency

2023-06-08 Thread via GitHub


zhugezifang commented on PR #4611:
URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582067198

   > A few thoughts about your work:
   > 
   > * All functionality is located in the JDBC interpreter. All new 
dependencies and classes should also be located there.
   > * It seems that in case of a complex query you modify the dataset with the 
JDBC interpreter at the source. With your current implementation, the user is 
not aware of this right away.
   > * You want to cache complex queries with an LRUCache. Depending on the 
size of the cache, queries may not be removed from the cache and the user may 
see stale data. I think a time based cache would be more appropriate here.
   > * Have you ever heard of PreparedStatements?
   > * At first view your configuration options are only JDBC interpreter 
configuration, therefore the configuration should be stored only in the JDBC 
interpreter and not as global ZeppelinConfiguration.
   
   and could you help me to continue complete the feature of sql debug  
https://github.com/apache/zeppelin/pull/4598


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] zhugezifang commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency

2023-06-08 Thread via GitHub


zhugezifang commented on PR #4611:
URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-1582065412

   @Reamer  hi,firstly thanks for your advice
   
   "All functionality is located in the JDBC interpreter. All new dependencies 
and classes should also be located there" it contains antlr4 ,and it also 
can be used in the feature in sql debug 
https://github.com/apache/zeppelin/pull/4598
   but i am very sorry , the feature cost you  a lot of time ,but it can not be 
merged
   
   if this feature of sql debug can be merge, the use of antlr can also be used 
in this feature.
   
   'queries may not be removed from the cache and the user may see stale data' 
--- this is really a good question,it is i want to do in next step,i create 
temp table as the data, but if the origin data change or temp table very much 
,it need to fix ,so i write the Design Document, the last step is to clean the 
temp table 
   
   
https://docs.google.com/document/d/1wruK0ZZ0XiriYOraFa5WYSz531pcsCpJeIBmne57fJY/edit
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] youshaojun commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note

2023-06-08 Thread via GitHub


youshaojun commented on PR #4563:
URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1582052448

   Mybe a synchronized at the 
`org.apache.zeppelin.util.FileUtils#atomicWriteToFile(java.lang.String, 
java.io.File, java.util.Set)`   is 
a better option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] Reamer commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note

2023-06-08 Thread via GitHub


Reamer commented on PR #4563:
URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1582001696

   Feel free to prepare a PullRequest with JIRA ticket. Your StackTrace should 
be included in the JIRA ticket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] Reamer commented on pull request #4611: [ZEPPELIN-5915]improve query efficiency

2023-06-08 Thread via GitHub


Reamer commented on PR #4611:
URL: https://github.com/apache/zeppelin/pull/4611#issuecomment-158298

   A few thoughts about your work:
- All functionality is located in the JDBC interpreter. All new 
dependencies and classes should also be located there.
- It seems that in case of a complex query you modify the dataset with the 
JDBC interpreter at the source. With your current implementation, the user is 
not aware of this right away.
- You want to cache complex queries with an LRUCache. Depending on the size 
of the cache, queries may not be removed from the cache and the user may see 
stale data. I think a time based cache would be more appropriate here.
- Have you ever heard of PreparedStatements?
- At first view your configuration options are only JDBC interpreter 
configuration, therefore the configuration should be stored only in the JDBC 
interpreter and not as global ZeppelinConfiguration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [zeppelin] youshaojun commented on pull request #4563: [ZEPPELIN-5885] Solve the concurrency clone note

2023-06-08 Thread via GitHub


youshaojun commented on PR #4563:
URL: https://github.com/apache/zeppelin/pull/4563#issuecomment-1581997675

   Yes, i also think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org