[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1371 @Peilin-Yang Right, let's not think about it for now then. We can do that when next release is ready --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1371 @Peilin-Yang Thanks I will take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project zeppelin-interpreter: Compilation failure [ERROR] /Users/yizhang/zeppelin/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterServer.java:[35,43] package org.apache.zeppelin.interpreter.dev does not exist [ERROR] -> [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project zeppelin-interpreter: Compilation failure /Users/yizhang/zeppelin/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterServer.java:[35,43] package org.apache.zeppelin.interpreter.dev does not exist at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288) at org.apache.maven.cli.MavenCli.main(MavenCli.java:199) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure /Users/yizhang/zeppelin/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterServer.java:[35,43] package org.apache.zeppelin.interpreter.dev does not exist at org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:858) at org.apache.maven.plugin.compiler.CompilerMojo.execute(CompilerMojo.java:129) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207) ... 20 more Not sure whether this is some problem with my own settings? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 @corneadoug fetched branch-0.6 but compile failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 @corneadoug sure, I can do that once having time:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1371 @Peilin-Yang Could you eventually open a PR with those changed targeting branch-0.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1371 I wanted to merge this in 0.6 also, there was some conflicts, and something went wrong the cherry-pick. Now I can't push to a different branch with our merge tool --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1371 @Peilin-Yang Tested again, merging it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user bzz commented on the issue: https://github.com/apache/zeppelin/pull/1371 Looks great to me, thank you @Peilin-Yang ! @corneadoug what do you think? Let's merge to master, if there is no further discussion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 I have changed the logic of code a bit. Since it is just impossible to maintain a list of date formats for parsing purpose, now I just take whatever in the cell (if the value was proven not a number) and just let moment to parse it. Like @corneadoug said, in the future we can let the user (as per #1363) to decide what are in each column and we simply do not put ourselves to the dangerous side. Local test was done to remove compilation errors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1371 Tested the latest state of the PR and the ordering of Numbers and Dates, LGTM Except for the Comments and CI status (should be fixed after fixing my comments). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 @corneadoug thanks for all your feedbacks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1371 I like the code changes so that we don't have additional loops. I didn't yet test with the new changes, but I'll do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 BTW, for the dates detection I made a pre-defined list of supported formats and used the strict mode for moment.js parsing. The format "x" or "X" which corresponds to the UNIX timestamp was intentionally excluded from this list as it can match almost every number. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 @corneadoug The ideal case is to fetch the type from back end. But that is pretty hard. There are several types of interpreters (Sql, Spark, AngularJS, Shell etc..) in the back end and create a unified interface for all of them needs a ton of work. A simple example is: if user submits a SQL query and in that query the user applies DATE function to a string field to make that column as type date then basically I do not think we can correctly get the type of the column. To me, (again) the data type in front end table is just a utility for the user sorts/formats the data on-the-fly. Does this make more sense? In the latest commit I added the date cell detection. Also, I make the detection at the time when data was loaded to the table and this would save some time (each cell will be processed and loaded anyway, see function $scope.loadTableData). I tried to follow the code convention to make the change as API but the current code seems do not like doing like that so I keep my changes with other parts of code that deals with table display. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1371 I think that data type discovery should be done on the back-end and sent to the front. Parsing every column and rows can become expensive. And having a good data type discovery structure would also allow APIs to benefit from it. While it only applied to table #1363 is more beneficial since it let the user set the type himself (usually the user knows the type better). We can also expand it to handle more types, and why not the graph pivot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 @echarles Yes the table can only hold limited number of rows (1000) to not to overload the client. I think the limit is set by the interpreter not the front end client. That said, there is no way for the front end client to know about the whole picture of the data. But what I meant is the purpose of sorting the table is for the current data in the table which means this is fine as long as it correctly sorts the table shown in the page. Does this make any sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user echarles commented on the issue: https://github.com/apache/zeppelin/pull/1371 @Peilin-Yang say you resultset is 50.000 lines, clients only have 1.000 lines (if I remember well). There is a risk your type guess is not correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 @bzz yes, this will basically touch every cell in the table. I do not think just visiting a few rows would work as the sorting is done by HudsonTable and it loads all data for sorting purpose. But you are right. I will see whether it can be done at the time the data is loaded - that will make the re-parsing of the data unneeded, but possibly affect the graphs as well. @echarles Why sorting at client is misleading? the table is generated by the user and the purpose of sorting by columns is to sort the current table on-the-fly. Your other suggestion makes more sense - making the type get/set as APIs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user echarles commented on the issue: https://github.com/apache/zeppelin/pull/1371 Upon numeric values, the automatic detection of dates... would be great. Detection at client (javascript) side could be misleading as you only get a subset of the data. On the other side, detection on server (scala) side may be more time consuming for hughe resultset. Still this could be a real added value as you could expose the types, units... via the API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user bzz commented on the issue: https://github.com/apache/zeppelin/pull/1371 Thank you @Peilin-Yang ! It looks like it's `O(N)` from the table size - do you think there might be a performance implication here? Just curious, but if it's a table - should not analysing just a few rows be enough? @arunsoman that sounds interesting, but I think dev@ mailing list is much better place to discuss such changes, while keeping this PR to the scope of ZEPPELIN-1372. \cc @corneadoug for review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user Peilin-Yang commented on the issue: https://github.com/apache/zeppelin/pull/1371 @arunsoman Yes. in this PR the code can automatically detect the type of the cells, but restricted to strings and numbers. Dates can be added either and this is a good idea. In my other PR #1363 the drop down menu intents to make user aware of the data type of each column. Please let me know if you have better idea (better with detailed suggestions) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1371: [ZEPPELIN-1372]Automatically Detect the data type in t...
Github user arunsoman commented on the issue: https://github.com/apache/zeppelin/pull/1371 It will be further beneficial if zeppelin could detect the column type, eg(date, number, string) and also if the field shows the property of measure or dimension; if zeppelin could dispatch such details then the display system can make out what charts to show to user, or user could make more informed choice when creating a report. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---