[jira] [Commented] (FLINK-5413) Convert TableEnvironmentITCases to unit tests
[ https://issues.apache.org/jira/browse/FLINK-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822775#comment-15822775 ] GaoLun commented on FLINK-5413: --- Hi Timo, is it proper just validating the correctness of the table schema which created by the given conditions? > Convert TableEnvironmentITCases to unit tests > - > > Key: FLINK-5413 > URL: https://issues.apache.org/jira/browse/FLINK-5413 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: Timo Walther >Assignee: GaoLun > > The following IT cases could be converted into unit tests: > - {{org.apache.flink.table.api.scala.batch.TableEnvironmentITCase}} > - {{org.apache.flink.table.api.java.batch.TableEnvironmentITCase}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-5413) Convert TableEnvironmentITCases to unit tests
[ https://issues.apache.org/jira/browse/FLINK-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GaoLun reassigned FLINK-5413: - Assignee: GaoLun > Convert TableEnvironmentITCases to unit tests > - > > Key: FLINK-5413 > URL: https://issues.apache.org/jira/browse/FLINK-5413 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: Timo Walther >Assignee: GaoLun > > The following IT cases could be converted into unit tests: > - {{org.apache.flink.table.api.scala.batch.TableEnvironmentITCase}} > - {{org.apache.flink.table.api.java.batch.TableEnvironmentITCase}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5434) Remove unsupported project() transformation from Scala DataStream docs
[ https://issues.apache.org/jira/browse/FLINK-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822045#comment-15822045 ] GaoLun commented on FLINK-5434: --- Just remove from the doc as the following pr. Should we create another jira to support the operator for the scala data stream api ? > Remove unsupported project() transformation from Scala DataStream docs > -- > > Key: FLINK-5434 > URL: https://issues.apache.org/jira/browse/FLINK-5434 > Project: Flink > Issue Type: Bug > Components: Documentation >Reporter: Vasia Kalavri > > The Scala DataStream does not have a project() transformation, yet the docs > include it as a supported operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4263) SQL's VALUES does not work properly
[ https://issues.apache.org/jira/browse/FLINK-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395370#comment-15395370 ] GaoLun commented on FLINK-4263: --- [~jark], [~twalthr] i debug this issue and found that when using {{getExecutionEnvironment}} (default: {{LocalEnvironment}}), this exception will throw and while using {{CollectionsEnvironment}}, it works correctly. Because with the implementation of {{DataSetValues}}, the row serializer is not written into {{ValuesInputFormat}}. And {{InstantiationUtil}}: {code} public static byte[] serializeObject(Object o) throws IOException { try (ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos)) { oos.writeObject(o); {code} {{oos}} can't find serializer and throws exception. But for {{CollectionsEnvironment}}, it creates serializer for row type ({{GenericDataSourceBase.225}}): {code} InputSplit[] splits = inputFormat.createInputSplits(1); TypeSerializer serializer = getOperatorInfo().getOutputType().createSerializer(executionConfig); {code} We can create row serializer in {{DataSetValues}} and override the {{writeObject}} in {{ValuesInputFormat}} to fix this bug. > SQL's VALUES does not work properly > --- > > Key: FLINK-4263 > URL: https://issues.apache.org/jira/browse/FLINK-4263 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Affects Versions: 1.1.0 >Reporter: Timo Walther >Assignee: Jark Wu > > Executing the following SQL leads to very strange output: > {code} > SELECT * > FROM( > VALUES > (1, 2), > (3, 4) > ) AS q (col1, col2)" > {code} > {code} > org.apache.flink.optimizer.CompilerException: Error translating node 'Data > Source "at translateToPlan(DataSetValues.scala:88) > (org.apache.flink.api.table.runtime.ValuesInputFormat)" : NONE [[ > GlobalProperties [partitioning=RANDOM_PARTITIONED] ]] [[ LocalProperties > [ordering=null, grouped=null, unique=null] ]]': Could not write the user code > wrapper class > org.apache.flink.api.common.operators.util.UserCodeObjectWrapper : > java.io.NotSerializableException: org.apache.flink.api.table.Row > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:381) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:106) > at > org.apache.flink.optimizer.plan.SourcePlanNode.accept(SourcePlanNode.java:86) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:128) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:192) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:170) > at > org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:637) > at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:547) > at > org.apache.flink.api.scala.batch.sql.SortITCase.testOrderByMultipleFieldsWithSql(SortITCase.scala:56) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > Caused by: > org.apache.flink.runtime.operators.util.CorruptConfigurationException: Could > not write the user code wrapper class > org.apache.flink.api.common.operators.util.UserCodeObjectWrapper : > java.io.NotSerializableException: org.apache.flink.api.table.Row > at > org.apache.flink.runtime.operators.util.TaskConfig.setStubWrapper(TaskConfig.java:279) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.createDataSourceVertex(JobGraphGenerator.java:888) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:281) > ... 51 more > Caused by: java.io.NotSerializableException: org.apache.flink.api.table.Row > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at java.io.O
[jira] [Commented] (FLINK-4242) Improve validation exception messages
[ https://issues.apache.org/jira/browse/FLINK-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390749#comment-15390749 ] GaoLun commented on FLINK-4242: --- Yes, currently some messages of table exception aren't readable and clear for the users. I will open a PR for this issue. > Improve validation exception messages > - > > Key: FLINK-4242 > URL: https://issues.apache.org/jira/browse/FLINK-4242 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Priority: Minor > > The Table API's validation exceptions could be improved to be more meaningful > for users. For example, the following code snippet: > {code} > Table inputTable = tableEnv.fromDataStream(env.fromElements( > Tuple3.of(1, "a", 1.0), > Tuple3.of(2, "b", 2.0), > Tuple3.of(3, "c", 3.0)), "a, b, c"); > inputTable.select("a").where("!a"); > {code} > fails correctly. However, the validation exception message says "Expression > !('a) failed on input check: Not only accepts child of Boolean Type, get > Integer". I think it could be changed such that it says: "The not operator > requires a boolean input but "a" is of type integer." or something similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-3940) Add support for ORDER BY OFFSET FETCH
[ https://issues.apache.org/jira/browse/FLINK-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GaoLun reassigned FLINK-3940: - Assignee: GaoLun > Add support for ORDER BY OFFSET FETCH > - > > Key: FLINK-3940 > URL: https://issues.apache.org/jira/browse/FLINK-3940 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 1.1.0 >Reporter: Fabian Hueske >Assignee: GaoLun >Priority: Minor > > Currently only ORDER BY without OFFSET and FETCH are supported. > This issue tracks the effort to add support for OFFSET and FETCH and involves: > - Implementing the execution strategy in `DataSetSort` > - adapting the `DataSetSortRule` to support OFFSET and FETCH > - extending the Table API and validation to support OFFSET and FETCH and > generate a corresponding RelNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3940) Add support for ORDER BY OFFSET FETCH
[ https://issues.apache.org/jira/browse/FLINK-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388752#comment-15388752 ] GaoLun commented on FLINK-3940: --- Hello, i would like to work on this issue. :) > Add support for ORDER BY OFFSET FETCH > - > > Key: FLINK-3940 > URL: https://issues.apache.org/jira/browse/FLINK-3940 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 1.1.0 >Reporter: Fabian Hueske >Priority: Minor > > Currently only ORDER BY without OFFSET and FETCH are supported. > This issue tracks the effort to add support for OFFSET and FETCH and involves: > - Implementing the execution strategy in `DataSetSort` > - adapting the `DataSetSortRule` to support OFFSET and FETCH > - extending the Table API and validation to support OFFSET and FETCH and > generate a corresponding RelNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2985) Allow different field names for unionAll() in Table API
[ https://issues.apache.org/jira/browse/FLINK-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317673#comment-15317673 ] GaoLun edited comment on FLINK-2985 at 6/7/16 2:12 AM: --- The refactoring work has been finished. If nobody working on this, i will go on with it. :) was (Author: gallenvara_bg): The refactoring work has been finished and i will go on with this issue. :) > Allow different field names for unionAll() in Table API > --- > > Key: FLINK-2985 > URL: https://issues.apache.org/jira/browse/FLINK-2985 > Project: Flink > Issue Type: Improvement > Components: Table API >Reporter: Timo Walther >Priority: Minor > > The recently merged `unionAll` operator checks if the field names of the left > and right side are equal. Actually, this is not necessary. The union operator > in SQL checks only the types and uses the names of left side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2985) Allow different field names for unionAll() in Table API
[ https://issues.apache.org/jira/browse/FLINK-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317673#comment-15317673 ] GaoLun commented on FLINK-2985: --- The refactoring work has been finished and i will go on with this issue. :) > Allow different field names for unionAll() in Table API > --- > > Key: FLINK-2985 > URL: https://issues.apache.org/jira/browse/FLINK-2985 > Project: Flink > Issue Type: Improvement > Components: Table API >Reporter: Timo Walther >Priority: Minor > > The recently merged `unionAll` operator checks if the field names of the left > and right side are equal. Actually, this is not necessary. The union operator > in SQL checks only the types and uses the names of left side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3971) Aggregates handle null values incorrectly.
[ https://issues.apache.org/jira/browse/FLINK-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301660#comment-15301660 ] GaoLun commented on FLINK-3971: --- [~fhueske] If no one work on this issue, i would have a try. > Aggregates handle null values incorrectly. > -- > > Key: FLINK-3971 > URL: https://issues.apache.org/jira/browse/FLINK-3971 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 1.1.0 >Reporter: Fabian Hueske >Priority: Critical > Fix For: 1.1.0 > > > Table API and SQL aggregates are supposed to ignore null values, e.g., > {{sum(1,2,null,4)}} is supposed to return {{7}}. > There current implementation is correct if at least one valid value is > present however, is incorrect if only null values are aggregated. {{sum(null, > null, null)}} should return {{null}} instead of {{0}} > Currently only the Count aggregate handles the case of null-values-only > correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281600#comment-15281600 ] GaoLun commented on FLINK-3879: --- [~greghogan][~vkalavri] Hi, I have done some work and FLINK-2044 supports converge threshold now. :) > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277514#comment-15277514 ] GaoLun commented on FLINK-3879: --- Scatter-gather model divides original iteration into two parts. For performance, 3879 is better. Maybe it's good to keep both with different path. One for performance, another for being consistent with other algorithm implementations used same iteration model. :) > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2184) Cannot get last element with maxBy/minBy
[ https://issues.apache.org/jira/browse/FLINK-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276262#comment-15276262 ] GaoLun commented on FLINK-2184: --- I will go on with this issue. :) > Cannot get last element with maxBy/minBy > > > Key: FLINK-2184 > URL: https://issues.apache.org/jira/browse/FLINK-2184 > Project: Flink > Issue Type: Improvement > Components: Scala API, Streaming >Reporter: Gábor Hermann >Priority: Minor > > In the streaming Scala API there is no method > {{maxBy(int positionToMaxBy, boolean first)}} > nor > {{minBy(int positionToMinBy, boolean first)}} > like in the Java API, where _first_ set to {{true}} indicates that the latest > found element will return. > These methods should be added to the Scala API too, in order to be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275265#comment-15275265 ] GaoLun commented on FLINK-3879: --- yes, make sense. > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275242#comment-15275242 ] GaoLun commented on FLINK-3879: --- Hub values are same but authority values have a little difference. > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275240#comment-15275240 ] GaoLun commented on FLINK-3879: --- My: (1,0.847998304005088,0.0), (2,0.5299989400031799,0.5144957554275266), (3,0.0,0.8574929257125442) Yours: (1,0.8479983040050879,0.0), (2,0.5299989400031799,0.5240974256643347), (3,0.0,0.8516583167045438) > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275160#comment-15275160 ] GaoLun commented on FLINK-3879: --- Hi [~greghogan], the PR of FLINK-2044 has been updated and support returning both value now. And i have changed the normalization method from sum to square sum. I wrote a simple test for your implementation to compare the result with mine, but i find the result is different. For a simple graph: {{1->2, 1->3, 2->3}} with one iteration result : Mine: {{(1,0.8320502943378436,0.0), (2,0.554700196225229,0.4472135954999579), (3,0.0,0.894427190159)}} Yours: {{(1,0.8320502943378437,0.0), (2,0.5547001962252291,0.5144957554275265), (3,0.0,0.8574929257125441)}} We can calculate the hub/authority value manually, the result should be: {{(1, sqrt(9/13), 0.0), (2,sqrt(4/13), 1/sqrt(5)), (3, 0.0, 2/sqrt(5))}} which is a little different with yours. > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265783#comment-15265783 ] GaoLun commented on FLINK-2044: --- I will go on with this issue. :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2220) Join on Pojo without hashCode() silently fails
[ https://issues.apache.org/jira/browse/FLINK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259367#comment-15259367 ] GaoLun commented on FLINK-2220: --- I wrote test for generictype and the problem still arose. The small fix PR added log warning if POJO used as key but do not override the two method. > Join on Pojo without hashCode() silently fails > -- > > Key: FLINK-2220 > URL: https://issues.apache.org/jira/browse/FLINK-2220 > Project: Flink > Issue Type: Bug >Affects Versions: 0.9, 0.8.1 >Reporter: Marcus Leich > > I need to perform a join using a complete Pojo as join key. > With DOP > 1 this only works if the Pojo comes with a meaningful hasCode() > implementation, as otherwise equal objects will get hashed to different > partitions based on their memory address and not on the content. > I guess it's fine if users are required to implement hasCode() themselves, > but it would be nice of documentation or better yet, Flink itself could alert > users that this is a requirement, similar to how Comparable is required for > keys. > Use the following code to reproduce the issue: > public class Pojo implements Comparable { > public byte[] data; > public Pojo () { > } > public Pojo (byte[] data) { > this.data = data; > } > @Override > public int compareTo(Pojo o) { > return UnsignedBytes.lexicographicalComparator().compare(data, > o.data); > } > // uncomment me for making the join work > /* @Override > public int hashCode() { > return Arrays.hashCode(data); > }*/ > } > public void testJoin () throws Exception { > final ExecutionEnvironment env = > ExecutionEnvironment.createLocalEnvironment(); > env.setParallelism(4); > DataSet> left = env.fromElements( > new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "black"), > new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), "red"), > new Tuple2<>(new Pojo(new byte[] {1}), "Spark"), > new Tuple2<>(new Pojo(new byte[] {2}), "good"), > new Tuple2<>(new Pojo(new byte[] {5}), "bug")); > DataSet> right = env.fromElements( > new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "white"), > new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), > "green"), > new Tuple2<>(new Pojo(new byte[] {1}), "Flink"), > new Tuple2<>(new Pojo(new byte[] {2}), "evil"), > new Tuple2<>(new Pojo(new byte[] {5}), "fix")); > // will not print anything unless Pojo has a real hashCode() > implementation > > left.join(right).where(0).equalTo(0).projectFirst(1).projectSecond(1).print(); > } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3783) Support weighted random sampling with reservoir
GaoLun created FLINK-3783: - Summary: Support weighted random sampling with reservoir Key: FLINK-3783 URL: https://issues.apache.org/jira/browse/FLINK-3783 Project: Flink Issue Type: Improvement Components: Core Reporter: GaoLun Assignee: GaoLun Priority: Minor In default random sampling, all items have the same probability to be selected. But in weighted random sampling, the probability of each item to be selected is determined by its weight with respect to the weights of the other items. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3729) Several SQL tests fail on Windows OS
[ https://issues.apache.org/jira/browse/FLINK-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237523#comment-15237523 ] GaoLun commented on FLINK-3729: --- The problem is the {{ast}} would use \r\n in Windows environment. The {{ast}} is created by {{RelOptUtil.toString(relNode)}} which would use different line-endings depend on the OS. And we can solve this issue as you mentioned: 1: modify the ast {code} val ast = RelOptUtil.toString(relNode).replaceAll("\r\n", "\n") {code} 2: For every test file, add another one with Windows version. > Several SQL tests fail on Windows OS > > > Key: FLINK-3729 > URL: https://issues.apache.org/jira/browse/FLINK-3729 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 1.0.1 >Reporter: Chesnay Schepler > > The Table API SqlExplain(Test/ITCase) fail categorically on Windows due to > different line-endings. These tests generate an string representation of an > abstract syntax tree; problem is there is a difference in line-endings. > The expected ones contain LF, the actual one CRLF. > The tests should be either changed to either > * include CRLF line-endings in the expected string when run on windows > * always use LF line-endings regardless of OS > * use a compare method that is aware of this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3192) Add explain support to print ast and sql physical execution plan.
[ https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GaoLun updated FLINK-3192: -- Description: Table API doesn't support sql-explanation now. Add the explain support to print ast (abstract syntax tree) and the physical execution plan of sql. (was: Table API doesn't support sql-explanation now. Add the explain support to print ast (abstract syntax tree) and the physical execution of sql.) > Add explain support to print ast and sql physical execution plan. > -- > > Key: FLINK-3192 > URL: https://issues.apache.org/jira/browse/FLINK-3192 > Project: Flink > Issue Type: New Feature > Components: Table API >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: features > > Table API doesn't support sql-explanation now. Add the explain support to > print ast (abstract syntax tree) and the physical execution plan of sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3192) Add explain support to print ast and sql physical execution plan.
[ https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GaoLun updated FLINK-3192: -- Summary: Add explain support to print ast and sql physical execution plan. (was: Add explain support to print ast and sql physical execution. ) > Add explain support to print ast and sql physical execution plan. > -- > > Key: FLINK-3192 > URL: https://issues.apache.org/jira/browse/FLINK-3192 > Project: Flink > Issue Type: New Feature > Components: Table API >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: features > > Table API doesn't support sql-explanation now. Add the explain support to > print ast (abstract syntax tree) and the physical execution of sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3192) Add explain support to print ast and sql physical execution.
GaoLun created FLINK-3192: - Summary: Add explain support to print ast and sql physical execution. Key: FLINK-3192 URL: https://issues.apache.org/jira/browse/FLINK-3192 Project: Flink Issue Type: New Feature Components: Table API Reporter: GaoLun Assignee: GaoLun Priority: Minor Table API doesn't support sql-explanation now. Add the explain support to print ast (abstract syntax tree) and the physical execution of sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2988) Cannot load DataSet[Row] from CSV file
[ https://issues.apache.org/jira/browse/FLINK-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023998#comment-15023998 ] GaoLun edited comment on FLINK-2988 at 11/25/15 1:45 AM: - Hi, [~jkovacs] . Is this problem solved? IMO, readCsvFile can only support the *tuple* and *POJO* class which wouldn't support nullable fields. While reading the csv file, env created the tupleSerializer. RowSerializer can create certain serializer according to the field type. Row can also support null value which tuple can't. was (Author: gallenvara_bg): Hi, [~jkovacs] . Is this problem solved? IMO, *readCsvFile* can only support the *tuple* and *POJO* class which wouldn't support nullable fields. While reading the csv file, *env* created the *tupleSerializer*. *RowSerializer* can create certain *serializer* according to the field type. *Row* can also support null value which *tuple* can't. > Cannot load DataSet[Row] from CSV file > -- > > Key: FLINK-2988 > URL: https://issues.apache.org/jira/browse/FLINK-2988 > Project: Flink > Issue Type: Improvement > Components: DataSet API, Table API >Affects Versions: 0.10.0 >Reporter: Johann Kovacs >Priority: Minor > > Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot > load a CSV file with more than 25 columns directly as a > DataSet\[TupleX\[...\]\]. > An alternative to using Tuples is using the Table API's Row class, which > allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using > RowTypeInfo) and index-based access. > However, trying to load a CSV file as a DataSet\[Row\] yields an exception: > {code} > val env = ExecutionEnvironment.createLocalEnvironment() > val filePath = "../someCsv.csv" > val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, > BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number")) > val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo) > println(source.collect()) > {code} > with someCsv.csv containing: > {code} > one,1 > two,2 > {code} > yields > {code} > Exception in thread "main" java.lang.ClassCastException: > org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to > org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase > at > org.apache.flink.api.scala.operators.ScalaCsvInputFormat.(ScalaCsvInputFormat.java:46) > at > org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282) > {code} > As a user I would like to be able to load a CSV file into a DataSet\[Row\], > preferably having a convenience method to specify the schema (RowTypeInfo), > without having to use the "explicit implicit parameters" syntax and > specifying the ClassTag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2988) Cannot load DataSet[Row] from CSV file
[ https://issues.apache.org/jira/browse/FLINK-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023998#comment-15023998 ] GaoLun commented on FLINK-2988: --- Hi, [~jkovacs] . Is this problem solved? IMO, *readCsvFile* can only support the *tuple* and *POJO* class which wouldn't support nullable fields. While reading the csv file, *env* created the *tupleSerializer*. *RowSerializer* can create certain *serializer* according to the field type. *Row* can also support null value which *tuple* can't. > Cannot load DataSet[Row] from CSV file > -- > > Key: FLINK-2988 > URL: https://issues.apache.org/jira/browse/FLINK-2988 > Project: Flink > Issue Type: Improvement > Components: DataSet API, Table API >Affects Versions: 0.10.0 >Reporter: Johann Kovacs >Priority: Minor > > Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot > load a CSV file with more than 25 columns directly as a > DataSet\[TupleX\[...\]\]. > An alternative to using Tuples is using the Table API's Row class, which > allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using > RowTypeInfo) and index-based access. > However, trying to load a CSV file as a DataSet\[Row\] yields an exception: > {code} > val env = ExecutionEnvironment.createLocalEnvironment() > val filePath = "../someCsv.csv" > val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, > BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number")) > val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo) > println(source.collect()) > {code} > with someCsv.csv containing: > {code} > one,1 > two,2 > {code} > yields > {code} > Exception in thread "main" java.lang.ClassCastException: > org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to > org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase > at > org.apache.flink.api.scala.operators.ScalaCsvInputFormat.(ScalaCsvInputFormat.java:46) > at > org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282) > {code} > As a user I would like to be able to load a CSV file into a DataSet\[Row\], > preferably having a convenience method to specify the schema (RowTypeInfo), > without having to use the "explicit implicit parameters" syntax and > specifying the ClassTag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2884) Apply JMH on HashVsSortMiniBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980348#comment-14980348 ] GaoLun commented on FLINK-2884: --- This class has some unknown faults when to run. It will be better to do some check work before refactor it with JMH. I will reopen this issue after some days. > Apply JMH on HashVsSortMiniBenchmark class. > --- > > Key: FLINK-2884 > URL: https://issues.apache.org/jira/browse/FLINK-2884 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > HashVsSortMiniBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2884) Apply JMH on HashVsSortMiniBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GaoLun closed FLINK-2884. - Resolution: Later > Apply JMH on HashVsSortMiniBenchmark class. > --- > > Key: FLINK-2884 > URL: https://issues.apache.org/jira/browse/FLINK-2884 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > HashVsSortMiniBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2892) Apply JMH on MemorySegmentSpeedBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GaoLun closed FLINK-2892. - Resolution: Not A Problem To keep the original benchmark method in MemorySegmentSpeedBenchmark class is a good choice I think. Its output is much clear and understandable. > Apply JMH on MemorySegmentSpeedBenchmark class. > --- > > Key: FLINK-2892 > URL: https://issues.apache.org/jira/browse/FLINK-2892 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > MemorySegmentSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2892) Apply JMH on MemorySegmentSpeedBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977598#comment-14977598 ] GaoLun commented on FLINK-2892: --- IMO, if I modify this class with JMH, the result will be much more complex. Because I must rewrite all the test into small benchmark test method. Method number is almost 96! It will be too long to read and understand. I don’t think it’s a good idea to modify this class with JMH. Also there is no rule that all benchmark in flink-benchmark need to be ported to JMH. So I will close the issue. > Apply JMH on MemorySegmentSpeedBenchmark class. > --- > > Key: FLINK-2892 > URL: https://issues.apache.org/jira/browse/FLINK-2892 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > MemorySegmentSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2920) Apply JMH on KryoVersusAvroMinibenchmark class.
GaoLun created FLINK-2920: - Summary: Apply JMH on KryoVersusAvroMinibenchmark class. Key: FLINK-2920 URL: https://issues.apache.org/jira/browse/FLINK-2920 Project: Flink Issue Type: Sub-task Components: Tests Reporter: GaoLun Assignee: GaoLun Priority: Minor JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the KryoVersusAvroMinibenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.
GaoLun created FLINK-2919: - Summary: Apply JMH on FieldAccessMinibenchmark class. Key: FLINK-2919 URL: https://issues.apache.org/jira/browse/FLINK-2919 Project: Flink Issue Type: Sub-task Components: Tests Reporter: GaoLun Assignee: GaoLun Priority: Minor JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the FieldAccessMinibenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2892) Apply JMH on MemorySegmentSpeedBenchmark class.
GaoLun created FLINK-2892: - Summary: Apply JMH on MemorySegmentSpeedBenchmark class. Key: FLINK-2892 URL: https://issues.apache.org/jira/browse/FLINK-2892 Project: Flink Issue Type: Sub-task Components: Tests Reporter: GaoLun Assignee: GaoLun Priority: Minor JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the MemorySegmentSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2848) Refactor Flink benchmarks with JMH and move to flink-benchmark module
[ https://issues.apache.org/jira/browse/FLINK-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968753#comment-14968753 ] GaoLun commented on FLINK-2848: --- Hi,Chesnay. There are eight different benchmark class to modify.IMO,dividing the task to eight sub-tasks can raise the efficiency of review work. > Refactor Flink benchmarks with JMH and move to flink-benchmark module > - > > Key: FLINK-2848 > URL: https://issues.apache.org/jira/browse/FLINK-2848 > Project: Flink > Issue Type: Test > Components: Tests >Reporter: Chengxiang Li >Assignee: Chengxiang Li >Priority: Minor > > There are many flink internal micro benchmarks in different modules, which > are coarse measured(by System.currentNanoTime()...), and with no warmup or > multi iteration test. This is an umbrella JIRA to refactor these micro > benchmarks and move to flink-benchmark module for central management. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2890) Apply JMH on StringSerializationSpeedBenchmark class.
GaoLun created FLINK-2890: - Summary: Apply JMH on StringSerializationSpeedBenchmark class. Key: FLINK-2890 URL: https://issues.apache.org/jira/browse/FLINK-2890 Project: Flink Issue Type: Sub-task Components: Tests Reporter: GaoLun Assignee: GaoLun Priority: Minor JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the StringSerializationSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class
GaoLun created FLINK-2889: - Summary: Apply JMH on LongSerializationSpeedBenchmark class Key: FLINK-2889 URL: https://issues.apache.org/jira/browse/FLINK-2889 Project: Flink Issue Type: Sub-task Components: Tests Reporter: GaoLun Assignee: GaoLun Priority: Minor JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the LongSerializationSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2884) Apply JMH on HashVsSortMiniBenchmark class.
GaoLun created FLINK-2884: - Summary: Apply JMH on HashVsSortMiniBenchmark class. Key: FLINK-2884 URL: https://issues.apache.org/jira/browse/FLINK-2884 Project: Flink Issue Type: Sub-task Components: Tests Reporter: GaoLun Assignee: GaoLun Priority: Minor JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the HashVsSortMiniBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2869) Apply JMH on IOManagerPerformanceBenchmark class.
GaoLun created FLINK-2869: - Summary: Apply JMH on IOManagerPerformanceBenchmark class. Key: FLINK-2869 URL: https://issues.apache.org/jira/browse/FLINK-2869 Project: Flink Issue Type: Sub-task Components: Tests Reporter: GaoLun Assignee: GaoLun Priority: Minor JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2853) Apply JMH on MutableHashTablePerformanceBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GaoLun updated FLINK-2853: -- Summary: Apply JMH on MutableHashTablePerformanceBenchmark class. (was: Apply JMH on Flink benchmarks) > Apply JMH on MutableHashTablePerformanceBenchmark class. > > > Key: FLINK-2853 > URL: https://issues.apache.org/jira/browse/FLINK-2853 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2853) Apply JMH on Flink benchmarks
GaoLun created FLINK-2853: - Summary: Apply JMH on Flink benchmarks Key: FLINK-2853 URL: https://issues.apache.org/jira/browse/FLINK-2853 Project: Flink Issue Type: Sub-task Components: Tests Reporter: GaoLun Assignee: GaoLun Priority: Minor JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2535) Fixed size sample algorithm optimization
[ https://issues.apache.org/jira/browse/FLINK-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GaoLun updated FLINK-2535: -- Attachment: sampling.png Statistical data of rejected items' number with SRS & SSRS. > Fixed size sample algorithm optimization > > > Key: FLINK-2535 > URL: https://issues.apache.org/jira/browse/FLINK-2535 > Project: Flink > Issue Type: Improvement > Components: Core >Reporter: Chengxiang Li >Priority: Minor > Attachments: sampling.png > > > Fixed size sample algorithm is known to be less efficient than sample > algorithms with fraction, but sometime it's necessary. Some optimization > could significantly reduce the storage size and computation cost, such as the > algorithm described in [this > paper|http://machinelearning.wustl.edu/mlpapers/papers/icml2013_meng13a]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2535) Fixed size sample algorithm optimization
[ https://issues.apache.org/jira/browse/FLINK-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726870#comment-14726870 ] GaoLun commented on FLINK-2535: --- Hi.I have replaced the sampling algorithm with scalable simple random sampling based on the paper. And I have done some test to compare the performance between the two sampling method.Here are some statistical data of rejected items' number : (source_size = 1000) SampleSize SRS SSRS 100 9998727 9998488 100 99987559998493 500 99945029994081 500 99946249994029 100099897819989061 100099896579989132 500099567809956061 500099568129956018 1 99216019919174 1 99208669919396 5 96850859682182 5 96852039681611 10 94393459435887 10 94404699435521 50 80015357998046 50 80003707996807 100 66997526690612 100 66961756692111 500 15348141530409 500 15340881529784 With the statistical data, we can see the number of items SRS rejected is more than SSRS but isn't obvious. > Fixed size sample algorithm optimization > > > Key: FLINK-2535 > URL: https://issues.apache.org/jira/browse/FLINK-2535 > Project: Flink > Issue Type: Improvement > Components: Core >Reporter: Chengxiang Li >Priority: Minor > > Fixed size sample algorithm is known to be less efficient than sample > algorithms with fraction, but sometime it's necessary. Some optimization > could significantly reduce the storage size and computation cost, such as the > algorithm described in [this > paper|http://machinelearning.wustl.edu/mlpapers/papers/icml2013_meng13a]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2077) Rework Path class and add extend support for Windows paths
[ https://issues.apache.org/jira/browse/FLINK-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693036#comment-14693036 ] GaoLun commented on FLINK-2077: --- Hi, Fabian. what do you mean about 'path like //host/dir1/dir2' ? In the dir1 or dir2 ,there must hava several '/' .How to pick out dir1 and dir2 with a slash '/' > Rework Path class and add extend support for Windows paths > -- > > Key: FLINK-2077 > URL: https://issues.apache.org/jira/browse/FLINK-2077 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: GaoLun >Priority: Minor > Labels: starter > > The class {{org.apache.flink.core.fs.Path}} handles paths for Flink's > {{FileInputFormat}} and {{FileOutputFormat}}. Over time, this class has > become quite hard to read and modify. > It would benefit from some cleaning and refactoring. Along with the > refactoring, support for Windows paths like {{//host/dir1/dir2}} could be > added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2077) Rework Path class and add extend support for Windows paths
[ https://issues.apache.org/jira/browse/FLINK-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681157#comment-14681157 ] GaoLun commented on FLINK-2077: --- If no one work for this issue , i will work it. > Rework Path class and add extend support for Windows paths > -- > > Key: FLINK-2077 > URL: https://issues.apache.org/jira/browse/FLINK-2077 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 0.9 >Reporter: Fabian Hueske >Priority: Minor > Labels: starter > > The class {{org.apache.flink.core.fs.Path}} handles paths for Flink's > {{FileInputFormat}} and {{FileOutputFormat}}. Over time, this class has > become quite hard to read and modify. > It would benefit from some cleaning and refactoring. Along with the > refactoring, support for Windows paths like {{//host/dir1/dir2}} could be > added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)