[jira] [Commented] (FLINK-5413) Convert TableEnvironmentITCases to unit tests

2017-01-14 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822775#comment-15822775
 ] 

GaoLun commented on FLINK-5413:
---

Hi Timo, is it proper just validating the correctness of the table schema which 
created by the given conditions?

> Convert TableEnvironmentITCases to unit tests
> -
>
> Key: FLINK-5413
> URL: https://issues.apache.org/jira/browse/FLINK-5413
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: GaoLun
>
> The following IT cases could be converted into unit tests:
> - {{org.apache.flink.table.api.scala.batch.TableEnvironmentITCase}}
> - {{org.apache.flink.table.api.java.batch.TableEnvironmentITCase}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-5413) Convert TableEnvironmentITCases to unit tests

2017-01-13 Thread GaoLun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GaoLun reassigned FLINK-5413:
-

Assignee: GaoLun

> Convert TableEnvironmentITCases to unit tests
> -
>
> Key: FLINK-5413
> URL: https://issues.apache.org/jira/browse/FLINK-5413
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: GaoLun
>
> The following IT cases could be converted into unit tests:
> - {{org.apache.flink.table.api.scala.batch.TableEnvironmentITCase}}
> - {{org.apache.flink.table.api.java.batch.TableEnvironmentITCase}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5434) Remove unsupported project() transformation from Scala DataStream docs

2017-01-13 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822045#comment-15822045
 ] 

GaoLun commented on FLINK-5434:
---

Just remove from the doc as the following pr. Should we create another jira to 
support the operator for the scala data stream api ?

> Remove unsupported project() transformation from Scala DataStream docs
> --
>
> Key: FLINK-5434
> URL: https://issues.apache.org/jira/browse/FLINK-5434
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Vasia Kalavri
>
> The Scala DataStream does not have a project() transformation, yet the docs 
> include it as a supported operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4263) SQL's VALUES does not work properly

2016-07-27 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395370#comment-15395370
 ] 

GaoLun commented on FLINK-4263:
---

[~jark], [~twalthr] i debug this issue and found that when using 
{{getExecutionEnvironment}} (default: {{LocalEnvironment}}), this exception 
will throw and while using {{CollectionsEnvironment}}, it works correctly. 
Because with the implementation of {{DataSetValues}}, the row serializer is not 
written into {{ValuesInputFormat}}. And {{InstantiationUtil}}: 
{code}
public static byte[] serializeObject(Object o) throws IOException {
try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new 
ObjectOutputStream(baos)) {
oos.writeObject(o);
{code}
{{oos}} can't find serializer and throws exception. 
But for {{CollectionsEnvironment}}, it creates serializer for row type 
({{GenericDataSourceBase.225}}):
{code}
InputSplit[] splits = inputFormat.createInputSplits(1);
TypeSerializer serializer = 
getOperatorInfo().getOutputType().createSerializer(executionConfig);
{code}
We can create row serializer in {{DataSetValues}} and override the 
{{writeObject}} in {{ValuesInputFormat}} to fix this bug.

> SQL's VALUES does not work properly
> ---
>
> Key: FLINK-4263
> URL: https://issues.apache.org/jira/browse/FLINK-4263
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.1.0
>Reporter: Timo Walther
>Assignee: Jark Wu
>
> Executing the following SQL leads to very strange output:
> {code}
> SELECT  *
> FROM(
> VALUES
> (1, 2),
> (3, 4)
> ) AS q (col1, col2)"
> {code}
> {code}
> org.apache.flink.optimizer.CompilerException: Error translating node 'Data 
> Source "at translateToPlan(DataSetValues.scala:88) 
> (org.apache.flink.api.table.runtime.ValuesInputFormat)" : NONE [[ 
> GlobalProperties [partitioning=RANDOM_PARTITIONED] ]] [[ LocalProperties 
> [ordering=null, grouped=null, unique=null] ]]': Could not write the user code 
> wrapper class 
> org.apache.flink.api.common.operators.util.UserCodeObjectWrapper : 
> java.io.NotSerializableException: org.apache.flink.api.table.Row
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:381)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:106)
>   at 
> org.apache.flink.optimizer.plan.SourcePlanNode.accept(SourcePlanNode.java:86)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
>   at 
> org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:128)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:192)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:170)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)
>   at 
> org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:637)
>   at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:547)
>   at 
> org.apache.flink.api.scala.batch.sql.SortITCase.testOrderByMultipleFieldsWithSql(SortITCase.scala:56)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
> Caused by: 
> org.apache.flink.runtime.operators.util.CorruptConfigurationException: Could 
> not write the user code wrapper class 
> org.apache.flink.api.common.operators.util.UserCodeObjectWrapper : 
> java.io.NotSerializableException: org.apache.flink.api.table.Row
>   at 
> org.apache.flink.runtime.operators.util.TaskConfig.setStubWrapper(TaskConfig.java:279)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.createDataSourceVertex(JobGraphGenerator.java:888)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:281)
>   ... 51 more
> Caused by: java.io.NotSerializableException: org.apache.flink.api.table.Row
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
>   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
>   at java.io.O

[jira] [Commented] (FLINK-4242) Improve validation exception messages

2016-07-23 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390749#comment-15390749
 ] 

GaoLun commented on FLINK-4242:
---

Yes, currently some messages of table exception aren't readable and clear for 
the users. 
I will open a PR for this issue.

> Improve validation exception messages
> -
>
> Key: FLINK-4242
> URL: https://issues.apache.org/jira/browse/FLINK-4242
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> The Table API's validation exceptions could be improved to be more meaningful 
> for users. For example, the following code snippet:
> {code}
> Table inputTable = tableEnv.fromDataStream(env.fromElements(
> Tuple3.of(1, "a", 1.0),
> Tuple3.of(2, "b", 2.0),
> Tuple3.of(3, "c", 3.0)), "a, b, c");
> inputTable.select("a").where("!a");
> {code}
> fails correctly. However, the validation exception message says "Expression 
> !('a) failed on input check: Not only accepts child of Boolean Type, get 
> Integer". I think it could be changed such that it says: "The not operator 
> requires a boolean input but "a" is of type integer." or something similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3940) Add support for ORDER BY OFFSET FETCH

2016-07-22 Thread GaoLun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GaoLun reassigned FLINK-3940:
-

Assignee: GaoLun

> Add support for ORDER BY OFFSET FETCH
> -
>
> Key: FLINK-3940
> URL: https://issues.apache.org/jira/browse/FLINK-3940
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Assignee: GaoLun
>Priority: Minor
>
> Currently only ORDER BY without OFFSET and FETCH are supported.
> This issue tracks the effort to add support for OFFSET and FETCH and involves:
> - Implementing the execution strategy in `DataSetSort`
> - adapting the `DataSetSortRule` to support OFFSET and FETCH
> - extending the Table API and validation to support OFFSET and FETCH and 
> generate a corresponding RelNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3940) Add support for ORDER BY OFFSET FETCH

2016-07-21 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388752#comment-15388752
 ] 

GaoLun commented on FLINK-3940:
---

Hello, i would like to work on this issue. :)

> Add support for ORDER BY OFFSET FETCH
> -
>
> Key: FLINK-3940
> URL: https://issues.apache.org/jira/browse/FLINK-3940
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Priority: Minor
>
> Currently only ORDER BY without OFFSET and FETCH are supported.
> This issue tracks the effort to add support for OFFSET and FETCH and involves:
> - Implementing the execution strategy in `DataSetSort`
> - adapting the `DataSetSortRule` to support OFFSET and FETCH
> - extending the Table API and validation to support OFFSET and FETCH and 
> generate a corresponding RelNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2985) Allow different field names for unionAll() in Table API

2016-06-06 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317673#comment-15317673
 ] 

GaoLun edited comment on FLINK-2985 at 6/7/16 2:12 AM:
---

The refactoring work has been finished. If nobody working on this, i will go on 
with it. :)


was (Author: gallenvara_bg):
The refactoring work has been finished and i will go on with this issue. :)

> Allow different field names for unionAll() in Table API
> ---
>
> Key: FLINK-2985
> URL: https://issues.apache.org/jira/browse/FLINK-2985
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Reporter: Timo Walther
>Priority: Minor
>
> The recently merged `unionAll` operator checks if the field names of the left 
> and right side are equal. Actually, this is not necessary. The union operator 
> in SQL checks only the types and uses the names of left side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2985) Allow different field names for unionAll() in Table API

2016-06-06 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317673#comment-15317673
 ] 

GaoLun commented on FLINK-2985:
---

The refactoring work has been finished and i will go on with this issue. :)

> Allow different field names for unionAll() in Table API
> ---
>
> Key: FLINK-2985
> URL: https://issues.apache.org/jira/browse/FLINK-2985
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Reporter: Timo Walther
>Priority: Minor
>
> The recently merged `unionAll` operator checks if the field names of the left 
> and right side are equal. Actually, this is not necessary. The union operator 
> in SQL checks only the types and uses the names of left side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3971) Aggregates handle null values incorrectly.

2016-05-26 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301660#comment-15301660
 ] 

GaoLun commented on FLINK-3971:
---

[~fhueske] If no one work on this issue, i would have a try.

> Aggregates handle null values incorrectly.
> --
>
> Key: FLINK-3971
> URL: https://issues.apache.org/jira/browse/FLINK-3971
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Priority: Critical
> Fix For: 1.1.0
>
>
> Table API and SQL aggregates are supposed to ignore null values, e.g., 
> {{sum(1,2,null,4)}} is supposed to return {{7}}. 
> There current implementation is correct if at least one valid value is 
> present however, is incorrect if only null values are aggregated. {{sum(null, 
> null, null)}} should return {{null}} instead of {{0}}
> Currently only the Count aggregate handles the case of null-values-only 
> correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-12 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281600#comment-15281600
 ] 

GaoLun commented on FLINK-3879:
---

[~greghogan][~vkalavri] Hi, I have done some work and FLINK-2044 supports 
converge threshold now. :) 

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-09 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277514#comment-15277514
 ] 

GaoLun commented on FLINK-3879:
---

Scatter-gather model divides original iteration into two parts. For 
performance, 3879 is better. Maybe it's good to keep both with different path. 
One for performance, another for being consistent with other algorithm 
implementations used same iteration model. :)

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2184) Cannot get last element with maxBy/minBy

2016-05-09 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276262#comment-15276262
 ] 

GaoLun commented on FLINK-2184:
---

I will go on with this issue. :)

> Cannot get last element with maxBy/minBy
> 
>
> Key: FLINK-2184
> URL: https://issues.apache.org/jira/browse/FLINK-2184
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala API, Streaming
>Reporter: Gábor Hermann
>Priority: Minor
>
> In the streaming Scala API there is no method
> {{maxBy(int positionToMaxBy, boolean first)}}
> nor
> {{minBy(int positionToMinBy, boolean first)}}
> like in the Java API, where _first_ set to {{true}} indicates that the latest 
> found element will return.
> These methods should be added to the Scala API too, in order to be consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-07 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275265#comment-15275265
 ] 

GaoLun commented on FLINK-3879:
---

yes, make sense.

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-07 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275242#comment-15275242
 ] 

GaoLun commented on FLINK-3879:
---

Hub values are same but authority values have a little difference.

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-07 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275240#comment-15275240
 ] 

GaoLun commented on FLINK-3879:
---

My: (1,0.847998304005088,0.0), (2,0.5299989400031799,0.5144957554275266), 
(3,0.0,0.8574929257125442)
Yours: (1,0.8479983040050879,0.0), (2,0.5299989400031799,0.5240974256643347), 
(3,0.0,0.8516583167045438)

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-07 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275160#comment-15275160
 ] 

GaoLun commented on FLINK-3879:
---

Hi [~greghogan], the PR of FLINK-2044 has been updated and support returning 
both value now. And i have changed the normalization method from sum to square 
sum.
I wrote a simple test for your implementation to compare the result with mine, 
but i find the result is different.
For a simple graph: {{1->2, 1->3, 2->3}} with one iteration
result :
Mine:  {{(1,0.8320502943378436,0.0), (2,0.554700196225229,0.4472135954999579), 
(3,0.0,0.894427190159)}}
Yours: {{(1,0.8320502943378437,0.0), (2,0.5547001962252291,0.5144957554275265), 
(3,0.0,0.8574929257125441)}}
We can calculate the hub/authority value manually, the result should be:
{{(1, sqrt(9/13), 0.0), (2,sqrt(4/13), 1/sqrt(5)), (3, 0.0, 2/sqrt(5))}}
which is a little different with yours.

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-01 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265783#comment-15265783
 ] 

GaoLun commented on FLINK-2044:
---

I will go on with this issue. :)

> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2220) Join on Pojo without hashCode() silently fails

2016-04-26 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259367#comment-15259367
 ] 

GaoLun commented on FLINK-2220:
---

I wrote test for generictype and the problem still arose. The small fix PR 
added log warning if POJO used as key but do not override the two method.

> Join on Pojo without hashCode() silently fails
> --
>
> Key: FLINK-2220
> URL: https://issues.apache.org/jira/browse/FLINK-2220
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.9, 0.8.1
>Reporter: Marcus Leich
>
> I need to perform a join using a complete Pojo as join key.
> With DOP > 1 this only works if the Pojo comes with a meaningful hasCode() 
> implementation, as otherwise equal objects will get hashed to different 
> partitions based on their memory address and not on the content.
> I guess it's fine if users are required to implement hasCode() themselves, 
> but it would be nice of documentation or better yet, Flink itself could alert 
> users that this is a requirement, similar to how Comparable is required for 
> keys.
> Use the following code to reproduce the issue:
> public class Pojo implements Comparable {
> public byte[] data;
> public Pojo () {
> }
> public Pojo (byte[] data) {
> this.data = data;
> }
> @Override
> public int compareTo(Pojo o) {
> return UnsignedBytes.lexicographicalComparator().compare(data, 
> o.data);
> }
> // uncomment me for making the join work
> /* @Override
> public int hashCode() {
> return Arrays.hashCode(data);
> }*/
> }
> public void testJoin () throws Exception {
> final ExecutionEnvironment env = 
> ExecutionEnvironment.createLocalEnvironment();
> env.setParallelism(4);
> DataSet> left = env.fromElements(
> new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "black"),
> new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), "red"),
> new Tuple2<>(new Pojo(new byte[] {1}), "Spark"),
> new Tuple2<>(new Pojo(new byte[] {2}), "good"),
> new Tuple2<>(new Pojo(new byte[] {5}), "bug"));
> DataSet> right = env.fromElements(
> new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "white"),
> new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), 
> "green"),
> new Tuple2<>(new Pojo(new byte[] {1}), "Flink"),
> new Tuple2<>(new Pojo(new byte[] {2}), "evil"),
> new Tuple2<>(new Pojo(new byte[] {5}), "fix"));
> // will not print anything unless Pojo has a real hashCode() 
> implementation
> 
> left.join(right).where(0).equalTo(0).projectFirst(1).projectSecond(1).print();
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3783) Support weighted random sampling with reservoir

2016-04-18 Thread GaoLun (JIRA)
GaoLun created FLINK-3783:
-

 Summary: Support weighted random sampling with reservoir
 Key: FLINK-3783
 URL: https://issues.apache.org/jira/browse/FLINK-3783
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


In default random sampling, all items have the same probability to be selected. 
But in weighted random sampling, the probability of each item to be selected is 
determined by its weight with respect to the weights of the other items.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3729) Several SQL tests fail on Windows OS

2016-04-12 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237523#comment-15237523
 ] 

GaoLun commented on FLINK-3729:
---

The problem is the {{ast}} would use \r\n in Windows environment. The {{ast}} 
is created by {{RelOptUtil.toString(relNode)}} which would use different 
line-endings depend on the OS. And we can solve this issue as you mentioned:
1: modify the ast 
{code}
val ast = RelOptUtil.toString(relNode).replaceAll("\r\n", "\n")
{code}
2: For every test file, add another one with Windows version.

> Several SQL tests fail on Windows OS
> 
>
> Key: FLINK-3729
> URL: https://issues.apache.org/jira/browse/FLINK-3729
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 1.0.1
>Reporter: Chesnay Schepler
>
> The Table API SqlExplain(Test/ITCase) fail categorically on Windows due to 
> different line-endings. These tests generate an string representation of an 
> abstract syntax tree; problem is there is a difference in line-endings.
> The expected ones contain LF, the actual one CRLF.
> The tests should be either changed to either
> * include CRLF line-endings in the expected string when run on windows
> * always use LF line-endings regardless of OS
> * use a compare method that is aware of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3192) Add explain support to print ast and sql physical execution plan.

2015-12-25 Thread GaoLun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GaoLun updated FLINK-3192:
--
Description: Table API doesn't support sql-explanation now. Add the explain 
support to print ast (abstract syntax tree) and the physical execution plan of 
sql.  (was: Table API doesn't support sql-explanation now. Add the explain 
support to print ast (abstract syntax tree) and the physical execution of sql.)

> Add explain support to print ast and sql physical execution plan. 
> --
>
> Key: FLINK-3192
> URL: https://issues.apache.org/jira/browse/FLINK-3192
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: features
>
> Table API doesn't support sql-explanation now. Add the explain support to 
> print ast (abstract syntax tree) and the physical execution plan of sql.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3192) Add explain support to print ast and sql physical execution plan.

2015-12-25 Thread GaoLun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GaoLun updated FLINK-3192:
--
Summary: Add explain support to print ast and sql physical execution plan.  
 (was: Add explain support to print ast and sql physical execution. )

> Add explain support to print ast and sql physical execution plan. 
> --
>
> Key: FLINK-3192
> URL: https://issues.apache.org/jira/browse/FLINK-3192
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: features
>
> Table API doesn't support sql-explanation now. Add the explain support to 
> print ast (abstract syntax tree) and the physical execution of sql.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3192) Add explain support to print ast and sql physical execution.

2015-12-25 Thread GaoLun (JIRA)
GaoLun created FLINK-3192:
-

 Summary: Add explain support to print ast and sql physical 
execution. 
 Key: FLINK-3192
 URL: https://issues.apache.org/jira/browse/FLINK-3192
 Project: Flink
  Issue Type: New Feature
  Components: Table API
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


Table API doesn't support sql-explanation now. Add the explain support to print 
ast (abstract syntax tree) and the physical execution of sql.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2988) Cannot load DataSet[Row] from CSV file

2015-11-24 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023998#comment-15023998
 ] 

GaoLun edited comment on FLINK-2988 at 11/25/15 1:45 AM:
-

Hi, [~jkovacs] . Is this problem solved?
IMO, readCsvFile can only support the *tuple* and *POJO* class which wouldn't 
support nullable fields. While reading the csv file, env created the 
tupleSerializer. RowSerializer can create certain serializer according to the 
field type. Row can also support null value which tuple can't.


was (Author: gallenvara_bg):
Hi, [~jkovacs] . Is this problem solved?
IMO, *readCsvFile* can only support the *tuple* and *POJO* class which wouldn't 
support nullable fields. While reading the csv file, *env* created the 
*tupleSerializer*. *RowSerializer* can create certain *serializer* according to 
the field type. *Row* can also support null value which *tuple* can't.

> Cannot load DataSet[Row] from CSV file
> --
>
> Key: FLINK-2988
> URL: https://issues.apache.org/jira/browse/FLINK-2988
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API, Table API
>Affects Versions: 0.10.0
>Reporter: Johann Kovacs
>Priority: Minor
>
> Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot 
> load a CSV file with more than 25 columns directly as a 
> DataSet\[TupleX\[...\]\].
> An alternative to using Tuples is using the Table API's Row class, which 
> allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using 
> RowTypeInfo) and index-based access.
> However, trying to load a CSV file as a DataSet\[Row\] yields an exception:
> {code}
> val env = ExecutionEnvironment.createLocalEnvironment()
> val filePath = "../someCsv.csv"
> val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, 
> BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number"))
> val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo)
> println(source.collect())
> {code}
> with someCsv.csv containing:
> {code}
> one,1
> two,2
> {code}
> yields
> {code}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase
>   at 
> org.apache.flink.api.scala.operators.ScalaCsvInputFormat.(ScalaCsvInputFormat.java:46)
>   at 
> org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282)
> {code}
> As a user I would like to be able to load a CSV file into a DataSet\[Row\], 
> preferably having a convenience method to specify the schema (RowTypeInfo), 
> without having to use the "explicit implicit parameters" syntax and 
> specifying the ClassTag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2988) Cannot load DataSet[Row] from CSV file

2015-11-24 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023998#comment-15023998
 ] 

GaoLun commented on FLINK-2988:
---

Hi, [~jkovacs] . Is this problem solved?
IMO, *readCsvFile* can only support the *tuple* and *POJO* class which wouldn't 
support nullable fields. While reading the csv file, *env* created the 
*tupleSerializer*. *RowSerializer* can create certain *serializer* according to 
the field type. *Row* can also support null value which *tuple* can't.

> Cannot load DataSet[Row] from CSV file
> --
>
> Key: FLINK-2988
> URL: https://issues.apache.org/jira/browse/FLINK-2988
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API, Table API
>Affects Versions: 0.10.0
>Reporter: Johann Kovacs
>Priority: Minor
>
> Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot 
> load a CSV file with more than 25 columns directly as a 
> DataSet\[TupleX\[...\]\].
> An alternative to using Tuples is using the Table API's Row class, which 
> allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using 
> RowTypeInfo) and index-based access.
> However, trying to load a CSV file as a DataSet\[Row\] yields an exception:
> {code}
> val env = ExecutionEnvironment.createLocalEnvironment()
> val filePath = "../someCsv.csv"
> val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, 
> BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number"))
> val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo)
> println(source.collect())
> {code}
> with someCsv.csv containing:
> {code}
> one,1
> two,2
> {code}
> yields
> {code}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase
>   at 
> org.apache.flink.api.scala.operators.ScalaCsvInputFormat.(ScalaCsvInputFormat.java:46)
>   at 
> org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282)
> {code}
> As a user I would like to be able to load a CSV file into a DataSet\[Row\], 
> preferably having a convenience method to specify the schema (RowTypeInfo), 
> without having to use the "explicit implicit parameters" syntax and 
> specifying the ClassTag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2884) Apply JMH on HashVsSortMiniBenchmark class.

2015-10-29 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980348#comment-14980348
 ] 

GaoLun commented on FLINK-2884:
---

This class has some unknown faults when to run. It will be better to do some 
check work before refactor it with JMH. I will reopen this issue after some 
days.

> Apply JMH on HashVsSortMiniBenchmark class.
> ---
>
> Key: FLINK-2884
> URL: https://issues.apache.org/jira/browse/FLINK-2884
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the 
> HashVsSortMiniBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2884) Apply JMH on HashVsSortMiniBenchmark class.

2015-10-29 Thread GaoLun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GaoLun closed FLINK-2884.
-
Resolution: Later

> Apply JMH on HashVsSortMiniBenchmark class.
> ---
>
> Key: FLINK-2884
> URL: https://issues.apache.org/jira/browse/FLINK-2884
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the 
> HashVsSortMiniBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2892) Apply JMH on MemorySegmentSpeedBenchmark class.

2015-10-27 Thread GaoLun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GaoLun closed FLINK-2892.
-
Resolution: Not A Problem

To keep the original benchmark method in MemorySegmentSpeedBenchmark class is a 
good choice I think. Its output is much clear and understandable.

> Apply JMH on MemorySegmentSpeedBenchmark class.
> ---
>
> Key: FLINK-2892
> URL: https://issues.apache.org/jira/browse/FLINK-2892
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the  
> MemorySegmentSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2892) Apply JMH on MemorySegmentSpeedBenchmark class.

2015-10-27 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977598#comment-14977598
 ] 

GaoLun commented on FLINK-2892:
---

IMO, if I modify this class with JMH, the result will be much more complex. 
Because I must rewrite all the test into small benchmark test method. Method 
number is almost 96! It will be too long to read and understand. I don’t think 
it’s a good idea to modify this class with JMH. Also there is no rule that all 
benchmark in flink-benchmark need to be ported to JMH. So I will close the 
issue.

> Apply JMH on MemorySegmentSpeedBenchmark class.
> ---
>
> Key: FLINK-2892
> URL: https://issues.apache.org/jira/browse/FLINK-2892
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the  
> MemorySegmentSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2920) Apply JMH on KryoVersusAvroMinibenchmark class.

2015-10-26 Thread GaoLun (JIRA)
GaoLun created FLINK-2920:
-

 Summary: Apply JMH on KryoVersusAvroMinibenchmark class.
 Key: FLINK-2920
 URL: https://issues.apache.org/jira/browse/FLINK-2920
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the  
KryoVersusAvroMinibenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.

2015-10-25 Thread GaoLun (JIRA)
GaoLun created FLINK-2919:
-

 Summary: Apply JMH on FieldAccessMinibenchmark class.
 Key: FLINK-2919
 URL: https://issues.apache.org/jira/browse/FLINK-2919
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the  
FieldAccessMinibenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2892) Apply JMH on MemorySegmentSpeedBenchmark class.

2015-10-22 Thread GaoLun (JIRA)
GaoLun created FLINK-2892:
-

 Summary: Apply JMH on MemorySegmentSpeedBenchmark class.
 Key: FLINK-2892
 URL: https://issues.apache.org/jira/browse/FLINK-2892
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the  
MemorySegmentSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2848) Refactor Flink benchmarks with JMH and move to flink-benchmark module

2015-10-22 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968753#comment-14968753
 ] 

GaoLun commented on FLINK-2848:
---

Hi,Chesnay. There are eight different benchmark class to modify.IMO,dividing 
the task to eight sub-tasks can raise the efficiency of review work.  

> Refactor Flink benchmarks with JMH and move to flink-benchmark module
> -
>
> Key: FLINK-2848
> URL: https://issues.apache.org/jira/browse/FLINK-2848
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
>
> There are many flink internal micro benchmarks in different modules, which 
> are coarse measured(by System.currentNanoTime()...), and with no warmup or 
> multi iteration test. This is an umbrella JIRA to refactor these micro 
> benchmarks and move to flink-benchmark module for central management.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2890) Apply JMH on StringSerializationSpeedBenchmark class.

2015-10-21 Thread GaoLun (JIRA)
GaoLun created FLINK-2890:
-

 Summary: Apply JMH on StringSerializationSpeedBenchmark class.
 Key: FLINK-2890
 URL: https://issues.apache.org/jira/browse/FLINK-2890
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the 
StringSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class

2015-10-21 Thread GaoLun (JIRA)
GaoLun created FLINK-2889:
-

 Summary: Apply JMH on LongSerializationSpeedBenchmark class
 Key: FLINK-2889
 URL: https://issues.apache.org/jira/browse/FLINK-2889
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the 
LongSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2884) Apply JMH on HashVsSortMiniBenchmark class.

2015-10-20 Thread GaoLun (JIRA)
GaoLun created FLINK-2884:
-

 Summary: Apply JMH on HashVsSortMiniBenchmark class.
 Key: FLINK-2884
 URL: https://issues.apache.org/jira/browse/FLINK-2884
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the 
HashVsSortMiniBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2869) Apply JMH on IOManagerPerformanceBenchmark class.

2015-10-18 Thread GaoLun (JIRA)
GaoLun created FLINK-2869:
-

 Summary: Apply JMH on IOManagerPerformanceBenchmark class.
 Key: FLINK-2869
 URL: https://issues.apache.org/jira/browse/FLINK-2869
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2853) Apply JMH on MutableHashTablePerformanceBenchmark class.

2015-10-15 Thread GaoLun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GaoLun updated FLINK-2853:
--
Summary: Apply JMH on MutableHashTablePerformanceBenchmark class.  (was: 
Apply JMH on Flink benchmarks)

> Apply JMH on MutableHashTablePerformanceBenchmark class.
> 
>
> Key: FLINK-2853
> URL: https://issues.apache.org/jira/browse/FLINK-2853
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2853) Apply JMH on Flink benchmarks

2015-10-14 Thread GaoLun (JIRA)
GaoLun created FLINK-2853:
-

 Summary: Apply JMH on Flink benchmarks
 Key: FLINK-2853
 URL: https://issues.apache.org/jira/browse/FLINK-2853
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2535) Fixed size sample algorithm optimization

2015-09-02 Thread GaoLun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GaoLun updated FLINK-2535:
--
Attachment: sampling.png

Statistical data of rejected items' number with SRS & SSRS.

> Fixed size sample algorithm optimization
> 
>
> Key: FLINK-2535
> URL: https://issues.apache.org/jira/browse/FLINK-2535
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Priority: Minor
> Attachments: sampling.png
>
>
> Fixed size sample algorithm is known to be less efficient than sample 
> algorithms with fraction, but sometime it's necessary. Some optimization 
> could significantly reduce the storage size and computation cost, such as the 
> algorithm described in [this 
> paper|http://machinelearning.wustl.edu/mlpapers/papers/icml2013_meng13a].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2535) Fixed size sample algorithm optimization

2015-09-02 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726870#comment-14726870
 ] 

GaoLun commented on FLINK-2535:
---

Hi.I have replaced the sampling algorithm with scalable simple random sampling 
based on the paper. And I have done some test to compare the performance 
between the two sampling method.Here are some statistical data of rejected 
items' number : (source_size = 1000)

SampleSize  SRS SSRS
 100 9998727 9998488
 100 99987559998493
 500 99945029994081
 500 99946249994029
 100099897819989061
 100099896579989132
 500099567809956061
 500099568129956018
 1   99216019919174
 1   99208669919396
 5   96850859682182
 5   96852039681611
 10  94393459435887
 10  94404699435521
 50  80015357998046
 50  80003707996807
 100 66997526690612
 100 66961756692111
 500 15348141530409
 500 15340881529784

With the statistical data, we can see the number of items SRS rejected is more 
than SSRS but isn't obvious.

> Fixed size sample algorithm optimization
> 
>
> Key: FLINK-2535
> URL: https://issues.apache.org/jira/browse/FLINK-2535
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Priority: Minor
>
> Fixed size sample algorithm is known to be less efficient than sample 
> algorithms with fraction, but sometime it's necessary. Some optimization 
> could significantly reduce the storage size and computation cost, such as the 
> algorithm described in [this 
> paper|http://machinelearning.wustl.edu/mlpapers/papers/icml2013_meng13a].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2077) Rework Path class and add extend support for Windows paths

2015-08-12 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693036#comment-14693036
 ] 

GaoLun commented on FLINK-2077:
---

Hi, Fabian.
what do you mean about 'path like //host/dir1/dir2' ? 
In the dir1 or dir2 ,there must hava several '/' .How to pick out dir1 and dir2 
with a slash '/' 

> Rework Path class and add extend support for Windows paths
> --
>
> Key: FLINK-2077
> URL: https://issues.apache.org/jira/browse/FLINK-2077
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: GaoLun
>Priority: Minor
>  Labels: starter
>
> The class {{org.apache.flink.core.fs.Path}} handles paths for Flink's 
> {{FileInputFormat}} and {{FileOutputFormat}}. Over time, this class has 
> become quite hard to read and modify. 
> It would benefit from some cleaning and refactoring. Along with the 
> refactoring, support for Windows paths like {{//host/dir1/dir2}} could be 
> added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2077) Rework Path class and add extend support for Windows paths

2015-08-10 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681157#comment-14681157
 ] 

GaoLun commented on FLINK-2077:
---

If no one work for this issue , i will work it.

> Rework Path class and add extend support for Windows paths
> --
>
> Key: FLINK-2077
> URL: https://issues.apache.org/jira/browse/FLINK-2077
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Priority: Minor
>  Labels: starter
>
> The class {{org.apache.flink.core.fs.Path}} handles paths for Flink's 
> {{FileInputFormat}} and {{FileOutputFormat}}. Over time, this class has 
> become quite hard to read and modify. 
> It would benefit from some cleaning and refactoring. Along with the 
> refactoring, support for Windows paths like {{//host/dir1/dir2}} could be 
> added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)