[jira] [Commented] (FLINK-3850) Add forward field annotations to DataSet operators generated by the Table API

2016-12-07 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728834#comment-15728834
 ] 

Nikolay Vasilishin commented on FLINK-3850:
---

Hi, [~fhueske]!

I've read a topic about forward field annotations, it's clear for me.

But where is the place in code where these annotations should be added? I'm 
thinking about {{CodeGenerator}}, which generates different Map and other 
functions.

> Add forward field annotations to DataSet operators generated by the Table API
> -
>
> Key: FLINK-3850
> URL: https://issues.apache.org/jira/browse/FLINK-3850
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Fabian Hueske
>Assignee: Nikolay Vasilishin
>
> The DataSet API features semantic annotations [1] to hint the optimizer which 
> input fields an operator copies. This information is valuable for the 
> optimizer because it can infer that certain physical properties such as 
> partitioning or sorting are not destroyed by user functions and thus generate 
> more efficient execution plans.
> The Table API is built on top of the DataSet API and generates DataSet 
> programs and code for user-defined functions. Hence, it knows exactly which 
> fields are modified and which not. We should use this information to 
> automatically generate forward field annotations and attach them to the 
> operators. This can help to significantly improve the performance of certain 
> jobs.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/index.html#semantic-annotations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3850) Add forward field annotations to DataSet operators generated by the Table API

2016-12-07 Thread Nikolay Vasilishin (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Vasilishin reassigned FLINK-3850:
-

Assignee: Nikolay Vasilishin

> Add forward field annotations to DataSet operators generated by the Table API
> -
>
> Key: FLINK-3850
> URL: https://issues.apache.org/jira/browse/FLINK-3850
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Fabian Hueske
>Assignee: Nikolay Vasilishin
>
> The DataSet API features semantic annotations [1] to hint the optimizer which 
> input fields an operator copies. This information is valuable for the 
> optimizer because it can infer that certain physical properties such as 
> partitioning or sorting are not destroyed by user functions and thus generate 
> more efficient execution plans.
> The Table API is built on top of the DataSet API and generates DataSet 
> programs and code for user-defined functions. Hence, it knows exactly which 
> fields are modified and which not. We should use this information to 
> automatically generate forward field annotations and attach them to the 
> operators. This can help to significantly improve the performance of certain 
> jobs.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/index.html#semantic-annotations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4492) Cleanup files from canceled snapshots

2016-12-06 Thread Nikolay Vasilishin (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Vasilishin closed FLINK-4492.
-
Resolution: Cannot Reproduce

> Cleanup files from canceled snapshots
> -
>
> Key: FLINK-4492
> URL: https://issues.apache.org/jira/browse/FLINK-4492
> Project: Flink
>  Issue Type: Bug
>Reporter: Stefan Richter
>Assignee: Nikolay Vasilishin
>Priority: Minor
>
> Current checkpointing only closes CheckpointStateOutputStreams on cancel, but 
> incomplete files are not properly deleted from the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4565) Support for SQL IN operator

2016-11-28 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701964#comment-15701964
 ] 

Nikolay Vasilishin commented on FLINK-4565:
---

The problem above can be solved with proper RexNode construct. I've asked 
calcite community how to avoid this problem and temporarily applied a 
workaround with Reflection API. 
So, now I'm getting in CodeGenerator#visitSubQuery with RexNode which stands 
for column, left side of IN and RelNode, which stands for table on the right 
side of IN. 
If I'm not mistaken, I have to implement RelShuttle, which will deal with 
subquery and delegate it to Table methods.
But how to use results of subquery as predicate is still question for me.

> Support for SQL IN operator
> ---
>
> Key: FLINK-4565
> URL: https://issues.apache.org/jira/browse/FLINK-4565
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Nikolay Vasilishin
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But 
> it should also be available in the Table API and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-4565) Support for SQL IN operator

2016-11-25 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696107#comment-15696107
 ] 

Nikolay Vasilishin edited comment on FLINK-4565 at 11/25/16 4:38 PM:
-

[~fhueske], I've finally opened the PR, 
[https://github.com/apache/flink/pull/2870].

And what's about subqueries.
As I've seen above, there are couple of ways to implement it: 
 - create table with inner join somewhere at the beginning
 - try to use calcite's IN operator, when constructing RexNode.

I tried to implement the second approach. It looks as it has to be 
([example|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-7c7c5dd5a5723b84b8d45424f04b2be5R68]).
In that case we need to [overload in 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-04d1bca648d7ee47ab9ce787c8d944a6R108]
 with Table argument, then somehow construct proper rex node in [toRexNode 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-1b0f642e7f9b75bde5062b89b0b873e8R28]
 [1] (now it constructs not-working plan, I'll show it below). Then it will be 
passed to 
[CodeGenerator|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-c0ee691580cf6752e6cb186ca4f0260dR980],
 where I have to implement visitSubquery method. Actually, it's not hard to 
generate code for all subquery's nodes, but I don't know, what code should be 
ganarated to link subquery with the rest query.

RexNode constructed whis way has this logical plan:
{noformat}
LogicalFilter(condition=[IN(IN($2, {
LogicalProject(c=[$2])
  LogicalFilter(condition=[=($1, 6)])
LogicalTableScan(table=[[_DataSetTable_0]])
}))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
There is duplicating IN call.

Logical plan generating for similar query in SQL API looks like this:
{noformat}
LogicalProject(a=[$0], c=[$2])
  LogicalJoin(condition=[=($1, $3)], joinType=[inner])
LogicalTableScan(table=[[_DataSetTable_0]])
LogicalAggregate(group=[{0}])
  LogicalProject(b=[$1])
LogicalFilter(condition=[AND(>=($1, 6), <=($1, 9))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
So, it's implemented via the first approach.

In the first approach it's not clear for me where we will get reference on 
first (left) table, as we invoke IN method on expressions like 'column. But I 
didn't thought about it well yet.

[1] I'm sorry, I forgot to change code to this:
{noformat}
val in: RexSubQuery = RexSubQuery.in(table.getRelNode, new 
ImmutableList.Builder[RexNode]().add(children.map(_.toRexNode): _*).build())
relBuilder.call(SqlStdOperatorTable.IN, in)
{noformat}
In this case there will be generated plan shown above


was (Author: nvasilishin):
[~fhueske], I've finally opened the PR, 
[https://github.com/apache/flink/pull/2870].

And what's about subqueries.
As I've seen above, there are couple of ways to implement it: 
 - create table with inner join somewhere at the beginning
 - try to use calcite's IN operator, when constructing RexNode.

I tried to implement the second approach. It looks as it has to be 
([example|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-7c7c5dd5a5723b84b8d45424f04b2be5R68]).
In that case we need to [overload in 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-04d1bca648d7ee47ab9ce787c8d944a6R108]
 with Table argument, then somehow construct proper rex node in [toRexNode 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-1b0f642e7f9b75bde5062b89b0b873e8R28]
 [1] (now it constructs not-working plan, I'll show it below). Then it will be 
passed to 
[CodeGenerator|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-c0ee691580cf6752e6cb186ca4f0260dR980],
 where I have to implement visitSubquery method. Actually, it's not hard to 
generate code for all subquery's nodes, but I don't know, what code should be 
ganarated to link subquery with the rest query.

RexNode constructed whis way has this logical plan:
{noformat}
LogicalFilter(condition=[IN(IN($2, {
LogicalProject(c=[$2])
  LogicalFilter(condition=[=($1, 6)])
LogicalTableScan(table=[[_DataSetTable_0]])
}))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
There is duplicating IN call.

Logical plan generating for similar query in SQL API looks like this:
{noformat}
LogicalProject(a=[$0], c=[$2])
  LogicalJoin(condition=[=($1, $3)], joinType=[inner])
LogicalTableScan(table=[[_DataSetTable_0]])
LogicalAggregate(group=[{0}])
  LogicalProject(b=[$1])
LogicalFilter(condition=[AND(>=($1, 6), <=($1, 9))])
  LogicalTableScan(table=[[_DataSetTable_0]])

[jira] [Comment Edited] (FLINK-4565) Support for SQL IN operator

2016-11-25 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696107#comment-15696107
 ] 

Nikolay Vasilishin edited comment on FLINK-4565 at 11/25/16 4:10 PM:
-

[~fhueske], I've finally opened the PR, 
[https://github.com/apache/flink/pull/2870].

And what's about subqueries.
As I've seen above, there are couple of ways to implement it: 
 - create table with inner join somewhere at the beginning
 - try to use calcite's IN operator, when constructing RexNode.

I tried to implement the second approach. It looks as it has to be 
([example|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-7c7c5dd5a5723b84b8d45424f04b2be5R68]).
In that case we need to [overload in 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-04d1bca648d7ee47ab9ce787c8d944a6R108]
 with Table argument, then somehow construct proper rex node in [toRexNode 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-1b0f642e7f9b75bde5062b89b0b873e8R28]
 [1] (now it constructs not-working plan, I'll show it below). Then it will be 
passed to 
[CodeGenerator|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-c0ee691580cf6752e6cb186ca4f0260dR980],
 where I have to implement visitSubquery method. Actually, it's not hard to 
generate code for all subquery's nodes, but I don't know, what code should be 
ganarated to link subquery with the rest query.

RexNode constructed whis way has this logical plan:
{noformat}
LogicalFilter(condition=[IN(IN($2, {
LogicalProject(c=[$2])
  LogicalFilter(condition=[=($1, 6)])
LogicalTableScan(table=[[_DataSetTable_0]])
}))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
There is duplicating IN call.

Logical plan generating for similar query in SQL API looks like this:
{noformat}
LogicalProject(a=[$0], c=[$2])
  LogicalJoin(condition=[=($1, $3)], joinType=[inner])
LogicalTableScan(table=[[_DataSetTable_0]])
LogicalAggregate(group=[{0}])
  LogicalProject(b=[$1])
LogicalFilter(condition=[AND(>=($1, 6), <=($1, 9))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
So, it's implemented via the first approach.

In the first approach it's not clear for me where we will get reference on 
first (left) table, as we invoke IN method on expressions like 'column. But I 
didn't thought about it well yet.

[1] I'm sorry, I forgot to change code to this:
{noformat}
val in: RexSubQuery = RexSubQuery.in(table.getRelNode, new 
ImmutableList.Builder[RexNode]().add(children.map(_.toRexNode): _*).build())
relBuilder.call(SqlStdOperatorTable.IN, in)
{noformat}
In this case there will be generated plan shown above


was (Author: nvasilishin):
[~fhueske], I've finally opened the PR, 
[https://github.com/apache/flink/pull/2870].

And what's about subqueries.
As I've seen above, there are couple of ways to implement it: 
 - create table with inner join somewhere at the beginning
 - try to use calcite's IN operator, when constructing RexNode.

I tried to implement the second approach. It looks as it has to be 
([example|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-7c7c5dd5a5723b84b8d45424f04b2be5R68]).
In that case we need to [overload in 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-04d1bca648d7ee47ab9ce787c8d944a6R108]
 with Table argument, then somehow construct proper rex node in [toRexNode 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-04d1bca648d7ee47ab9ce787c8d944a6R108]
 (now it constructs not-working plan, I'll show it below). Then it will be 
passed to 
[CodeGenerator|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-c0ee691580cf6752e6cb186ca4f0260dR980],
 where I have to implement visitSubquery method. Actually, it's not hard to 
generate code for all subquery's nodes, but I don't know, what code should be 
ganarated to link subquery with the rest query.

RexNode constructed whis way has this logical plan:
{noformat}
LogicalFilter(condition=[IN(IN($2, {
LogicalProject(c=[$2])
  LogicalFilter(condition=[=($1, 6)])
LogicalTableScan(table=[[_DataSetTable_0]])
}))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
There is duplicating IN call.

Logical plan generating for similar query in SQL API looks like this:
{noformat}
LogicalProject(a=[$0], c=[$2])
  LogicalJoin(condition=[=($1, $3)], joinType=[inner])
LogicalTableScan(table=[[_DataSetTable_0]])
LogicalAggregate(group=[{0}])
  LogicalProject(b=[$1])
LogicalFilter(condition=[AND(>=($1, 6), <=($1, 9))])
  LogicalTableScan(table=[[_DataSetTable_0]])

[jira] [Commented] (FLINK-4565) Support for SQL IN operator

2016-11-25 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696107#comment-15696107
 ] 

Nikolay Vasilishin commented on FLINK-4565:
---

[~fhueske], I've finally opened the PR, 
[https://github.com/apache/flink/pull/2870].

And what's about subqueries.
As I've seen above, there are couple of ways to implement it: 
 - create table with inner join somewhere at the beginning
 - try to use calcite's IN operator, when constructing RexNode.

I tried to implement the second approach. It looks as it has to be 
([example|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-7c7c5dd5a5723b84b8d45424f04b2be5R68]).
In that case we need to [overload in 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-04d1bca648d7ee47ab9ce787c8d944a6R108]
 with Table argument, then somehow construct proper rex node in [toRexNode 
method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-04d1bca648d7ee47ab9ce787c8d944a6R108]
 (now it constructs not-working plan, I'll show it below). Then it will be 
passed to 
[CodeGenerator|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-c0ee691580cf6752e6cb186ca4f0260dR980],
 where I have to implement visitSubquery method. Actually, it's not hard to 
generate code for all subquery's nodes, but I don't know, what code should be 
ganarated to link subquery with the rest query.

RexNode constructed whis way has this logical plan:
{noformat}
LogicalFilter(condition=[IN(IN($2, {
LogicalProject(c=[$2])
  LogicalFilter(condition=[=($1, 6)])
LogicalTableScan(table=[[_DataSetTable_0]])
}))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
There is duplicating IN call.

Logical plan generating for similar query in SQL API looks like this:
{noformat}
LogicalProject(a=[$0], c=[$2])
  LogicalJoin(condition=[=($1, $3)], joinType=[inner])
LogicalTableScan(table=[[_DataSetTable_0]])
LogicalAggregate(group=[{0}])
  LogicalProject(b=[$1])
LogicalFilter(condition=[AND(>=($1, 6), <=($1, 9))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
So, it's implemented via the first approach.

In the first approach it's not clear for me where we will get reference on 
first (left) table, as we invoke IN method on expressions like 'column. But I 
didn't thought about it well yet.

> Support for SQL IN operator
> ---
>
> Key: FLINK-4565
> URL: https://issues.apache.org/jira/browse/FLINK-4565
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Nikolay Vasilishin
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But 
> it should also be available in the Table API and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4565) Support for SQL IN operator

2016-11-24 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692849#comment-15692849
 ] 

Nikolay Vasilishin commented on FLINK-4565:
---

Hi, [~fhueske].

We've just decided to open a pull request yesterday to get some feedback on 
things done and show the way that I'm working with subqueries. 
So, it's almost done.

> Support for SQL IN operator
> ---
>
> Key: FLINK-4565
> URL: https://issues.apache.org/jira/browse/FLINK-4565
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Nikolay Vasilishin
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But 
> it should also be available in the Table API and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4565) Support for SQL IN operator

2016-11-15 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667527#comment-15667527
 ] 

Nikolay Vasilishin commented on FLINK-4565:
---

Thanks, guys.
I've implemented first two cases: disjunctive equality predicates for less then 
20 entries and reusable hashset for >= 20 predicates.

Can you give any advice, where I can find anything in code related to 3 
subissue? Where is the point in which I can access the subquery results and use 
them for IN operator?

> Support for SQL IN operator
> ---
>
> Key: FLINK-4565
> URL: https://issues.apache.org/jira/browse/FLINK-4565
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Nikolay Vasilishin
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But 
> it should also be available in the Table API and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4565) Support for SQL IN operator

2016-11-10 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653728#comment-15653728
 ] 

Nikolay Vasilishin commented on FLINK-4565:
---

Hi,  [~jark] ,

1. Thanks, I'll look for it in code.

2. In that case function call may be delegated to [suffixFunctionCall => Call 
methods|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L226]?
Can you explain me why, by which criteria some functions are implemented in 
their own "lazy val functions" (such as 
[SUM|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L169],
 
[IF|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L207],
 
[AVG|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L181])
 and others are called through  [suffixFunctionCall => Call 
methods|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L226]?
 

> Support for SQL IN operator
> ---
>
> Key: FLINK-4565
> URL: https://issues.apache.org/jira/browse/FLINK-4565
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Nikolay Vasilishin
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But 
> it should also be available in the Table API and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-4565) Support for SQL IN operator

2016-11-08 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648140#comment-15648140
 ] 

Nikolay Vasilishin edited comment on FLINK-4565 at 11/8/16 5:11 PM:


Hi, guys, I faced some problems.
Now I have IN operator for literals, subqueries are not supported yet.
You can find my code [on my 
github|https://github.com/NickolayVasilishin/flink/tree/FLINK-4565].
So, the problems are:
#   I’m using HashSet to check entry. The code generates in 
[ScalarOperators.scala|https://github.com/apache/flink/compare/master...NickolayVasilishin:FLINK-4565#diff-423fbbd7967ec8e9feee7c1b7062b884R106].
 But creating the object of HashSet and adding elements to it is placed into 
the body of  public void flatMap(..) method, which invokes for every row, as I 
understand. The comment above the 
[CodeGenerator#generateResultExpression|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/codegen/CodeGenerator.scala#L305]
 says that reusable code will be reused internally, but how to check if it 
works properly?
#   The problem in 
[ExpressionParser.scala|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala].
 Since I’ve implemented matching pattern for IN operator, it conflicts with 
initCap() function ([in this 
test|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/test/scala/org/apache/flink/api/table/expressions/ScalarFunctionsTest.scala#L156].
 During the expression parsing it goes through 
[ExpressionParser#functionIdent|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L79]
 method (where ‘not’-checks occur on operators such as AS, COUNT, IF and “my” 
IN), where it gets into my [suffixIn 
method|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L194]
 and fails with an exception: 
{noformat}
Could not parse expression at column 6: `(' expected but `i' found f0.initCap().
{noformat}
I expected that expression will go to the next check if current fails. 
Also my check cannot be the last check in this chain.
So what are ways to solve this problem? Maybe there is a solution to make 
matcher not so greedy? The easiest way I think is to rename IN operator to ISIN 
operator like it is implemented in Spark.


Appreciate any help and thanks in advance.



was (Author: nvasilishin):
Hi, guys, I faced some problems.
Now I have IN operator for literals, subqueries are not supported yet.
You can find my code [on my 
github|https://github.com/NickolayVasilishin/flink/tree/FLINK-4565].
So, the problems are:
#   I’m using HashSet to check entry. The code generates in 
[ScalarOperators.scala|https://github.com/apache/flink/compare/master...NickolayVasilishin:FLINK-4565#diff-423fbbd7967ec8e9feee7c1b7062b884R106].
 But creating the object of HashSet and adding elements to it is placing into 
the body of  public void flatMap(..) method, which invokes for every row, as I 
understand. The comment above the 
[CodeGenerator#generateResultExpression|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/codegen/CodeGenerator.scala#L305]
 says that reusable code will be reused internally, but how to check if it 
works properly?
#   The problem in 
[ExpressionParser.scala|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala].
 Since I’ve implemented matching pattern for IN operator, it conflicts with 
initCap() function ([in this 
test|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/test/scala/org/apache/flink/api/table/expressions/ScalarFunctionsTest.scala#L156].
 During the expression parsing it goes through 
[ExpressionParser#functionIdent|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L79]
 method (where ‘not’-checks occur on operators such as AS, COUNT, IF and “my” 
IN), where it gets into my [suffixIn 
method|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L194]
 and fails with an exception: 
{noformat}
Could not parse expression at column 6: `(' expected but `i' found f0.initCap().
{noformat}
I expected that expression will go to the next check if current fails. 
Also my check cannot 

[jira] [Commented] (FLINK-4565) Support for SQL IN operator

2016-11-08 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648140#comment-15648140
 ] 

Nikolay Vasilishin commented on FLINK-4565:
---

Hi, guys, I faced some problems.
Now I have IN operator for literals, subqueries are not supported yet.
You can find my code [on my 
github|https://github.com/NickolayVasilishin/flink/tree/FLINK-4565].
So, the problems are:
#   I’m using HashSet to check entry. The code generates in 
[ScalarOperators.scala|https://github.com/apache/flink/compare/master...NickolayVasilishin:FLINK-4565#diff-423fbbd7967ec8e9feee7c1b7062b884R106].
 But creating the object of HashSet and adding elements to it is placing into 
the body of  public void flatMap(..) method, which invokes for every row, as I 
understand. The comment above the 
[CodeGenerator#generateResultExpression|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/codegen/CodeGenerator.scala#L305]
 says that reusable code will be reused internally, but how to check if it 
works properly?
#   The problem in 
[ExpressionParser.scala|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala].
 Since I’ve implemented matching pattern for IN operator, it conflicts with 
initCap() function ([in this 
test|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/test/scala/org/apache/flink/api/table/expressions/ScalarFunctionsTest.scala#L156].
 During the expression parsing it goes through 
[ExpressionParser#functionIdent|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L79]
 method (where ‘not’-checks occur on operators such as AS, COUNT, IF and “my” 
IN), where it gets into my [suffixIn 
method|https://github.com/NickolayVasilishin/flink/blob/FLINK-4565/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala#L194]
 and fails with an exception: 
{noformat}
Could not parse expression at column 6: `(' expected but `i' found f0.initCap().
{noformat}
I expected that expression will go to the next check if current fails. 
Also my check cannot be the last check in this chain.
So what are ways to solve this problem? Maybe there is a solution to make 
matcher not so greedy? The easiest way I think is to rename IN operator to ISIN 
operator like it is implemented in Spark.


Appreciate any help and thanks in advance.


> Support for SQL IN operator
> ---
>
> Key: FLINK-4565
> URL: https://issues.apache.org/jira/browse/FLINK-4565
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Nikolay Vasilishin
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But 
> it should also be available in the Table API and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4492) Cleanup files from canceled snapshots

2016-11-07 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644006#comment-15644006
 ] 

Nikolay Vasilishin commented on FLINK-4492:
---

I offer to close this issue as it seems to be already resolved.

> Cleanup files from canceled snapshots
> -
>
> Key: FLINK-4492
> URL: https://issues.apache.org/jira/browse/FLINK-4492
> Project: Flink
>  Issue Type: Bug
>Reporter: Stefan Richter
>Assignee: Nikolay Vasilishin
>Priority: Minor
>
> Current checkpointing only closes CheckpointStateOutputStreams on cancel, but 
> incomplete files are not properly deleted from the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4565) Support for SQL IN operator

2016-10-24 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601503#comment-15601503
 ] 

Nikolay Vasilishin commented on FLINK-4565:
---

Hi [~jark].
Yes, i'm working on it right now.

> Support for SQL IN operator
> ---
>
> Key: FLINK-4565
> URL: https://issues.apache.org/jira/browse/FLINK-4565
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Nikolay Vasilishin
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But 
> it should also be available in the Table API and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4492) Cleanup files from canceled snapshots

2016-10-17 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581831#comment-15581831
 ] 

Nikolay Vasilishin commented on FLINK-4492:
---

[~srichter] Does this issue still persist? 
I've runned some tests to reproduce this bug, but the checkpoint directory was 
empty after them. 
If I'm not mistaken, 
{noformat}
/** 
* If the stream is only closed, we remove the produced file (cleanup through 
the auto close
 * feature, for example). This method throws no exception if the deletion 
fails, but only
 * logs the error.
 */
@Override
public void close() {
if (!closed) {
closed = true;
if (outStream != null) {
try {
outStream.close();
fs.delete(statePath, false);

// attempt to delete the parent (will fail and 
be ignored if the parent has more files)
try {
fs.delete(basePath, false);
} catch (IOException ignored) {}
}
catch (Exception e) {
LOG.warn("Cannot delete closed and discarded 
state stream for " + statePath, e);
}
}
}
}
{noformat}
this method (at 
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory) properly 
deletes all checkpoint files. 

> Cleanup files from canceled snapshots
> -
>
> Key: FLINK-4492
> URL: https://issues.apache.org/jira/browse/FLINK-4492
> Project: Flink
>  Issue Type: Bug
>Reporter: Stefan Richter
>Assignee: Nikolay Vasilishin
>Priority: Minor
>
> Current checkpointing only closes CheckpointStateOutputStreams on cancel, but 
> incomplete files are not properly deleted from the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4492) Cleanup files from canceled snapshots

2016-10-12 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568220#comment-15568220
 ] 

Nikolay Vasilishin commented on FLINK-4492:
---

This issue has been resolved since [FLINK-3761] (commit 
4809f5367b08a9734fc1bd4875be51a9f3bb65aa). 
Now canceling job causes cleaning up checkpoint directory (via 
FsCheckpointStateOutputStream#close() method).

> Cleanup files from canceled snapshots
> -
>
> Key: FLINK-4492
> URL: https://issues.apache.org/jira/browse/FLINK-4492
> Project: Flink
>  Issue Type: Bug
>Reporter: Stefan Richter
>Assignee: Nikolay Vasilishin
>Priority: Minor
>
> Current checkpointing only closes CheckpointStateOutputStreams on cancel, but 
> incomplete files are not properly deleted from the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-4492) Cleanup files from canceled snapshots

2016-10-06 Thread Nikolay Vasilishin (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Vasilishin reassigned FLINK-4492:
-

Assignee: Nikolay Vasilishin

> Cleanup files from canceled snapshots
> -
>
> Key: FLINK-4492
> URL: https://issues.apache.org/jira/browse/FLINK-4492
> Project: Flink
>  Issue Type: Bug
>Reporter: Stefan Richter
>Assignee: Nikolay Vasilishin
>Priority: Minor
>
> Current checkpointing only closes CheckpointStateOutputStreams on cancel, but 
> incomplete files are not properly deleted from the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3204) TaskManagers are not shutting down properly on YARN

2016-10-05 Thread Nikolay Vasilishin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548643#comment-15548643
 ] 

Nikolay Vasilishin commented on FLINK-3204:
---

[~rmetzger], could you provide some additional information about experiments 
you were running? And is this bug reproducable or still persists? As i can see, 
tests now are passing successfully and all my attempts to broke them to 
reproduce the issue failed.

> TaskManagers are not shutting down properly on YARN
> ---
>
> Key: FLINK-3204
> URL: https://issues.apache.org/jira/browse/FLINK-3204
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Robert Metzger
>Assignee: Nikolay Vasilishin
>  Labels: test-stability
>
> While running some experiments on a YARN cluster, I saw the following error
> {code}
> 10:15:24,741 INFO  org.apache.flink.yarn.YarnJobManager   
>- Stopping YARN JobManager with status SUCCEEDED and diagnostic Flink YARN 
> Client requested shutdown.
> 10:15:24,748 INFO  org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  
>- Waiting for application to be successfully unregistered.
> 10:15:24,852 INFO  
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl  - 
> Interrupted while waiting for queue
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:275)
> 10:15:24,875 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_10when 
> stopping NMClientImpl
> 10:15:24,899 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_07when 
> stopping NMClientImpl
> 10:15:24,954 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_06when 
> stopping NMClientImpl
> 10:15:24,982 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_09when 
> stopping NMClientImpl
> 10:15:25,013 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_11when 
> stopping NMClientImpl
> 10:15:25,037 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_08when 
> stopping NMClientImpl
> 10:15:25,041 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_12when 
> stopping NMClientImpl
> 10:15:25,072 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_05when 
> stopping NMClientImpl
> 10:15:25,075 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_03when 
> stopping NMClientImpl
> 10:15:25,077 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_04when 
> stopping NMClientImpl
> 10:15:25,079 ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl
>- Failed to stop Container container_1452019681933_0002_01_02when 
> stopping NMClientImpl
> 10:15:25,080 INFO  
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - 
> Closing proxy : cdh544-worker-0.c.astral-sorter-757.internal:8041
> 10:15:25,080 INFO  
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - 
> Closing proxy : cdh544-worker-1.c.astral-sorter-757.internal:8041
> 10:15:25,080 INFO  
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - 
> Closing proxy : cdh544-master.c.astral-sorter-757.internal:8041
> 10:15:25,080 INFO  
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - 
> Closing proxy : cdh544-worker-4.c.astral-sorter-757.internal:8041
> 10:15:25,081 INFO  
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - 
> Closing proxy : cdh544-worker-2.c.astral-sorter-757.internal:8041
> 10:15:25,081 INFO  
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - 
> Closing