[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-12 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319085#comment-17319085
 ] 

Jark Wu commented on FLINK-22201:
-

[~jamii], {{execution.runtime-mode=BATCH}} is a SQL Client configuration and 
take effect only in SQL Client. Flink doesn't pass this configuration throght 
{{flink run}}.

> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
> Attachments: flink-total-timeseries.png
>
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-11 Thread Jamie Brandon (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319037#comment-17319037
 ] 

Jamie Brandon commented on FLINK-22201:
---

Ok, thanks.

Should I open a separate issue for the behavior with the mixed settings, where 
it does not finish at 0?

> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
> Attachments: config.toml, flink-total-timeseries.png
>
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-11 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319034#comment-17319034
 ] 

Jark Wu commented on FLINK-22201:
-

I think this behavior is expected, because the total transaction amout is 
increasing. Because the records of credits and debits are independent records, 
so the streaming output will either minus or plus the total transaction amout 
first. If the input stream stop, the final result should be 0. 

> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
> Attachments: config.toml, flink-total-timeseries.png
>
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-11 Thread Jamie Brandon (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319030#comment-17319030
 ] 

Jamie Brandon commented on FLINK-22201:
---

In the branch where I believed I was running in batch mode, but had mistakenly 
hardcoded streaming mode, the output does not finish at 0.

{code:bash}
jamie@machine:~/streaming-consistency/flink$ tail tmp/total
delete -49042.0
insert -48813.0
delete -48813.0
insert -49042.0
delete -49042.0
insert -48812.0
delete -48812.0
insert -49042.0
delete -49042.0
insert -48811.0
{code}

If I remove the `-Dexecution.runtime-mode=BATCH` then it does finish at 0.

Perhaps I should file a different bug for this? The outcome might just be that 
this combination of settings should produce an error instead of running.

---

For the original streaming version, it does finish at 0 if the input stop, but 
if the inputs are unbounded then the error increases over time:

 !flink-total-timeseries.png! 

Am I right in understanding that this behavior is expected?

> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
> Attachments: config.toml, flink-total-timeseries.png
>
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-11 Thread Kurt Young (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319023#comment-17319023
 ] 

Kurt Young commented on FLINK-22201:


Yes, for streaming mode, we are expecting the final result finish at 0 when all 
the inputs are finished and stopped. For batch mode, we will expect one single 
output 0 after the sources are finished. They are *eventually consistent*. 

> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-11 Thread Jamie Brandon (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319021#comment-17319021
 ] 

Jamie Brandon commented on FLINK-22201:
---

Ah, I misunderstood the documentation there. I will try setting it correctly.

In the meantime though, if this version is still in streaming mode then it's a 
problem that it doesn't finish at 0? Aside from the batch mode setting in 
'flink run', the only difference from the original streaming version is that 
the input topic is filled before 'flink run' rather than after. Let me try and 
see if that is enough to reproduce the incorrect result.

> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-11 Thread Kurt Young (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319020#comment-17319020
 ] 

Kurt Young commented on FLINK-22201:


[~jamii] You didn't really enabled batch execution mode because you hard coded 
table environment with streaming mode in 

[https://github.com/jamii/streaming-consistency/blob/d3288e27649174c7463829c726be514610bbd056/flink/src/main/java/Demo.java#L22]

> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-11 Thread Jamie Brandon (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319016#comment-17319016
 ] 

Jamie Brandon commented on FLINK-22201:
---

You can see the jar is run in batch mode 
[here|https://github.com/jamii/streaming-consistency/blob/d3288e27649174c7463829c726be514610bbd056/flink/run.sh#L106].
 The output is produced by this sink 
[here|https://github.com/jamii/streaming-consistency/blob/d3288e27649174c7463829c726be514610bbd056/flink/src/main/java/Demo.java#L157-L174].
 The inputs are produced by [this 
code|https://github.com/jamii/streaming-consistency/blob/d3288e27649174c7463829c726be514610bbd056/transactions.py]
 and look like this:

{code:bash}
jamie@machine:~/streaming-consistency/flink$ head tmp/transactions
{"id": 12584, "from_account": 0, "to_account": 7, "amount": 1, "ts": 
"2021-01-01 00:00:00.000"}
{"id": 12219, "from_account": 2, "to_account": 2, "amount": 1, "ts": 
"2021-01-01 00:00:00.000"}
{"id": 16318, "from_account": 3, "to_account": 7, "amount": 1, "ts": 
"2021-01-01 00:00:00.000"}
{"id": 8891, "from_account": 5, "to_account": 3, "amount": 1, "ts": "2021-01-01 
00:00:00.000"}
{"id": 13892, "from_account": 3, "to_account": 1, "amount": 1, "ts": 
"2021-01-01 00:00:00.000"}
{"id": 5316, "from_account": 1, "to_account": 2, "amount": 1, "ts": "2021-01-01 
00:00:00.000"}
{"id": 5271, "from_account": 7, "to_account": 8, "amount": 1, "ts": "2021-01-01 
00:00:00.000"}
{"id": 13410, "from_account": 2, "to_account": 0, "amount": 1, "ts": 
"2021-01-01 00:00:00.000"}
{"id": 16192, "from_account": 8, "to_account": 1, "amount": 1, "ts": 
"2021-01-01 00:00:00.000"}
{"id": 7090, "from_account": 1, "to_account": 3, "amount": 1, "ts": "2021-01-01 
00:00:00.000"}
{code}

If you follow the instructions in the 
[readme|https://github.com/jamii/streaming-consistency/blob/d3288e27649174c7463829c726be514610bbd056/flink/README.md]
 you should be able to reproduce these results.


> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-11 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319009#comment-17319009
 ] 

Jark Wu commented on FLINK-22201:
-

Hi [~jamii], according to your description: in streaming mode, the final result 
is 0. So that means streaming mode is correct, because Flink streaming mode 
only can provide Eventual Consistency. 

Regarding to batch mode, sorry, I don't fully understand this. From my 
understanding, batch mode should only produce output, and the output should 
only contain one record. However, according to your description, the batch mode 
outputs many records which sounds like a streaming mode. 

> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22201) Incorrect output for simple sql query

2021-04-11 Thread Kurt Young (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318974#comment-17318974
 ] 

Kurt Young commented on FLINK-22201:


[~jamii] Thanks for the reporting. Could you provide some example data that can 
help us finding the bug? The query is just too simple that I can't recall any 
potential bug around it. 

> Incorrect output for simple sql query
> -
>
> Key: FLINK-22201
> URL: https://issues.apache.org/jira/browse/FLINK-22201
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.2
> Environment: {code:bash}
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> {code}
>Reporter: Jamie Brandon
>Priority: Major
>
> I'm running this simple query:
> {code:sql}
> CREATE VIEW credits AS
> SELECT
> to_account AS account, 
> sum(amount) AS credits
> FROM
> transactions
> GROUP BY
> to_account;
> CREATE VIEW debits AS
> SELECT
> from_account AS account, 
> sum(amount) AS debits
> FROM
> transactions
> GROUP BY
> from_account;
> CREATE VIEW balance AS
> SELECT
> credits.account AS account, 
> credits - debits AS balance
> FROM
> credits,
> debits
> WHERE
> credits.account = debits.account;
> CREATE VIEW total AS
> SELECT
> sum(balance)
> FROM
> balance;
> {code}
> The `total` view is a sanity check - it's value should always be 0 because 
> money is only moved from one account to another, never created or destroyed.
> In streaming mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/a0f3b9d7ba178a7e184e6cb60e597a302dc3dd86/flink-table])
>  only about ~0.04% of the output values are 0. The absolute error in the 
> outputs increases roughly linearly wrt to the number of input transactions. 
> But after the inputs are finished it does return to 0.
> In batch mode (code 
> [here|https://github.com/jamii/streaming-consistency/tree/d3288e27649174c7463829c726be514610bbd056/flink])
>  it produces 0 for a while but then has large jumps to incorrect outputs and 
> never returns to 0. In this run, the first ~44% of the outputs are correct 
> but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
> I also run a variant of that query which joins on event time. In streaming 
> mode it produces similar results to the original. In batch mode only 2 out of 
> 1718375 outputs were correct and the final error was similar to the original 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)