[GitHub] flink pull request: FLINK-3197: InputStream not closed in BinaryIn...
Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/1494#discussion_r49168115 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java --- @@ -213,9 +213,10 @@ protected SequentialStatistics createStatistics(List files, FileBase FSDataInputStream fdis = file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize()); fdis.seek(file.getLen() - blockInfo.getInfoSize()); - + blockInfo.read(new DataInputViewStreamWrapper(fdis)); totalCount += blockInfo.getAccumulatedRecordCount(); + fdis.close(); --- End diff -- Generally it is expected to enclose these type of IO streams/files to be enclosed in a try/finally block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3197) InputStream not closed in BinaryInputFormat#createStatistics
[ https://issues.apache.org/jira/browse/FLINK-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088905#comment-15088905 ] ASF GitHub Bot commented on FLINK-3197: --- Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/1494#discussion_r49168115 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java --- @@ -213,9 +213,10 @@ protected SequentialStatistics createStatistics(List files, FileBase FSDataInputStream fdis = file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize()); fdis.seek(file.getLen() - blockInfo.getInfoSize()); - + blockInfo.read(new DataInputViewStreamWrapper(fdis)); totalCount += blockInfo.getAccumulatedRecordCount(); + fdis.close(); --- End diff -- Generally it is expected to enclose these type of IO streams/files to be enclosed in a try/finally block. > InputStream not closed in BinaryInputFormat#createStatistics > > > Key: FLINK-3197 > URL: https://issues.apache.org/jira/browse/FLINK-3197 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > Here is related code: > {code} > FSDataInputStream fdis = > file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize()); > fdis.seek(file.getLen() - blockInfo.getInfoSize()); > blockInfo.read(new DataInputViewStreamWrapper(fdis)); > totalCount += blockInfo.getAccumulatedRecordCount(); > {code} > fdis / wrapper should be closed upon leaving the method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1870) Reintroduce Indexed reader functionality to streaming
[ https://issues.apache.org/jira/browse/FLINK-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089649#comment-15089649 ] ASF GitHub Bot commented on FLINK-1870: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170083592 I am big time skeptic when it comes to putting this into the `RuntimeContext`. If we really need to expose that beyond the `StreamInputProcessor`, it would probably be best to pass this into the operator, as part of the "processRecord()" method. Since StormBolts are implemented as custom operators, they should be able to pick it up from there. > Reintroduce Indexed reader functionality to streaming > - > > Key: FLINK-1870 > URL: https://issues.apache.org/jira/browse/FLINK-1870 > Project: Flink > Issue Type: Task > Components: Streaming >Reporter: Gyula Fora >Assignee: Matthias J. Sax >Priority: Minor > > The Indexed record reader classes (IndexedReaderIterator, > IndexedMutableReader) were introduced to allow the streaming operators to > access the index of the last read channel from the input gate. This was a > necessary step toward future input sorting operators. > Unfortunately this untested feature was patched away by the following commit: > https://github.com/apache/flink/commit/5232c56b13b7e13e2cf10dbe818c221f5557426d > At some point we need to reimplement these features with proper tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1870) Reintroduce Indexed reader functionality to streaming
[ https://issues.apache.org/jira/browse/FLINK-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089600#comment-15089600 ] ASF GitHub Bot commented on FLINK-1870: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170073172 If you implement a Bolt you can retrieve this information from Storm. Thus, the compatibility layer should provide this information, too. > Reintroduce Indexed reader functionality to streaming > - > > Key: FLINK-1870 > URL: https://issues.apache.org/jira/browse/FLINK-1870 > Project: Flink > Issue Type: Task > Components: Streaming >Reporter: Gyula Fora >Assignee: Matthias J. Sax >Priority: Minor > > The Indexed record reader classes (IndexedReaderIterator, > IndexedMutableReader) were introduced to allow the streaming operators to > access the index of the last read channel from the input gate. This was a > necessary step toward future input sorting operators. > Unfortunately this untested feature was patched away by the following commit: > https://github.com/apache/flink/commit/5232c56b13b7e13e2cf10dbe818c221f5557426d > At some point we need to reimplement these features with proper tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170073172 If you implement a Bolt you can retrieve this information from Storm. Thus, the compatibility layer should provide this information, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170083592 I am big time skeptic when it comes to putting this into the `RuntimeContext`. If we really need to expose that beyond the `StreamInputProcessor`, it would probably be best to pass this into the operator, as part of the "processRecord()" method. Since StormBolts are implemented as custom operators, they should be able to pick it up from there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2765) Upgrade hbase version for hadoop-2 to 1.1 release
[ https://issues.apache.org/jira/browse/FLINK-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated FLINK-2765: -- Description: Currently 0.98.11 is used: {code} 0.98.11-hadoop2 {code} Stable release for hadoop-2 is 1.1.x line We should upgrade to 1.2 (soon to be released) was: Currently 0.98.11 is used: {code} 0.98.11-hadoop2 {code} Stable release for hadoop-2 is 1.1.x line We should upgrade to 1.2 (soon to be released) > Upgrade hbase version for hadoop-2 to 1.1 release > - > > Key: FLINK-2765 > URL: https://issues.apache.org/jira/browse/FLINK-2765 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > > Currently 0.98.11 is used: > {code} > 0.98.11-hadoop2 > {code} > Stable release for hadoop-2 is 1.1.x line > We should upgrade to 1.2 (soon to be released) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3058] Add support for Kafka 0.9.0.0
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1489#issuecomment-170103132 There are some build instabilities with the new Kafka 0.9 code. I'll look into it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3058) Add Kafka consumer for new 0.9.0.0 Kafka API
[ https://issues.apache.org/jira/browse/FLINK-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089798#comment-15089798 ] ASF GitHub Bot commented on FLINK-3058: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1489#issuecomment-170103132 There are some build instabilities with the new Kafka 0.9 code. I'll look into it soon. > Add Kafka consumer for new 0.9.0.0 Kafka API > > > Key: FLINK-3058 > URL: https://issues.apache.org/jira/browse/FLINK-3058 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector >Affects Versions: 1.0.0 >Reporter: Robert Metzger >Assignee: Robert Metzger > > The Apache Kafka project is about to release a new consumer API . They also > changed their internal protocol so Kafka 0.9.0.0 users will need an updated > consumer from Flink. > Also, I would like to let Flink be among the first stream processors > supporting Kafka 0.9.0.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1994] [ml] Add different gain calculati...
Github user rawkintrevo commented on the pull request: https://github.com/apache/flink/pull/1397#issuecomment-170162685 on your second point- I agree that Enum would be better for human readable variable setting, but if one was doing a grid parameter search it would be easier to search over a range of integers than having to establish an array of specifically named variables. Not significantly harder, but still slightly more tedious. Further, since I'm just learning scala- limited research/understanding leads me to believe 1) scala doesn't have an enum that works just like Java, and 2) changing the cases to more informative string names would achieve the desired goal? In the mean time working on documentation and reverting the indentation (which I thought I had already done), and thanks for bearing with me as I am still learning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1994) Add different gain calculation schemes to SGD
[ https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090246#comment-15090246 ] ASF GitHub Bot commented on FLINK-1994: --- Github user rawkintrevo commented on the pull request: https://github.com/apache/flink/pull/1397#issuecomment-170162685 on your second point- I agree that Enum would be better for human readable variable setting, but if one was doing a grid parameter search it would be easier to search over a range of integers than having to establish an array of specifically named variables. Not significantly harder, but still slightly more tedious. Further, since I'm just learning scala- limited research/understanding leads me to believe 1) scala doesn't have an enum that works just like Java, and 2) changing the cases to more informative string names would achieve the desired goal? In the mean time working on documentation and reverting the indentation (which I thought I had already done), and thanks for bearing with me as I am still learning. > Add different gain calculation schemes to SGD > - > > Key: FLINK-1994 > URL: https://issues.apache.org/jira/browse/FLINK-1994 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Trevor Grant >Priority: Minor > Labels: ML, Starter > > The current SGD implementation uses as gain for the weight updates the > formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain > calculation configurable and to provide different strategies for that. For > example: > * stepsize/(1 + iterationNumber) > * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4) > See also how to properly select the gains [1]. > Resources: > [1] http://arxiv.org/pdf/1107.2490.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3192] Add explain support to print ast ...
Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1477#issuecomment-169967055 @ChengXiangLi @fhueske @rmetzger , thanks a lot for your suggestions. The codes have been modified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3192) Add explain support to print ast and sql physical execution plan.
[ https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089075#comment-15089075 ] ASF GitHub Bot commented on FLINK-3192: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1477#issuecomment-169967055 @ChengXiangLi @fhueske @rmetzger , thanks a lot for your suggestions. The codes have been modified. > Add explain support to print ast and sql physical execution plan. > -- > > Key: FLINK-3192 > URL: https://issues.apache.org/jira/browse/FLINK-3192 > Project: Flink > Issue Type: New Feature > Components: Table API >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: features > > Table API doesn't support sql-explanation now. Add the explain support to > print ast (abstract syntax tree) and the physical execution plan of sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3211) Add AWS Kinesis streaming connector
Tzu-Li (Gordon) Tai created FLINK-3211: -- Summary: Add AWS Kinesis streaming connector Key: FLINK-3211 URL: https://issues.apache.org/jira/browse/FLINK-3211 Project: Flink Issue Type: New Feature Components: Streaming Connectors Reporter: Tzu-Li (Gordon) Tai Fix For: 1.0.0 AWS Kinesis is a widely adopted message queue used by AWS users, much like a cloud service version of Apache Kafka. Support for AWS Kinesis will be a great addition to the handful of Flink's streaming connectors to external systems and a great reach out to the AWS community. After a first look at the AWS KCL (Kinesis Client Library), KCL already supports stream read beginning from a specific offset (or "record sequence number" in Kinesis terminology). For external checkpointing, KCL is designed to use AWS DynamoDB to checkpoint application state, where each partition's progress (or "shard" in Kinesis terminology) corresponds to a single row in the KCL-managed DynamoDB table. So, implementing the AWS Kinesis connector will very much resemble the work done on the Kafka connector, with a few different tweaks as following (I'm mainly just rewording [~StephanEwen]'s original description [1]): 1. Determine KCL Shard Worker to Flink source task mapping. KCL already offers worker tasks per shard, so we will need to do mapping much like [2]. 2. Let the Flink connector also maintain a local copy of application state, accessed using KCL API, for the distributed snapshot checkpointing. 3. Restart the KCL at the last Flink local checkpointed record sequence upon failure. However, when KCL restarts after failure, it is originally designed to reference the external DynamoDB table. Need a further look on how to work with this so that the Flink checkpoint and external checkpoint in DynamoDB is properly synced. Most of the details regarding KCL's state checkpointing, sharding, shard workers, and failure recovery can be found here [3]. As for the Kinesis sink connector, it should be fairly straightforward and almost, if not completely, identical to the Kafka sink. References: [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connector-td2872.html [2] http://data-artisans.com/kafka-flink-a-practical-how-to/ [3] http://docs.aws.amazon.com/kinesis/latest/dev/advanced-consumers.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1994] [ml] Add different gain calculati...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1397#issuecomment-170200015 Hi @rawkintrevo, you can convert scala Enum to Int like following: ```scala object Parameter extends Enumeration { type Parameter = Value val Param1, Param2, Param3, Param4 = Value } val convertedInt: Int = Parameter.Param1.id ``` Also, these values can be caught by pattern matching: ```scala val param = getSomeEnumValues() param match { case Parameter.Param1 => // blahblah case Parameter.Param2 => // blahblah case Parameter.Param3 => // blahblah case Parameter.Param4 => // blahblah } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1994) Add different gain calculation schemes to SGD
[ https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090429#comment-15090429 ] ASF GitHub Bot commented on FLINK-1994: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1397#issuecomment-170200015 Hi @rawkintrevo, you can convert scala Enum to Int like following: ```scala object Parameter extends Enumeration { type Parameter = Value val Param1, Param2, Param3, Param4 = Value } val convertedInt: Int = Parameter.Param1.id ``` Also, these values can be caught by pattern matching: ```scala val param = getSomeEnumValues() param match { case Parameter.Param1 => // blahblah case Parameter.Param2 => // blahblah case Parameter.Param3 => // blahblah case Parameter.Param4 => // blahblah } ``` > Add different gain calculation schemes to SGD > - > > Key: FLINK-1994 > URL: https://issues.apache.org/jira/browse/FLINK-1994 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Trevor Grant >Priority: Minor > Labels: ML, Starter > > The current SGD implementation uses as gain for the weight updates the > formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain > calculation configurable and to provide different strategies for that. For > example: > * stepsize/(1 + iterationNumber) > * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4) > See also how to properly select the gains [1]. > Resources: > [1] http://arxiv.org/pdf/1107.2490.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3192] Add explain support to print ast ...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1477#discussion_r49183864 --- Diff: flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/explain/PlanJsonParser.java --- @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.table.explain; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.DeserializationFeature; + +import java.io.PrintWriter; +import java.io.StringWriter; +import java.util.LinkedHashMap; +import java.util.List; + +public class PlanJsonParser { + + public static String getSqlExecutionPlan(String t, boolean extended) throws Exception{ + StringWriter sw = new StringWriter(); + PrintWriter pw = new PrintWriter(sw); + buildSqlExecutionPlan(t, extended, pw); + pw.close(); + return sw.toString(); + } + + private static void printTab(int tabCount, PrintWriter pw) { + for (int i = 0; i < tabCount; i++) + pw.print("\t"); + } + + private static void buildSqlExecutionPlan(String t, Boolean extended, PrintWriter pw) throws Exception { --- End diff -- I think we can merge `buildSqlExecutionPlan` and `getSqlExecutionPlan` into a single method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3192] Add explain support to print ast ...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1477#discussion_r49183832 --- Diff: flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/explain/PlanJsonParser.java --- @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.table.explain; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.DeserializationFeature; + +import java.io.PrintWriter; +import java.io.StringWriter; +import java.util.LinkedHashMap; +import java.util.List; + +public class PlanJsonParser { + + public static String getSqlExecutionPlan(String t, boolean extended) throws Exception{ --- End diff -- Add space after `Exception` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3192) Add explain support to print ast and sql physical execution plan.
[ https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089141#comment-15089141 ] ASF GitHub Bot commented on FLINK-3192: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1477#discussion_r49183864 --- Diff: flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/explain/PlanJsonParser.java --- @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.table.explain; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.DeserializationFeature; + +import java.io.PrintWriter; +import java.io.StringWriter; +import java.util.LinkedHashMap; +import java.util.List; + +public class PlanJsonParser { + + public static String getSqlExecutionPlan(String t, boolean extended) throws Exception{ + StringWriter sw = new StringWriter(); + PrintWriter pw = new PrintWriter(sw); + buildSqlExecutionPlan(t, extended, pw); + pw.close(); + return sw.toString(); + } + + private static void printTab(int tabCount, PrintWriter pw) { + for (int i = 0; i < tabCount; i++) + pw.print("\t"); + } + + private static void buildSqlExecutionPlan(String t, Boolean extended, PrintWriter pw) throws Exception { --- End diff -- I think we can merge `buildSqlExecutionPlan` and `getSqlExecutionPlan` into a single method. > Add explain support to print ast and sql physical execution plan. > -- > > Key: FLINK-3192 > URL: https://issues.apache.org/jira/browse/FLINK-3192 > Project: Flink > Issue Type: New Feature > Components: Table API >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: features > > Table API doesn't support sql-explanation now. Add the explain support to > print ast (abstract syntax tree) and the physical execution plan of sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3192) Add explain support to print ast and sql physical execution plan.
[ https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089138#comment-15089138 ] ASF GitHub Bot commented on FLINK-3192: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1477#discussion_r49183832 --- Diff: flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/explain/PlanJsonParser.java --- @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.table.explain; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.DeserializationFeature; + +import java.io.PrintWriter; +import java.io.StringWriter; +import java.util.LinkedHashMap; +import java.util.List; + +public class PlanJsonParser { + + public static String getSqlExecutionPlan(String t, boolean extended) throws Exception{ --- End diff -- Add space after `Exception` > Add explain support to print ast and sql physical execution plan. > -- > > Key: FLINK-3192 > URL: https://issues.apache.org/jira/browse/FLINK-3192 > Project: Flink > Issue Type: New Feature > Components: Table API >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: features > > Table API doesn't support sql-explanation now. Add the explain support to > print ast (abstract syntax tree) and the physical execution plan of sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3192] Add explain support to print ast ...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1477#issuecomment-169987537 I added two small comments in-line. But there is another thing that came to my mind. A `Table` can be constructed from any type of `DataSet` not just a `DataSource`. So, it is possible that the input of a query is a partial DataSet program. In that case, the Explain output would include the DataSet program as well as the Table query. I am not sure if that is actually desirable. Do you think it is possible to limit the Explain output to the current query only? Or would you be in favor of showing the whole plan? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3192) Add explain support to print ast and sql physical execution plan.
[ https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089155#comment-15089155 ] ASF GitHub Bot commented on FLINK-3192: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1477#issuecomment-169987537 I added two small comments in-line. But there is another thing that came to my mind. A `Table` can be constructed from any type of `DataSet` not just a `DataSource`. So, it is possible that the input of a query is a partial DataSet program. In that case, the Explain output would include the DataSet program as well as the Table query. I am not sure if that is actually desirable. Do you think it is possible to limit the Explain output to the current query only? Or would you be in favor of showing the whole plan? > Add explain support to print ast and sql physical execution plan. > -- > > Key: FLINK-3192 > URL: https://issues.apache.org/jira/browse/FLINK-3192 > Project: Flink > Issue Type: New Feature > Components: Table API >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: features > > Table API doesn't support sql-explanation now. Add the explain support to > print ast (abstract syntax tree) and the physical execution plan of sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2125][streaming] Delimiter change from ...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1491#discussion_r49204456 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java --- @@ -84,25 +84,25 @@ public SocketTextStreamFunction(String hostname, int port, char delimiter, long public void run(SourceContext ctx) throws Exception { final StringBuilder buffer = new StringBuilder(); long attempt = 0; - + while (isRunning) { - + try (Socket socket = new Socket()) { currentSocket = socket; - + LOG.info("Connecting to server socket " + hostname + ':' + port); socket.connect(new InetSocketAddress(hostname, port), CONNECTION_TIMEOUT_TIME); BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream())); - int data; - while (isRunning && (data = reader.read()) != -1) { - // check if the string is complete - if (data != delimiter) { - buffer.append((char) data); - } - else { - // truncate trailing carriage return - if (delimiter == '\n' && buffer.length() > 0 && buffer.charAt(buffer.length() - 1) == '\r') { + char[] charBuffer = new char[Math.max(8192,delimiter.length()*2)]; + int bytesRead; + while (isRunning && (bytesRead = reader.read(charBuffer)) != -1) { + String input = new String(charBuffer, 0, bytesRead); + int start = 0,pos; + while ((pos = input.indexOf(delimiter, start)) > -1) { --- End diff -- What happens if `input` does not contain `delimiter`? And what happens with the input part after the last delimiter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2125) String delimiter for SocketTextStream
[ https://issues.apache.org/jira/browse/FLINK-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089426#comment-15089426 ] ASF GitHub Bot commented on FLINK-2125: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1491#discussion_r49204456 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java --- @@ -84,25 +84,25 @@ public SocketTextStreamFunction(String hostname, int port, char delimiter, long public void run(SourceContext ctx) throws Exception { final StringBuilder buffer = new StringBuilder(); long attempt = 0; - + while (isRunning) { - + try (Socket socket = new Socket()) { currentSocket = socket; - + LOG.info("Connecting to server socket " + hostname + ':' + port); socket.connect(new InetSocketAddress(hostname, port), CONNECTION_TIMEOUT_TIME); BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream())); - int data; - while (isRunning && (data = reader.read()) != -1) { - // check if the string is complete - if (data != delimiter) { - buffer.append((char) data); - } - else { - // truncate trailing carriage return - if (delimiter == '\n' && buffer.length() > 0 && buffer.charAt(buffer.length() - 1) == '\r') { + char[] charBuffer = new char[Math.max(8192,delimiter.length()*2)]; + int bytesRead; + while (isRunning && (bytesRead = reader.read(charBuffer)) != -1) { + String input = new String(charBuffer, 0, bytesRead); + int start = 0,pos; + while ((pos = input.indexOf(delimiter, start)) > -1) { --- End diff -- What happens if `input` does not contain `delimiter`? And what happens with the input part after the last delimiter? > String delimiter for SocketTextStream > - > > Key: FLINK-2125 > URL: https://issues.apache.org/jira/browse/FLINK-2125 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 0.9 >Reporter: Márton Balassi >Priority: Minor > Labels: starter > > The SocketTextStreamFunction uses a character delimiter, despite other parts > of the API using String delimiter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3197) InputStream not closed in BinaryInputFormat#createStatistics
[ https://issues.apache.org/jira/browse/FLINK-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089430#comment-15089430 ] ASF GitHub Bot commented on FLINK-3197: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1494#discussion_r49205022 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java --- @@ -213,9 +213,10 @@ protected SequentialStatistics createStatistics(List files, FileBase FSDataInputStream fdis = file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize()); fdis.seek(file.getLen() - blockInfo.getInfoSize()); - + blockInfo.read(new DataInputViewStreamWrapper(fdis)); totalCount += blockInfo.getAccumulatedRecordCount(); + fdis.close(); --- End diff -- +1, or to use the try with resources construct. > InputStream not closed in BinaryInputFormat#createStatistics > > > Key: FLINK-3197 > URL: https://issues.apache.org/jira/browse/FLINK-3197 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > Here is related code: > {code} > FSDataInputStream fdis = > file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize()); > fdis.seek(file.getLen() - blockInfo.getInfoSize()); > blockInfo.read(new DataInputViewStreamWrapper(fdis)); > totalCount += blockInfo.getAccumulatedRecordCount(); > {code} > fdis / wrapper should be closed upon leaving the method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3189) Error while parsing job arguments passed by CLI
[ https://issues.apache.org/jira/browse/FLINK-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089368#comment-15089368 ] ASF GitHub Bot commented on FLINK-3189: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1493#issuecomment-170035218 Sure. Will do that. > Error while parsing job arguments passed by CLI > --- > > Key: FLINK-3189 > URL: https://issues.apache.org/jira/browse/FLINK-3189 > Project: Flink > Issue Type: Bug > Components: Command-line client >Affects Versions: 0.10.1 >Reporter: Filip Leczycki >Assignee: Matthias J. Sax >Priority: Minor > > Flink CLI treats job arguments provided in format "-" as its own > parameters, which results in errors in execution. > Example 1: > call: >bin/flink info myJarFile.jar -f flink -i -m 1 > error: Unrecognized option: -f > Example 2: > Job myJarFile.jar is uploaded to web submission client, flink parameter box > is empty > program arguments box: -f flink -i -m 1 > error: > An unexpected error occurred: > Unrecognized option: -f > org.apache.flink.client.cli.CliArgsException: Unrecognized option: -f > at > org.apache.flink.client.cli.CliFrontendParser.parseInfoCommand(CliFrontendParser.java:296) > at org.apache.flink.client.CliFrontend.info(CliFrontend.java:376) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:983) > at > org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:171) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) > at > org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113) > at org.eclipse.jetty.server.Server.handle(Server.java:348) > at > org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) > at > org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549) > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) > at > org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425) > at > org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) > at java.lang.Thread.run(Thread.java:745) > Execution of > >bin/flink run myJarFile.jar -f flink -i -m 1 > works perfectly fine -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2125][streaming] Delimiter change from ...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1491#issuecomment-170043300 Thanks for your contribution @ajaybhat. The PR contains many whitespace changes. Would be good to avoid them. I had a question concerning the implementation. I might be the case that the text after the last delimiter is discarded. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: FLINK-1737: Kronecker product
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1078#issuecomment-170050950 Looks really good :-) Thanks for your contribution @daniel-pape. Will merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-3210) Unnecessary call to deserializer#deserialize() in LegacyFetcher#SimpleConsumerThread#run()
Ted Yu created FLINK-3210: - Summary: Unnecessary call to deserializer#deserialize() in LegacyFetcher#SimpleConsumerThread#run() Key: FLINK-3210 URL: https://issues.apache.org/jira/browse/FLINK-3210 Project: Flink Issue Type: Bug Reporter: Ted Yu Priority: Minor Here is related code: {code} byte[] valueBytes; if (payload == null) { deletedMessages++; valueBytes = null; } else { ... final T value = deserializer.deserialize(keyBytes, valueBytes, fp.topic, offset); {code} When valueBytes is null, there is no need to call deserializer#deserialize() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-1737: Kronecker product
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1078 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1737) Add statistical whitening transformation to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089525#comment-15089525 ] ASF GitHub Bot commented on FLINK-1737: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1078 > Add statistical whitening transformation to machine learning library > > > Key: FLINK-1737 > URL: https://issues.apache.org/jira/browse/FLINK-1737 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Daniel Pape > Labels: ML, Starter > > The statistical whitening transformation [1] is a preprocessing step for > different ML algorithms. It decorrelates the individual dimensions and sets > its variance to 1. > Statistical whitening should be implemented as a {{Transfomer}}. > Resources: > [1] [http://en.wikipedia.org/wiki/Whitening_transformation] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3197) InputStream not closed in BinaryInputFormat#createStatistics
[ https://issues.apache.org/jira/browse/FLINK-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089343#comment-15089343 ] ASF GitHub Bot commented on FLINK-3197: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1494#discussion_r49199370 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java --- @@ -213,9 +213,10 @@ protected SequentialStatistics createStatistics(List files, FileBase FSDataInputStream fdis = file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize()); fdis.seek(file.getLen() - blockInfo.getInfoSize()); - + blockInfo.read(new DataInputViewStreamWrapper(fdis)); totalCount += blockInfo.getAccumulatedRecordCount(); + fdis.close(); --- End diff -- Agreed, we should try and consistently use a "try-with-resource" clause for streams. > InputStream not closed in BinaryInputFormat#createStatistics > > > Key: FLINK-3197 > URL: https://issues.apache.org/jira/browse/FLINK-3197 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > Here is related code: > {code} > FSDataInputStream fdis = > file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize()); > fdis.seek(file.getLen() - blockInfo.getInfoSize()); > blockInfo.read(new DataInputViewStreamWrapper(fdis)); > totalCount += blockInfo.getAccumulatedRecordCount(); > {code} > fdis / wrapper should be closed upon leaving the method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2125) String delimiter for SocketTextStream
[ https://issues.apache.org/jira/browse/FLINK-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089428#comment-15089428 ] ASF GitHub Bot commented on FLINK-2125: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1491#issuecomment-170043300 Thanks for your contribution @ajaybhat. The PR contains many whitespace changes. Would be good to avoid them. I had a question concerning the implementation. I might be the case that the text after the last delimiter is discarded. > String delimiter for SocketTextStream > - > > Key: FLINK-2125 > URL: https://issues.apache.org/jira/browse/FLINK-2125 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 0.9 >Reporter: Márton Balassi >Priority: Minor > Labels: starter > > The SocketTextStreamFunction uses a character delimiter, despite other parts > of the API using String delimiter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1737) Add statistical whitening transformation to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089481#comment-15089481 ] ASF GitHub Bot commented on FLINK-1737: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1078#issuecomment-170050950 Looks really good :-) Thanks for your contribution @daniel-pape. Will merge it. > Add statistical whitening transformation to machine learning library > > > Key: FLINK-1737 > URL: https://issues.apache.org/jira/browse/FLINK-1737 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Daniel Pape > Labels: ML, Starter > > The statistical whitening transformation [1] is a preprocessing step for > different ML algorithms. It decorrelates the individual dimensions and sets > its variance to 1. > Statistical whitening should be implemented as a {{Transfomer}}. > Resources: > [1] [http://en.wikipedia.org/wiki/Whitening_transformation] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3189) Error while parsing job arguments passed by CLI
[ https://issues.apache.org/jira/browse/FLINK-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089344#comment-15089344 ] ASF GitHub Bot commented on FLINK-3189: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1493#issuecomment-170032469 Can you add a test case that guards this change? > Error while parsing job arguments passed by CLI > --- > > Key: FLINK-3189 > URL: https://issues.apache.org/jira/browse/FLINK-3189 > Project: Flink > Issue Type: Bug > Components: Command-line client >Affects Versions: 0.10.1 >Reporter: Filip Leczycki >Assignee: Matthias J. Sax >Priority: Minor > > Flink CLI treats job arguments provided in format "-" as its own > parameters, which results in errors in execution. > Example 1: > call: >bin/flink info myJarFile.jar -f flink -i -m 1 > error: Unrecognized option: -f > Example 2: > Job myJarFile.jar is uploaded to web submission client, flink parameter box > is empty > program arguments box: -f flink -i -m 1 > error: > An unexpected error occurred: > Unrecognized option: -f > org.apache.flink.client.cli.CliArgsException: Unrecognized option: -f > at > org.apache.flink.client.cli.CliFrontendParser.parseInfoCommand(CliFrontendParser.java:296) > at org.apache.flink.client.CliFrontend.info(CliFrontend.java:376) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:983) > at > org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:171) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) > at > org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113) > at org.eclipse.jetty.server.Server.handle(Server.java:348) > at > org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) > at > org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549) > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) > at > org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425) > at > org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) > at java.lang.Thread.run(Thread.java:745) > Execution of > >bin/flink run myJarFile.jar -f flink -i -m 1 > works perfectly fine -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3189] Error while parsing job arguments...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1493#issuecomment-170032469 Can you add a test case that guards this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: FLINK-3197: InputStream not closed in BinaryIn...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1494#discussion_r49205022 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java --- @@ -213,9 +213,10 @@ protected SequentialStatistics createStatistics(List files, FileBase FSDataInputStream fdis = file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize()); fdis.seek(file.getLen() - blockInfo.getInfoSize()); - + blockInfo.read(new DataInputViewStreamWrapper(fdis)); totalCount += blockInfo.getAccumulatedRecordCount(); + fdis.close(); --- End diff -- +1, or to use the try with resources construct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3058] Add support for Kafka 0.9.0.0
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1489#issuecomment-170044651 Test cases `Kafka09ITCase.testMultipleSourcesOnePartition` and `Kafka08ITCase.testOffsetInZookeeper` are failing in Travis build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3058) Add Kafka consumer for new 0.9.0.0 Kafka API
[ https://issues.apache.org/jira/browse/FLINK-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089436#comment-15089436 ] ASF GitHub Bot commented on FLINK-3058: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1489#issuecomment-170044651 Test cases `Kafka09ITCase.testMultipleSourcesOnePartition` and `Kafka08ITCase.testOffsetInZookeeper` are failing in Travis build. > Add Kafka consumer for new 0.9.0.0 Kafka API > > > Key: FLINK-3058 > URL: https://issues.apache.org/jira/browse/FLINK-3058 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector >Affects Versions: 1.0.0 >Reporter: Robert Metzger >Assignee: Robert Metzger > > The Apache Kafka project is about to release a new consumer API . They also > changed their internal protocol so Kafka 0.9.0.0 users will need an updated > consumer from Flink. > Also, I would like to let Flink be among the first stream processors > supporting Kafka 0.9.0.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1994] [ml] Add different gain calculati...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1397#discussion_r49211433 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala --- @@ -54,14 +54,15 @@ abstract class GradientDescent extends IterativeSolver { */ override def optimize( data: DataSet[LabeledVector], -initialWeights: Option[DataSet[WeightVector]]): DataSet[WeightVector] = { +initialWeights: Option[DataSet[WeightVector]] + ): DataSet[WeightVector] = { --- End diff -- Indentation: ``` override def optimize( data: ... initialWeights: ...) : DataSet[WeightVector] = { } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1994) Add different gain calculation schemes to SGD
[ https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089529#comment-15089529 ] ASF GitHub Bot commented on FLINK-1994: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1397#discussion_r49211433 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala --- @@ -54,14 +54,15 @@ abstract class GradientDescent extends IterativeSolver { */ override def optimize( data: DataSet[LabeledVector], -initialWeights: Option[DataSet[WeightVector]]): DataSet[WeightVector] = { +initialWeights: Option[DataSet[WeightVector]] + ): DataSet[WeightVector] = { --- End diff -- Indentation: ``` override def optimize( data: ... initialWeights: ...) : DataSet[WeightVector] = { } ``` > Add different gain calculation schemes to SGD > - > > Key: FLINK-1994 > URL: https://issues.apache.org/jira/browse/FLINK-1994 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Trevor Grant >Priority: Minor > Labels: ML, Starter > > The current SGD implementation uses as gain for the weight updates the > formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain > calculation configurable and to provide different strategies for that. For > example: > * stepsize/(1 + iterationNumber) > * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4) > See also how to properly select the gains [1]. > Resources: > [1] http://arxiv.org/pdf/1107.2490.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1870) Reintroduce Indexed reader functionality to streaming
[ https://issues.apache.org/jira/browse/FLINK-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089432#comment-15089432 ] ASF GitHub Bot commented on FLINK-1870: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170044112 The user code should not deal with what channel an element came from. It wires assumptions about the parallelism of the predecessor into the user code. Adjustments of the parallelism become impossible that way. Why does the storm compatibility need this? > Reintroduce Indexed reader functionality to streaming > - > > Key: FLINK-1870 > URL: https://issues.apache.org/jira/browse/FLINK-1870 > Project: Flink > Issue Type: Task > Components: Streaming >Reporter: Gyula Fora >Assignee: Matthias J. Sax >Priority: Minor > > The Indexed record reader classes (IndexedReaderIterator, > IndexedMutableReader) were introduced to allow the streaming operators to > access the index of the last read channel from the input gate. This was a > necessary step toward future input sorting operators. > Unfortunately this untested feature was patched away by the following commit: > https://github.com/apache/flink/commit/5232c56b13b7e13e2cf10dbe818c221f5557426d > At some point we need to reimplement these features with proper tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3210) Unnecessary call to deserializer#deserialize() in LegacyFetcher#SimpleConsumerThread#run()
[ https://issues.apache.org/jira/browse/FLINK-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089572#comment-15089572 ] Ted Yu commented on FLINK-3210: --- bq. The key can still contain something I agree. But why deserializing the value ? > Unnecessary call to deserializer#deserialize() in > LegacyFetcher#SimpleConsumerThread#run() > -- > > Key: FLINK-3210 > URL: https://issues.apache.org/jira/browse/FLINK-3210 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > Here is related code: > {code} > byte[] valueBytes; > if (payload == null) { > deletedMessages++; > valueBytes = null; > } else { > ... > final T value = deserializer.deserialize(keyBytes, > valueBytes, fp.topic, offset); > {code} > When valueBytes is null, there is no need to call deserializer#deserialize() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-3197: InputStream not closed in BinaryIn...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1494#discussion_r49199370 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java --- @@ -213,9 +213,10 @@ protected SequentialStatistics createStatistics(List files, FileBase FSDataInputStream fdis = file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize()); fdis.seek(file.getLen() - blockInfo.getInfoSize()); - + blockInfo.read(new DataInputViewStreamWrapper(fdis)); totalCount += blockInfo.getAccumulatedRecordCount(); + fdis.close(); --- End diff -- Agreed, we should try and consistently use a "try-with-resource" clause for streams. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1712) Restructure Maven Projects
[ https://issues.apache.org/jira/browse/FLINK-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089354#comment-15089354 ] ASF GitHub Bot commented on FLINK-1712: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1492#issuecomment-170033507 Can you summarize what the changes from this pull request are? The set of changes is too large for github to display. > Restructure Maven Projects > -- > > Key: FLINK-1712 > URL: https://issues.apache.org/jira/browse/FLINK-1712 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 0.9 >Reporter: Stephan Ewen >Priority: Minor > > Project consolidation > - flink-hadoop (shaded fat jar) > - Core (Core and Java) > - (core-scala) > - Streaming (core + java) > - (streaming-scala) > - Runtime > - Optimizer (may also be merged with Client) > - Client (or Client + Optimizer) > > - Examples (Java + Scala + Streaming Java + Streaming Scala) > - Tests (test-utils (compile) and tests (test)) > > - Quickstarts >- Quickstart Java >- Quickstart Scala > > - connectors / Input/Output Formats >- Avro >- HBase >- HadoopCompartibility >- HCatalogue >- JDBC >- kafka >- rabbit >- ... > > - staging >- Gelly >- ML >- spargel (deprecated) >- expression API > > - contrib > > - yarn > > - dist > > - yarn tests > > - java 8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1712] Remove "flink-staging" module
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1492#issuecomment-170033507 Can you summarize what the changes from this pull request are? The set of changes is too large for github to display. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3189] Error while parsing job arguments...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1493#issuecomment-170035218 Sure. Will do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170044112 The user code should not deal with what channel an element came from. It wires assumptions about the parallelism of the predecessor into the user code. Adjustments of the parallelism become impossible that way. Why does the storm compatibility need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3210) Unnecessary call to deserializer#deserialize() in LegacyFetcher#SimpleConsumerThread#run()
[ https://issues.apache.org/jira/browse/FLINK-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089566#comment-15089566 ] Robert Metzger commented on FLINK-3210: --- I think this issue is invalid. A null value indicates a deleted Kafka message. The key can still contain something, and somebody explicitly opened a JIRA for this feature. > Unnecessary call to deserializer#deserialize() in > LegacyFetcher#SimpleConsumerThread#run() > -- > > Key: FLINK-3210 > URL: https://issues.apache.org/jira/browse/FLINK-3210 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > Here is related code: > {code} > byte[] valueBytes; > if (payload == null) { > deletedMessages++; > valueBytes = null; > } else { > ... > final T value = deserializer.deserialize(keyBytes, > valueBytes, fp.topic, offset); > {code} > When valueBytes is null, there is no need to call deserializer#deserialize() -- This message was sent by Atlassian JIRA (v6.3.4#6332)