[GitHub] flink pull request: FLINK-3197: InputStream not closed in BinaryIn...

2016-01-08 Thread ramkrish86
Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1494#discussion_r49168115
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java 
---
@@ -213,9 +213,10 @@ protected SequentialStatistics 
createStatistics(List files, FileBase
 
FSDataInputStream fdis = 
file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize());
fdis.seek(file.getLen() - blockInfo.getInfoSize());
-   
+
blockInfo.read(new DataInputViewStreamWrapper(fdis));
totalCount += blockInfo.getAccumulatedRecordCount();
+   fdis.close();
--- End diff --

Generally it is expected to enclose these type of IO streams/files to be 
enclosed in a try/finally block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3197) InputStream not closed in BinaryInputFormat#createStatistics

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088905#comment-15088905
 ] 

ASF GitHub Bot commented on FLINK-3197:
---

Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1494#discussion_r49168115
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java 
---
@@ -213,9 +213,10 @@ protected SequentialStatistics 
createStatistics(List files, FileBase
 
FSDataInputStream fdis = 
file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize());
fdis.seek(file.getLen() - blockInfo.getInfoSize());
-   
+
blockInfo.read(new DataInputViewStreamWrapper(fdis));
totalCount += blockInfo.getAccumulatedRecordCount();
+   fdis.close();
--- End diff --

Generally it is expected to enclose these type of IO streams/files to be 
enclosed in a try/finally block.


> InputStream not closed in BinaryInputFormat#createStatistics
> 
>
> Key: FLINK-3197
> URL: https://issues.apache.org/jira/browse/FLINK-3197
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
>   FSDataInputStream fdis = 
> file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize());
>   fdis.seek(file.getLen() - blockInfo.getInfoSize());
>   blockInfo.read(new DataInputViewStreamWrapper(fdis));
>   totalCount += blockInfo.getAccumulatedRecordCount();
> {code}
> fdis / wrapper should be closed upon leaving the method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1870) Reintroduce Indexed reader functionality to streaming

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089649#comment-15089649
 ] 

ASF GitHub Bot commented on FLINK-1870:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170083592
  
I am big time skeptic when it comes to putting this into the 
`RuntimeContext`.

If we really need to expose that beyond the `StreamInputProcessor`, it 
would probably be best to pass this into the operator, as part of the 
"processRecord()" method.

Since StormBolts are implemented as custom operators, they should be able 
to pick it up from there.


> Reintroduce Indexed reader functionality to streaming
> -
>
> Key: FLINK-1870
> URL: https://issues.apache.org/jira/browse/FLINK-1870
> Project: Flink
>  Issue Type: Task
>  Components: Streaming
>Reporter: Gyula Fora
>Assignee: Matthias J. Sax
>Priority: Minor
>
> The Indexed record reader classes (IndexedReaderIterator, 
> IndexedMutableReader) were introduced to allow the streaming operators to 
> access the index of the last read channel from the input gate. This was a 
> necessary step toward future input sorting operators.
> Unfortunately this untested feature was patched away by the following commit:
> https://github.com/apache/flink/commit/5232c56b13b7e13e2cf10dbe818c221f5557426d
> At some point we need to reimplement these features with proper tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1870) Reintroduce Indexed reader functionality to streaming

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089600#comment-15089600
 ] 

ASF GitHub Bot commented on FLINK-1870:
---

Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170073172
  
If you implement a Bolt you can retrieve this information from Storm. Thus, 
the compatibility layer should provide this information, too.


> Reintroduce Indexed reader functionality to streaming
> -
>
> Key: FLINK-1870
> URL: https://issues.apache.org/jira/browse/FLINK-1870
> Project: Flink
>  Issue Type: Task
>  Components: Streaming
>Reporter: Gyula Fora
>Assignee: Matthias J. Sax
>Priority: Minor
>
> The Indexed record reader classes (IndexedReaderIterator, 
> IndexedMutableReader) were introduced to allow the streaming operators to 
> access the index of the last read channel from the input gate. This was a 
> necessary step toward future input sorting operators.
> Unfortunately this untested feature was patched away by the following commit:
> https://github.com/apache/flink/commit/5232c56b13b7e13e2cf10dbe818c221f5557426d
> At some point we need to reimplement these features with proper tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-08 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170073172
  
If you implement a Bolt you can retrieve this information from Storm. Thus, 
the compatibility layer should provide this information, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-08 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170083592
  
I am big time skeptic when it comes to putting this into the 
`RuntimeContext`.

If we really need to expose that beyond the `StreamInputProcessor`, it 
would probably be best to pass this into the operator, as part of the 
"processRecord()" method.

Since StormBolts are implemented as custom operators, they should be able 
to pick it up from there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2765) Upgrade hbase version for hadoop-2 to 1.1 release

2016-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-2765:
--
Description: 
Currently 0.98.11 is used:
{code}
0.98.11-hadoop2
{code}
Stable release for hadoop-2 is 1.1.x line
We should upgrade to 1.2 (soon to be released)

  was:
Currently 0.98.11 is used:
{code}
0.98.11-hadoop2
{code}

Stable release for hadoop-2 is 1.1.x line
We should upgrade to 1.2 (soon to be released)


> Upgrade hbase version for hadoop-2 to 1.1 release
> -
>
> Key: FLINK-2765
> URL: https://issues.apache.org/jira/browse/FLINK-2765
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> Currently 0.98.11 is used:
> {code}
> 0.98.11-hadoop2
> {code}
> Stable release for hadoop-2 is 1.1.x line
> We should upgrade to 1.2 (soon to be released)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3058] Add support for Kafka 0.9.0.0

2016-01-08 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1489#issuecomment-170103132
  
There are some build instabilities with the new Kafka 0.9 code. I'll look 
into it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3058) Add Kafka consumer for new 0.9.0.0 Kafka API

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089798#comment-15089798
 ] 

ASF GitHub Bot commented on FLINK-3058:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1489#issuecomment-170103132
  
There are some build instabilities with the new Kafka 0.9 code. I'll look 
into it soon.


> Add Kafka consumer for new 0.9.0.0 Kafka API
> 
>
> Key: FLINK-3058
> URL: https://issues.apache.org/jira/browse/FLINK-3058
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> The Apache Kafka project is about to release a new consumer API . They also 
> changed their internal protocol so Kafka 0.9.0.0 users will need an updated 
> consumer from Flink.
> Also, I would like to let Flink be among the first stream processors 
> supporting Kafka 0.9.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1994] [ml] Add different gain calculati...

2016-01-08 Thread rawkintrevo
Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/1397#issuecomment-170162685
  
on your second point- I agree that Enum would be better for human readable 
variable setting, but if one was doing a grid parameter search it would be 
easier to search over a range of integers than having to establish an array of 
specifically named variables. Not significantly harder, but still slightly more 
tedious. 

Further, since I'm just learning scala- limited research/understanding 
leads me to believe 1) scala doesn't have an enum that works just like Java, 
and 2) changing the cases to more informative string names would achieve the 
desired goal? 

In the mean time working on documentation and reverting the indentation 
(which I thought I had already done), and thanks for bearing with me as I am 
still learning. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1994) Add different gain calculation schemes to SGD

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090246#comment-15090246
 ] 

ASF GitHub Bot commented on FLINK-1994:
---

Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/1397#issuecomment-170162685
  
on your second point- I agree that Enum would be better for human readable 
variable setting, but if one was doing a grid parameter search it would be 
easier to search over a range of integers than having to establish an array of 
specifically named variables. Not significantly harder, but still slightly more 
tedious. 

Further, since I'm just learning scala- limited research/understanding 
leads me to believe 1) scala doesn't have an enum that works just like Java, 
and 2) changing the cases to more informative string names would achieve the 
desired goal? 

In the mean time working on documentation and reverting the indentation 
(which I thought I had already done), and thanks for bearing with me as I am 
still learning. 


> Add different gain calculation schemes to SGD
> -
>
> Key: FLINK-1994
> URL: https://issues.apache.org/jira/browse/FLINK-1994
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Trevor Grant
>Priority: Minor
>  Labels: ML, Starter
>
> The current SGD implementation uses as gain for the weight updates the 
> formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain 
> calculation configurable and to provide different strategies for that. For 
> example:
> * stepsize/(1 + iterationNumber)
> * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4)
> See also how to properly select the gains [1].
> Resources:
> [1] http://arxiv.org/pdf/1107.2490.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3192] Add explain support to print ast ...

2016-01-08 Thread gallenvara
Github user gallenvara commented on the pull request:

https://github.com/apache/flink/pull/1477#issuecomment-169967055
  
 @ChengXiangLi @fhueske @rmetzger , thanks a lot for your suggestions. The 
codes have been modified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3192) Add explain support to print ast and sql physical execution plan.

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089075#comment-15089075
 ] 

ASF GitHub Bot commented on FLINK-3192:
---

Github user gallenvara commented on the pull request:

https://github.com/apache/flink/pull/1477#issuecomment-169967055
  
 @ChengXiangLi @fhueske @rmetzger , thanks a lot for your suggestions. The 
codes have been modified.


> Add explain support to print ast and sql physical execution plan. 
> --
>
> Key: FLINK-3192
> URL: https://issues.apache.org/jira/browse/FLINK-3192
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: features
>
> Table API doesn't support sql-explanation now. Add the explain support to 
> print ast (abstract syntax tree) and the physical execution plan of sql.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3211) Add AWS Kinesis streaming connector

2016-01-08 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-3211:
--

 Summary: Add AWS Kinesis streaming connector
 Key: FLINK-3211
 URL: https://issues.apache.org/jira/browse/FLINK-3211
 Project: Flink
  Issue Type: New Feature
  Components: Streaming Connectors
Reporter: Tzu-Li (Gordon) Tai
 Fix For: 1.0.0


AWS Kinesis is a widely adopted message queue used by AWS users, much like a 
cloud service version of Apache Kafka. Support for AWS Kinesis will be a great 
addition to the handful of Flink's streaming connectors to external systems and 
a great reach out to the AWS community.

After a first look at the AWS KCL (Kinesis Client Library), KCL already 
supports stream read beginning from a specific offset (or "record sequence 
number" in Kinesis terminology). For external checkpointing, KCL is designed  
to use AWS DynamoDB to checkpoint application state, where each partition's 
progress (or "shard" in Kinesis terminology) corresponds to a single row in the 
KCL-managed DynamoDB table.

So, implementing the AWS Kinesis connector will very much resemble the work 
done on the Kafka connector, with a few different tweaks as following (I'm 
mainly just rewording [~StephanEwen]'s original description [1]):

1. Determine KCL Shard Worker to Flink source task mapping. KCL already offers 
worker tasks per shard, so we will need to do mapping much like [2].

2. Let the Flink connector also maintain a local copy of application state, 
accessed using KCL API, for the distributed snapshot checkpointing.

3. Restart the KCL at the last Flink local checkpointed record sequence upon 
failure. However, when KCL restarts after failure, it is originally designed to 
reference the external DynamoDB table. Need a further look on how to work with 
this so that the Flink checkpoint and external checkpoint in DynamoDB is 
properly synced.

Most of the details regarding KCL's state checkpointing, sharding, shard 
workers, and failure recovery can be found here [3].

As for the Kinesis sink connector, it should be fairly straightforward and 
almost, if not completely, identical to the Kafka sink.

References:
[1] 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connector-td2872.html
[2] http://data-artisans.com/kafka-flink-a-practical-how-to/
[3] http://docs.aws.amazon.com/kinesis/latest/dev/advanced-consumers.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1994] [ml] Add different gain calculati...

2016-01-08 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1397#issuecomment-170200015
  
Hi @rawkintrevo, you can convert scala Enum to Int like following:

```scala
object Parameter extends Enumeration {
  type Parameter = Value
  val Param1, Param2, Param3, Param4 = Value
}

val convertedInt: Int = Parameter.Param1.id
```

Also, these values can be caught by pattern matching:

```scala
val param = getSomeEnumValues()

param match {
  case Parameter.Param1 => // blahblah
  case Parameter.Param2 => // blahblah
  case Parameter.Param3 => // blahblah
  case Parameter.Param4 => // blahblah
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1994) Add different gain calculation schemes to SGD

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090429#comment-15090429
 ] 

ASF GitHub Bot commented on FLINK-1994:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1397#issuecomment-170200015
  
Hi @rawkintrevo, you can convert scala Enum to Int like following:

```scala
object Parameter extends Enumeration {
  type Parameter = Value
  val Param1, Param2, Param3, Param4 = Value
}

val convertedInt: Int = Parameter.Param1.id
```

Also, these values can be caught by pattern matching:

```scala
val param = getSomeEnumValues()

param match {
  case Parameter.Param1 => // blahblah
  case Parameter.Param2 => // blahblah
  case Parameter.Param3 => // blahblah
  case Parameter.Param4 => // blahblah
}
```


> Add different gain calculation schemes to SGD
> -
>
> Key: FLINK-1994
> URL: https://issues.apache.org/jira/browse/FLINK-1994
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Trevor Grant
>Priority: Minor
>  Labels: ML, Starter
>
> The current SGD implementation uses as gain for the weight updates the 
> formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain 
> calculation configurable and to provide different strategies for that. For 
> example:
> * stepsize/(1 + iterationNumber)
> * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4)
> See also how to properly select the gains [1].
> Resources:
> [1] http://arxiv.org/pdf/1107.2490.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3192] Add explain support to print ast ...

2016-01-08 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1477#discussion_r49183864
  
--- Diff: 
flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/explain/PlanJsonParser.java
 ---
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.table.explain;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.DeserializationFeature;
+
+import java.io.PrintWriter;
+import java.io.StringWriter;
+import java.util.LinkedHashMap;
+import java.util.List;
+
+public class PlanJsonParser {
+   
+   public static String getSqlExecutionPlan(String t, boolean extended) 
throws Exception{
+   StringWriter sw = new StringWriter();
+   PrintWriter pw = new PrintWriter(sw);
+   buildSqlExecutionPlan(t, extended, pw);
+   pw.close();
+   return sw.toString();
+   }
+
+   private static void printTab(int tabCount, PrintWriter pw) {
+   for (int i = 0; i < tabCount; i++)
+   pw.print("\t");
+   }
+
+   private static void buildSqlExecutionPlan(String t, Boolean extended, 
PrintWriter pw) throws Exception {
--- End diff --

I think we can merge `buildSqlExecutionPlan` and `getSqlExecutionPlan` into 
a single method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3192] Add explain support to print ast ...

2016-01-08 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1477#discussion_r49183832
  
--- Diff: 
flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/explain/PlanJsonParser.java
 ---
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.table.explain;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.DeserializationFeature;
+
+import java.io.PrintWriter;
+import java.io.StringWriter;
+import java.util.LinkedHashMap;
+import java.util.List;
+
+public class PlanJsonParser {
+   
+   public static String getSqlExecutionPlan(String t, boolean extended) 
throws Exception{
--- End diff --

Add space after `Exception`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3192) Add explain support to print ast and sql physical execution plan.

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089141#comment-15089141
 ] 

ASF GitHub Bot commented on FLINK-3192:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1477#discussion_r49183864
  
--- Diff: 
flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/explain/PlanJsonParser.java
 ---
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.table.explain;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.DeserializationFeature;
+
+import java.io.PrintWriter;
+import java.io.StringWriter;
+import java.util.LinkedHashMap;
+import java.util.List;
+
+public class PlanJsonParser {
+   
+   public static String getSqlExecutionPlan(String t, boolean extended) 
throws Exception{
+   StringWriter sw = new StringWriter();
+   PrintWriter pw = new PrintWriter(sw);
+   buildSqlExecutionPlan(t, extended, pw);
+   pw.close();
+   return sw.toString();
+   }
+
+   private static void printTab(int tabCount, PrintWriter pw) {
+   for (int i = 0; i < tabCount; i++)
+   pw.print("\t");
+   }
+
+   private static void buildSqlExecutionPlan(String t, Boolean extended, 
PrintWriter pw) throws Exception {
--- End diff --

I think we can merge `buildSqlExecutionPlan` and `getSqlExecutionPlan` into 
a single method.


> Add explain support to print ast and sql physical execution plan. 
> --
>
> Key: FLINK-3192
> URL: https://issues.apache.org/jira/browse/FLINK-3192
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: features
>
> Table API doesn't support sql-explanation now. Add the explain support to 
> print ast (abstract syntax tree) and the physical execution plan of sql.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3192) Add explain support to print ast and sql physical execution plan.

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089138#comment-15089138
 ] 

ASF GitHub Bot commented on FLINK-3192:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1477#discussion_r49183832
  
--- Diff: 
flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/explain/PlanJsonParser.java
 ---
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.table.explain;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.DeserializationFeature;
+
+import java.io.PrintWriter;
+import java.io.StringWriter;
+import java.util.LinkedHashMap;
+import java.util.List;
+
+public class PlanJsonParser {
+   
+   public static String getSqlExecutionPlan(String t, boolean extended) 
throws Exception{
--- End diff --

Add space after `Exception`


> Add explain support to print ast and sql physical execution plan. 
> --
>
> Key: FLINK-3192
> URL: https://issues.apache.org/jira/browse/FLINK-3192
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: features
>
> Table API doesn't support sql-explanation now. Add the explain support to 
> print ast (abstract syntax tree) and the physical execution plan of sql.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3192] Add explain support to print ast ...

2016-01-08 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1477#issuecomment-169987537
  
I added two small comments in-line. 
But there is another thing that came to my mind. A `Table` can be 
constructed from any type of `DataSet` not just a `DataSource`. So, it is 
possible that the input of a query is a partial DataSet program. In that case, 
the Explain output would include the DataSet program as well as the Table 
query. I am not sure if that is actually desirable. Do you think it is possible 
to limit the Explain output to the current query only? Or would you be in favor 
of showing the whole plan?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3192) Add explain support to print ast and sql physical execution plan.

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089155#comment-15089155
 ] 

ASF GitHub Bot commented on FLINK-3192:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1477#issuecomment-169987537
  
I added two small comments in-line. 
But there is another thing that came to my mind. A `Table` can be 
constructed from any type of `DataSet` not just a `DataSource`. So, it is 
possible that the input of a query is a partial DataSet program. In that case, 
the Explain output would include the DataSet program as well as the Table 
query. I am not sure if that is actually desirable. Do you think it is possible 
to limit the Explain output to the current query only? Or would you be in favor 
of showing the whole plan?


> Add explain support to print ast and sql physical execution plan. 
> --
>
> Key: FLINK-3192
> URL: https://issues.apache.org/jira/browse/FLINK-3192
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: features
>
> Table API doesn't support sql-explanation now. Add the explain support to 
> print ast (abstract syntax tree) and the physical execution plan of sql.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2125][streaming] Delimiter change from ...

2016-01-08 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1491#discussion_r49204456
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java
 ---
@@ -84,25 +84,25 @@ public SocketTextStreamFunction(String hostname, int 
port, char delimiter, long
public void run(SourceContext ctx) throws Exception {
final StringBuilder buffer = new StringBuilder();
long attempt = 0;
-   
+
while (isRunning) {
-   
+
try (Socket socket = new Socket()) {
currentSocket = socket;
-   
+
LOG.info("Connecting to server socket " + 
hostname + ':' + port);
socket.connect(new InetSocketAddress(hostname, 
port), CONNECTION_TIMEOUT_TIME);
BufferedReader reader = new BufferedReader(new 
InputStreamReader(socket.getInputStream()));
 
-   int data;
-   while (isRunning && (data = reader.read()) != 
-1) {
-   // check if the string is complete
-   if (data != delimiter) {
-   buffer.append((char) data);
-   }
-   else {
-   // truncate trailing carriage 
return
-   if (delimiter == '\n' && 
buffer.length() > 0 && buffer.charAt(buffer.length() - 1) == '\r') {
+   char[] charBuffer = new 
char[Math.max(8192,delimiter.length()*2)];
+   int bytesRead;
+   while (isRunning && (bytesRead = 
reader.read(charBuffer)) != -1) {
+   String input = new String(charBuffer, 
0, bytesRead);
+   int start = 0,pos;
+   while ((pos = input.indexOf(delimiter, 
start)) > -1) {
--- End diff --

What happens if `input` does not contain `delimiter`? And what happens with 
the input part after the last delimiter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2125) String delimiter for SocketTextStream

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089426#comment-15089426
 ] 

ASF GitHub Bot commented on FLINK-2125:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1491#discussion_r49204456
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java
 ---
@@ -84,25 +84,25 @@ public SocketTextStreamFunction(String hostname, int 
port, char delimiter, long
public void run(SourceContext ctx) throws Exception {
final StringBuilder buffer = new StringBuilder();
long attempt = 0;
-   
+
while (isRunning) {
-   
+
try (Socket socket = new Socket()) {
currentSocket = socket;
-   
+
LOG.info("Connecting to server socket " + 
hostname + ':' + port);
socket.connect(new InetSocketAddress(hostname, 
port), CONNECTION_TIMEOUT_TIME);
BufferedReader reader = new BufferedReader(new 
InputStreamReader(socket.getInputStream()));
 
-   int data;
-   while (isRunning && (data = reader.read()) != 
-1) {
-   // check if the string is complete
-   if (data != delimiter) {
-   buffer.append((char) data);
-   }
-   else {
-   // truncate trailing carriage 
return
-   if (delimiter == '\n' && 
buffer.length() > 0 && buffer.charAt(buffer.length() - 1) == '\r') {
+   char[] charBuffer = new 
char[Math.max(8192,delimiter.length()*2)];
+   int bytesRead;
+   while (isRunning && (bytesRead = 
reader.read(charBuffer)) != -1) {
+   String input = new String(charBuffer, 
0, bytesRead);
+   int start = 0,pos;
+   while ((pos = input.indexOf(delimiter, 
start)) > -1) {
--- End diff --

What happens if `input` does not contain `delimiter`? And what happens with 
the input part after the last delimiter?


> String delimiter for SocketTextStream
> -
>
> Key: FLINK-2125
> URL: https://issues.apache.org/jira/browse/FLINK-2125
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Márton Balassi
>Priority: Minor
>  Labels: starter
>
> The SocketTextStreamFunction uses a character delimiter, despite other parts 
> of the API using String delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3197) InputStream not closed in BinaryInputFormat#createStatistics

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089430#comment-15089430
 ] 

ASF GitHub Bot commented on FLINK-3197:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1494#discussion_r49205022
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java 
---
@@ -213,9 +213,10 @@ protected SequentialStatistics 
createStatistics(List files, FileBase
 
FSDataInputStream fdis = 
file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize());
fdis.seek(file.getLen() - blockInfo.getInfoSize());
-   
+
blockInfo.read(new DataInputViewStreamWrapper(fdis));
totalCount += blockInfo.getAccumulatedRecordCount();
+   fdis.close();
--- End diff --

+1, or to use the try with resources construct.


> InputStream not closed in BinaryInputFormat#createStatistics
> 
>
> Key: FLINK-3197
> URL: https://issues.apache.org/jira/browse/FLINK-3197
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
>   FSDataInputStream fdis = 
> file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize());
>   fdis.seek(file.getLen() - blockInfo.getInfoSize());
>   blockInfo.read(new DataInputViewStreamWrapper(fdis));
>   totalCount += blockInfo.getAccumulatedRecordCount();
> {code}
> fdis / wrapper should be closed upon leaving the method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3189) Error while parsing job arguments passed by CLI

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089368#comment-15089368
 ] 

ASF GitHub Bot commented on FLINK-3189:
---

Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1493#issuecomment-170035218
  
Sure. Will do that.


> Error while parsing job arguments passed by CLI
> ---
>
> Key: FLINK-3189
> URL: https://issues.apache.org/jira/browse/FLINK-3189
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.10.1
>Reporter: Filip Leczycki
>Assignee: Matthias J. Sax
>Priority: Minor
>
> Flink CLI treats job arguments provided in format "-" as its own 
> parameters, which results in errors in execution.
> Example 1:
> call: >bin/flink info myJarFile.jar -f flink -i  -m 1
> error: Unrecognized option: -f
> Example 2:
> Job myJarFile.jar is uploaded to web submission client, flink parameter box 
> is empty
> program arguments box: -f flink -i  -m 1
> error: 
> An unexpected error occurred:
> Unrecognized option: -f
> org.apache.flink.client.cli.CliArgsException: Unrecognized option: -f
>   at 
> org.apache.flink.client.cli.CliFrontendParser.parseInfoCommand(CliFrontendParser.java:296)
>   at org.apache.flink.client.CliFrontend.info(CliFrontend.java:376)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:983)
>   at 
> org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:171)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>   at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
>   at org.eclipse.jetty.server.Server.handle(Server.java:348)
>   at 
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
>   at 
> org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
>   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
>   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
>   at 
> org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
>   at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
>   at java.lang.Thread.run(Thread.java:745)
> Execution of 
> >bin/flink run myJarFile.jar -f flink -i  -m 1  
> works perfectly fine



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2125][streaming] Delimiter change from ...

2016-01-08 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1491#issuecomment-170043300
  
Thanks for your contribution @ajaybhat.

The PR contains many whitespace changes. Would be good to avoid them.

I had a question concerning the implementation. I might be the case that 
the text after the last delimiter is discarded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-1737: Kronecker product

2016-01-08 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1078#issuecomment-170050950
  
Looks really good :-) Thanks for your contribution @daniel-pape. Will merge 
it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3210) Unnecessary call to deserializer#deserialize() in LegacyFetcher#SimpleConsumerThread#run()

2016-01-08 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3210:
-

 Summary: Unnecessary call to deserializer#deserialize() in 
LegacyFetcher#SimpleConsumerThread#run()
 Key: FLINK-3210
 URL: https://issues.apache.org/jira/browse/FLINK-3210
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Here is related code:
{code}
byte[] valueBytes;
if (payload == null) {
  deletedMessages++;
  valueBytes = null;
} else {
...
final T value = deserializer.deserialize(keyBytes, valueBytes, 
fp.topic, offset);
{code}
When valueBytes is null, there is no need to call deserializer#deserialize()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-1737: Kronecker product

2016-01-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1078


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1737) Add statistical whitening transformation to machine learning library

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089525#comment-15089525
 ] 

ASF GitHub Bot commented on FLINK-1737:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1078


> Add statistical whitening transformation to machine learning library
> 
>
> Key: FLINK-1737
> URL: https://issues.apache.org/jira/browse/FLINK-1737
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Pape
>  Labels: ML, Starter
>
> The statistical whitening transformation [1] is a preprocessing step for 
> different ML algorithms. It decorrelates the individual dimensions and sets 
> its variance to 1.
> Statistical whitening should be implemented as a {{Transfomer}}.
> Resources:
> [1] [http://en.wikipedia.org/wiki/Whitening_transformation]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3197) InputStream not closed in BinaryInputFormat#createStatistics

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089343#comment-15089343
 ] 

ASF GitHub Bot commented on FLINK-3197:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/1494#discussion_r49199370
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java 
---
@@ -213,9 +213,10 @@ protected SequentialStatistics 
createStatistics(List files, FileBase
 
FSDataInputStream fdis = 
file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize());
fdis.seek(file.getLen() - blockInfo.getInfoSize());
-   
+
blockInfo.read(new DataInputViewStreamWrapper(fdis));
totalCount += blockInfo.getAccumulatedRecordCount();
+   fdis.close();
--- End diff --

Agreed, we should try and consistently use a "try-with-resource" clause for 
streams.


> InputStream not closed in BinaryInputFormat#createStatistics
> 
>
> Key: FLINK-3197
> URL: https://issues.apache.org/jira/browse/FLINK-3197
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
>   FSDataInputStream fdis = 
> file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize());
>   fdis.seek(file.getLen() - blockInfo.getInfoSize());
>   blockInfo.read(new DataInputViewStreamWrapper(fdis));
>   totalCount += blockInfo.getAccumulatedRecordCount();
> {code}
> fdis / wrapper should be closed upon leaving the method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2125) String delimiter for SocketTextStream

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089428#comment-15089428
 ] 

ASF GitHub Bot commented on FLINK-2125:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1491#issuecomment-170043300
  
Thanks for your contribution @ajaybhat.

The PR contains many whitespace changes. Would be good to avoid them.

I had a question concerning the implementation. I might be the case that 
the text after the last delimiter is discarded.


> String delimiter for SocketTextStream
> -
>
> Key: FLINK-2125
> URL: https://issues.apache.org/jira/browse/FLINK-2125
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Márton Balassi
>Priority: Minor
>  Labels: starter
>
> The SocketTextStreamFunction uses a character delimiter, despite other parts 
> of the API using String delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1737) Add statistical whitening transformation to machine learning library

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089481#comment-15089481
 ] 

ASF GitHub Bot commented on FLINK-1737:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1078#issuecomment-170050950
  
Looks really good :-) Thanks for your contribution @daniel-pape. Will merge 
it.


> Add statistical whitening transformation to machine learning library
> 
>
> Key: FLINK-1737
> URL: https://issues.apache.org/jira/browse/FLINK-1737
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Pape
>  Labels: ML, Starter
>
> The statistical whitening transformation [1] is a preprocessing step for 
> different ML algorithms. It decorrelates the individual dimensions and sets 
> its variance to 1.
> Statistical whitening should be implemented as a {{Transfomer}}.
> Resources:
> [1] [http://en.wikipedia.org/wiki/Whitening_transformation]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3189) Error while parsing job arguments passed by CLI

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089344#comment-15089344
 ] 

ASF GitHub Bot commented on FLINK-3189:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1493#issuecomment-170032469
  
Can you add a test case that guards this change?


> Error while parsing job arguments passed by CLI
> ---
>
> Key: FLINK-3189
> URL: https://issues.apache.org/jira/browse/FLINK-3189
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.10.1
>Reporter: Filip Leczycki
>Assignee: Matthias J. Sax
>Priority: Minor
>
> Flink CLI treats job arguments provided in format "-" as its own 
> parameters, which results in errors in execution.
> Example 1:
> call: >bin/flink info myJarFile.jar -f flink -i  -m 1
> error: Unrecognized option: -f
> Example 2:
> Job myJarFile.jar is uploaded to web submission client, flink parameter box 
> is empty
> program arguments box: -f flink -i  -m 1
> error: 
> An unexpected error occurred:
> Unrecognized option: -f
> org.apache.flink.client.cli.CliArgsException: Unrecognized option: -f
>   at 
> org.apache.flink.client.cli.CliFrontendParser.parseInfoCommand(CliFrontendParser.java:296)
>   at org.apache.flink.client.CliFrontend.info(CliFrontend.java:376)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:983)
>   at 
> org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:171)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>   at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
>   at org.eclipse.jetty.server.Server.handle(Server.java:348)
>   at 
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
>   at 
> org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
>   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
>   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
>   at 
> org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
>   at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
>   at java.lang.Thread.run(Thread.java:745)
> Execution of 
> >bin/flink run myJarFile.jar -f flink -i  -m 1  
> works perfectly fine



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3189] Error while parsing job arguments...

2016-01-08 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1493#issuecomment-170032469
  
Can you add a test case that guards this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-3197: InputStream not closed in BinaryIn...

2016-01-08 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1494#discussion_r49205022
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java 
---
@@ -213,9 +213,10 @@ protected SequentialStatistics 
createStatistics(List files, FileBase
 
FSDataInputStream fdis = 
file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize());
fdis.seek(file.getLen() - blockInfo.getInfoSize());
-   
+
blockInfo.read(new DataInputViewStreamWrapper(fdis));
totalCount += blockInfo.getAccumulatedRecordCount();
+   fdis.close();
--- End diff --

+1, or to use the try with resources construct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3058] Add support for Kafka 0.9.0.0

2016-01-08 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1489#issuecomment-170044651
  
Test cases `Kafka09ITCase.testMultipleSourcesOnePartition` and 
`Kafka08ITCase.testOffsetInZookeeper` are failing in Travis build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3058) Add Kafka consumer for new 0.9.0.0 Kafka API

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089436#comment-15089436
 ] 

ASF GitHub Bot commented on FLINK-3058:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1489#issuecomment-170044651
  
Test cases `Kafka09ITCase.testMultipleSourcesOnePartition` and 
`Kafka08ITCase.testOffsetInZookeeper` are failing in Travis build.


> Add Kafka consumer for new 0.9.0.0 Kafka API
> 
>
> Key: FLINK-3058
> URL: https://issues.apache.org/jira/browse/FLINK-3058
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> The Apache Kafka project is about to release a new consumer API . They also 
> changed their internal protocol so Kafka 0.9.0.0 users will need an updated 
> consumer from Flink.
> Also, I would like to let Flink be among the first stream processors 
> supporting Kafka 0.9.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1994] [ml] Add different gain calculati...

2016-01-08 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1397#discussion_r49211433
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala
 ---
@@ -54,14 +54,15 @@ abstract class GradientDescent extends IterativeSolver {
 */
   override def optimize(
 data: DataSet[LabeledVector],
-initialWeights: Option[DataSet[WeightVector]]): DataSet[WeightVector] 
= {
+initialWeights: Option[DataSet[WeightVector]]
+ ): DataSet[WeightVector] = {
--- End diff --

Indentation:
```
override def optimize(
data: ...
initialWeights: ...)
  : DataSet[WeightVector] = {

}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1994) Add different gain calculation schemes to SGD

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089529#comment-15089529
 ] 

ASF GitHub Bot commented on FLINK-1994:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1397#discussion_r49211433
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala
 ---
@@ -54,14 +54,15 @@ abstract class GradientDescent extends IterativeSolver {
 */
   override def optimize(
 data: DataSet[LabeledVector],
-initialWeights: Option[DataSet[WeightVector]]): DataSet[WeightVector] 
= {
+initialWeights: Option[DataSet[WeightVector]]
+ ): DataSet[WeightVector] = {
--- End diff --

Indentation:
```
override def optimize(
data: ...
initialWeights: ...)
  : DataSet[WeightVector] = {

}
```


> Add different gain calculation schemes to SGD
> -
>
> Key: FLINK-1994
> URL: https://issues.apache.org/jira/browse/FLINK-1994
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Trevor Grant
>Priority: Minor
>  Labels: ML, Starter
>
> The current SGD implementation uses as gain for the weight updates the 
> formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain 
> calculation configurable and to provide different strategies for that. For 
> example:
> * stepsize/(1 + iterationNumber)
> * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4)
> See also how to properly select the gains [1].
> Resources:
> [1] http://arxiv.org/pdf/1107.2490.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1870) Reintroduce Indexed reader functionality to streaming

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089432#comment-15089432
 ] 

ASF GitHub Bot commented on FLINK-1870:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170044112
  
The user code should not deal with what channel an element came from. It 
wires assumptions about the parallelism of the predecessor into the user code. 
Adjustments of the parallelism become impossible that way.

Why does the storm compatibility need this?


> Reintroduce Indexed reader functionality to streaming
> -
>
> Key: FLINK-1870
> URL: https://issues.apache.org/jira/browse/FLINK-1870
> Project: Flink
>  Issue Type: Task
>  Components: Streaming
>Reporter: Gyula Fora
>Assignee: Matthias J. Sax
>Priority: Minor
>
> The Indexed record reader classes (IndexedReaderIterator, 
> IndexedMutableReader) were introduced to allow the streaming operators to 
> access the index of the last read channel from the input gate. This was a 
> necessary step toward future input sorting operators.
> Unfortunately this untested feature was patched away by the following commit:
> https://github.com/apache/flink/commit/5232c56b13b7e13e2cf10dbe818c221f5557426d
> At some point we need to reimplement these features with proper tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3210) Unnecessary call to deserializer#deserialize() in LegacyFetcher#SimpleConsumerThread#run()

2016-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089572#comment-15089572
 ] 

Ted Yu commented on FLINK-3210:
---

bq. The key can still contain something

I agree.
But why deserializing the value ?

> Unnecessary call to deserializer#deserialize() in 
> LegacyFetcher#SimpleConsumerThread#run()
> --
>
> Key: FLINK-3210
> URL: https://issues.apache.org/jira/browse/FLINK-3210
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> byte[] valueBytes;
> if (payload == null) {
>   deletedMessages++;
>   valueBytes = null;
> } else {
> ...
> final T value = deserializer.deserialize(keyBytes, 
> valueBytes, fp.topic, offset);
> {code}
> When valueBytes is null, there is no need to call deserializer#deserialize()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3197: InputStream not closed in BinaryIn...

2016-01-08 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/1494#discussion_r49199370
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/BinaryInputFormat.java 
---
@@ -213,9 +213,10 @@ protected SequentialStatistics 
createStatistics(List files, FileBase
 
FSDataInputStream fdis = 
file.getPath().getFileSystem().open(file.getPath(), blockInfo.getInfoSize());
fdis.seek(file.getLen() - blockInfo.getInfoSize());
-   
+
blockInfo.read(new DataInputViewStreamWrapper(fdis));
totalCount += blockInfo.getAccumulatedRecordCount();
+   fdis.close();
--- End diff --

Agreed, we should try and consistently use a "try-with-resource" clause for 
streams.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1712) Restructure Maven Projects

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089354#comment-15089354
 ] 

ASF GitHub Bot commented on FLINK-1712:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1492#issuecomment-170033507
  
Can you summarize what the changes from this pull request are? The set of 
changes is too large for github to display.


> Restructure Maven Projects
> --
>
> Key: FLINK-1712
> URL: https://issues.apache.org/jira/browse/FLINK-1712
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Priority: Minor
>
> Project consolidation
>  - flink-hadoop (shaded fat jar)
>  - Core (Core and Java)
>  - (core-scala)
>  - Streaming (core + java)
>  - (streaming-scala)
>  - Runtime
>  - Optimizer (may also be merged with Client)
>  - Client (or Client + Optimizer)
>  
>  - Examples (Java + Scala + Streaming Java + Streaming Scala)
>  - Tests (test-utils (compile) and tests (test))
>  
>  - Quickstarts
>- Quickstart Java
>- Quickstart Scala
>  
>  - connectors / Input/Output Formats
>- Avro
>- HBase
>- HadoopCompartibility
>- HCatalogue
>- JDBC
>- kafka
>- rabbit
>- ...
>  
>  - staging
>- Gelly
>- ML
>- spargel (deprecated)
>- expression API
>  
>  - contrib
>  
>  - yarn
>  
>  - dist
>  
>  - yarn tests
>  
>  - java 8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1712] Remove "flink-staging" module

2016-01-08 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1492#issuecomment-170033507
  
Can you summarize what the changes from this pull request are? The set of 
changes is too large for github to display.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3189] Error while parsing job arguments...

2016-01-08 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1493#issuecomment-170035218
  
Sure. Will do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-08 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170044112
  
The user code should not deal with what channel an element came from. It 
wires assumptions about the parallelism of the predecessor into the user code. 
Adjustments of the parallelism become impossible that way.

Why does the storm compatibility need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3210) Unnecessary call to deserializer#deserialize() in LegacyFetcher#SimpleConsumerThread#run()

2016-01-08 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089566#comment-15089566
 ] 

Robert Metzger commented on FLINK-3210:
---

I think this issue is invalid. A null value indicates a deleted Kafka message. 
The key can still contain something, and somebody explicitly opened a JIRA for 
this feature.

> Unnecessary call to deserializer#deserialize() in 
> LegacyFetcher#SimpleConsumerThread#run()
> --
>
> Key: FLINK-3210
> URL: https://issues.apache.org/jira/browse/FLINK-3210
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> byte[] valueBytes;
> if (payload == null) {
>   deletedMessages++;
>   valueBytes = null;
> } else {
> ...
> final T value = deserializer.deserialize(keyBytes, 
> valueBytes, fp.topic, offset);
> {code}
> When valueBytes is null, there is no need to call deserializer#deserialize()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)