[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066596#comment-16066596 ] ASF GitHub Bot commented on FLINK-2627: --- Github user coveralls commented on the issue: https://github.com/apache/flink/pull/1099 [![Coverage Status](https://coveralls.io/builds/12169841/badge)](https://coveralls.io/builds/12169841) Changes Unknown when pulling **784cbc1a7901c65719f92919a2f584b5636105bf on sachingoel0101:scala_utils_fix** into ** on apache:master**. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > Fix For: 0.10.0 > > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14847300#comment-14847300 ] ASF GitHub Bot commented on FLINK-2627: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1099 > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805312#comment-14805312 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-141404362 I've already removed the line break. :) > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805310#comment-14805310 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-141403984 Will address Till's comment and merge this... > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805304#comment-14805304 ] ASF GitHub Bot commented on FLINK-2627: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-141402933 LGTM. +1 for merging :-) > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805300#comment-14805300 ] ASF GitHub Bot commented on FLINK-2627: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1099#discussion_r39838739 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/utils/package.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala + +import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, TypeInformation} +import org.apache.flink.api.java.Utils +import org.apache.flink.api.java.utils.{DataSetUtils => jutils} + +import _root_.scala.language.implicitConversions +import _root_.scala.reflect.ClassTag + +package object utils { + + /** + * This class provides simple utility methods for zipping elements in a data set with an index + * or with a unique identifier, sampling elements from a data set. + * + * @param self Data Set + */ + --- End diff -- line break > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805277#comment-14805277 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-141395162 Looks good to me. +1 to merge > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805266#comment-14805266 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-141391475 Unrelated failures. Already filed jiras for those. 2700 and 2612. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803222#comment-14803222 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-141148749 Hey @StephanEwen, apologies for being too eager but is it possible to get this in soon? > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738315#comment-14738315 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-139137139 @StephanEwen , can you look this over again? > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734690#comment-14734690 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138530591 Travis passes successfully. I've squashed the commits. This should be mergeable now. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734563#comment-14734563 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138504819 Ah. Thank you @aljoscha. Travis should pass. I've already pushed a fix. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734557#comment-14734557 ] ASF GitHub Bot commented on FLINK-2627: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138503770 I took a stab at removing the implicit parameters here: https://github.com/aljoscha/flink/commit/e197ea4aba4005400bc80a5693f17ec2617bfae5 The tests are still running on Travis but I think it should work. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734553#comment-14734553 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138502471 I think I agree with that. I wasn't too happy about using implicit arguments here; we're constructing the type information explicitly anyway. Will push a commit in a while to change this. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734467#comment-14734467 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138488251 I think this looks good. I personally like explicit arguments, even if it means sometimes explicitly summoning an implicit bound. It just makes it clearer where the type infos flow, which is not that easy to figure out ;-) This is subject to debate, though. @tillrohrmann and @aljoscha may be of different opinion here... > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734461#comment-14734461 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1099#discussion_r38903902 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/UnfinishedCoGroupOperation.scala --- @@ -61,33 +59,14 @@ class UnfinishedCoGroupOperation[L: ClassTag, R: ClassTag]( // We have to use this hack, for some reason classOf[Array[T]] does not work. // Maybe because ObjectArrayTypeInfo does not accept the Scala Array as an array class. -val leftArrayType = +implicit val leftArrayType = --- End diff -- These values should be explicitly provided, not implicitly. It makes the code much more understandable > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734458#comment-14734458 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1099#discussion_r38903737 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/package.scala --- @@ -70,4 +73,27 @@ package object scala { } st(depth).toString } + + def createTuple2TypeInformation[T1, T2] + (implicit t1: TypeInformation[T1], t2: TypeInformation[T2]) --- End diff -- Making this implicit seems dangerous, it should be explicitly provided. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734455#comment-14734455 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138485125 The type information generation works as a macro on the abstract syntax tree, that's why it cannot work on its own code (or any code in the same project, which is the same compilation unit). > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734101#comment-14734101 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138401226 @StephanEwen , I have created a separate function to create type information for 2-tuple. One question though. Why is there a need to generate type information explicitly here? The `TypeInformationGen` class does have a case analysis for `Product` types. I may be very wrong, but `createTypeInformation` macro cannot be used anywhere inside the module itself but only after the module's been compiled. This is perhaps why `createTypeInformation` works in, say, `flink-ml`. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733866#comment-14733866 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138329755 I think the way to go is not to put the type info into the class, but into the methods, and create it as follows: ```scala def zipWithIndex[T : TypeInformation : ClassTag](): DataSet[(Long, T)] = { val tInfo = implicitly[TypeInformation[T]] implicit val tupleTypeInformation = new CaseClassTypeInfo[(Long, T)]( classOf[(Long, T)], Array(BasicTypeInfo.LONG_TYPE_INFO, tInfo), Seq(BasicTypeInfo.LONG_TYPE_INFO, tInfo), Array("_1", "_2")) wrap(jutils.zipWithIndex(self.javaSet)).map { t => (t.f0.toLong, t.f1) } ``` All the methods in the utils class should have parenthesis, they are not a side effect free getters after all. Also, some tooling around creating Scala Tuple type information would be nice. I can see that there are more places where one would do that. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733840#comment-14733840 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/1099#discussion_r38869305 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/utils/package.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala + +import org.apache.flink.api.common.ExecutionConfig +import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, TypeInformation} +import org.apache.flink.api.common.typeutils.TypeSerializer +import org.apache.flink.api.java.Utils +import org.apache.flink.api.java.utils.{DataSetUtils => jutils} +import org.apache.flink.api.scala.typeutils.{CaseClassSerializer, CaseClassTypeInfo} + +import _root_.scala.language.implicitConversions +import _root_.scala.reflect.ClassTag + +package object utils { + + /** + * This class provides simple utility methods for zipping elements in a data set with an index + * or with a unique identifier, sampling elements from a data set. + * + * @param self Data Set + */ + + implicit class DataSetUtils[T: TypeInformation: ClassTag](val self: DataSet[T]) { + +implicit val tupleTypeInformation = new CaseClassTypeInfo[(Long, T)]( + classOf[(Long, T)], --- End diff -- No. It always led to error I mentioned above. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733836#comment-14733836 ] ASF GitHub Bot commented on FLINK-2627: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1099#discussion_r38869114 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/utils/package.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala + +import org.apache.flink.api.common.ExecutionConfig +import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, TypeInformation} +import org.apache.flink.api.common.typeutils.TypeSerializer +import org.apache.flink.api.java.Utils +import org.apache.flink.api.java.utils.{DataSetUtils => jutils} +import org.apache.flink.api.scala.typeutils.{CaseClassSerializer, CaseClassTypeInfo} + +import _root_.scala.language.implicitConversions +import _root_.scala.reflect.ClassTag + +package object utils { + + /** + * This class provides simple utility methods for zipping elements in a data set with an index + * or with a unique identifier, sampling elements from a data set. + * + * @param self Data Set + */ + + implicit class DataSetUtils[T: TypeInformation: ClassTag](val self: DataSet[T]) { + +implicit val tupleTypeInformation = new CaseClassTypeInfo[(Long, T)]( + classOf[(Long, T)], --- End diff -- Didn't `createTypeInformation[(Long, T)]` work? > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733827#comment-14733827 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/1099#discussion_r38868511 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/utils/package.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala + +import org.apache.flink.api.common.ExecutionConfig +import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, TypeInformation} +import org.apache.flink.api.common.typeutils.TypeSerializer +import org.apache.flink.api.java.Utils +import org.apache.flink.api.java.utils.{DataSetUtils => jutils} +import org.apache.flink.api.scala.typeutils.{CaseClassSerializer, CaseClassTypeInfo} + +import _root_.scala.language.implicitConversions +import _root_.scala.reflect.ClassTag + +package object utils { + + /** + * This class provides simple utility methods for zipping elements in a data set with an index + * or with a unique identifier, sampling elements from a data set. + * + * @param self Data Set + */ + + implicit class DataSetUtils[T: TypeInformation: ClassTag](val self: DataSet[T]) { + +implicit val tupleTypeInformation = new CaseClassTypeInfo[(Long, T)]( + classOf[(Long, T)], --- End diff -- @StephanEwen , is this what you had in mind? Thanks a lot. Figuring this out cleared up a lot of things for me. :) > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733725#comment-14733725 ] ASF GitHub Bot commented on FLINK-2627: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138303761 To be precise, directly before the `map` call. And you have to make declare the value as an implicit value. Otherwise, the map call won't find it. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733722#comment-14733722 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138302904 Ah, sorry, my bad, you should take the TypeInformation fro `T` from the call site. You may need to manually create the type info for the tuple, from the `T` type info, by creating a case class type info for `Tuple2` with `Long` and `T`. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733719#comment-14733719 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138302608 I think for this, the type information should be passed from the call site, you should not need to create it explicitly. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733575#comment-14733575 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138277563 I get the following error on using that: `macro implementation not found: createTypeInformation (the most common reason is that you cannot use macro implementations in the same compilation run that defines them)` The correct place to use that would be just before the `wrap` call though, right? > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733541#comment-14733541 ] ASF GitHub Bot commented on FLINK-2627: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138269532 Have you tried constructing the type information explicitly `createTypeInformation[(Long, T)]`? > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733501#comment-14733501 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1099#issuecomment-138258862 I am unable to get rid of the implicit type information for the `zip` functions, presumably because the type information for `(Long,T)` isn't found. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733394#comment-14733394 ] ASF GitHub Bot commented on FLINK-2627: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1099#discussion_r38842611 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/utils/package.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala + +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.Utils +import org.apache.flink.api.java.utils.{DataSetUtils => jutils} + +import _root_.scala.language.implicitConversions +import _root_.scala.reflect.ClassTag + +package object utils { + + /** + * This class provides simple utility methods for zipping elements in a data set with an index + * or with a unique identifier, sampling elements from a data set. + * + * @param self Data Set + */ + + implicit class DataSetUtils[T](val self: DataSet[T]) { + +/** + * Method that takes a set of subtask index, total number of elements mappings + * and assigns ids to all the elements from the input data set. + * + * @return a data set of tuple 2 consisting of consecutive ids and initial values. + */ +def zipWithIndex(implicit ti: TypeInformation[(Long, T)], --- End diff -- Stephan suggested to remove the implicit parameter lists from all methods and write instead `implicit class DataSetUtils[T: TypeInformation: ClassTag](val self: DataSet[T])`. +1 for his suggestion. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733385#comment-14733385 ] ASF GitHub Bot commented on FLINK-2627: --- Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/1099#discussion_r38842188 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/utils/package.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala + +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.Utils +import org.apache.flink.api.java.utils.{DataSetUtils => jutils} + +import _root_.scala.language.implicitConversions +import _root_.scala.reflect.ClassTag + +package object utils { + + /** + * This class provides simple utility methods for zipping elements in a data set with an index + * or with a unique identifier, sampling elements from a data set. + * + * @param self Data Set + */ + + implicit class DataSetUtils[T](val self: DataSet[T]) { + +/** + * Method that takes a set of subtask index, total number of elements mappings + * and assigns ids to all the elements from the input data set. + * + * @return a data set of tuple 2 consisting of consecutive ids and initial values. + */ +def zipWithIndex(implicit ti: TypeInformation[(Long, T)], --- End diff -- I'm not sure I understand. I'm not familiar with implicit values and type information systems of scala as well as flink. > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733374#comment-14733374 ] ASF GitHub Bot commented on FLINK-2627: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1099#discussion_r38841668 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/utils/package.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.scala + +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.Utils +import org.apache.flink.api.java.utils.{DataSetUtils => jutils} + +import _root_.scala.language.implicitConversions +import _root_.scala.reflect.ClassTag + +package object utils { + + /** + * This class provides simple utility methods for zipping elements in a data set with an index + * or with a unique identifier, sampling elements from a data set. + * + * @param self Data Set + */ + + implicit class DataSetUtils[T](val self: DataSet[T]) { + +/** + * Method that takes a set of subtask index, total number of elements mappings + * and assigns ids to all the elements from the input data set. + * + * @return a data set of tuple 2 consisting of consecutive ids and initial values. + */ +def zipWithIndex(implicit ti: TypeInformation[(Long, T)], --- End diff -- Quick question: In most other parts of the Scala API, the TypeInformation is passed via context bounds. Even though that de-sugars to an implicit parameter, why not keep the style consistent over all functions? > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2627) Make Scala Data Set utils easier to access
[ https://issues.apache.org/jira/browse/FLINK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1478#comment-1478 ] ASF GitHub Bot commented on FLINK-2627: --- GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/1099 [FLINK-2627][utils]Make Scala Data Set utils easier to access Introduces a package object for Scala data set utils to simplify usage. New usage: `import org.apache.flink.api.scala.utils._` You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink scala_utils_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1099.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1099 commit 7df1766278111583eac24c375ec7996f1854705b Author: Sachin Goel Date: 2015-09-05T13:12:38Z Makes scala utils access easier, in sync with Java utils accessor > Make Scala Data Set utils easier to access > -- > > Key: FLINK-2627 > URL: https://issues.apache.org/jira/browse/FLINK-2627 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Sachin Goel >Assignee: Sachin Goel >Priority: Trivial > > Currently, to use the Scala Data Set utility functions, one needs to import > {{import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet}} > This is counter-intuitive, extra complicated and should be more in sync with > how Java utils are imported. I propose a package object which can allow > importing utils like > {{import org.apache.flink.api.scala.utils._}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)