[jira] [Commented] (FLINK-992) Create CollectionDataSets by reading (client) local files.

2017-06-21 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058224#comment-16058224
 ] 

niraj rai commented on FLINK-992:
-

Please go ahead

On Jun 21, 2017 2:00 PM, "Neelesh Srinivas Salian (JIRA)" 



> Create CollectionDataSets by reading (client) local files.
> --
>
> Key: FLINK-992
> URL: https://issues.apache.org/jira/browse/FLINK-992
> Project: Flink
>  Issue Type: New Feature
>  Components: DataSet API, Python API
>Reporter: Fabian Hueske
>Assignee: niraj rai
>Priority: Minor
>  Labels: starter
>
> {{CollectionDataSets}} are a nice way to feed data into programs.
> We could add support to read a client-local file at program construction time 
> using a FileInputFormat, put its data into a CollectionDataSet, and ship its 
> data together with the program.
> This would remove the need to upload small files into DFS which are used 
> together with some large input (stored in DFS).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-992) Create CollectionDataSets by reading (client) local files.

2016-01-15 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102283#comment-15102283
 ] 

niraj rai commented on FLINK-992:
-

Hi Stefano, I will submit the patch by next week. is it ok with you?
Thanks
Niraj

On Fri, Jan 15, 2016 at 8:53 AM, Stefano Baghino (JIRA) 



> Create CollectionDataSets by reading (client) local files.
> --
>
> Key: FLINK-992
> URL: https://issues.apache.org/jira/browse/FLINK-992
> Project: Flink
>  Issue Type: New Feature
>  Components: DataSet API, Python API
>Reporter: Fabian Hueske
>Assignee: niraj rai
>Priority: Minor
>  Labels: starter
>
> {{CollectionDataSets}} are a nice way to feed data into programs.
> We could add support to read a client-local file at program construction time 
> using a FileInputFormat, put its data into a CollectionDataSet, and ship its 
> data together with the program.
> This would remove the need to upload small files into DFS which are used 
> together with some large input (stored in DFS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-992) Create CollectionDataSets by reading (client) local files.

2015-07-28 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644819#comment-14644819
 ] 

niraj rai commented on FLINK-992:
-

Hi Fabian,
Can you please provide more details about this feature? My understanding is, if 
we need to read the data from local file system.
Are you suggesting, we should read the data from local file system and and pass 
it to collection data sets? 
Thanks again.
Niraj

 Create CollectionDataSets by reading (client) local files.
 --

 Key: FLINK-992
 URL: https://issues.apache.org/jira/browse/FLINK-992
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Python API, Scala API
Reporter: Fabian Hueske
Assignee: niraj rai
Priority: Minor
  Labels: starter

 {{CollectionDataSets}} are a nice way to feed data into programs.
 We could add support to read a client-local file at program construction time 
 using a FileInputFormat, put its data into a CollectionDataSet, and ship its 
 data together with the program.
 This would remove the need to upload small files into DFS which are used 
 together with some large input (stored in DFS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1818) Provide API to cancel running job

2015-07-22 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637385#comment-14637385
 ] 

niraj rai commented on FLINK-1818:
--

Thanks Max for mentoring me.. Really appreciate your help.. Looking forward to 
contribute more ..

 Provide API to cancel running job
 -

 Key: FLINK-1818
 URL: https://issues.apache.org/jira/browse/FLINK-1818
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: niraj rai
  Labels: starter

 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1818) Provide API to cancel running job

2015-06-13 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584852#comment-14584852
 ] 

niraj rai commented on FLINK-1818:
--

Hi [~mjsax] Yes, I will submit another pull request in couple of days. Please 
wait. 
Thanks
Niraj

 Provide API to cancel running job
 -

 Key: FLINK-1818
 URL: https://issues.apache.org/jira/browse/FLINK-1818
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: niraj rai
  Labels: starter

 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1818) Provide API to cancel running job

2015-04-17 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499050#comment-14499050
 ] 

niraj rai commented on FLINK-1818:
--

Should we cancel just one job or the multiple job at a time? Should we also 
look to have an option to cancel all the jobs?

 Provide API to cancel running job
 -

 Key: FLINK-1818
 URL: https://issues.apache.org/jira/browse/FLINK-1818
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: niraj rai
  Labels: starter

 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1320) Add an off-heap variant of the managed memory

2015-04-17 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai reassigned FLINK-1320:


Assignee: niraj rai  (was: Maximilian Michels)

 Add an off-heap variant of the managed memory
 -

 Key: FLINK-1320
 URL: https://issues.apache.org/jira/browse/FLINK-1320
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Stephan Ewen
Assignee: niraj rai
Priority: Minor

 For (nearly) all memory that Flink accumulates (in the form of sort buffers, 
 hash tables, caching), we use a special way of representing data serialized 
 across a set of memory pages. The big work lies in the way the algorithms are 
 implemented to operate on pages, rather than on objects.
 The core class for the memory is the {{MemorySegment}}, which has all methods 
 to set and get primitives values efficiently. It is a somewhat simpler (and 
 faster) variant of a HeapByteBuffer.
 As such, it should be straightforward to create a version where the memory 
 segment is not backed by a heap byte[], but by memory allocated outside the 
 JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct 
 buffers do it.
 This may have multiple advantages:
   - We reduce the size of the JVM heap (garbage collected) and the number and 
 size of long living alive objects. For large JVM sizes, this may improve 
 performance quite a bit. Utilmately, we would in many cases reduce JVM size 
 to 1/3 to 1/2 and keep the remaining memory outside the JVM.
   - We save copies when we move memory pages to disk (spilling) or through 
 the network (shuffling / broadcasting / forward piping)
 The changes required to implement this are
   - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a 
 long, and the segment size. It is initialized from a DirectByteBuffer.
   - Allow the MemoryManager to allocate these MemorySegments, instead of the 
 current ones.
   - Make sure that the startup script pick up the mode and configure the heap 
 size and the max direct memory properly.
 Since the MemorySegment is probably the most performance critical class in 
 Flink, we must take care that we do this right. The following are critical 
 considerations:
   - If we want both solutions (heap and off-heap) to exist side-by-side 
 (configurable), we must make the base MemorySegment abstract and implement 
 two versions (heap and off-heap).
   - To get the best performance, we need to make sure that only one class 
 gets loaded (or at least ever used), to ensure optimal JIT de-virtualization 
 and inlining.
   - We should carefully measure the performance of both variants. From 
 previous micro benchmarks, I remember that individual byte accesses in 
 DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger 
 accesses were equally good or slightly better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1320) Add an off-heap variant of the managed memory

2015-04-17 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500113#comment-14500113
 ] 

niraj rai commented on FLINK-1320:
--

Hi Henry,

Sorry for breaking protocol of the project. Will keep it in mind in future.
Niraj

 Add an off-heap variant of the managed memory
 -

 Key: FLINK-1320
 URL: https://issues.apache.org/jira/browse/FLINK-1320
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Stephan Ewen
Assignee: niraj rai
Priority: Minor

 For (nearly) all memory that Flink accumulates (in the form of sort buffers, 
 hash tables, caching), we use a special way of representing data serialized 
 across a set of memory pages. The big work lies in the way the algorithms are 
 implemented to operate on pages, rather than on objects.
 The core class for the memory is the {{MemorySegment}}, which has all methods 
 to set and get primitives values efficiently. It is a somewhat simpler (and 
 faster) variant of a HeapByteBuffer.
 As such, it should be straightforward to create a version where the memory 
 segment is not backed by a heap byte[], but by memory allocated outside the 
 JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct 
 buffers do it.
 This may have multiple advantages:
   - We reduce the size of the JVM heap (garbage collected) and the number and 
 size of long living alive objects. For large JVM sizes, this may improve 
 performance quite a bit. Utilmately, we would in many cases reduce JVM size 
 to 1/3 to 1/2 and keep the remaining memory outside the JVM.
   - We save copies when we move memory pages to disk (spilling) or through 
 the network (shuffling / broadcasting / forward piping)
 The changes required to implement this are
   - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a 
 long, and the segment size. It is initialized from a DirectByteBuffer.
   - Allow the MemoryManager to allocate these MemorySegments, instead of the 
 current ones.
   - Make sure that the startup script pick up the mode and configure the heap 
 size and the max direct memory properly.
 Since the MemorySegment is probably the most performance critical class in 
 Flink, we must take care that we do this right. The following are critical 
 considerations:
   - If we want both solutions (heap and off-heap) to exist side-by-side 
 (configurable), we must make the base MemorySegment abstract and implement 
 two versions (heap and off-heap).
   - To get the best performance, we need to make sure that only one class 
 gets loaded (or at least ever used), to ensure optimal JIT de-virtualization 
 and inlining.
   - We should carefully measure the performance of both variants. From 
 previous micro benchmarks, I remember that individual byte accesses in 
 DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger 
 accesses were equally good or slightly better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-992) Create CollectionDataSets by reading (client) local files.

2015-04-15 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496378#comment-14496378
 ] 

niraj rai commented on FLINK-992:
-

Hi, if no one is working, I can work on this.

 Create CollectionDataSets by reading (client) local files.
 --

 Key: FLINK-992
 URL: https://issues.apache.org/jira/browse/FLINK-992
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Python API, Scala API
Reporter: Fabian Hueske
Assignee: Henry Saputra
Priority: Minor
  Labels: starter

 {{CollectionDataSets}} are a nice way to feed data into programs.
 We could add support to read a client-local file at program construction time 
 using a FileInputFormat, put its data into a CollectionDataSet, and ship its 
 data together with the program.
 This would remove the need to upload small files into DFS which are used 
 together with some large input (stored in DFS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1818) Provide API to cancel running job

2015-04-15 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai reassigned FLINK-1818:


Assignee: niraj rai

 Provide API to cancel running job
 -

 Key: FLINK-1818
 URL: https://issues.apache.org/jira/browse/FLINK-1818
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: niraj rai
  Labels: starter

 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-992) Create CollectionDataSets by reading (client) local files.

2015-04-15 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai reassigned FLINK-992:
---

Assignee: niraj rai  (was: Henry Saputra)

 Create CollectionDataSets by reading (client) local files.
 --

 Key: FLINK-992
 URL: https://issues.apache.org/jira/browse/FLINK-992
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Python API, Scala API
Reporter: Fabian Hueske
Assignee: niraj rai
Priority: Minor
  Labels: starter

 {{CollectionDataSets}} are a nice way to feed data into programs.
 We could add support to read a client-local file at program construction time 
 using a FileInputFormat, put its data into a CollectionDataSet, and ship its 
 data together with the program.
 This would remove the need to upload small files into DFS which are used 
 together with some large input (stored in DFS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)