Re: [DISCUSS] Mnemonic incubator proposal

2016-02-29 Thread Henry Saputra
Looks like the discussions had calmed down with no objection so I think we
could proceed with the VOTE thread.

- Henry

On Wed, Feb 24, 2016 at 12:15 PM, Wang, Yanping <yanping.w...@intel.com>
wrote:

> In general, Mnemonic can be integrated into many projects. First, projects
> can use Mnemonic to take data off Java heap so GC can be much reduced, and
> use GET/SET to access data fields so serdes can be eliminated.
> Later we can expand Mnemonic to excise persistent/non-volatile programming
> on large scaled distributed systems with TB sized fast persistent memory
> devices.
>
> Regarding solving Hadoop Namenode pressure of large scale of cluster
> scenarios. This issue is due to HDFS.
> Last year we found the use of FileInputStream in HDFS causes unpredicted
> long Garbage Collection pauses due to the overhead of finalizers and
> significantly impacted HDFS performance and its scalability. We recorded
> the issue in https://issues.apache.org/jira/browse/HDFS-8562
>
> Uma explained what we can do for using Mnemonic to improve HDFS
> performance and scalability. One big advantage is Mnemonic does not need to
> hold File System cache for random access, which will benefit large scale of
> clusters.
>
> Thanks
> yanping
>
>
>
> -Original Message-
> From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com]
> Sent: Tuesday, February 23, 2016 8:06 PM
> To: general@incubator.apache.org
> Subject: Re: [DISCUSS] Mnemonic incubator proposal
>
> Hi Liang,
>
> Thank you for your interest. Sure we would consider you adding in
> interested contributors list.
>
> >Mnemonic is trying to solve performance issues associated
> with serialization/deserialization of java object when dealing with JVM &
> disk directly as well as GC pressure caused by caching ?
>
> Yes.
>
> >whether Mnemonic could solve Hadoop Namenode pressure of large
> scale of cluster scenaros, or not?
> Yeah, we are thinking on some aspects considering memory and GC overheads
> in Namenode too.
> Example couple of JIRAs already there in HDFS to move some of data
> structure to off heap. So, we had plans to get the standard data structures
> from this library and can make use of them push.
> Also we could make advantage if persistence here.
>
>
> @Yanping/Gary, may be you could add more points if you have?
>
> [Gary] Thanks Uma, in addition, you can plug-in your special allocators
> that could be optimized for namenode usage patterns, by this way, the
> performance could be better and more predictable. Thanks.
>
> Regards,
> Uma
>
> On 2/23/16, 6:48 PM, "Liang Chen" <chenliang...@huawei.com> wrote:
>
> >Interesting, would love to become the contributor
> >
> >My understanding: Mnemonic is trying to solve performance issues
> >associated with serialization/deserialization of java object when
> >dealing with JVM & disk directly as well as GC pressure caused by
> >caching ?
> >
> >one question: whether Mnemonic could solve Hadoop Namenode pressure of
> >large scale of cluster scenaros, or not?
> >
> >
> >
> >--
> >View this message in context:
> >http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-i
> >ncu
> >bator-proposal-tp48502p48533.html
> >Sent from the Apache Incubator - General mailing list archive at
> >Nabble.com.
> >
> >-
> >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


RE: [DISCUSS] Mnemonic incubator proposal

2016-02-24 Thread Wang, Yanping
In general, Mnemonic can be integrated into many projects. First, projects can 
use Mnemonic to take data off Java heap so GC can be much reduced, and use 
GET/SET to access data fields so serdes can be eliminated. 
Later we can expand Mnemonic to excise persistent/non-volatile programming on 
large scaled distributed systems with TB sized fast persistent memory devices.

Regarding solving Hadoop Namenode pressure of large scale of cluster scenarios. 
This issue is due to HDFS.
Last year we found the use of FileInputStream in HDFS causes unpredicted long 
Garbage Collection pauses due to the overhead of finalizers and significantly 
impacted HDFS performance and its scalability. We recorded the issue in 
https://issues.apache.org/jira/browse/HDFS-8562 

Uma explained what we can do for using Mnemonic to improve HDFS performance and 
scalability. One big advantage is Mnemonic does not need to hold File System 
cache for random access, which will benefit large scale of clusters. 

Thanks
yanping



-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Tuesday, February 23, 2016 8:06 PM
To: general@incubator.apache.org
Subject: Re: [DISCUSS] Mnemonic incubator proposal

Hi Liang,

Thank you for your interest. Sure we would consider you adding in interested 
contributors list.
 
>Mnemonic is trying to solve performance issues associated
with serialization/deserialization of java object when dealing with JVM & disk 
directly as well as GC pressure caused by caching ?

Yes.

>whether Mnemonic could solve Hadoop Namenode pressure of large
scale of cluster scenaros, or not?
Yeah, we are thinking on some aspects considering memory and GC overheads in 
Namenode too.
Example couple of JIRAs already there in HDFS to move some of data structure to 
off heap. So, we had plans to get the standard data structures from this 
library and can make use of them push.
Also we could make advantage if persistence here.


@Yanping/Gary, may be you could add more points if you have?

[Gary] Thanks Uma, in addition, you can plug-in your special allocators that 
could be optimized for namenode usage patterns, by this way, the performance 
could be better and more predictable. Thanks.

Regards,
Uma

On 2/23/16, 6:48 PM, "Liang Chen" <chenliang...@huawei.com> wrote:

>Interesting, would love to become the contributor
>
>My understanding: Mnemonic is trying to solve performance issues 
>associated with serialization/deserialization of java object when 
>dealing with JVM & disk directly as well as GC pressure caused by 
>caching ?
>
>one question: whether Mnemonic could solve Hadoop Namenode pressure of 
>large scale of cluster scenaros, or not?
>
>
>
>--
>View this message in context:
>http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-i
>ncu
>bator-proposal-tp48502p48533.html
>Sent from the Apache Incubator - General mailing list archive at 
>Nabble.com.
>
>-
>To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>For additional commands, e-mail: general-h...@incubator.apache.org
>


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: [DISCUSS] Mnemonic incubator proposal

2016-02-23 Thread Wang, Gang1
Hi Liang

   please find my response inline below

Best Regards
Gary (Wang, Gang), PMP(r), CMMI(r) Appraiser, ITIL(r) Foundation
NRDC: Donate (Natural Resources Defense Council)


-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Tuesday, February 23, 2016 8:06 PM
To: general@incubator.apache.org
Subject: Re: [DISCUSS] Mnemonic incubator proposal

Hi Liang,

Thank you for your interest. Sure we would consider you adding in interested 
contributors list.
 
>Mnemonic is trying to solve performance issues associated
with serialization/deserialization of java object when dealing with JVM & disk 
directly as well as GC pressure caused by caching ?

Yes.

>whether Mnemonic could solve Hadoop Namenode pressure of large
scale of cluster scenaros, or not?
Yeah, we are thinking on some aspects considering memory and GC overheads in 
Namenode too.
Example couple of JIRAs already there in HDFS to move some of data structure to 
off heap. So, we had plans to get the standard data structures from this 
library and can make use of them push.
Also we could make advantage if persistence here.


@Yanping/Gary, may be you could add more points if you have?

[Gary] Thanks Uma, in addition, you can plug-in your special allocators that 
could be optimized for namenode usage patterns, by this way, the performance 
could be better and more predictable. Thanks.

Regards,
Uma

On 2/23/16, 6:48 PM, "Liang Chen" <chenliang...@huawei.com> wrote:

>Interesting, would love to become the contributor
>
>My understanding: Mnemonic is trying to solve performance issues 
>associated with serialization/deserialization of java object when 
>dealing with JVM & disk directly as well as GC pressure caused by 
>caching ?
>
>one question: whether Mnemonic could solve Hadoop Namenode pressure of 
>large scale of cluster scenaros, or not?
>
>
>
>--
>View this message in context:
>http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-i
>ncu
>bator-proposal-tp48502p48533.html
>Sent from the Apache Incubator - General mailing list archive at 
>Nabble.com.
>
>-
>To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>For additional commands, e-mail: general-h...@incubator.apache.org
>


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Mnemonic incubator proposal

2016-02-23 Thread Gangumalla, Uma
Hi Liang,

Thank you for your interest. Sure we would consider you adding in
interested contributors list.
 
>Mnemonic is trying to solve performance issues associated
with serialization/deserialization of java object when dealing with JVM &
disk directly as well as GC pressure caused by caching ?

Yes.

>whether Mnemonic could solve Hadoop Namenode pressure of large
scale of cluster scenaros, or not?
Yeah, we are thinking on some aspects considering memory and GC overheads
in Namenode too.
Example couple of JIRAs already there in HDFS to move some of data
structure to off heap. So, we had plans to get the standard data
structures from this library and can make use of them push.
Also we could make advantage if persistence here.


@Yanping/Gary, may be you could add more points if you have?

Regards,
Uma

On 2/23/16, 6:48 PM, "Liang Chen"  wrote:

>Interesting, would love to become the contributor
>
>My understanding: Mnemonic is trying to solve performance issues
>associated
>with serialization/deserialization of java object when dealing with JVM &
>disk directly as well as GC pressure caused by caching ?
>
>one question: whether Mnemonic could solve Hadoop Namenode pressure of
>large
>scale of cluster scenaros, or not?
>
>
>
>--
>View this message in context:
>http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-incu
>bator-proposal-tp48502p48533.html
>Sent from the Apache Incubator - General mailing list archive at
>Nabble.com.
>
>-
>To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>For additional commands, e-mail: general-h...@incubator.apache.org
>


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Mnemonic incubator proposal

2016-02-23 Thread Liang Chen
Interesting, would love to become the contributor

My understanding: Mnemonic is trying to solve performance issues associated
with serialization/deserialization of java object when dealing with JVM &
disk directly as well as GC pressure caused by caching ?

one question: whether Mnemonic could solve Hadoop Namenode pressure of large
scale of cluster scenaros, or not? 



--
View this message in context: 
http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-incubator-proposal-tp48502p48533.html
Sent from the Apache Incubator - General mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: [DISCUSS] Mnemonic incubator proposal

2016-02-22 Thread Wang, Gang1
Hey Ted, 

   I'm working with Yanping in same Big Data Technologies U.S R team. Let me 
try to explain the potential possible way to leverage the power of Arrow for 
Mnemonic.

   Our initial thoughts about the connection with Arrow are Mnemonic could 
directly provide Arrow based collections for generic non-volatile objects, 
developers could apply SIMD operations on those collections for high 
performance processing. 

   In other possible case, some customized object graphs could take benefits 
from Arrow if Mnemonic provides Arrow specific tags to hint pluggable Arrow 
featured allocators, an Arrow featured  allocators could aggregate same type of 
non-volatile objects from customized object graphs according to the hint of 
Arrow tags. 

   We think the Arrow tag could also be marked on some non-volatile fields of 
different type of objects for SIMD friendly operations. Thanks.

Best Regards
Gary (Wang, Gang), PMP®, CMMI® Appraiser, ITIL® Foundation
NRDC: Donate (Natural Resources Defense Council)


-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com] 
Sent: Monday, February 22, 2016 8:53 PM
To: general@incubator.apache.org
Subject: Re: [DISCUSS] Mnemonic incubator proposal

Yanping,

Could you explain some of the ways that you see that Arrow might be useful to 
Mnemonic and how Mnemonic extends the generality of Arrow (probably with 
performance consequences)?



On Mon, Feb 22, 2016 at 4:43 PM, Jacques Nadeau <jacq...@apache.org> wrote:

> Hey YanPing,
>
> This addition is nice to see. I agree that there is great opportunity 
> for the Arrow and Mnemonic communities to collaborate. I look forward 
> to working together.
>
> Jacques
>
> On Mon, Feb 22, 2016 at 3:01 PM, Wang, Yanping 
> <yanping.w...@intel.com>
> wrote:
>
> > Hi, All
> >
> > Based on feedback, we added following into Mnemonic proposal:
> >
> >  Relationships with Other Apache Product 
> > + Relationship with Apache™ Arrow:
> > + Arrow's columnar data layout allows great use of CPU caches & 
> > + SIMD. It
> > places all data that relevant to a column operation in a compact 
> > format
> in
> > memory.
> > +
> > + Mnemonic directly puts the whole business object graphs on 
> > + external
> > heterogeneous storage media, e.g. off-heap, SSD. It is not necessary 
> > to normalize the structures of object graphs for caching, checkpoint 
> > or storing. It doesn’t require developers to normalize their data 
> > object graphs. Mnemonic applications can avoid indexing & join 
> > datasets compared to traditional approaches.
> > +
> > + Mnemonic can leverage Arrow to transparently re-layout qualified 
> > + data
> > objects or create special containers that is able to efficiently 
> > hold
> those
> > data records in columnar form as one of major performance 
> > optimization constructs.
> > +
> >
> > Thanks
> > Yanping
> >
> > -Original Message-
> > From: Wang, Yanping [mailto:yanping.w...@intel.com]
> > Sent: Sunday, February 21, 2016 11:47 AM
> > To: general@incubator.apache.org
> > Subject: [DISCUSS] Mnemonic incubator proposal
> >
> > Hi all
> >
> > We'd like to start a discussion regarding a proposal to submit 
> > Mnemonic
> to
> > the Apache Incubator.
> >
> > The proposal text is available on the Wiki here:
> > https://wiki.apache.org/incubator/MnemonicProposal
> >
> > and pasted below for convenience.
> >
> > We are excited to make this proposal, and look forward to the 
> > community's input!
> >
> > Best,
> > Yanping
> >
> >
> > = Mnemonic Proposal =
> > === Abstract ===
> > Mnemonic is a Java based non-volatile memory library for in-place 
> > structured data processing and computing. It is a solution for 
> > generic object and block persistence on heterogeneous block and 
> > byte-addressable devices, such as DRAM, persistent memory, NVMe, 
> > SSD, and cloud network storage.
> >
> > === Proposal ===
> > Mnemonic is a structured data persistence in-memory in-place library 
> > for Java-based applications and frameworks. It provides unified 
> > interfaces
> for
> > data manipulation on heterogeneous block/byte-addressable devices, 
> > such
> as
> > DRAM, persistent memory, NVMe, SSD, and cloud network devices.
> >
> > The design motivation for this project is to create a non-volatile 
> > programming paradigm for in-memory data object persistence, 
> > in-memory
> data
> > objects caching, and JNI-less IPC.
> > Mnemonic simplifies the usage of data object caching, per

RE: [DISCUSS] Mnemonic incubator proposal

2016-02-22 Thread Wang, Yanping
Hi, All

I uploaded a PDF presentation that describes Project Mnemonic with some nice 
pictures.
Click Attachment link below to see the presentation. 

Attachment name: Project_Mnemonic_Pub1.0.pdf
Attachment size: 1493317
Attachment link: 
https://wiki.apache.org/incubator/MnemonicProposal?action=AttachFile=get=Project_Mnemonic_Pub1.0.pdf
 

Page link: https://wiki.apache.org/incubator/MnemonicProposal 

Thanks
Yanping

-Original Message-
From: Wang, Yanping [mailto:yanping.w...@intel.com] 
Sent: Sunday, February 21, 2016 11:47 AM
To: general@incubator.apache.org
Subject: [DISCUSS] Mnemonic incubator proposal 

Hi all 

We'd like to start a discussion regarding a proposal to submit Mnemonic to the 
Apache Incubator.

The proposal text is available on the Wiki here:
https://wiki.apache.org/incubator/MnemonicProposal

and pasted below for convenience.

We are excited to make this proposal, and look forward to the community's input!

Best,
Yanping


= Mnemonic Proposal =
=== Abstract ===
Mnemonic is a Java based non-volatile memory library for in-place structured 
data processing and computing. It is a solution for generic object and block 
persistence on heterogeneous block and byte-addressable devices, such as DRAM, 
persistent memory, NVMe, SSD, and cloud network storage.

=== Proposal ===
Mnemonic is a structured data persistence in-memory in-place library for 
Java-based applications and frameworks. It provides unified interfaces for data 
manipulation on heterogeneous block/byte-addressable devices, such as DRAM, 
persistent memory, NVMe, SSD, and cloud network devices.

The design motivation for this project is to create a non-volatile programming 
paradigm for in-memory data object persistence, in-memory data objects caching, 
and JNI-less IPC.
Mnemonic simplifies the usage of data object caching, persistence, and JNI-less 
IPC for massive object oriented structural datasets.

Mnemonic defines Non-Volatile Java objects that store data fields in persistent 
memory and storage. During the program runtime, only methods and volatile 
fields are instantiated in Java heap, Non-Volatile data fields are directly 
accessed via GET/SET operation to and from persistent memory and storage. 
Mnemonic avoids SerDes and significantly reduces amount of garbage in Java heap.

Major features of Mnemonic:
* Provides an abstract level of viewpoint to utilize heterogeneous 
block/byte-addressable device as a whole (e.g., DRAM, persistent memory, NVMe, 
SSD, HD, cloud network Storage).
* Provides seamless support object oriented design and programming without 
adding burden to transfer object data to different form.
* Avoids the object data serialization/de-serialization for data retrieval, 
caching and storage.
* Reduces the consumption of on-heap memory and in turn to reduce and stabilize 
Java Garbage Collection (GC) pauses for latency sensitive applications.
* Overcomes current limitations of Java GC to manage much larger memory 
resources for massive dataset processing and computing.
* Supports the migration data usage model from traditional NVMe/SSD/HD to 
non-volatile memory with ease.
* Uses lazy loading mechanism to avoid unnecessary memory consumption if some 
data does not need to use for computing immediately.
* Bypasses JNI call for the interaction between Java runtime application and 
its native code.
* Provides an allocation aware auto-reclaim mechanism to prevent external 
memory resource leaking.


=== Background ===
Big Data and Cloud applications increasingly require both high throughput and 
low latency processing. Java-based applications targeting the Big Data and 
Cloud space should be tuned for better throughput, lower latency, and more 
predictable response time.
Typically, there are some issues that impact BigData applications' performance 
and scalability:

1) The Complexity of Data Transformation/Organization: In most cases, during 
data processing, applications use their own complicated data caching mechanism 
for SerDes data objects, spilling to different storage and eviction large 
amount of data. Some data objects contains complex values and structure that 
will make it much more difficulty for data organization. To load and then 
parse/decode its datasets from storage consumes high system resource and 
computation power. 

2) Lack of Caching, Burst Temporary Object Creation/Destruction Causes Frequent 
Long GC Pauses: Big Data computing/syntax generates large amount of temporary 
objects during processing, e.g. lambda, SerDes, copying and etc. This will 
trigger frequent long Java GC pause to scan references, to update references 
lists, and to copy live objects from one memory location to another blindly.

3) The Unpredictable GC Pause: For latency sensitive applications, such as 
database, search engine, web query, real-time/streaming computing, require 
latency/request-response under control. But current Java GC does not provide 
predictable GC activities with large on-heap 

RE: [DISCUSS] Mnemonic incubator proposal

2016-02-22 Thread Wang, Yanping
Yes, Jacques, it is exciting to see Arrow and Mnemonic can leverage each other.
I looked at Apache Drill today. I think Drill can use Mnemonic to optimize 
scalable data sources.

So the idea is, Mnemonic takes Arrow as a columnar data construct or collection 
that optimized from memory to CPU cache. Then Drill can use Arrow integrated 
Mnemonic to access cross distributed systems storage media for scalable data 
sources.

Drill  +  (Mnemonic (Arrow)) Integration => Optimize entire data access chains 
from distributed storage media to CPU cache.

Definitely looking forward to working together.

Best,
Yanping


-Original Message-
From: Jacques Nadeau [mailto:jacq...@apache.org] 
Sent: Monday, February 22, 2016 4:43 PM
To: general@incubator.apache.org
Subject: Re: [DISCUSS] Mnemonic incubator proposal

Hey YanPing,

This addition is nice to see. I agree that there is great opportunity for
the Arrow and Mnemonic communities to collaborate. I look forward to
working together.

Jacques

On Mon, Feb 22, 2016 at 3:01 PM, Wang, Yanping <yanping.w...@intel.com>
wrote:

> Hi, All
>
> Based on feedback, we added following into Mnemonic proposal:
>
>  Relationships with Other Apache Product 
> + Relationship with Apache™ Arrow:
> + Arrow's columnar data layout allows great use of CPU caches & SIMD. It
> places all data that relevant to a column operation in a compact format in
> memory.
> +
> + Mnemonic directly puts the whole business object graphs on external
> heterogeneous storage media, e.g. off-heap, SSD. It is not necessary to
> normalize the structures of object graphs for caching, checkpoint or
> storing. It doesn’t require developers to normalize their data object
> graphs. Mnemonic applications can avoid indexing & join datasets compared
> to traditional approaches.
> +
> + Mnemonic can leverage Arrow to transparently re-layout qualified data
> objects or create special containers that is able to efficiently hold those
> data records in columnar form as one of major performance optimization
> constructs.
> +
>
> Thanks
> Yanping
>
> -Original Message-
> From: Wang, Yanping [mailto:yanping.w...@intel.com]
> Sent: Sunday, February 21, 2016 11:47 AM
> To: general@incubator.apache.org
> Subject: [DISCUSS] Mnemonic incubator proposal
>
> Hi all
>
> We'd like to start a discussion regarding a proposal to submit Mnemonic to
> the Apache Incubator.
>
> The proposal text is available on the Wiki here:
> https://wiki.apache.org/incubator/MnemonicProposal
>
> and pasted below for convenience.
>
> We are excited to make this proposal, and look forward to the community's
> input!
>
> Best,
> Yanping
>
>
> = Mnemonic Proposal =
> === Abstract ===
> Mnemonic is a Java based non-volatile memory library for in-place
> structured data processing and computing. It is a solution for generic
> object and block persistence on heterogeneous block and byte-addressable
> devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network
> storage.
>
> === Proposal ===
> Mnemonic is a structured data persistence in-memory in-place library for
> Java-based applications and frameworks. It provides unified interfaces for
> data manipulation on heterogeneous block/byte-addressable devices, such as
> DRAM, persistent memory, NVMe, SSD, and cloud network devices.
>
> The design motivation for this project is to create a non-volatile
> programming paradigm for in-memory data object persistence, in-memory data
> objects caching, and JNI-less IPC.
> Mnemonic simplifies the usage of data object caching, persistence, and
> JNI-less IPC for massive object oriented structural datasets.
>
> Mnemonic defines Non-Volatile Java objects that store data fields in
> persistent memory and storage. During the program runtime, only methods and
> volatile fields are instantiated in Java heap, Non-Volatile data fields are
> directly accessed via GET/SET operation to and from persistent memory and
> storage. Mnemonic avoids SerDes and significantly reduces amount of garbage
> in Java heap.
>
> Major features of Mnemonic:
> * Provides an abstract level of viewpoint to utilize heterogeneous
> block/byte-addressable device as a whole (e.g., DRAM, persistent memory,
> NVMe, SSD, HD, cloud network Storage).
> * Provides seamless support object oriented design and programming without
> adding burden to transfer object data to different form.
> * Avoids the object data serialization/de-serialization for data
> retrieval, caching and storage.
> * Reduces the consumption of on-heap memory and in turn to reduce and
> stabilize Java Garbage Collection (GC) pauses for latency sensitive
> applications.
> * Overcomes current limitations of Java GC to manage much 

Re: [DISCUSS] Mnemonic incubator proposal

2016-02-22 Thread Ted Dunning
Yanping,

Could you explain some of the ways that you see that Arrow might be useful
to Mnemonic and how Mnemonic extends the generality of Arrow (probably with
performance consequences)?



On Mon, Feb 22, 2016 at 4:43 PM, Jacques Nadeau  wrote:

> Hey YanPing,
>
> This addition is nice to see. I agree that there is great opportunity for
> the Arrow and Mnemonic communities to collaborate. I look forward to
> working together.
>
> Jacques
>
> On Mon, Feb 22, 2016 at 3:01 PM, Wang, Yanping 
> wrote:
>
> > Hi, All
> >
> > Based on feedback, we added following into Mnemonic proposal:
> >
> >  Relationships with Other Apache Product 
> > + Relationship with Apache™ Arrow:
> > + Arrow's columnar data layout allows great use of CPU caches & SIMD. It
> > places all data that relevant to a column operation in a compact format
> in
> > memory.
> > +
> > + Mnemonic directly puts the whole business object graphs on external
> > heterogeneous storage media, e.g. off-heap, SSD. It is not necessary to
> > normalize the structures of object graphs for caching, checkpoint or
> > storing. It doesn’t require developers to normalize their data object
> > graphs. Mnemonic applications can avoid indexing & join datasets compared
> > to traditional approaches.
> > +
> > + Mnemonic can leverage Arrow to transparently re-layout qualified data
> > objects or create special containers that is able to efficiently hold
> those
> > data records in columnar form as one of major performance optimization
> > constructs.
> > +
> >
> > Thanks
> > Yanping
> >
> > -Original Message-
> > From: Wang, Yanping [mailto:yanping.w...@intel.com]
> > Sent: Sunday, February 21, 2016 11:47 AM
> > To: general@incubator.apache.org
> > Subject: [DISCUSS] Mnemonic incubator proposal
> >
> > Hi all
> >
> > We'd like to start a discussion regarding a proposal to submit Mnemonic
> to
> > the Apache Incubator.
> >
> > The proposal text is available on the Wiki here:
> > https://wiki.apache.org/incubator/MnemonicProposal
> >
> > and pasted below for convenience.
> >
> > We are excited to make this proposal, and look forward to the community's
> > input!
> >
> > Best,
> > Yanping
> >
> >
> > = Mnemonic Proposal =
> > === Abstract ===
> > Mnemonic is a Java based non-volatile memory library for in-place
> > structured data processing and computing. It is a solution for generic
> > object and block persistence on heterogeneous block and byte-addressable
> > devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network
> > storage.
> >
> > === Proposal ===
> > Mnemonic is a structured data persistence in-memory in-place library for
> > Java-based applications and frameworks. It provides unified interfaces
> for
> > data manipulation on heterogeneous block/byte-addressable devices, such
> as
> > DRAM, persistent memory, NVMe, SSD, and cloud network devices.
> >
> > The design motivation for this project is to create a non-volatile
> > programming paradigm for in-memory data object persistence, in-memory
> data
> > objects caching, and JNI-less IPC.
> > Mnemonic simplifies the usage of data object caching, persistence, and
> > JNI-less IPC for massive object oriented structural datasets.
> >
> > Mnemonic defines Non-Volatile Java objects that store data fields in
> > persistent memory and storage. During the program runtime, only methods
> and
> > volatile fields are instantiated in Java heap, Non-Volatile data fields
> are
> > directly accessed via GET/SET operation to and from persistent memory and
> > storage. Mnemonic avoids SerDes and significantly reduces amount of
> garbage
> > in Java heap.
> >
> > Major features of Mnemonic:
> > * Provides an abstract level of viewpoint to utilize heterogeneous
> > block/byte-addressable device as a whole (e.g., DRAM, persistent memory,
> > NVMe, SSD, HD, cloud network Storage).
> > * Provides seamless support object oriented design and programming
> without
> > adding burden to transfer object data to different form.
> > * Avoids the object data serialization/de-serialization for data
> > retrieval, caching and storage.
> > * Reduces the consumption of on-heap memory and in turn to reduce and
> > stabilize Java Garbage Collection (GC) pauses for latency sensitive
> > applications.
> > * Overcomes current limitations of Java GC to manage much larger memory
> > resources for massive dataset processing and computing.
> > * Supports the migration data usage model from traditional NVMe/SSD/HD to
> > non-volatile memory with ease.
> > * Uses lazy loading mechanism to avoid unnecessary memory consumption if
> > some data does not need to use for computing immediately.
> > * Bypasses JNI call for the interaction between Java runtime application
> > and its native code.
> > * Provides an allocation aware auto-reclaim mechanism to prevent external
> > memory resource leaking.
> >
> >
> > === Background ===
> > Big Data and Cloud applications increasingly require 

Re: [DISCUSS] Mnemonic incubator proposal

2016-02-22 Thread Jacques Nadeau
Hey YanPing,

This addition is nice to see. I agree that there is great opportunity for
the Arrow and Mnemonic communities to collaborate. I look forward to
working together.

Jacques

On Mon, Feb 22, 2016 at 3:01 PM, Wang, Yanping 
wrote:

> Hi, All
>
> Based on feedback, we added following into Mnemonic proposal:
>
>  Relationships with Other Apache Product 
> + Relationship with Apache™ Arrow:
> + Arrow's columnar data layout allows great use of CPU caches & SIMD. It
> places all data that relevant to a column operation in a compact format in
> memory.
> +
> + Mnemonic directly puts the whole business object graphs on external
> heterogeneous storage media, e.g. off-heap, SSD. It is not necessary to
> normalize the structures of object graphs for caching, checkpoint or
> storing. It doesn’t require developers to normalize their data object
> graphs. Mnemonic applications can avoid indexing & join datasets compared
> to traditional approaches.
> +
> + Mnemonic can leverage Arrow to transparently re-layout qualified data
> objects or create special containers that is able to efficiently hold those
> data records in columnar form as one of major performance optimization
> constructs.
> +
>
> Thanks
> Yanping
>
> -Original Message-
> From: Wang, Yanping [mailto:yanping.w...@intel.com]
> Sent: Sunday, February 21, 2016 11:47 AM
> To: general@incubator.apache.org
> Subject: [DISCUSS] Mnemonic incubator proposal
>
> Hi all
>
> We'd like to start a discussion regarding a proposal to submit Mnemonic to
> the Apache Incubator.
>
> The proposal text is available on the Wiki here:
> https://wiki.apache.org/incubator/MnemonicProposal
>
> and pasted below for convenience.
>
> We are excited to make this proposal, and look forward to the community's
> input!
>
> Best,
> Yanping
>
>
> = Mnemonic Proposal =
> === Abstract ===
> Mnemonic is a Java based non-volatile memory library for in-place
> structured data processing and computing. It is a solution for generic
> object and block persistence on heterogeneous block and byte-addressable
> devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network
> storage.
>
> === Proposal ===
> Mnemonic is a structured data persistence in-memory in-place library for
> Java-based applications and frameworks. It provides unified interfaces for
> data manipulation on heterogeneous block/byte-addressable devices, such as
> DRAM, persistent memory, NVMe, SSD, and cloud network devices.
>
> The design motivation for this project is to create a non-volatile
> programming paradigm for in-memory data object persistence, in-memory data
> objects caching, and JNI-less IPC.
> Mnemonic simplifies the usage of data object caching, persistence, and
> JNI-less IPC for massive object oriented structural datasets.
>
> Mnemonic defines Non-Volatile Java objects that store data fields in
> persistent memory and storage. During the program runtime, only methods and
> volatile fields are instantiated in Java heap, Non-Volatile data fields are
> directly accessed via GET/SET operation to and from persistent memory and
> storage. Mnemonic avoids SerDes and significantly reduces amount of garbage
> in Java heap.
>
> Major features of Mnemonic:
> * Provides an abstract level of viewpoint to utilize heterogeneous
> block/byte-addressable device as a whole (e.g., DRAM, persistent memory,
> NVMe, SSD, HD, cloud network Storage).
> * Provides seamless support object oriented design and programming without
> adding burden to transfer object data to different form.
> * Avoids the object data serialization/de-serialization for data
> retrieval, caching and storage.
> * Reduces the consumption of on-heap memory and in turn to reduce and
> stabilize Java Garbage Collection (GC) pauses for latency sensitive
> applications.
> * Overcomes current limitations of Java GC to manage much larger memory
> resources for massive dataset processing and computing.
> * Supports the migration data usage model from traditional NVMe/SSD/HD to
> non-volatile memory with ease.
> * Uses lazy loading mechanism to avoid unnecessary memory consumption if
> some data does not need to use for computing immediately.
> * Bypasses JNI call for the interaction between Java runtime application
> and its native code.
> * Provides an allocation aware auto-reclaim mechanism to prevent external
> memory resource leaking.
>
>
> === Background ===
> Big Data and Cloud applications increasingly require both high throughput
> and low latency processing. Java-based applications targeting the Big Data
> and Cloud space should be tuned for better throughput, lower latency, and
> more predictable response time.
> Typically, there are some issues that impact BigData applications'
> performance and scalability:
>
> 1) The Complexity of Data Transformation/Organization: In most cases,
> during data processing, applications use their own complicated data caching
> mechanism for SerDes data objects, spilling to 

RE: [DISCUSS] Mnemonic incubator proposal

2016-02-22 Thread Wang, Yanping
Hi, All

Based on feedback, we added following into Mnemonic proposal:

 Relationships with Other Apache Product 
+ Relationship with Apache™ Arrow: 
+ Arrow's columnar data layout allows great use of CPU caches & SIMD. It places 
all data that relevant to a column operation in a compact format in memory. 
+ 
+ Mnemonic directly puts the whole business object graphs on external 
heterogeneous storage media, e.g. off-heap, SSD. It is not necessary to 
normalize the structures of object graphs for caching, checkpoint or storing. 
It doesn’t require developers to normalize their data object graphs. Mnemonic 
applications can avoid indexing & join datasets compared to traditional 
approaches.
+  
+ Mnemonic can leverage Arrow to transparently re-layout qualified data objects 
or create special containers that is able to efficiently hold those data 
records in columnar form as one of major performance optimization constructs.
+

Thanks
Yanping

-Original Message-
From: Wang, Yanping [mailto:yanping.w...@intel.com] 
Sent: Sunday, February 21, 2016 11:47 AM
To: general@incubator.apache.org
Subject: [DISCUSS] Mnemonic incubator proposal 

Hi all 

We'd like to start a discussion regarding a proposal to submit Mnemonic to the 
Apache Incubator.

The proposal text is available on the Wiki here:
https://wiki.apache.org/incubator/MnemonicProposal

and pasted below for convenience.

We are excited to make this proposal, and look forward to the community's input!

Best,
Yanping


= Mnemonic Proposal =
=== Abstract ===
Mnemonic is a Java based non-volatile memory library for in-place structured 
data processing and computing. It is a solution for generic object and block 
persistence on heterogeneous block and byte-addressable devices, such as DRAM, 
persistent memory, NVMe, SSD, and cloud network storage.

=== Proposal ===
Mnemonic is a structured data persistence in-memory in-place library for 
Java-based applications and frameworks. It provides unified interfaces for data 
manipulation on heterogeneous block/byte-addressable devices, such as DRAM, 
persistent memory, NVMe, SSD, and cloud network devices.

The design motivation for this project is to create a non-volatile programming 
paradigm for in-memory data object persistence, in-memory data objects caching, 
and JNI-less IPC.
Mnemonic simplifies the usage of data object caching, persistence, and JNI-less 
IPC for massive object oriented structural datasets.

Mnemonic defines Non-Volatile Java objects that store data fields in persistent 
memory and storage. During the program runtime, only methods and volatile 
fields are instantiated in Java heap, Non-Volatile data fields are directly 
accessed via GET/SET operation to and from persistent memory and storage. 
Mnemonic avoids SerDes and significantly reduces amount of garbage in Java heap.

Major features of Mnemonic:
* Provides an abstract level of viewpoint to utilize heterogeneous 
block/byte-addressable device as a whole (e.g., DRAM, persistent memory, NVMe, 
SSD, HD, cloud network Storage).
* Provides seamless support object oriented design and programming without 
adding burden to transfer object data to different form.
* Avoids the object data serialization/de-serialization for data retrieval, 
caching and storage.
* Reduces the consumption of on-heap memory and in turn to reduce and stabilize 
Java Garbage Collection (GC) pauses for latency sensitive applications.
* Overcomes current limitations of Java GC to manage much larger memory 
resources for massive dataset processing and computing.
* Supports the migration data usage model from traditional NVMe/SSD/HD to 
non-volatile memory with ease.
* Uses lazy loading mechanism to avoid unnecessary memory consumption if some 
data does not need to use for computing immediately.
* Bypasses JNI call for the interaction between Java runtime application and 
its native code.
* Provides an allocation aware auto-reclaim mechanism to prevent external 
memory resource leaking.


=== Background ===
Big Data and Cloud applications increasingly require both high throughput and 
low latency processing. Java-based applications targeting the Big Data and 
Cloud space should be tuned for better throughput, lower latency, and more 
predictable response time.
Typically, there are some issues that impact BigData applications' performance 
and scalability:

1) The Complexity of Data Transformation/Organization: In most cases, during 
data processing, applications use their own complicated data caching mechanism 
for SerDes data objects, spilling to different storage and eviction large 
amount of data. Some data objects contains complex values and structure that 
will make it much more difficulty for data organization. To load and then 
parse/decode its datasets from storage consumes high system resource and 
computation power. 

2) Lack of Caching, Burst Temporary Object Creation/Destruction Causes Frequent 
Long GC Pauses: Big Data computing/syntax generates 

RE: [DISCUSS] Mnemonic incubator proposal

2016-02-22 Thread Wang, Yanping
That's great, thanks Debo, I will add you as additional interested contributors.

Thanks
Yanping

-Original Message-
From: Debo Dutta (dedutta) [mailto:dedu...@cisco.com] 
Sent: Sunday, February 21, 2016 3:26 PM
To: general@incubator.apache.org
Subject: Re: [DISCUSS] Mnemonic incubator proposal 

Hi Yanping

This is very interesting and timely. Would love to contribute, participate
etc. 

thx
debo

On 2/21/16, 11:47 AM, "Wang, Yanping" <yanping.w...@intel.com> wrote:

>Hi all 
>
>We'd like to start a discussion regarding a proposal to submit Mnemonic
>to the Apache Incubator.
>
>The proposal text is available on the Wiki here:
>https://wiki.apache.org/incubator/MnemonicProposal
>
>and pasted below for convenience.
>
>We are excited to make this proposal, and look forward to the community's
>input!
>
>Best,
>Yanping
>
>
>= Mnemonic Proposal =
>=== Abstract ===
>Mnemonic is a Java based non-volatile memory library for in-place
>structured data processing and computing. It is a solution for generic
>object and block persistence on heterogeneous block and byte-addressable
>devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network
>storage.
>
>=== Proposal ===
>Mnemonic is a structured data persistence in-memory in-place library for
>Java-based applications and frameworks. It provides unified interfaces
>for data manipulation on heterogeneous block/byte-addressable devices,
>such as DRAM, persistent memory, NVMe, SSD, and cloud network devices.
>
>The design motivation for this project is to create a non-volatile
>programming paradigm for in-memory data object persistence, in-memory
>data objects caching, and JNI-less IPC.
>Mnemonic simplifies the usage of data object caching, persistence, and
>JNI-less IPC for massive object oriented structural datasets.
>
>Mnemonic defines Non-Volatile Java objects that store data fields in
>persistent memory and storage. During the program runtime, only methods
>and volatile fields are instantiated in Java heap, Non-Volatile data
>fields are directly accessed via GET/SET operation to and from persistent
>memory and storage. Mnemonic avoids SerDes and significantly reduces
>amount of garbage in Java heap.
>
>Major features of Mnemonic:
>* Provides an abstract level of viewpoint to utilize heterogeneous
>block/byte-addressable device as a whole (e.g., DRAM, persistent memory,
>NVMe, SSD, HD, cloud network Storage).
>* Provides seamless support object oriented design and programming
>without adding burden to transfer object data to different form.
>* Avoids the object data serialization/de-serialization for data
>retrieval, caching and storage.
>* Reduces the consumption of on-heap memory and in turn to reduce and
>stabilize Java Garbage Collection (GC) pauses for latency sensitive
>applications.
>* Overcomes current limitations of Java GC to manage much larger memory
>resources for massive dataset processing and computing.
>* Supports the migration data usage model from traditional NVMe/SSD/HD to
>non-volatile memory with ease.
>* Uses lazy loading mechanism to avoid unnecessary memory consumption if
>some data does not need to use for computing immediately.
>* Bypasses JNI call for the interaction between Java runtime application
>and its native code.
>* Provides an allocation aware auto-reclaim mechanism to prevent external
>memory resource leaking.
>
>
>=== Background ===
>Big Data and Cloud applications increasingly require both high throughput
>and low latency processing. Java-based applications targeting the Big
>Data and Cloud space should be tuned for better throughput, lower
>latency, and more predictable response time.
>Typically, there are some issues that impact BigData applications'
>performance and scalability:
>
>1) The Complexity of Data Transformation/Organization: In most cases,
>during data processing, applications use their own complicated data
>caching mechanism for SerDes data objects, spilling to different storage
>and eviction large amount of data. Some data objects contains complex
>values and structure that will make it much more difficulty for data
>organization. To load and then parse/decode its datasets from storage
>consumes high system resource and computation power.
>
>2) Lack of Caching, Burst Temporary Object Creation/Destruction Causes
>Frequent Long GC Pauses: Big Data computing/syntax generates large amount
>of temporary objects during processing, e.g. lambda, SerDes, copying and
>etc. This will trigger frequent long Java GC pause to scan references, to
>update references lists, and to copy live objects from one memory
>location to another blindly.
>
>3) The Unpredictable GC Pause: For latency sensitive applications, such
>as datab

Re: [DISCUSS] Mnemonic incubator proposal

2016-02-21 Thread Debo Dutta (dedutta)
Hi Yanping

This is very interesting and timely. Would love to contribute, participate
etc. 

thx
debo

On 2/21/16, 11:47 AM, "Wang, Yanping"  wrote:

>Hi all 
>
>We'd like to start a discussion regarding a proposal to submit Mnemonic
>to the Apache Incubator.
>
>The proposal text is available on the Wiki here:
>https://wiki.apache.org/incubator/MnemonicProposal
>
>and pasted below for convenience.
>
>We are excited to make this proposal, and look forward to the community's
>input!
>
>Best,
>Yanping
>
>
>= Mnemonic Proposal =
>=== Abstract ===
>Mnemonic is a Java based non-volatile memory library for in-place
>structured data processing and computing. It is a solution for generic
>object and block persistence on heterogeneous block and byte-addressable
>devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network
>storage.
>
>=== Proposal ===
>Mnemonic is a structured data persistence in-memory in-place library for
>Java-based applications and frameworks. It provides unified interfaces
>for data manipulation on heterogeneous block/byte-addressable devices,
>such as DRAM, persistent memory, NVMe, SSD, and cloud network devices.
>
>The design motivation for this project is to create a non-volatile
>programming paradigm for in-memory data object persistence, in-memory
>data objects caching, and JNI-less IPC.
>Mnemonic simplifies the usage of data object caching, persistence, and
>JNI-less IPC for massive object oriented structural datasets.
>
>Mnemonic defines Non-Volatile Java objects that store data fields in
>persistent memory and storage. During the program runtime, only methods
>and volatile fields are instantiated in Java heap, Non-Volatile data
>fields are directly accessed via GET/SET operation to and from persistent
>memory and storage. Mnemonic avoids SerDes and significantly reduces
>amount of garbage in Java heap.
>
>Major features of Mnemonic:
>* Provides an abstract level of viewpoint to utilize heterogeneous
>block/byte-addressable device as a whole (e.g., DRAM, persistent memory,
>NVMe, SSD, HD, cloud network Storage).
>* Provides seamless support object oriented design and programming
>without adding burden to transfer object data to different form.
>* Avoids the object data serialization/de-serialization for data
>retrieval, caching and storage.
>* Reduces the consumption of on-heap memory and in turn to reduce and
>stabilize Java Garbage Collection (GC) pauses for latency sensitive
>applications.
>* Overcomes current limitations of Java GC to manage much larger memory
>resources for massive dataset processing and computing.
>* Supports the migration data usage model from traditional NVMe/SSD/HD to
>non-volatile memory with ease.
>* Uses lazy loading mechanism to avoid unnecessary memory consumption if
>some data does not need to use for computing immediately.
>* Bypasses JNI call for the interaction between Java runtime application
>and its native code.
>* Provides an allocation aware auto-reclaim mechanism to prevent external
>memory resource leaking.
>
>
>=== Background ===
>Big Data and Cloud applications increasingly require both high throughput
>and low latency processing. Java-based applications targeting the Big
>Data and Cloud space should be tuned for better throughput, lower
>latency, and more predictable response time.
>Typically, there are some issues that impact BigData applications'
>performance and scalability:
>
>1) The Complexity of Data Transformation/Organization: In most cases,
>during data processing, applications use their own complicated data
>caching mechanism for SerDes data objects, spilling to different storage
>and eviction large amount of data. Some data objects contains complex
>values and structure that will make it much more difficulty for data
>organization. To load and then parse/decode its datasets from storage
>consumes high system resource and computation power.
>
>2) Lack of Caching, Burst Temporary Object Creation/Destruction Causes
>Frequent Long GC Pauses: Big Data computing/syntax generates large amount
>of temporary objects during processing, e.g. lambda, SerDes, copying and
>etc. This will trigger frequent long Java GC pause to scan references, to
>update references lists, and to copy live objects from one memory
>location to another blindly.
>
>3) The Unpredictable GC Pause: For latency sensitive applications, such
>as database, search engine, web query, real-time/streaming computing,
>require latency/request-response under control. But current Java GC does
>not provide predictable GC activities with large on-heap memory
>management.
>
>4) High JNI Invocation Cost: JNI calls are expensive, but high
>performance applications usually try to leverage native code to improve
>performance, however, JNI calls need to convert Java objects into
>something that C/C++ can understand. In addition, some comprehensive
>native code needs to communicate with Java based application that will
>cause frequently JNI call along with