Re: [DISCUSS] Mnemonic incubator proposal
Looks like the discussions had calmed down with no objection so I think we could proceed with the VOTE thread. - Henry On Wed, Feb 24, 2016 at 12:15 PM, Wang, Yanping <yanping.w...@intel.com> wrote: > In general, Mnemonic can be integrated into many projects. First, projects > can use Mnemonic to take data off Java heap so GC can be much reduced, and > use GET/SET to access data fields so serdes can be eliminated. > Later we can expand Mnemonic to excise persistent/non-volatile programming > on large scaled distributed systems with TB sized fast persistent memory > devices. > > Regarding solving Hadoop Namenode pressure of large scale of cluster > scenarios. This issue is due to HDFS. > Last year we found the use of FileInputStream in HDFS causes unpredicted > long Garbage Collection pauses due to the overhead of finalizers and > significantly impacted HDFS performance and its scalability. We recorded > the issue in https://issues.apache.org/jira/browse/HDFS-8562 > > Uma explained what we can do for using Mnemonic to improve HDFS > performance and scalability. One big advantage is Mnemonic does not need to > hold File System cache for random access, which will benefit large scale of > clusters. > > Thanks > yanping > > > > -Original Message- > From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] > Sent: Tuesday, February 23, 2016 8:06 PM > To: general@incubator.apache.org > Subject: Re: [DISCUSS] Mnemonic incubator proposal > > Hi Liang, > > Thank you for your interest. Sure we would consider you adding in > interested contributors list. > > >Mnemonic is trying to solve performance issues associated > with serialization/deserialization of java object when dealing with JVM & > disk directly as well as GC pressure caused by caching ? > > Yes. > > >whether Mnemonic could solve Hadoop Namenode pressure of large > scale of cluster scenaros, or not? > Yeah, we are thinking on some aspects considering memory and GC overheads > in Namenode too. > Example couple of JIRAs already there in HDFS to move some of data > structure to off heap. So, we had plans to get the standard data structures > from this library and can make use of them push. > Also we could make advantage if persistence here. > > > @Yanping/Gary, may be you could add more points if you have? > > [Gary] Thanks Uma, in addition, you can plug-in your special allocators > that could be optimized for namenode usage patterns, by this way, the > performance could be better and more predictable. Thanks. > > Regards, > Uma > > On 2/23/16, 6:48 PM, "Liang Chen" <chenliang...@huawei.com> wrote: > > >Interesting, would love to become the contributor > > > >My understanding: Mnemonic is trying to solve performance issues > >associated with serialization/deserialization of java object when > >dealing with JVM & disk directly as well as GC pressure caused by > >caching ? > > > >one question: whether Mnemonic could solve Hadoop Namenode pressure of > >large scale of cluster scenaros, or not? > > > > > > > >-- > >View this message in context: > >http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-i > >ncu > >bator-proposal-tp48502p48533.html > >Sent from the Apache Incubator - General mailing list archive at > >Nabble.com. > > > >- > >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >For additional commands, e-mail: general-h...@incubator.apache.org > > > > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
RE: [DISCUSS] Mnemonic incubator proposal
In general, Mnemonic can be integrated into many projects. First, projects can use Mnemonic to take data off Java heap so GC can be much reduced, and use GET/SET to access data fields so serdes can be eliminated. Later we can expand Mnemonic to excise persistent/non-volatile programming on large scaled distributed systems with TB sized fast persistent memory devices. Regarding solving Hadoop Namenode pressure of large scale of cluster scenarios. This issue is due to HDFS. Last year we found the use of FileInputStream in HDFS causes unpredicted long Garbage Collection pauses due to the overhead of finalizers and significantly impacted HDFS performance and its scalability. We recorded the issue in https://issues.apache.org/jira/browse/HDFS-8562 Uma explained what we can do for using Mnemonic to improve HDFS performance and scalability. One big advantage is Mnemonic does not need to hold File System cache for random access, which will benefit large scale of clusters. Thanks yanping -Original Message- From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] Sent: Tuesday, February 23, 2016 8:06 PM To: general@incubator.apache.org Subject: Re: [DISCUSS] Mnemonic incubator proposal Hi Liang, Thank you for your interest. Sure we would consider you adding in interested contributors list. >Mnemonic is trying to solve performance issues associated with serialization/deserialization of java object when dealing with JVM & disk directly as well as GC pressure caused by caching ? Yes. >whether Mnemonic could solve Hadoop Namenode pressure of large scale of cluster scenaros, or not? Yeah, we are thinking on some aspects considering memory and GC overheads in Namenode too. Example couple of JIRAs already there in HDFS to move some of data structure to off heap. So, we had plans to get the standard data structures from this library and can make use of them push. Also we could make advantage if persistence here. @Yanping/Gary, may be you could add more points if you have? [Gary] Thanks Uma, in addition, you can plug-in your special allocators that could be optimized for namenode usage patterns, by this way, the performance could be better and more predictable. Thanks. Regards, Uma On 2/23/16, 6:48 PM, "Liang Chen" <chenliang...@huawei.com> wrote: >Interesting, would love to become the contributor > >My understanding: Mnemonic is trying to solve performance issues >associated with serialization/deserialization of java object when >dealing with JVM & disk directly as well as GC pressure caused by >caching ? > >one question: whether Mnemonic could solve Hadoop Namenode pressure of >large scale of cluster scenaros, or not? > > > >-- >View this message in context: >http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-i >ncu >bator-proposal-tp48502p48533.html >Sent from the Apache Incubator - General mailing list archive at >Nabble.com. > >- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: [DISCUSS] Mnemonic incubator proposal
Hi Liang please find my response inline below Best Regards Gary (Wang, Gang), PMP(r), CMMI(r) Appraiser, ITIL(r) Foundation NRDC: Donate (Natural Resources Defense Council) -Original Message- From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] Sent: Tuesday, February 23, 2016 8:06 PM To: general@incubator.apache.org Subject: Re: [DISCUSS] Mnemonic incubator proposal Hi Liang, Thank you for your interest. Sure we would consider you adding in interested contributors list. >Mnemonic is trying to solve performance issues associated with serialization/deserialization of java object when dealing with JVM & disk directly as well as GC pressure caused by caching ? Yes. >whether Mnemonic could solve Hadoop Namenode pressure of large scale of cluster scenaros, or not? Yeah, we are thinking on some aspects considering memory and GC overheads in Namenode too. Example couple of JIRAs already there in HDFS to move some of data structure to off heap. So, we had plans to get the standard data structures from this library and can make use of them push. Also we could make advantage if persistence here. @Yanping/Gary, may be you could add more points if you have? [Gary] Thanks Uma, in addition, you can plug-in your special allocators that could be optimized for namenode usage patterns, by this way, the performance could be better and more predictable. Thanks. Regards, Uma On 2/23/16, 6:48 PM, "Liang Chen" <chenliang...@huawei.com> wrote: >Interesting, would love to become the contributor > >My understanding: Mnemonic is trying to solve performance issues >associated with serialization/deserialization of java object when >dealing with JVM & disk directly as well as GC pressure caused by >caching ? > >one question: whether Mnemonic could solve Hadoop Namenode pressure of >large scale of cluster scenaros, or not? > > > >-- >View this message in context: >http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-i >ncu >bator-proposal-tp48502p48533.html >Sent from the Apache Incubator - General mailing list archive at >Nabble.com. > >- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Mnemonic incubator proposal
Hi Liang, Thank you for your interest. Sure we would consider you adding in interested contributors list. >Mnemonic is trying to solve performance issues associated with serialization/deserialization of java object when dealing with JVM & disk directly as well as GC pressure caused by caching ? Yes. >whether Mnemonic could solve Hadoop Namenode pressure of large scale of cluster scenaros, or not? Yeah, we are thinking on some aspects considering memory and GC overheads in Namenode too. Example couple of JIRAs already there in HDFS to move some of data structure to off heap. So, we had plans to get the standard data structures from this library and can make use of them push. Also we could make advantage if persistence here. @Yanping/Gary, may be you could add more points if you have? Regards, Uma On 2/23/16, 6:48 PM, "Liang Chen"wrote: >Interesting, would love to become the contributor > >My understanding: Mnemonic is trying to solve performance issues >associated >with serialization/deserialization of java object when dealing with JVM & >disk directly as well as GC pressure caused by caching ? > >one question: whether Mnemonic could solve Hadoop Namenode pressure of >large >scale of cluster scenaros, or not? > > > >-- >View this message in context: >http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-incu >bator-proposal-tp48502p48533.html >Sent from the Apache Incubator - General mailing list archive at >Nabble.com. > >- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Mnemonic incubator proposal
Interesting, would love to become the contributor My understanding: Mnemonic is trying to solve performance issues associated with serialization/deserialization of java object when dealing with JVM & disk directly as well as GC pressure caused by caching ? one question: whether Mnemonic could solve Hadoop Namenode pressure of large scale of cluster scenaros, or not? -- View this message in context: http://apache-incubator-general.996316.n3.nabble.com/DISCUSS-Mnemonic-incubator-proposal-tp48502p48533.html Sent from the Apache Incubator - General mailing list archive at Nabble.com. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: [DISCUSS] Mnemonic incubator proposal
Hey Ted, I'm working with Yanping in same Big Data Technologies U.S R team. Let me try to explain the potential possible way to leverage the power of Arrow for Mnemonic. Our initial thoughts about the connection with Arrow are Mnemonic could directly provide Arrow based collections for generic non-volatile objects, developers could apply SIMD operations on those collections for high performance processing. In other possible case, some customized object graphs could take benefits from Arrow if Mnemonic provides Arrow specific tags to hint pluggable Arrow featured allocators, an Arrow featured allocators could aggregate same type of non-volatile objects from customized object graphs according to the hint of Arrow tags. We think the Arrow tag could also be marked on some non-volatile fields of different type of objects for SIMD friendly operations. Thanks. Best Regards Gary (Wang, Gang), PMP®, CMMI® Appraiser, ITIL® Foundation NRDC: Donate (Natural Resources Defense Council) -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Monday, February 22, 2016 8:53 PM To: general@incubator.apache.org Subject: Re: [DISCUSS] Mnemonic incubator proposal Yanping, Could you explain some of the ways that you see that Arrow might be useful to Mnemonic and how Mnemonic extends the generality of Arrow (probably with performance consequences)? On Mon, Feb 22, 2016 at 4:43 PM, Jacques Nadeau <jacq...@apache.org> wrote: > Hey YanPing, > > This addition is nice to see. I agree that there is great opportunity > for the Arrow and Mnemonic communities to collaborate. I look forward > to working together. > > Jacques > > On Mon, Feb 22, 2016 at 3:01 PM, Wang, Yanping > <yanping.w...@intel.com> > wrote: > > > Hi, All > > > > Based on feedback, we added following into Mnemonic proposal: > > > > Relationships with Other Apache Product > > + Relationship with Apache™ Arrow: > > + Arrow's columnar data layout allows great use of CPU caches & > > + SIMD. It > > places all data that relevant to a column operation in a compact > > format > in > > memory. > > + > > + Mnemonic directly puts the whole business object graphs on > > + external > > heterogeneous storage media, e.g. off-heap, SSD. It is not necessary > > to normalize the structures of object graphs for caching, checkpoint > > or storing. It doesn’t require developers to normalize their data > > object graphs. Mnemonic applications can avoid indexing & join > > datasets compared to traditional approaches. > > + > > + Mnemonic can leverage Arrow to transparently re-layout qualified > > + data > > objects or create special containers that is able to efficiently > > hold > those > > data records in columnar form as one of major performance > > optimization constructs. > > + > > > > Thanks > > Yanping > > > > -Original Message- > > From: Wang, Yanping [mailto:yanping.w...@intel.com] > > Sent: Sunday, February 21, 2016 11:47 AM > > To: general@incubator.apache.org > > Subject: [DISCUSS] Mnemonic incubator proposal > > > > Hi all > > > > We'd like to start a discussion regarding a proposal to submit > > Mnemonic > to > > the Apache Incubator. > > > > The proposal text is available on the Wiki here: > > https://wiki.apache.org/incubator/MnemonicProposal > > > > and pasted below for convenience. > > > > We are excited to make this proposal, and look forward to the > > community's input! > > > > Best, > > Yanping > > > > > > = Mnemonic Proposal = > > === Abstract === > > Mnemonic is a Java based non-volatile memory library for in-place > > structured data processing and computing. It is a solution for > > generic object and block persistence on heterogeneous block and > > byte-addressable devices, such as DRAM, persistent memory, NVMe, > > SSD, and cloud network storage. > > > > === Proposal === > > Mnemonic is a structured data persistence in-memory in-place library > > for Java-based applications and frameworks. It provides unified > > interfaces > for > > data manipulation on heterogeneous block/byte-addressable devices, > > such > as > > DRAM, persistent memory, NVMe, SSD, and cloud network devices. > > > > The design motivation for this project is to create a non-volatile > > programming paradigm for in-memory data object persistence, > > in-memory > data > > objects caching, and JNI-less IPC. > > Mnemonic simplifies the usage of data object caching, per
RE: [DISCUSS] Mnemonic incubator proposal
Hi, All I uploaded a PDF presentation that describes Project Mnemonic with some nice pictures. Click Attachment link below to see the presentation. Attachment name: Project_Mnemonic_Pub1.0.pdf Attachment size: 1493317 Attachment link: https://wiki.apache.org/incubator/MnemonicProposal?action=AttachFile=get=Project_Mnemonic_Pub1.0.pdf Page link: https://wiki.apache.org/incubator/MnemonicProposal Thanks Yanping -Original Message- From: Wang, Yanping [mailto:yanping.w...@intel.com] Sent: Sunday, February 21, 2016 11:47 AM To: general@incubator.apache.org Subject: [DISCUSS] Mnemonic incubator proposal Hi all We'd like to start a discussion regarding a proposal to submit Mnemonic to the Apache Incubator. The proposal text is available on the Wiki here: https://wiki.apache.org/incubator/MnemonicProposal and pasted below for convenience. We are excited to make this proposal, and look forward to the community's input! Best, Yanping = Mnemonic Proposal = === Abstract === Mnemonic is a Java based non-volatile memory library for in-place structured data processing and computing. It is a solution for generic object and block persistence on heterogeneous block and byte-addressable devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network storage. === Proposal === Mnemonic is a structured data persistence in-memory in-place library for Java-based applications and frameworks. It provides unified interfaces for data manipulation on heterogeneous block/byte-addressable devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network devices. The design motivation for this project is to create a non-volatile programming paradigm for in-memory data object persistence, in-memory data objects caching, and JNI-less IPC. Mnemonic simplifies the usage of data object caching, persistence, and JNI-less IPC for massive object oriented structural datasets. Mnemonic defines Non-Volatile Java objects that store data fields in persistent memory and storage. During the program runtime, only methods and volatile fields are instantiated in Java heap, Non-Volatile data fields are directly accessed via GET/SET operation to and from persistent memory and storage. Mnemonic avoids SerDes and significantly reduces amount of garbage in Java heap. Major features of Mnemonic: * Provides an abstract level of viewpoint to utilize heterogeneous block/byte-addressable device as a whole (e.g., DRAM, persistent memory, NVMe, SSD, HD, cloud network Storage). * Provides seamless support object oriented design and programming without adding burden to transfer object data to different form. * Avoids the object data serialization/de-serialization for data retrieval, caching and storage. * Reduces the consumption of on-heap memory and in turn to reduce and stabilize Java Garbage Collection (GC) pauses for latency sensitive applications. * Overcomes current limitations of Java GC to manage much larger memory resources for massive dataset processing and computing. * Supports the migration data usage model from traditional NVMe/SSD/HD to non-volatile memory with ease. * Uses lazy loading mechanism to avoid unnecessary memory consumption if some data does not need to use for computing immediately. * Bypasses JNI call for the interaction between Java runtime application and its native code. * Provides an allocation aware auto-reclaim mechanism to prevent external memory resource leaking. === Background === Big Data and Cloud applications increasingly require both high throughput and low latency processing. Java-based applications targeting the Big Data and Cloud space should be tuned for better throughput, lower latency, and more predictable response time. Typically, there are some issues that impact BigData applications' performance and scalability: 1) The Complexity of Data Transformation/Organization: In most cases, during data processing, applications use their own complicated data caching mechanism for SerDes data objects, spilling to different storage and eviction large amount of data. Some data objects contains complex values and structure that will make it much more difficulty for data organization. To load and then parse/decode its datasets from storage consumes high system resource and computation power. 2) Lack of Caching, Burst Temporary Object Creation/Destruction Causes Frequent Long GC Pauses: Big Data computing/syntax generates large amount of temporary objects during processing, e.g. lambda, SerDes, copying and etc. This will trigger frequent long Java GC pause to scan references, to update references lists, and to copy live objects from one memory location to another blindly. 3) The Unpredictable GC Pause: For latency sensitive applications, such as database, search engine, web query, real-time/streaming computing, require latency/request-response under control. But current Java GC does not provide predictable GC activities with large on-heap
RE: [DISCUSS] Mnemonic incubator proposal
Yes, Jacques, it is exciting to see Arrow and Mnemonic can leverage each other. I looked at Apache Drill today. I think Drill can use Mnemonic to optimize scalable data sources. So the idea is, Mnemonic takes Arrow as a columnar data construct or collection that optimized from memory to CPU cache. Then Drill can use Arrow integrated Mnemonic to access cross distributed systems storage media for scalable data sources. Drill + (Mnemonic (Arrow)) Integration => Optimize entire data access chains from distributed storage media to CPU cache. Definitely looking forward to working together. Best, Yanping -Original Message- From: Jacques Nadeau [mailto:jacq...@apache.org] Sent: Monday, February 22, 2016 4:43 PM To: general@incubator.apache.org Subject: Re: [DISCUSS] Mnemonic incubator proposal Hey YanPing, This addition is nice to see. I agree that there is great opportunity for the Arrow and Mnemonic communities to collaborate. I look forward to working together. Jacques On Mon, Feb 22, 2016 at 3:01 PM, Wang, Yanping <yanping.w...@intel.com> wrote: > Hi, All > > Based on feedback, we added following into Mnemonic proposal: > > Relationships with Other Apache Product > + Relationship with Apache™ Arrow: > + Arrow's columnar data layout allows great use of CPU caches & SIMD. It > places all data that relevant to a column operation in a compact format in > memory. > + > + Mnemonic directly puts the whole business object graphs on external > heterogeneous storage media, e.g. off-heap, SSD. It is not necessary to > normalize the structures of object graphs for caching, checkpoint or > storing. It doesn’t require developers to normalize their data object > graphs. Mnemonic applications can avoid indexing & join datasets compared > to traditional approaches. > + > + Mnemonic can leverage Arrow to transparently re-layout qualified data > objects or create special containers that is able to efficiently hold those > data records in columnar form as one of major performance optimization > constructs. > + > > Thanks > Yanping > > -Original Message- > From: Wang, Yanping [mailto:yanping.w...@intel.com] > Sent: Sunday, February 21, 2016 11:47 AM > To: general@incubator.apache.org > Subject: [DISCUSS] Mnemonic incubator proposal > > Hi all > > We'd like to start a discussion regarding a proposal to submit Mnemonic to > the Apache Incubator. > > The proposal text is available on the Wiki here: > https://wiki.apache.org/incubator/MnemonicProposal > > and pasted below for convenience. > > We are excited to make this proposal, and look forward to the community's > input! > > Best, > Yanping > > > = Mnemonic Proposal = > === Abstract === > Mnemonic is a Java based non-volatile memory library for in-place > structured data processing and computing. It is a solution for generic > object and block persistence on heterogeneous block and byte-addressable > devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network > storage. > > === Proposal === > Mnemonic is a structured data persistence in-memory in-place library for > Java-based applications and frameworks. It provides unified interfaces for > data manipulation on heterogeneous block/byte-addressable devices, such as > DRAM, persistent memory, NVMe, SSD, and cloud network devices. > > The design motivation for this project is to create a non-volatile > programming paradigm for in-memory data object persistence, in-memory data > objects caching, and JNI-less IPC. > Mnemonic simplifies the usage of data object caching, persistence, and > JNI-less IPC for massive object oriented structural datasets. > > Mnemonic defines Non-Volatile Java objects that store data fields in > persistent memory and storage. During the program runtime, only methods and > volatile fields are instantiated in Java heap, Non-Volatile data fields are > directly accessed via GET/SET operation to and from persistent memory and > storage. Mnemonic avoids SerDes and significantly reduces amount of garbage > in Java heap. > > Major features of Mnemonic: > * Provides an abstract level of viewpoint to utilize heterogeneous > block/byte-addressable device as a whole (e.g., DRAM, persistent memory, > NVMe, SSD, HD, cloud network Storage). > * Provides seamless support object oriented design and programming without > adding burden to transfer object data to different form. > * Avoids the object data serialization/de-serialization for data > retrieval, caching and storage. > * Reduces the consumption of on-heap memory and in turn to reduce and > stabilize Java Garbage Collection (GC) pauses for latency sensitive > applications. > * Overcomes current limitations of Java GC to manage much
Re: [DISCUSS] Mnemonic incubator proposal
Yanping, Could you explain some of the ways that you see that Arrow might be useful to Mnemonic and how Mnemonic extends the generality of Arrow (probably with performance consequences)? On Mon, Feb 22, 2016 at 4:43 PM, Jacques Nadeauwrote: > Hey YanPing, > > This addition is nice to see. I agree that there is great opportunity for > the Arrow and Mnemonic communities to collaborate. I look forward to > working together. > > Jacques > > On Mon, Feb 22, 2016 at 3:01 PM, Wang, Yanping > wrote: > > > Hi, All > > > > Based on feedback, we added following into Mnemonic proposal: > > > > Relationships with Other Apache Product > > + Relationship with Apache™ Arrow: > > + Arrow's columnar data layout allows great use of CPU caches & SIMD. It > > places all data that relevant to a column operation in a compact format > in > > memory. > > + > > + Mnemonic directly puts the whole business object graphs on external > > heterogeneous storage media, e.g. off-heap, SSD. It is not necessary to > > normalize the structures of object graphs for caching, checkpoint or > > storing. It doesn’t require developers to normalize their data object > > graphs. Mnemonic applications can avoid indexing & join datasets compared > > to traditional approaches. > > + > > + Mnemonic can leverage Arrow to transparently re-layout qualified data > > objects or create special containers that is able to efficiently hold > those > > data records in columnar form as one of major performance optimization > > constructs. > > + > > > > Thanks > > Yanping > > > > -Original Message- > > From: Wang, Yanping [mailto:yanping.w...@intel.com] > > Sent: Sunday, February 21, 2016 11:47 AM > > To: general@incubator.apache.org > > Subject: [DISCUSS] Mnemonic incubator proposal > > > > Hi all > > > > We'd like to start a discussion regarding a proposal to submit Mnemonic > to > > the Apache Incubator. > > > > The proposal text is available on the Wiki here: > > https://wiki.apache.org/incubator/MnemonicProposal > > > > and pasted below for convenience. > > > > We are excited to make this proposal, and look forward to the community's > > input! > > > > Best, > > Yanping > > > > > > = Mnemonic Proposal = > > === Abstract === > > Mnemonic is a Java based non-volatile memory library for in-place > > structured data processing and computing. It is a solution for generic > > object and block persistence on heterogeneous block and byte-addressable > > devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network > > storage. > > > > === Proposal === > > Mnemonic is a structured data persistence in-memory in-place library for > > Java-based applications and frameworks. It provides unified interfaces > for > > data manipulation on heterogeneous block/byte-addressable devices, such > as > > DRAM, persistent memory, NVMe, SSD, and cloud network devices. > > > > The design motivation for this project is to create a non-volatile > > programming paradigm for in-memory data object persistence, in-memory > data > > objects caching, and JNI-less IPC. > > Mnemonic simplifies the usage of data object caching, persistence, and > > JNI-less IPC for massive object oriented structural datasets. > > > > Mnemonic defines Non-Volatile Java objects that store data fields in > > persistent memory and storage. During the program runtime, only methods > and > > volatile fields are instantiated in Java heap, Non-Volatile data fields > are > > directly accessed via GET/SET operation to and from persistent memory and > > storage. Mnemonic avoids SerDes and significantly reduces amount of > garbage > > in Java heap. > > > > Major features of Mnemonic: > > * Provides an abstract level of viewpoint to utilize heterogeneous > > block/byte-addressable device as a whole (e.g., DRAM, persistent memory, > > NVMe, SSD, HD, cloud network Storage). > > * Provides seamless support object oriented design and programming > without > > adding burden to transfer object data to different form. > > * Avoids the object data serialization/de-serialization for data > > retrieval, caching and storage. > > * Reduces the consumption of on-heap memory and in turn to reduce and > > stabilize Java Garbage Collection (GC) pauses for latency sensitive > > applications. > > * Overcomes current limitations of Java GC to manage much larger memory > > resources for massive dataset processing and computing. > > * Supports the migration data usage model from traditional NVMe/SSD/HD to > > non-volatile memory with ease. > > * Uses lazy loading mechanism to avoid unnecessary memory consumption if > > some data does not need to use for computing immediately. > > * Bypasses JNI call for the interaction between Java runtime application > > and its native code. > > * Provides an allocation aware auto-reclaim mechanism to prevent external > > memory resource leaking. > > > > > > === Background === > > Big Data and Cloud applications increasingly require
Re: [DISCUSS] Mnemonic incubator proposal
Hey YanPing, This addition is nice to see. I agree that there is great opportunity for the Arrow and Mnemonic communities to collaborate. I look forward to working together. Jacques On Mon, Feb 22, 2016 at 3:01 PM, Wang, Yanpingwrote: > Hi, All > > Based on feedback, we added following into Mnemonic proposal: > > Relationships with Other Apache Product > + Relationship with Apache™ Arrow: > + Arrow's columnar data layout allows great use of CPU caches & SIMD. It > places all data that relevant to a column operation in a compact format in > memory. > + > + Mnemonic directly puts the whole business object graphs on external > heterogeneous storage media, e.g. off-heap, SSD. It is not necessary to > normalize the structures of object graphs for caching, checkpoint or > storing. It doesn’t require developers to normalize their data object > graphs. Mnemonic applications can avoid indexing & join datasets compared > to traditional approaches. > + > + Mnemonic can leverage Arrow to transparently re-layout qualified data > objects or create special containers that is able to efficiently hold those > data records in columnar form as one of major performance optimization > constructs. > + > > Thanks > Yanping > > -Original Message- > From: Wang, Yanping [mailto:yanping.w...@intel.com] > Sent: Sunday, February 21, 2016 11:47 AM > To: general@incubator.apache.org > Subject: [DISCUSS] Mnemonic incubator proposal > > Hi all > > We'd like to start a discussion regarding a proposal to submit Mnemonic to > the Apache Incubator. > > The proposal text is available on the Wiki here: > https://wiki.apache.org/incubator/MnemonicProposal > > and pasted below for convenience. > > We are excited to make this proposal, and look forward to the community's > input! > > Best, > Yanping > > > = Mnemonic Proposal = > === Abstract === > Mnemonic is a Java based non-volatile memory library for in-place > structured data processing and computing. It is a solution for generic > object and block persistence on heterogeneous block and byte-addressable > devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network > storage. > > === Proposal === > Mnemonic is a structured data persistence in-memory in-place library for > Java-based applications and frameworks. It provides unified interfaces for > data manipulation on heterogeneous block/byte-addressable devices, such as > DRAM, persistent memory, NVMe, SSD, and cloud network devices. > > The design motivation for this project is to create a non-volatile > programming paradigm for in-memory data object persistence, in-memory data > objects caching, and JNI-less IPC. > Mnemonic simplifies the usage of data object caching, persistence, and > JNI-less IPC for massive object oriented structural datasets. > > Mnemonic defines Non-Volatile Java objects that store data fields in > persistent memory and storage. During the program runtime, only methods and > volatile fields are instantiated in Java heap, Non-Volatile data fields are > directly accessed via GET/SET operation to and from persistent memory and > storage. Mnemonic avoids SerDes and significantly reduces amount of garbage > in Java heap. > > Major features of Mnemonic: > * Provides an abstract level of viewpoint to utilize heterogeneous > block/byte-addressable device as a whole (e.g., DRAM, persistent memory, > NVMe, SSD, HD, cloud network Storage). > * Provides seamless support object oriented design and programming without > adding burden to transfer object data to different form. > * Avoids the object data serialization/de-serialization for data > retrieval, caching and storage. > * Reduces the consumption of on-heap memory and in turn to reduce and > stabilize Java Garbage Collection (GC) pauses for latency sensitive > applications. > * Overcomes current limitations of Java GC to manage much larger memory > resources for massive dataset processing and computing. > * Supports the migration data usage model from traditional NVMe/SSD/HD to > non-volatile memory with ease. > * Uses lazy loading mechanism to avoid unnecessary memory consumption if > some data does not need to use for computing immediately. > * Bypasses JNI call for the interaction between Java runtime application > and its native code. > * Provides an allocation aware auto-reclaim mechanism to prevent external > memory resource leaking. > > > === Background === > Big Data and Cloud applications increasingly require both high throughput > and low latency processing. Java-based applications targeting the Big Data > and Cloud space should be tuned for better throughput, lower latency, and > more predictable response time. > Typically, there are some issues that impact BigData applications' > performance and scalability: > > 1) The Complexity of Data Transformation/Organization: In most cases, > during data processing, applications use their own complicated data caching > mechanism for SerDes data objects, spilling to
RE: [DISCUSS] Mnemonic incubator proposal
Hi, All Based on feedback, we added following into Mnemonic proposal: Relationships with Other Apache Product + Relationship with Apache™ Arrow: + Arrow's columnar data layout allows great use of CPU caches & SIMD. It places all data that relevant to a column operation in a compact format in memory. + + Mnemonic directly puts the whole business object graphs on external heterogeneous storage media, e.g. off-heap, SSD. It is not necessary to normalize the structures of object graphs for caching, checkpoint or storing. It doesn’t require developers to normalize their data object graphs. Mnemonic applications can avoid indexing & join datasets compared to traditional approaches. + + Mnemonic can leverage Arrow to transparently re-layout qualified data objects or create special containers that is able to efficiently hold those data records in columnar form as one of major performance optimization constructs. + Thanks Yanping -Original Message- From: Wang, Yanping [mailto:yanping.w...@intel.com] Sent: Sunday, February 21, 2016 11:47 AM To: general@incubator.apache.org Subject: [DISCUSS] Mnemonic incubator proposal Hi all We'd like to start a discussion regarding a proposal to submit Mnemonic to the Apache Incubator. The proposal text is available on the Wiki here: https://wiki.apache.org/incubator/MnemonicProposal and pasted below for convenience. We are excited to make this proposal, and look forward to the community's input! Best, Yanping = Mnemonic Proposal = === Abstract === Mnemonic is a Java based non-volatile memory library for in-place structured data processing and computing. It is a solution for generic object and block persistence on heterogeneous block and byte-addressable devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network storage. === Proposal === Mnemonic is a structured data persistence in-memory in-place library for Java-based applications and frameworks. It provides unified interfaces for data manipulation on heterogeneous block/byte-addressable devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network devices. The design motivation for this project is to create a non-volatile programming paradigm for in-memory data object persistence, in-memory data objects caching, and JNI-less IPC. Mnemonic simplifies the usage of data object caching, persistence, and JNI-less IPC for massive object oriented structural datasets. Mnemonic defines Non-Volatile Java objects that store data fields in persistent memory and storage. During the program runtime, only methods and volatile fields are instantiated in Java heap, Non-Volatile data fields are directly accessed via GET/SET operation to and from persistent memory and storage. Mnemonic avoids SerDes and significantly reduces amount of garbage in Java heap. Major features of Mnemonic: * Provides an abstract level of viewpoint to utilize heterogeneous block/byte-addressable device as a whole (e.g., DRAM, persistent memory, NVMe, SSD, HD, cloud network Storage). * Provides seamless support object oriented design and programming without adding burden to transfer object data to different form. * Avoids the object data serialization/de-serialization for data retrieval, caching and storage. * Reduces the consumption of on-heap memory and in turn to reduce and stabilize Java Garbage Collection (GC) pauses for latency sensitive applications. * Overcomes current limitations of Java GC to manage much larger memory resources for massive dataset processing and computing. * Supports the migration data usage model from traditional NVMe/SSD/HD to non-volatile memory with ease. * Uses lazy loading mechanism to avoid unnecessary memory consumption if some data does not need to use for computing immediately. * Bypasses JNI call for the interaction between Java runtime application and its native code. * Provides an allocation aware auto-reclaim mechanism to prevent external memory resource leaking. === Background === Big Data and Cloud applications increasingly require both high throughput and low latency processing. Java-based applications targeting the Big Data and Cloud space should be tuned for better throughput, lower latency, and more predictable response time. Typically, there are some issues that impact BigData applications' performance and scalability: 1) The Complexity of Data Transformation/Organization: In most cases, during data processing, applications use their own complicated data caching mechanism for SerDes data objects, spilling to different storage and eviction large amount of data. Some data objects contains complex values and structure that will make it much more difficulty for data organization. To load and then parse/decode its datasets from storage consumes high system resource and computation power. 2) Lack of Caching, Burst Temporary Object Creation/Destruction Causes Frequent Long GC Pauses: Big Data computing/syntax generates
RE: [DISCUSS] Mnemonic incubator proposal
That's great, thanks Debo, I will add you as additional interested contributors. Thanks Yanping -Original Message- From: Debo Dutta (dedutta) [mailto:dedu...@cisco.com] Sent: Sunday, February 21, 2016 3:26 PM To: general@incubator.apache.org Subject: Re: [DISCUSS] Mnemonic incubator proposal Hi Yanping This is very interesting and timely. Would love to contribute, participate etc. thx debo On 2/21/16, 11:47 AM, "Wang, Yanping" <yanping.w...@intel.com> wrote: >Hi all > >We'd like to start a discussion regarding a proposal to submit Mnemonic >to the Apache Incubator. > >The proposal text is available on the Wiki here: >https://wiki.apache.org/incubator/MnemonicProposal > >and pasted below for convenience. > >We are excited to make this proposal, and look forward to the community's >input! > >Best, >Yanping > > >= Mnemonic Proposal = >=== Abstract === >Mnemonic is a Java based non-volatile memory library for in-place >structured data processing and computing. It is a solution for generic >object and block persistence on heterogeneous block and byte-addressable >devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network >storage. > >=== Proposal === >Mnemonic is a structured data persistence in-memory in-place library for >Java-based applications and frameworks. It provides unified interfaces >for data manipulation on heterogeneous block/byte-addressable devices, >such as DRAM, persistent memory, NVMe, SSD, and cloud network devices. > >The design motivation for this project is to create a non-volatile >programming paradigm for in-memory data object persistence, in-memory >data objects caching, and JNI-less IPC. >Mnemonic simplifies the usage of data object caching, persistence, and >JNI-less IPC for massive object oriented structural datasets. > >Mnemonic defines Non-Volatile Java objects that store data fields in >persistent memory and storage. During the program runtime, only methods >and volatile fields are instantiated in Java heap, Non-Volatile data >fields are directly accessed via GET/SET operation to and from persistent >memory and storage. Mnemonic avoids SerDes and significantly reduces >amount of garbage in Java heap. > >Major features of Mnemonic: >* Provides an abstract level of viewpoint to utilize heterogeneous >block/byte-addressable device as a whole (e.g., DRAM, persistent memory, >NVMe, SSD, HD, cloud network Storage). >* Provides seamless support object oriented design and programming >without adding burden to transfer object data to different form. >* Avoids the object data serialization/de-serialization for data >retrieval, caching and storage. >* Reduces the consumption of on-heap memory and in turn to reduce and >stabilize Java Garbage Collection (GC) pauses for latency sensitive >applications. >* Overcomes current limitations of Java GC to manage much larger memory >resources for massive dataset processing and computing. >* Supports the migration data usage model from traditional NVMe/SSD/HD to >non-volatile memory with ease. >* Uses lazy loading mechanism to avoid unnecessary memory consumption if >some data does not need to use for computing immediately. >* Bypasses JNI call for the interaction between Java runtime application >and its native code. >* Provides an allocation aware auto-reclaim mechanism to prevent external >memory resource leaking. > > >=== Background === >Big Data and Cloud applications increasingly require both high throughput >and low latency processing. Java-based applications targeting the Big >Data and Cloud space should be tuned for better throughput, lower >latency, and more predictable response time. >Typically, there are some issues that impact BigData applications' >performance and scalability: > >1) The Complexity of Data Transformation/Organization: In most cases, >during data processing, applications use their own complicated data >caching mechanism for SerDes data objects, spilling to different storage >and eviction large amount of data. Some data objects contains complex >values and structure that will make it much more difficulty for data >organization. To load and then parse/decode its datasets from storage >consumes high system resource and computation power. > >2) Lack of Caching, Burst Temporary Object Creation/Destruction Causes >Frequent Long GC Pauses: Big Data computing/syntax generates large amount >of temporary objects during processing, e.g. lambda, SerDes, copying and >etc. This will trigger frequent long Java GC pause to scan references, to >update references lists, and to copy live objects from one memory >location to another blindly. > >3) The Unpredictable GC Pause: For latency sensitive applications, such >as datab
Re: [DISCUSS] Mnemonic incubator proposal
Hi Yanping This is very interesting and timely. Would love to contribute, participate etc. thx debo On 2/21/16, 11:47 AM, "Wang, Yanping"wrote: >Hi all > >We'd like to start a discussion regarding a proposal to submit Mnemonic >to the Apache Incubator. > >The proposal text is available on the Wiki here: >https://wiki.apache.org/incubator/MnemonicProposal > >and pasted below for convenience. > >We are excited to make this proposal, and look forward to the community's >input! > >Best, >Yanping > > >= Mnemonic Proposal = >=== Abstract === >Mnemonic is a Java based non-volatile memory library for in-place >structured data processing and computing. It is a solution for generic >object and block persistence on heterogeneous block and byte-addressable >devices, such as DRAM, persistent memory, NVMe, SSD, and cloud network >storage. > >=== Proposal === >Mnemonic is a structured data persistence in-memory in-place library for >Java-based applications and frameworks. It provides unified interfaces >for data manipulation on heterogeneous block/byte-addressable devices, >such as DRAM, persistent memory, NVMe, SSD, and cloud network devices. > >The design motivation for this project is to create a non-volatile >programming paradigm for in-memory data object persistence, in-memory >data objects caching, and JNI-less IPC. >Mnemonic simplifies the usage of data object caching, persistence, and >JNI-less IPC for massive object oriented structural datasets. > >Mnemonic defines Non-Volatile Java objects that store data fields in >persistent memory and storage. During the program runtime, only methods >and volatile fields are instantiated in Java heap, Non-Volatile data >fields are directly accessed via GET/SET operation to and from persistent >memory and storage. Mnemonic avoids SerDes and significantly reduces >amount of garbage in Java heap. > >Major features of Mnemonic: >* Provides an abstract level of viewpoint to utilize heterogeneous >block/byte-addressable device as a whole (e.g., DRAM, persistent memory, >NVMe, SSD, HD, cloud network Storage). >* Provides seamless support object oriented design and programming >without adding burden to transfer object data to different form. >* Avoids the object data serialization/de-serialization for data >retrieval, caching and storage. >* Reduces the consumption of on-heap memory and in turn to reduce and >stabilize Java Garbage Collection (GC) pauses for latency sensitive >applications. >* Overcomes current limitations of Java GC to manage much larger memory >resources for massive dataset processing and computing. >* Supports the migration data usage model from traditional NVMe/SSD/HD to >non-volatile memory with ease. >* Uses lazy loading mechanism to avoid unnecessary memory consumption if >some data does not need to use for computing immediately. >* Bypasses JNI call for the interaction between Java runtime application >and its native code. >* Provides an allocation aware auto-reclaim mechanism to prevent external >memory resource leaking. > > >=== Background === >Big Data and Cloud applications increasingly require both high throughput >and low latency processing. Java-based applications targeting the Big >Data and Cloud space should be tuned for better throughput, lower >latency, and more predictable response time. >Typically, there are some issues that impact BigData applications' >performance and scalability: > >1) The Complexity of Data Transformation/Organization: In most cases, >during data processing, applications use their own complicated data >caching mechanism for SerDes data objects, spilling to different storage >and eviction large amount of data. Some data objects contains complex >values and structure that will make it much more difficulty for data >organization. To load and then parse/decode its datasets from storage >consumes high system resource and computation power. > >2) Lack of Caching, Burst Temporary Object Creation/Destruction Causes >Frequent Long GC Pauses: Big Data computing/syntax generates large amount >of temporary objects during processing, e.g. lambda, SerDes, copying and >etc. This will trigger frequent long Java GC pause to scan references, to >update references lists, and to copy live objects from one memory >location to another blindly. > >3) The Unpredictable GC Pause: For latency sensitive applications, such >as database, search engine, web query, real-time/streaming computing, >require latency/request-response under control. But current Java GC does >not provide predictable GC activities with large on-heap memory >management. > >4) High JNI Invocation Cost: JNI calls are expensive, but high >performance applications usually try to leverage native code to improve >performance, however, JNI calls need to convert Java objects into >something that C/C++ can understand. In addition, some comprehensive >native code needs to communicate with Java based application that will >cause frequently JNI call along with