Re: Batch updates in Ignite B+ tree.
Hi Pavel, As far as I know batch tree updates already being developed. Alex, could you please elaborate? On Tue, Mar 5, 2019 at 5:05 PM Pavel Pereslegin wrote: > Hi Igniters! > > I am working on implementing batch updates in PageMemory [1] to > improve the performance of preloader, datastreamer and putAll. > > This task consists of two major related improvements: > 1. Batch writing to PageMemory via FreeList - store several values at > once to single memory page. > 2. Batch updates in BPlusTree (for introducing invokeAll operation). > > I started to investigate the issue with batch updates in B+ tree, and > it seems that the concurrent top-down balancing algorithm (TD) > described in this paper [2] may be suitable for batch insertion of > keys into Ignite B+ Tree. > This algorithm uses a top-down balancing approach and allows to insert > a batch of keys belonging to the leaves having the same parent. The > negative point of top-down balancing approach is that the parent node > is locked when performing insertion/splitting in child nodes. > > WDYT? Do you know other approaches for implementing batch updates in > Ignite B+ Tree? > > [1] https://issues.apache.org/jira/browse/IGNITE-7935 > [2] > https://aaltodoc.aalto.fi/bitstream/handle/123456789/2168/isbn9512258951.pdf >
[jira] [Created] (IGNITE-11487) Document IGNITE_SQL_MERGE_TABLE_MAX_SIZE property
Evgenii Zhuravlev created IGNITE-11487: -- Summary: Document IGNITE_SQL_MERGE_TABLE_MAX_SIZE property Key: IGNITE-11487 URL: https://issues.apache.org/jira/browse/IGNITE-11487 Project: Ignite Issue Type: Improvement Components: documentation Reporter: Evgenii Zhuravlev Assignee: Prachi Garg -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSSION] System cache persistence.
Hi, Andrey! >> 6. ServiceGrid. >> We can use Metastore and drop old-services later. As you mentioned new Service Grid does not use system cache. Legacy implementation (GridServiceProcessor), uses system cache WITHOUT persistence since 2.3 release and does not restore services state at node restart [1]. [1] https://issues.apache.org/jira/browse/IGNITE-6629 On Tue, Mar 5, 2019 at 4:21 PM Andrey Mashenkov wrote: > > Hi Igniters, > > I'd like to start a discussion to avoid system cache usage with persistence. > > System cache is used in number of component internals. No one cares > system cache can have stale data after grid restart as it wasn't impossible > before 2.1. > From Ignite 2.1 version it is possible to be persistent that may affect > components behavior: > Compute, IGFS, ServiceGrid, DataStructures. > > What's wrong? > 1. System cache persistent only if default region is configured as > persistent (and vice versa). > This is non-obvious and can causes unpredictable issues. > > 2. Any change in system cache requires distributed transaction that may > causes a deadlock. > We already avoid its usage in BinaryMarshaller and (almost) in recently > reworked ServiceGrid due to the "deadlock" reason. > > > What has been affected? and we can do? > 3. IGFS > AFAIK, IGFS support is going to be discontinued. There is nothing to do if > IGFS will be removed in 3.0. > > 4. DataStrucutres > Its looks broken (may be partially) as I see CacheDataStructuresManager > uses on-heap maps for id->structure mapping for some structures. > Look like it is safe to deprecate persistence for datastructures for now > and rework them separately. > Also, from user perspective, I'd expect datastructures persistence be > configured in some separate place or in datastructure configuration. > > 5. Compute > Let's rework this to use Metastore. > > 6. ServiceGrid. > We can use Metastore and drop old-services later. > > 5. Some 3-rd party plugins may be affected. > Of course, there is no compatibility guarantee if someone uses internal > components, but the issue #1 can make user frustrated. > We can prevent system cache being persistent. > > Do we really ever need System cache with persistence enabled? > Thoughts? > > I've create a ticket for this [1]. > > [1] https://issues.apache.org/jira/browse/IGNITE-11483 > > -- > Best regards, > Andrey V. Mashenkov -- Best Regards, Vyacheslav D.
Re: Tests for ML using binary builds
Hi, Alexey! >> If we can use multi JVM test with >> different classpaths I will use them - such approach is more convenient >> from TC point of view. There is not such ability at the moment, you are only able to specify additional JVM arguments in 'GridAbstractTest#additionalRemoteJvmArgs'. But, it is not very hard to implement it if needed, see 'IgniteNodeRunner'. We use such approach in our Compatibility Framework. BTW, it is possible to use the framework for your goals if prepared and installed artefacts in Maven local repository (mvn install) then call 'startGrid(name, ver)' with your prepared version, e.g. "2.8-SNAPSHOT". On Tue, Mar 5, 2019 at 2:48 PM Alexey Platonov wrote: > > Ivan, > Thank for your answer. I want to use binary builds explicitly because they > don't share jars of client code. If we can use multi JVM test with > different classpaths I will use them - such approach is more convenient > from TC point of view. > > P.S. I use Docker in my prototype just because it is easy for me and for > test cluster management - I can create docker-image with all configs and > scripts and run Ignite cluster in a separate network. > > On Tue, Mar 5, 2019 at 12:28 PM Павлухин Иван wrote: > > > Alexey, > > > > If problems arise in environments different from one where usual > > Ignite tests run then definitely it is a good idea to cover it. And > > testing other build kinds and in other environments is a good idea as > > well. But a particular problem with serialization and peer class > > loading is not clear for me. Why binary builds and Docker are needed > > there? Why multi JVM tests from Ignite testing framework cannot reveal > > mentioned problems? > > > > Ideally I think we should aggregate all failure reporting in common > > place. And for me TC bot is the best choice. Consequently it should be > > TeamCity most likely. > > > > But all in all I think we can give it a try according to you proposal > > and see how the things will go. > > > > вт, 5 мар. 2019 г. в 11:09, dmitrievanthony : > > > > > > Hi Alexey, > > > > > > I think it's a great idea. Travis + Docker is a very good and cheap > > > solution, so we could start with it. Regards the statistics, Travis > > allows > > > to check a last build status using a badge, so it also shouldn't be a > > > problem. > > > > > > Best regards, > > > Anton Dmitriev. > > > > > > > > > > > > -- > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > > > > > > > -- > > Best regards, > > Ivan Pavlukhin > > -- Best Regards, Vyacheslav D.
[jira] [Created] (IGNITE-11486) Support Automatic modules for ignite-zookeeper: Resolve issues with logging packages conflict
Dmitriy Pavlov created IGNITE-11486: --- Summary: Support Automatic modules for ignite-zookeeper: Resolve issues with logging packages conflict Key: IGNITE-11486 URL: https://issues.apache.org/jira/browse/IGNITE-11486 Project: Ignite Issue Type: Sub-task Reporter: Dmitriy Pavlov Usage of Ignite Zookeeper module in a modular environment failed {noformat} error: the unnamed module reads package org.apache.log4j from both slf4j.log4j12 and log4j {noformat} slf4j version is updated by the build system when Ignite Zookeeper is used. {noformat} +--- org.slf4j:slf4j-api:1.7.7 -> 1.7.25 +--- org.slf4j:slf4j-log4j12:1.7.7 -> 1.7.25 +--- org.slf4j:slf4j-api:1.7.25 \--- log4j:log4j:1.2.17 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11485) Support Automatic modules for ignite-hibernate: Package interference with hibernate core and hibernate for particular verion
Dmitriy Pavlov created IGNITE-11485: --- Summary: Support Automatic modules for ignite-hibernate: Package interference with hibernate core and hibernate for particular verion Key: IGNITE-11485 URL: https://issues.apache.org/jira/browse/IGNITE-11485 Project: Ignite Issue Type: Sub-task Reporter: Dmitriy Pavlov Hibernate 5.3: {noformat} error: the unnamed module reads package org.apache.ignite.cache.hibernate from both ignite.hibernate.5.3 and ignite.hibernate.core {noformat} Hibernate 5.1: {noformat} error: the unnamed module reads package org.apache.ignite.cache.hibernate from both ignite.hibernate.core and ignite.hibernate.5.1 {noformat} Hibernate 4.2: {noformat} error: the unnamed module reads package org.apache.ignite.cache.hibernate from both ignite.hibernate.core and ignite.hibernate.4.2 {noformat} Probably we should be classes from hibernate-core module to org.apache.ignite.cache.hibernate.core package, but this may affect public API Following class will be moved in case we change core package: - HibernateAccessStrategyAdapter - HibernateAccessStrategyFactory - HibernateCacheProxy - HibernateExceptionConverter - HibernateKeyTransformer - HibernateNonStrictAccessStrategy - HibernateReadOnlyAccessStrategy - HibernateReadWriteAccessStrategy - HibernateTransactionalAccessStrategy Alternative solution: Hibernate 5.3 is not yet released so we could move implementation for the newest version to its own subpackage. Formally it would not be a breaking change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSSION] Channel communication between nodes
Maxim, My humble opinion. If there is no convenient means to implement partition file sending today then we should introduce something. And keeping such facility private is much easier, because introduction of new public API is a significantly more complex task. пт, 1 мар. 2019 г. в 19:44, Maxim Muzafarov : > > Igniters, > > Apache Ignite has a very suitable messaging user interface [1] for > topic-based communication between nodes (or a specific group of nodes > within a cluster). The messaging functionality in Ignite is provided > via IgniteMessaging interface. It allows: > - send a message to a certain topic > - register local\remote listeners > > I really like this feature, but the disadvantage here is when the user > wants to transfer a large amount of binary data (e.g. files) between > nodes he must create a complex logic to wrap it into messages. I think > Ignite could have an interface e.g. IgniteChannels which will allow: > - register local\remote listeners for channel created\destroy events. > - create a channel connection (a wrapped socket channel) to a certain > node\group of nodes and the desired topic > > As another suitable case where such a feature can be applied is > internal usage for Apache Ignite needs. I can mention here the task of > cluster rebalancing by sending cache partition files between nodes. > I've posted a small description of it on the IEP-28 page [2]. > > > WDYT about it? > > --- > > API (assumed) > > IgniteChannels chnls = ignite0.channels(); > chnls.remoteListen(TOPIC.MY_TOPIC, new RemoteListener()); > > IgniteSocketChannel ch0 = chnls.channel(node, TOPIC.MY_TOPIC); > ch0.writeInt(bigFile.size()); > ch0.transferTo(FileChannel.open(bigFile.path(), StandardOpenOption.READ)) > > > /** */ > > private class RemoteListener > implements IgniteBiPredicate { > > @IgniteInstanceResource > private Ignite ignite; > > @Override public boolean apply( > UUID nodeId, > IgniteSocketChannel ch > ) { > int size = ch.readInt(); > ignite.fileSystem("base") > .create("bigfile.mpg") > .transferFrom(ch, size); > return true; > } > } > > > [1] https://apacheignite.readme.io/docs/messaging > [2] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-28%3A+Cluster+peer-2-peer+balancing#IEP-28:Clusterpeer-2-peerbalancing-CommunicationSpi -- Best regards, Ivan Pavlukhin
Batch updates in Ignite B+ tree.
Hi Igniters! I am working on implementing batch updates in PageMemory [1] to improve the performance of preloader, datastreamer and putAll. This task consists of two major related improvements: 1. Batch writing to PageMemory via FreeList - store several values at once to single memory page. 2. Batch updates in BPlusTree (for introducing invokeAll operation). I started to investigate the issue with batch updates in B+ tree, and it seems that the concurrent top-down balancing algorithm (TD) described in this paper [2] may be suitable for batch insertion of keys into Ignite B+ Tree. This algorithm uses a top-down balancing approach and allows to insert a batch of keys belonging to the leaves having the same parent. The negative point of top-down balancing approach is that the parent node is locked when performing insertion/splitting in child nodes. WDYT? Do you know other approaches for implementing batch updates in Ignite B+ Tree? [1] https://issues.apache.org/jira/browse/IGNITE-7935 [2] https://aaltodoc.aalto.fi/bitstream/handle/123456789/2168/isbn9512258951.pdf
[jira] [Created] (IGNITE-11484) Get rid of ForkJoinPool#commonPool usage for csystem critical tasks
Ivan Rakov created IGNITE-11484: --- Summary: Get rid of ForkJoinPool#commonPool usage for csystem critical tasks Key: IGNITE-11484 URL: https://issues.apache.org/jira/browse/IGNITE-11484 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Assignee: Ivan Rakov Fix For: 2.8 We use ForkJoinPool#commonPool for sorting checkpoint pages. This may backfire if common pool is already utilized in current JVM: checkpoint may wait for sorting for a long time, which in turn will cause user load dropdown. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[DISCUSSION] System cache persistence.
Hi Igniters, I'd like to start a discussion to avoid system cache usage with persistence. System cache is used in number of component internals. No one cares system cache can have stale data after grid restart as it wasn't impossible before 2.1. >From Ignite 2.1 version it is possible to be persistent that may affect components behavior: Compute, IGFS, ServiceGrid, DataStructures. What's wrong? 1. System cache persistent only if default region is configured as persistent (and vice versa). This is non-obvious and can causes unpredictable issues. 2. Any change in system cache requires distributed transaction that may causes a deadlock. We already avoid its usage in BinaryMarshaller and (almost) in recently reworked ServiceGrid due to the "deadlock" reason. What has been affected? and we can do? 3. IGFS AFAIK, IGFS support is going to be discontinued. There is nothing to do if IGFS will be removed in 3.0. 4. DataStrucutres Its looks broken (may be partially) as I see CacheDataStructuresManager uses on-heap maps for id->structure mapping for some structures. Look like it is safe to deprecate persistence for datastructures for now and rework them separately. Also, from user perspective, I'd expect datastructures persistence be configured in some separate place or in datastructure configuration. 5. Compute Let's rework this to use Metastore. 6. ServiceGrid. We can use Metastore and drop old-services later. 5. Some 3-rd party plugins may be affected. Of course, there is no compatibility guarantee if someone uses internal components, but the issue #1 can make user frustrated. We can prevent system cache being persistent. Do we really ever need System cache with persistence enabled? Thoughts? I've create a ticket for this [1]. [1] https://issues.apache.org/jira/browse/IGNITE-11483 -- Best regards, Andrey V. Mashenkov
[jira] [Created] (IGNITE-11483) Make system cache non-persistent and deprecate.
Andrew Mashenkov created IGNITE-11483: - Summary: Make system cache non-persistent and deprecate. Key: IGNITE-11483 URL: https://issues.apache.org/jira/browse/IGNITE-11483 Project: Ignite Issue Type: Bug Components: cache, compute, igfs, managed services Reporter: Andrew Mashenkov For now, persistent Default Region makes System cache persistent as well (same correct for non-persistent region). This behavior is non-obvious and it may causes unpredictable issues. We have number of components that uses system cache, some of them doesn't need system cache to be persistent, while other ok with it: * DataStructures - datastructures persistence should be configured in it's configuration. Moreover, some structures looks broken as CacheDataStructureManages uses in-memory maps. * Compute - most likely persistence not needed. * Services - metastore can be used instead. * Igfs - candidate to remove in 3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11482) MVCC: Error on TxLog initialization.
Roman Kondakov created IGNITE-11482: --- Summary: MVCC: Error on TxLog initialization. Key: IGNITE-11482 URL: https://issues.apache.org/jira/browse/IGNITE-11482 Project: Ignite Issue Type: Bug Components: mvcc Reporter: Roman Kondakov Fix For: 2.8 Some [tests remained flaky|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&buildTypeId=&tab=testDetails&testNameId=-935846982857542309&order=TEST_STATUS_DESC&itemsCount=50&branch_IgniteTests24Java8=__all_branches__] even after IGNITE-10582 has been fixed. It should be investigated again. {noformat} [21:44:14] (err) Failed to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=TransformCollectionView [true, false, false, false]]class org.apache.ignite.IgniteCheckedException: Failed to complete exchange process. at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.createExchangeException(GridDhtPartitionsExchangeFuture.java:3209) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendExchangeFailureMessage(GridDhtPartitionsExchangeFuture.java:3237) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3323) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:3304) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1519) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:852) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2920) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2769) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to initialize exchange locally [locNodeId=140a9253-f646-4691-9947-2b211a90] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1254) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:782) ... 4 more Caused by: java.lang.IllegalStateException: Failed to get page IO instance (page content is corrupted) at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:85) at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:97) at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.init(PagesList.java:181) at org.apache.ignite.internal.processors.cache.persistence.tree.reuse.ReuseListImpl.(ReuseListImpl.java:57) at org.apache.ignite.internal.processors.cache.mvcc.txlog.TxLog.init(TxLog.java:161) at org.apache.ignite.internal.processors.cache.mvcc.txlog.TxLog.(TxLog.java:87) at org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.ensureStarted(MvccProcessorImpl.java:302) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.createCacheContext(GridCacheProcessor.java:1552) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheContext(GridCacheProcessor.java:2325) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$null$6a5b31b9$1(GridCacheProcessor.java:2164) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCachesIfPossible$6(GridCacheProcessor.java:2104) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCaches$926b6886$1(GridCacheProcessor.java:2161) at org.apache.ignite.internal.util.IgniteUtils.lambda$null$1(IgniteUtils.java:10833) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.
[jira] [Created] (IGNITE-11481) [ML] Prototype of DatasetRow for Vectorizer
Alexey Platonov created IGNITE-11481: Summary: [ML] Prototype of DatasetRow for Vectorizer Key: IGNITE-11481 URL: https://issues.apache.org/jira/browse/IGNITE-11481 Project: Ignite Issue Type: Improvement Components: ml Reporter: Alexey Platonov Assignee: Alexey Platonov Vectorizer shold produce DatasetRow object that can contains columns with different types (double, string, etc.). It needs for preprocessors working with non-double values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11480) [ML] Use only Vectorizer API in DatasetTrainer API
Alexey Platonov created IGNITE-11480: Summary: [ML] Use only Vectorizer API in DatasetTrainer API Key: IGNITE-11480 URL: https://issues.apache.org/jira/browse/IGNITE-11480 Project: Ignite Issue Type: Improvement Components: ml Reporter: Alexey Platonov Assignee: Alexey Platonov Use only Vectorizer API in DatasetTrainer API to avoid problems with user classes serialization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11479) [ML] Use new vectorizer API in PartitionDatasetBuilders
Alexey Platonov created IGNITE-11479: Summary: [ML] Use new vectorizer API in PartitionDatasetBuilders Key: IGNITE-11479 URL: https://issues.apache.org/jira/browse/IGNITE-11479 Project: Ignite Issue Type: Improvement Components: ml Reporter: Alexey Platonov Assignee: Alexey Platonov We need to exclude current feature extractors from partition building API and replace old extractors with new vectorizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Tests for ML using binary builds
Ivan, Thank for your answer. I want to use binary builds explicitly because they don't share jars of client code. If we can use multi JVM test with different classpaths I will use them - such approach is more convenient from TC point of view. P.S. I use Docker in my prototype just because it is easy for me and for test cluster management - I can create docker-image with all configs and scripts and run Ignite cluster in a separate network. On Tue, Mar 5, 2019 at 12:28 PM Павлухин Иван wrote: > Alexey, > > If problems arise in environments different from one where usual > Ignite tests run then definitely it is a good idea to cover it. And > testing other build kinds and in other environments is a good idea as > well. But a particular problem with serialization and peer class > loading is not clear for me. Why binary builds and Docker are needed > there? Why multi JVM tests from Ignite testing framework cannot reveal > mentioned problems? > > Ideally I think we should aggregate all failure reporting in common > place. And for me TC bot is the best choice. Consequently it should be > TeamCity most likely. > > But all in all I think we can give it a try according to you proposal > and see how the things will go. > > вт, 5 мар. 2019 г. в 11:09, dmitrievanthony : > > > > Hi Alexey, > > > > I think it's a great idea. Travis + Docker is a very good and cheap > > solution, so we could start with it. Regards the statistics, Travis > allows > > to check a last build status using a badge, so it also shouldn't be a > > problem. > > > > Best regards, > > Anton Dmitriev. > > > > > > > > -- > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > > > -- > Best regards, > Ivan Pavlukhin >
[jira] [Created] (IGNITE-11478) [ML] Use new vectorizer API in Trainers
Alexey Platonov created IGNITE-11478: Summary: [ML] Use new vectorizer API in Trainers Key: IGNITE-11478 URL: https://issues.apache.org/jira/browse/IGNITE-11478 Project: Ignite Issue Type: Improvement Components: ml Reporter: Alexey Platonov Assignee: Alexey Platonov We should rewrite current trainers - exclude all "free"-feature/labels extractors from APIs and use new vectorizer in them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11476) [ML] Use new feature extraction API in examples
Alexey Platonov created IGNITE-11476: Summary: [ML] Use new feature extraction API in examples Key: IGNITE-11476 URL: https://issues.apache.org/jira/browse/IGNITE-11476 Project: Ignite Issue Type: Improvement Components: ml Reporter: Alexey Platonov Assignee: Alexey Platonov Introduce new feature/label extraction API to all examples. These examples should work on binary builds without sharing additional jars to libs directory (except ml-jar). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11477) [ML] Create tests for ML algorithms stability check against binary builds
Alexey Platonov created IGNITE-11477: Summary: [ML] Create tests for ML algorithms stability check against binary builds Key: IGNITE-11477 URL: https://issues.apache.org/jira/browse/IGNITE-11477 Project: Ignite Issue Type: Improvement Components: ml Reporter: Alexey Platonov Assignee: Alexey Platonov After new feature API creation we should create tests for ML algorithms stability check against binary builds (or on other JVMs without common classpath). All new algorithms should be delivered with such test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11475) [ML] Vectorizer API prototype with POC
Alexey Platonov created IGNITE-11475: Summary: [ML] Vectorizer API prototype with POC Key: IGNITE-11475 URL: https://issues.apache.org/jira/browse/IGNITE-11475 Project: Ignite Issue Type: Improvement Components: ml Reporter: Alexey Platonov Assignee: Alexey Platonov We need to create a prototype of API for features/labels extraction and introduce it to one or two already existing examples. This prototype should show that new API works on binary builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11474) Add possibility to run idle_verify in not idle cluster
Vladislav Pyatkov created IGNITE-11474: -- Summary: Add possibility to run idle_verify in not idle cluster Key: IGNITE-11474 URL: https://issues.apache.org/jira/browse/IGNITE-11474 Project: Ignite Issue Type: Improvement Reporter: Vladislav Pyatkov We are capable to make sort of READ_ONLY mode for blocking all data load. Using this mode we should to add specific parameter for idle_verify, which exclude data load and after cluster switched to READ_ONLY continue the task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Storing short/empty strings in Ignite
Hello! If you can modify your code to store nulls instead of empty strings, nulls seem to be much more compact. Regards, -- Ilya Kasnacheev вт, 5 мар. 2019 г. в 10:12, Valentin Kulichenko < valentin.kuliche...@gmail.com>: > Hey folks, > > While working with Ignite users, I keep seeing data models where a single > object (row) might contain many fields (100, 200, more...), and most of > them are strings. > > Correct me if I'm wrong, but per my understanding, for every such field we > store an integer value to represent its length. This is significant > overhead - with 200 fields we spend 800 bytes only for this. > > Now here is the catch: vast majority of those strings are actually empty or > very short (several chars), therefore we don't really need 4 bytes to their > length. > > My suggestions is to introduce another data type, e.g. STRING_SHORT, use it > for all strings that are 255 chars or less, and therefore use a single byte > to encode length. We can go even further, and also introduce STRING_EMPTY, > which obviously doesn't need any length information at all. > > What do you guys think? > > -Val >
[jira] [Created] (IGNITE-11473) SQL: check convert to ENUM type by functions CAST, CONVERT throws sane exception
Taras Ledkov created IGNITE-11473: - Summary: SQL: check convert to ENUM type by functions CAST, CONVERT throws sane exception Key: IGNITE-11473 URL: https://issues.apache.org/jira/browse/IGNITE-11473 Project: Ignite Issue Type: Improvement Components: sql Affects Versions: 2.7 Reporter: Taras Ledkov CAST and CONVERT functions have the bug at the H2. It is fixed at H2 1.4.198. We have to check that the functions throws sane exception after H@ is upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11472) SQL: throw sane exception for unsupported features
Taras Ledkov created IGNITE-11472: - Summary: SQL: throw sane exception for unsupported features Key: IGNITE-11472 URL: https://issues.apache.org/jira/browse/IGNITE-11472 Project: Ignite Issue Type: Improvement Components: sql Affects Versions: 2.7 Reporter: Taras Ledkov || Feature || Issue || Comments || | WITH RECURSIVE | IGNITE-7664 | can be fixed immediately | | DEFAULT value in the INSERT / MERGE | IGNITE-7664 | can be fixed immediately | | MEMORY, TEMPORARY, HIDDEN table types for CREATE TABLE | IGNITE-7664 | can be fixed immediately | | FIRST column position for ALTER TABLE ADD COLUMN | IGNITE-7664 | can be fixed immediately | | HELP / SHOW commands | IGNITE-7664 | can be fixed immediately | | GRANT / REVOKE commands | IGNITE-7664 | can be fixed immediately | | TIMESTAMP WITH TIME ZONE unsupported type | IGNITE-7664 | can be fixed immediately | | ENUM unsupported type | IGNITE-7664 | partially fixed, CAST and CONVERT function has the bug at the H2 fixed at 1.4.198 | | MERGE USING | IGNITE-11444 | cannot be fixed without patch to H2 | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Tests for ML using binary builds
Alexey, If problems arise in environments different from one where usual Ignite tests run then definitely it is a good idea to cover it. And testing other build kinds and in other environments is a good idea as well. But a particular problem with serialization and peer class loading is not clear for me. Why binary builds and Docker are needed there? Why multi JVM tests from Ignite testing framework cannot reveal mentioned problems? Ideally I think we should aggregate all failure reporting in common place. And for me TC bot is the best choice. Consequently it should be TeamCity most likely. But all in all I think we can give it a try according to you proposal and see how the things will go. вт, 5 мар. 2019 г. в 11:09, dmitrievanthony : > > Hi Alexey, > > I think it's a great idea. Travis + Docker is a very good and cheap > solution, so we could start with it. Regards the statistics, Travis allows > to check a last build status using a badge, so it also shouldn't be a > problem. > > Best regards, > Anton Dmitriev. > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ -- Best regards, Ivan Pavlukhin
Re: Tests for ML using binary builds
Hi Alexey, I think it's a great idea. Travis + Docker is a very good and cheap solution, so we could start with it. Regards the statistics, Travis allows to check a last build status using a badge, so it also shouldn't be a problem. Best regards, Anton Dmitriev. -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Re: Storing short/empty strings in Ignite
Hi Val, I would say that we do not need string length at all, because it can be derived from object footer (next field offset MINUS current field offset). It is not very good idea to implement proposed change in Apache Ignite 2.x because it is breaking and will add unnecessary complexity to already very complex binary infrastructure. Instead, it is better to review binary format in 3.0 and remove length's not only from Strings, but from other variable-length data types as well (arrays, decimals). On Tue, Mar 5, 2019 at 10:12 AM Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Hey folks, > > While working with Ignite users, I keep seeing data models where a single > object (row) might contain many fields (100, 200, more...), and most of > them are strings. > > Correct me if I'm wrong, but per my understanding, for every such field we > store an integer value to represent its length. This is significant > overhead - with 200 fields we spend 800 bytes only for this. > > Now here is the catch: vast majority of those strings are actually empty or > very short (several chars), therefore we don't really need 4 bytes to their > length. > > My suggestions is to introduce another data type, e.g. STRING_SHORT, use it > for all strings that are 255 chars or less, and therefore use a single byte > to encode length. We can go even further, and also introduce STRING_EMPTY, > which obviously doesn't need any length information at all. > > What do you guys think? > > -Val >