Re: [HACKERS] There's random access and then there's random access
On Wed, Dec 05, 2007 at 01:49:20AM +, Gregory Stark wrote: Regardless of what mechanism is used and who is responsible for doing it someone is going to have to figure out which blocks are specifically interesting to prefetch. Bitmap index scans happen to be the easiest since we've already built up a list of blocks we plan to read. Somehow that information has to be pushed to the storage manager to be acted upon. Normal index scans are an even more interesting case but I'm not sure how hard it would be to get that information. It may only be convenient to get the blocks from the last leaf page we looked at, for example. I guess it depends on how you're looking at things... I'm thinking more in terms of telling the OS to fetch stuff we're pretty sure we're going to need while we get on with other work. There's a lot of cases where you know that besides just a bitmap scan (though perhaps code-wise bitmap scan is easier to implement...) For a seqscan, we'd want to be reading some number of blocks ahead of where we're at right now. Ditto for index pages on an index scan. In addition, when we're scanning the index, we'd definitely want to issue heap page requests asynchronously, since that gives the filesystem, etc a better shot at re-ordering the reads to improve performance. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 pgpSjafYi0G3G.pgp Description: PGP signature
Re: [HACKERS] There's random access and then there's random access
Gregory Stark [EMAIL PROTECTED] writes: I think this will be easiest to do for bitmap index scans. Since we gather up all the pages we'll need before starting the heap scan we can easily skim through them, issue posix_fadvises for at least a certain number ahead of the actual read point and then proceed with the rest of the scan unchanged. I've written up a simple test implementation of prefetching using posix_fadvise(). Here are some nice results on a query accessing 1,000 records from a 10G table with 300 million records: postgres=# set preread_pages=0; explain analyze select (select count(*) from h where h = any (x)) from (select random_array(1000,1,3) as x)x; SET QUERY PLAN Subquery Scan x (cost=0.00..115.69 rows=1 width=32) (actual time=6069.505..6069.509 rows=1 loops=1) - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.061 rows=1 loops=1) SubPlan - Aggregate (cost=115.66..115.67 rows=1 width=0) (actual time=6069.425..6069.426 rows=1 loops=1) - Bitmap Heap Scan on h (cost=75.49..115.63 rows=10 width=0) (actual time=3543.107..6068.335 rows=1000 loops=1) Recheck Cond: (h = ANY ($0)) - Bitmap Index Scan on hi (cost=0.00..75.49 rows=10 width=0) (actual time=3542.220..3542.220 rows=1000 loops=1) Index Cond: (h = ANY ($0)) Total runtime: 6069.632 ms (9 rows) postgres=# set preread_pages=300; explain analyze select (select count(*) from h where h = any (x)) from (select random_array(1000,1,3) as x)x; SET QUERY PLAN Subquery Scan x (cost=0.00..115.69 rows=1 width=32) (actual time=3945.602..3945.607 rows=1 loops=1) - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.060..0.064 rows=1 loops=1) SubPlan - Aggregate (cost=115.66..115.67 rows=1 width=0) (actual time=3945.520..3945.521 rows=1 loops=1) - Bitmap Heap Scan on h (cost=75.49..115.63 rows=10 width=0) (actual time=3505.546..3944.817 rows=1000 loops=1) Recheck Cond: (h = ANY ($0)) - Bitmap Index Scan on hi (cost=0.00..75.49 rows=10 width=0) (actual time=3452.759..3452.759 rows=1000 loops=1) Index Cond: (h = ANY ($0)) Total runtime: 3945.730 ms (9 rows) Note that while the query itself is only 50% faster the bitmap heap scan specifically is actually 575% faster than without readahead. It would be nice to optimize the bitmap index scan as well but that will be a bit trickier and it probably won't be able to cover as many pages. As a result it probably won't be a 5x speedup like the heap scan. Also, this is with a fairly aggressive readahead which only makes sense for queries that look a lot like this and will read all the tuples. For a more general solution I think it would make sense to water down the performance a bit in exchange for some protection against doing unnecessary I/O in cases where the query isn't actually going to read all the tuples. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] There's random access and then there's random access
Gregory Stark wrote: I could swear this has been discussed in the past too. I seem to recall Luke disparaging Postgres on the same basis but proposing an immensely complicated solution. posix_fadvise or using libaio in a simplistic fashion as a kind of fadvise would be fairly lightweight way to get most of the benefit of the more complex solutions. It has been on the TODO list for a long time: * Do async I/O for faster random read-ahead of data Async I/O allows multiple I/O requests to be sent to the disk with results coming back asynchronously. http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php I have added your thread URL to this. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] There's random access and then there's random access
Gregory Stark [EMAIL PROTECTED] writes: The two interfaces I'm aware of for this are posix_fadvise() and libaio. I've run tests with a synthetic benchmark which generates a large file then reads a random selection of blocks from within it using either synchronous reads like we do now or either of those interfaces. I saw impressive speed gains on a machine with only three drives in a raid array. I did this a while ago so I don't have the results handy. I'll rerun the tests again and post them. Here's the results of running the synthetic test program on a 3-drive raid array. Note that the results *exceeded* the 3x speedup I expected, even for ordered blocks. Either the drive (or the OS) is capable of reordering the block requests better than the offset into the file would appear or some other effect is kicking in. The test is with an 8GB file, picking 8,192 random 8k blocks from within it. The pink diamonds represent the bandwidth obtained if the random blocks are sorted before fetching (like a bitmap indexscan) and the blue if they're unsorted. inline: test-pfa-results.png for i in 1 2 3 4 5 6 7 8 16 24 32 64 96 128 192 256 384 512 768 1024 2048 4096 8192 ; do ./a.out pfa2 /mnt/data/test.data 8388608 8192 $i 8192 false ; done test-pfa-results for i in 1 2 3 4 5 6 7 8 16 24 32 64 96 128 192 256 384 512 768 1024 2048 4096 8192 ; do ./a.out pfa2 /mnt/data/test.data 8388608 8192 $i 8192 true ; done test-pfa-results test-pfa-results Description: Binary data #define _XOPEN_SOURCE 600 #define _GNU_SOURCE #define _FILE_OFFSET_BITS 64 #define __EXTENSIONS__ #include sys/types.h #include unistd.h #include sys/stat.h #include fcntl.h #include sys/time.h #include time.h #include sys/fcntl.h #include errno.h #include aio.h #include stdio.h #include stdlib.h #include string.h #if LINUX #define HAVE_POSIX_MEMALIGN #else #include malloc.h #endif #if defined(POSIX_FADV_DONTNEED) defined(POSIX_FADV_WILLNEED) #define HAVE_PFA #endif #if defined(DIRECTIO_ON) defined(DIRECTIO_OFF) #define HAVE_DIRECTIO #define WITH_DIO w/directio #elif defined(O_DIRECT) #define WITH_DIO w/O_DIRECT #define PLUS_DIO +O_DIRECT #else #define WITH_DIO with buffered i/o #define PLUS_DIO #endif enum method { seek, pfa, pfa2, aio} method; static unsigned work_set_size, block_size; static void seek_scan(int fd, off_t *offset_list, unsigned noffsets); #ifdef HAVE_PFA static void pfa_scan(int fd, off_t *offset_list, unsigned noffsets); static void pfa_scan2(int fd, off_t *offset_list, unsigned noffsets); #endif static void aio_scan(int fd, off_t *offset_list, unsigned noffsets); static void gen_buf(off_t offset, char *buf); static void check_buf(off_t offset, const char *read_buf); /* qsort helper */ static int cmp(const void *arg1, const void *arg2) { off_t a = *(off_t*)arg1; off_t b = *(off_t*)arg2; if (a b) return -1; else if (a b) return 1; else return 0; } int main(int argc, char *argv[]) { off_t file_size, sample_size, *offset_list, existing_size; unsigned noffsets, sorted_offsets; const char *file_name; int fd; struct timeval before, after; unsigned i; double elapsed; if (argc = 1) method = seek; #ifdef HAVE_PFA else if (!strcmp(argv[1], pfa)) method = pfa; else if (!strcmp(argv[1], pfa2)) method = pfa2; #endif else if (!strcmp(argv[1], aio)) method = aio; else if (!strcmp(argv[1], seek)) method = seek; else { fprintf(stderr, usage: ./a.out [seek|pfa|pfa2|aio] [filename] [file kB] [sample blocks] [concurrent blocks] [block bytes]\n); exit(1); } if (argc = 2) file_name = test.data; else file_name = argv[2]; if (argc = 3) file_size = 1024*1024; else file_size = (off_t)1024*atoi(argv[3]); if (argc = 4) sample_size = 1; else sample_size = atoi(argv[4]); if (argc = 5) work_set_size = 128; else work_set_size = atoi(argv[5]); if (argc = 6) block_size = 8192; else block_size = atoi(argv[6]); if (argc = 7) if (*argv[7] == 't' || atoi(argv[6])) sorted_offsets = 1; if (block_size = 0) { fprintf(stderr, bad block size %u\n, block_size); exit(1); } file_size = file_size/block_size*block_size; if (file_size = 0 || sample_size = 0) { fprintf(stderr, bad file/sample size %llu/%llu\n, (long long unsigned)file_size, (long long unsigned)sample_size); exit(1); } fprintf(stderr, reading random %lu %s %uk blocks out of %luM using %s (working set %u)\n, (unsigned long) sample_size, sorted_offsets ? sorted : unordered, block_size, (unsigned long) (file_size/1024/1024), (method == seek ? lseek only : method == pfa ? posix_fadvise : method == pfa2 ? posix_fadvise v2 : method == aio ? aio_read WITH_DIO: ???), work_set_size); fd = open(file_name, O_RDWR | O_CREAT, 0644); if (fd 0) { perror(open); exit(1); } existing_size = lseek(fd, 0, SEEK_END); if (existing_size ==
Re: [HACKERS] There's random access and then there's random access
Gregory Stark wrote: Gregory Stark [EMAIL PROTECTED] writes The two interfaces I'm aware of for this are posix_fadvise() and libaio. I've run tests with a synthetic benchmark which generates a large file then reads a random selection of blocks from within it using either synchronous reads like we do now or either of those interfaces. I saw impressive speed gains on a machine with only three drives in a raid array. I did this a while ago so I don't have the results handy. I'll rerun the tests again and post them. Here's the results of running the synthetic test program on a 3-drive raid array. Note that the results *exceeded* the 3x speedup I expected, even for ordered blocks. Either the drive (or the OS) is capable of reordering the block requests better than the offset into the file would appear or some other effect is kicking in. The test is with an 8GB file, picking 8,192 random 8k blocks from within it. The pink diamonds represent the bandwidth obtained if the random blocks are sorted before fetching (like a bitmap indexscan) and the blue if they're unsorted. I didn't see exceeded 3X in the graph. But I certainly see 2X+ for most of the graphic, and ~3X for very small reads. Possibly, it is avoiding unnecessary read-ahead at the drive or OS levels? I think we expected to see raw reads significantly faster for the single process case. I thought your simulation was going to involve a tweak to PostgreSQL on a real query to see what overall effect it would have on typical queries and on special queries like Matthew's. Are you able to tweak the index scan and bitmap scan methods to do posfix_fadvise() before running? Even if it doesn't do anything more intelligent such as you described in another post? Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] There's random access and then there's random access
Mark Mielke [EMAIL PROTECTED] writes: I didn't see exceeded 3X in the graph. But I certainly see 2X+ for most of the graphic, and ~3X for very small reads. Possibly, it is avoiding unnecessary read-ahead at the drive or OS levels? Then you're misreading the graph -- which would be my fault, my picture was only worth 500 words then. Ordered scans (simulating a bitmap index scan) is getting 3.8 MB/s a prefetch of 1 (effectively no prefetch) and 14.1 MB/s with a prefetch of 64. That's a factor of 3.7! Unordered scans have an even larger effect (unsurprisingly) going from 1.6MB/s to 8.9MB/s or a factor of 5.6. Another surprising bit is that prefetching 8192 blocks, ie, the whole set, doesn't erase the advantage of the presorting. I would have expected that when prefetching all the blocks it would make little difference what order we feed them to posix_fadvise. I guess since the all the blocks which have had i/o initiated on them haven't been read in yet when we reach the first real read() that forces some blocks to be read out-of-order. I'm surprised it makes nearly a 2x speed difference though. I think we expected to see raw reads significantly faster for the single process case. I thought your simulation was going to involve a tweak to PostgreSQL on a real query to see what overall effect it would have on typical queries and on special queries like Matthew's. Are you able to tweak the index scan and bitmap scan methods to do posfix_fadvise() before running? Even if it doesn't do anything more intelligent such as you described in another post? That's the next step. I'm debating between two ways to structure the code right now. Do I put the logic to peek ahead in nodeBitmapHeapScan to read ahead and remember the info seen or in tidbitmap with an new api function which is only really useful for this one use case. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] There's random access and then there's random access
Mark Mielke [EMAIL PROTECTED] writes: I didn't see exceeded 3X in the graph. But I certainly see 2X+ for most of the graphic, and ~3X for very small reads. Possibly, it is avoiding unnecessary read-ahead at the drive or OS levels? Ahh! I think I see how you're misreading it now. You're comparing the pink with the blue. That's not what's going on. The X axis (which is logarithmic) is the degree of prefetch. So 1 means it's prefetching one block then immediately reading it -- effectively not prefetching at all. 1 (actually the last data point is 8192) is completely prefetching the whole data set. The two data sets are the same tests run with ordered (ie, like a bitmap scan) or unordered (ie, like a regular index scan) blocks. Unsurprisingly ordered sets read faster with low levels of prefetch and both get faster the more blocks you prefetch. What's surprising to me is that the advantage of the ordered blocks doesn't diminish with prefetching. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] There's random access and then there's random access
On Dec 4, 2007, at 1:42 PM, Gregory Stark wrote: I'm debating between two ways to structure the code right now. Do I put the logic to peek ahead in nodeBitmapHeapScan to read ahead and remember the info seen or in tidbitmap with an new api function which is only really useful for this one use case. There has been discussion of having a bg_reader, similar to the bg_writer. Perhaps that would be better than creating something that's specific to bitmap scans? Also, I would expect to see a speed improvement even on single drives if the OS is actually issuing multiple requests to the drive. Doing so allows the drive to optimally order all of the reads. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] There's random access and then there's random access
Decibel! [EMAIL PROTECTED] writes: On Dec 4, 2007, at 1:42 PM, Gregory Stark wrote: I'm debating between two ways to structure the code right now. Do I put the logic to peek ahead in nodeBitmapHeapScan to read ahead and remember the info seen or in tidbitmap with an new api function which is only really useful for this one use case. There has been discussion of having a bg_reader, similar to the bg_writer. Perhaps that would be better than creating something that's specific to bitmap scans? Has there? AFAICT a bg_reader only makes sense if we move to a direct-i/o situation where we're responsible for read-ahead and have to read into shared buffers any blocks we decide are interesting to readahead. Regardless of what mechanism is used and who is responsible for doing it someone is going to have to figure out which blocks are specifically interesting to prefetch. Bitmap index scans happen to be the easiest since we've already built up a list of blocks we plan to read. Somehow that information has to be pushed to the storage manager to be acted upon. Normal index scans are an even more interesting case but I'm not sure how hard it would be to get that information. It may only be convenient to get the blocks from the last leaf page we looked at, for example. Also, I would expect to see a speed improvement even on single drives if the OS is actually issuing multiple requests to the drive. Doing so allows the drive to optimally order all of the reads. Sure, but a 2x speed improvement? That's way more than I was expecting -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] There's random access and then there's random access
Recently there was a post on -performance about a particular case where Postgres doesn't make very good use of the I/O system. This is when you try to fetch many records spread throughout a table in random order. http://archives.postgresql.org/pgsql-performance/2007-12/msg5.php Currently Postgres reads each record as needed and processes it. This means even if you have a large raid array you get no benefit from it since you're limited by the latency of each request. The raid array might let you run more queries simultaneously but not improve the response time of a single query. But in most cases, as in the use case in the email message above, we can do substantially better. We can arrange to issue all the read requests without blocking, then process the blocks either as they come in or in the order we want blocking until they're actually satisfied. Handling them as they come in is in theory more efficient but either way I would expect to see more or less a speedup nearly equal to the number of drives in the array. Even on a single drive it should slightly improve performance as it allows us to do some CPU work while the I/O requests are pending. The two interfaces I'm aware of for this are posix_fadvise() and libaio. I've run tests with a synthetic benchmark which generates a large file then reads a random selection of blocks from within it using either synchronous reads like we do now or either of those interfaces. I saw impressive speed gains on a machine with only three drives in a raid array. I did this a while ago so I don't have the results handy. I'll rerun the tests again and post them. I think this will be easiest to do for bitmap index scans. Since we gather up all the pages we'll need before starting the heap scan we can easily skim through them, issue posix_fadvises for at least a certain number ahead of the actual read point and then proceed with the rest of the scan unchanged. For regular index scans I'm not sure how easy it will be to beat them into doing this but I suspect it might not be too hard to at least prefetch the tuples in the page-at-a-time buffer. That's probably safer too since for such scans we're more likely to not actually read all the results anyways; there could be a limit or something else above which will stop us. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] There's random access and then there's random access
On 12/2/07, Gregory Stark [EMAIL PROTECTED] wrote: The two interfaces I'm aware of for this are posix_fadvise() and libaio. I've run tests with a synthetic benchmark which generates a large file then reads a random selection of blocks from within it using either synchronous reads like we do now or either of those interfaces. I saw impressive speed gains on a machine with only three drives in a raid array. I did this a while ago so I don't have the results handy. I'll rerun the tests again and post them. The issue I've always seen raised with asynchronous I/O is portability--apparently some platforms PG runs on don't support it (or not well). AIUI Linux actually still has a fairly crappy implementation of AIO--glibc starts threads to do the I/O and then tracks when they finish. Not absolutely horrible, but a nice way to suddenly have a threaded backend when you're not expecting one. -Doug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] There's random access and then there's random access
Douglas McNaught [EMAIL PROTECTED] writes: On 12/2/07, Gregory Stark [EMAIL PROTECTED] wrote: The two interfaces I'm aware of for this are posix_fadvise() and libaio. I've run tests with a synthetic benchmark which generates a large file then reads a random selection of blocks from within it using either synchronous reads like we do now or either of those interfaces. I saw impressive speed gains on a machine with only three drives in a raid array. I did this a while ago so I don't have the results handy. I'll rerun the tests again and post them. The issue I've always seen raised with asynchronous I/O is portability--apparently some platforms PG runs on don't support it (or not well). AIUI Linux actually still has a fairly crappy implementation of AIO--glibc starts threads to do the I/O and then tracks when they finish. Not absolutely horrible, but a nice way to suddenly have a threaded backend when you're not expecting one. In the tests I ran Linux's posix_fadvise worked well and that's the simpler interface for us to adapt to anyways. On Solaris there was no posix_fadvise but libaio worked instead. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] There's random access and then there's random access
Gregory Stark [EMAIL PROTECTED] writes: Recently there was a post on -performance about a particular case where Postgres doesn't make very good use of the I/O system. This is when you try to fetch many records spread throughout a table in random order. http://archives.postgresql.org/pgsql-performance/2007-12/msg5.php Since the OP in that thread has still supplied zero information (no EXPLAIN, let alone ANALYZE, and no version info), it's pure guesswork as to what his problem is. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] There's random access and then there's random access
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Recently there was a post on -performance about a particular case where Postgres doesn't make very good use of the I/O system. This is when you try to fetch many records spread throughout a table in random order. http://archives.postgresql.org/pgsql-performance/2007-12/msg5.php Since the OP in that thread has still supplied zero information (no EXPLAIN, let alone ANALYZE, and no version info), it's pure guesswork as to what his problem is. Nonetheless, asynchronous IO will reap performance improvements. Wether a specific case would indeed benefit from it is imho irrelevant, if other cases can indeed be found, where performance would be improved significantly. I experimented with a raid of 8 solid state devices, and found that the blocks/second for random access improved signifacantly with the number of processes doing the access. I actually wanted to use said raid as a tablespace for postgresql, and alas, the speedup did not depend on the number of drives in the raid, which is very unfortunate. I still got the lower solid-state latency, but the raid did not help. Regards, Jens-Wolfhard Schicke -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHUwM7zhchXT4RR5ARAsziAJ9qm/c8NuaJ+HqoJo9Ritb2t92fRwCgnF9J r5YU/Fa0mNYG7YXed4QW7K4= =Mvyj -END PGP SIGNATURE- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] There's random access and then there's random access
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Recently there was a post on -performance about a particular case where Postgres doesn't make very good use of the I/O system. This is when you try to fetch many records spread throughout a table in random order. http://archives.postgresql.org/pgsql-performance/2007-12/msg5.php Since the OP in that thread has still supplied zero information (no EXPLAIN, let alone ANALYZE, and no version info), it's pure guesswork as to what his problem is. Sure, consider it a hypothetical which needs further experimentation. That's part of why I ran (and will rerun) those synthetic benchmarks to test whether posix_fadvise() actually speeds up subsequent reads on a few operating systems. Surely any proposed patch will have to prove itself on empirical grounds too. I could swear this has been discussed in the past too. I seem to recall Luke disparaging Postgres on the same basis but proposing an immensely complicated solution. posix_fadvise or using libaio in a simplistic fashion as a kind of fadvise would be fairly lightweight way to get most of the benefit of the more complex solutions. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly