Re: solicting user stories of picolisp
Thomas, you have to read http://picolisp.com/5000/-2-I.html if you want to understand how it works completely. And the problem is of course that it's slow (regardless of where or what) and I don't really have the time to fix it :-) On Wed, Jul 21, 2010 at 9:40 AM, Alexander Burger wrote: > Hi Tomas, > > > > Such numbers are very variable, and difficult to predict. > > > > I'm not sure what you mean. How long does a simple grep over the > > article blob files take? That should serve as a rough indicator about > > worst case behaviour. > > I'm not talking about the timings of 'grep', but of the database. > > 'grep' is also subject to cache effects, but not as much as the picoLisp > database, where each process caches all objects once they have been > accessed. The whole query context is also cached, and related searches > continue in the same context. > > The timings are also difficult to predict because they depend very much > on the distribution of keys within the indexes, and which keys are > queried from each index in which combination. For example, if you ask > for a key combination that contains one or several keys that occur > _seldom_ in the db, the matching results are found almost immediately. > On the opposite end, searching for a combination of _common_ keys may > require relatively long to find the exact hits. > > Cheers, > - Alex > -- > UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe >
Re: solicting user stories of picolisp
Hi Tomas, > > Such numbers are very variable, and difficult to predict. > > I'm not sure what you mean. How long does a simple grep over the > article blob files take? That should serve as a rough indicator about > worst case behaviour. I'm not talking about the timings of 'grep', but of the database. 'grep' is also subject to cache effects, but not as much as the picoLisp database, where each process caches all objects once they have been accessed. The whole query context is also cached, and related searches continue in the same context. The timings are also difficult to predict because they depend very much on the distribution of keys within the indexes, and which keys are queried from each index in which combination. For example, if you ask for a key combination that contains one or several keys that occur _seldom_ in the db, the matching results are found almost immediately. On the opposite end, searching for a combination of _common_ keys may require relatively long to find the exact hits. Cheers, - Alex -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
Hi Henrik, > 1.) This is what each "remote" looks like by way of E/R: > > (class +WordCount +Entity) > (rel article =C2=A0 (+Ref +Number)) > (rel word =C2=A0 =C2=A0 =C2=A0(+Aux +Ref +Number) (article)) > (rel count =C2=A0 =C2=A0 (+Number)) > (rel picoStamp (+Ref +Number)) > > (dbs =C2=A0 > =C2=A0 (4 +WordCount) > =C2=A0 (3 (+WordCount word article picoStamp))) > I can't see how this works. In the search index I implemented was like this: ("picolisp" (5 . "file1") (4 . "file2") ...) ("google" (3 . "file1") (2 . "file3") ...) ... In your schema, I don't see how words are represented. > The bottleneck lies somewhere else than the actual lookup, So what is the problem then? ;-) > search since it returns the maximum 50 where picolisp only returns 8. Those are very long times considering there are so little results. > So the bottleneck is not the search itself but rather badly optimized > code that goes to work on the results later. Hard to say from what I know. > a way of extracting and specifying the interesting content from the h= arvested > feeds and links their articles point to > > Well the links you should be able to see in a per feed/category link map = (I noticed > it was broken hopefully it will work from now on) As per specifying conte= nt through > an Xpath what is it that you hope to gain by that? Give me a specific exa= mple > please. Most feeds don't contain actual text which I'm interested in but only a link. That means I have to click around too much. For example, the BBC News http://www.bbc.co.uk/news/ feed http://feeds.bbci.co.uk/news/rss.xml gets me onle short line and a link. I would like to see the link directly without clicking and also I don't want to see the whole page with all that redundant junk but only the text of the article. That text is inside of so I could specify xpath /html/body/div[2]/div[2]/div[2]/div/div[2]/div[2]/div[2]/div and the feed reader would automatically display just the portion of page I am interested in. > The main imperative for me to create the reader is the fact that the > Google Reader's GUI is horrible IMO and I'm happy with that part of > VizReader. That and I thought it would be an easy thing to start out > with in PL, but there is more to a feed reader than meets the > eye... If I had thought about making the application distributed right > from the start I would've been even happier. Sure, you have different motivation and way of reading news which doesn't match with my way. That's why I also suggested exporting a personal feed of collected feeds or sending that stuff by email. > In the beginning I also had an algorithm that compared articles for > automatic recommendations of similar content, that worked for a short That could be interesting but not something crucial I would need. Cheers, Tomas -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
Hi Alex, >> if I understand it well, you have all the articles locally on one >> machine. I wonder how long a simple grep over the article blobs would >> take? 22 seconds seems very long for any serious use. Have you > > Such numbers are very variable, and difficult to predict. I'm not sure what you mean. How long does a simple grep over the article blob files take? That should serve as a rough indicator about worst case behaviour. > For example, in the system mentioned in my previous mail, with > informations about millions of files distributed across several hosts, > searching for a given combination of e.g. file name pattern and > meta-informations like access times, sizes or md5 keys might take a > few seconds at the first access, but subsequent accesses > (i.e. continuing the search by scrolling down the list) showed almost > no delay at all. Hmm, I know too little about the actual system you talk about so I it's hard to make an educated opinion on this;-) Cheers, Tomas -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
Hi Tomas, > if I understand it well, you have all the articles locally on one > machine. I wonder how long a simple grep over the article blobs would > take? 22 seconds seems very long for any serious use. Have you Such numbers are very variable, and difficult to predict. For example, in the system mentioned in my previous mail, with informations about millions of files distributed across several hosts, searching for a given combination of e.g. file name pattern and meta-informations like access times, sizes or md5 keys might take a few seconds at the first access, but subsequent accesses (i.e. continuing the search by scrolling down the list) showed almost no delay at all. Cheers, - Alex -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
Hi Tomas. 1.) This is what each "remote" looks like by way of E/R: (class +WordCount +Entity) (rel article (+Ref +Number)) (rel word (+Aux +Ref +Number) (article)) (rel count (+Number)) (rel picoStamp (+Ref +Number)) (dbs (4 +WordCount) (3 (+WordCount word article picoStamp))) The bottleneck lies somewhere else than the actual lookup, here are some results I just got when probably using the application all by myself: "picolisp" => 1.97 s "google" => 7.22 s "obama" => 1.64 s (cached from prior search in RAM maybe?) "afghanistan" => 7.2 s Note the difference between google and picolisp we the search is performed in exactly the same way, the only difference being that the system needs to do post work after the results have been fetched and that is more work with the google search since it returns the maximum 50 where picolisp only returns 8. So the bottleneck is not the search itself but rather badly optimized code that goes to work on the results later. a way of extracting and specifying the interesting content from the > harvested feeds and links their articles point to > Well the links you should be able to see in a per feed/category link map (I noticed it was broken hopefully it will work from now on). As per specifying content through an Xpath what is it that you hope to gain by that? Give me a specific example please. The main imperative for me to create the reader is the fact that the Google Reader's GUI is horrible IMO and I'm happy with that part of VizReader. That and I thought it would be an easy thing to start out with in PL, but there is more to a feed reader than meets the eye... If I had thought about making the application distributed right from the start I would've been even happier. In the beginning I also had an algorithm that compared articles for automatic recommendations of similar content, that worked for a short time. If I were to currently apply it then it would take it roughly one year to compare all articles with each other. At one point I only let it compare a random subset but that resulted in (predictably) random quality too :-) Also, a lot of the finesse of the application is lost if you're not a Twitter user. The majority of the time I spend in it is simply checking my flow from time to time where most of the flow consists of Twitter posts since few "normal" feeds have implemented the pubsub protocol yet. Cheers, Henrik Sarvell On Tue, Jul 20, 2010 at 7:45 PM, Tomas Hlavaty wrote: > Hi Henrik, > >> Currently vizreader.com contains roughly 350 000 articles with a full >> word index (not partial). >> >> The word index is spread out on "virtual remotes" ie they are not >> really on remote machines, it's more a way to split up the physical >> database files on disk (I've written on how that is done on >> picolisp.com). I have no way of knowing how many words are mapped to >> their articles like this but most of the database is occupied by these >> indexes and it currently occupies some 30GB all in all. >> >> A search for the word "Google" just took 22 seconds. > > if I understand it well, you have all the articles locally on one > machine. I wonder how long a simple grep over the article blobs would > take? 22 seconds seems very long for any serious use. Have you > considered some state-of-the-art full text search engine, e.g. Lucene? > > Just curious, how did you create the word index? I implemented a simple > search functionality and word index for LogandCMS which you can try as > http://demo.cms.logand.com/search.html?s=sheep and I even keep the count > of every word in each page for ranking purposes but I haven't had a > chance to run into scaling problems like that. > >> No other part of the application is lagging significantly except for >> when listing new articles in my news category due to the fact that >> there are so many articles in that category. However the fetching >> method is highly inefficient as I first fetch all feeds in a category >> and then all their articles and then take (tail) on them to get the 50 >> newest for instance. Walking and then only loading the wanted articles >> to memory would of course be the best way and something I will look >> into. >> >> Why don't you try out the application yourself now that you know how >> big the database is and so on, if you use Google Reader you can just >> export your subscriptions as an OPML and import it into VizReader. > > I tried it and it looks interesting. What feature I would actually want > from such a system is a way of extracting and specifying the interesting > content from the harvested feeds and links their articles point to, > e.g. using an xpath expression. Then, either publishing it as per user > feed or sending that as email(s) so I could use my usual mail client to > read the news. > > Cheers, > > Tomas > -- > UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe >
Re: solicting user stories of picolisp
Hi Henrik, > Currently vizreader.com contains roughly 350 000 articles with a full > word index (not partial). > > The word index is spread out on "virtual remotes" ie they are not > really on remote machines, it's more a way to split up the physical > database files on disk (I've written on how that is done on > picolisp.com). I have no way of knowing how many words are mapped to > their articles like this but most of the database is occupied by these > indexes and it currently occupies some 30GB all in all. > > A search for the word "Google" just took 22 seconds. if I understand it well, you have all the articles locally on one machine. I wonder how long a simple grep over the article blobs would take? 22 seconds seems very long for any serious use. Have you considered some state-of-the-art full text search engine, e.g. Lucene? Just curious, how did you create the word index? I implemented a simple search functionality and word index for LogandCMS which you can try as http://demo.cms.logand.com/search.html?s=sheep and I even keep the count of every word in each page for ranking purposes but I haven't had a chance to run into scaling problems like that. > No other part of the application is lagging significantly except for > when listing new articles in my news category due to the fact that > there are so many articles in that category. However the fetching > method is highly inefficient as I first fetch all feeds in a category > and then all their articles and then take (tail) on them to get the 50 > newest for instance. Walking and then only loading the wanted articles > to memory would of course be the best way and something I will look > into. > > Why don't you try out the application yourself now that you know how > big the database is and so on, if you use Google Reader you can just > export your subscriptions as an OPML and import it into VizReader. I tried it and it looks interesting. What feature I would actually want from such a system is a way of extracting and specifying the interesting content from the harvested feeds and links their articles point to, e.g. using an xpath expression. Then, either publishing it as per user feed or sending that as email(s) so I could use my usual mail client to read the news. Cheers, Tomas -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
On 19.07.2010 18:46, Alexander Burger wrote: On Mon, Jul 19, 2010 at 04:39:08PM +0200, Mateusz Jan Przybylski wrote: > ``So this Lisp is a newfangled language, quite like Ruby, right?'' I'm deeply shocked! I'm not surprised. In 2010, people like wrapping yet another library in yet another framework. Until "the solution(tm)" is about 47 MB (=mega bloat) big - minimum. RAM and Disk is cheap nowadays... Programmers are admired for more LoC, not for less. Another point may be the orientation of educational entities towards certain "industry standards" and the vendors "academical pricing". Peter P.S.: even less people have heard of Forth. -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
El Mon, 19 Jul 2010 18:46:55 +0200 Alexander Burger escribi=C3=B3: > On Mon, Jul 19, 2010 at 04:39:08PM +0200, Mateusz Jan Przybylski > wrote: > > The lecturer never heard of Lisp before; after listening to my > > explanations he wrapped it up with: > > ``So this Lisp is a newfangled language, quite like Ruby, right?'' > > Geez... >=20 > I'm deeply shocked! Lisp Never gets old -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
Hi Alex, thank you for these. very comforting. and thank you for picolisp! my thanks also to Mateusz and Henrik. guys, keep em coming! i am deeply enjoying this. i hope the rest are too. On Tue, Jul 20, 2010 at 12:46 AM, Alexander Burger wrote: > Hi Edwin, > >> if anybody would be so kind to share how they have experienced running >> picolisp in production. fine, not just stories, but also numbers. how > > Since we are using PicoLisp in production since 1986, I could perhaps > tell a lot if I should remember it all. Concerning numbers, we have > several customers running many years. Our oldest customer using the > current system has the system running since January 2001 without > interruption. The database of that customer is not very big, though (430 > Megabytes, 277723 objects). > > >> big have your databases grown? how fast has the picolisp appserver > > The biggest databases we had for another project, for systems indexing > and classifying filer systems of big customers (I should not tell names > here). There we had distributed databases (up to 70 interconnected > databases) with nearly one billion objects. The larger databases within > such a system were around 100-200 GB, more typical was around 20-80 GB. > > >> delivered your queries? did you ever get to see the picolisp database > > I have never directly measured that speed, that wasn't an issue as all > those apps were not oriented for especially many clients. In this > context perhaps the results of the database contest in the german c't > magazine (http://www.heise.de/kiosk/archiv/ct/2006/13/190) are relevant, > where PicoLisp on the second price. > > >> recover from unforeseen system errors like crashes from the operating >> system and so? > > Fortunately, not yet. We tested such situations, however (pulling the > plug), and normal power outages happened from time to time whithout any > data loss so far. > > >> can you please share your stories? would love to hear them. > > I'm afraid I'm not a good story-teller, so I hope the above fragments > are useful ;-) > > Cheers, > - Alex > -- > UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe > -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
On Mon, Jul 19, 2010 at 10:39 PM, Mateusz Jan Przybylski wrote: > However, a (quick'n'dirty) HTML & HTTP application in PicoLisp got me a v= ery > good grade for `Programming languages & paradigms' course at Uni. > > The lecturer never heard of Lisp before; after listening to my explanatio= ns he > wrapped it up with: > =A0``So this Lisp is a newfangled language, quite like Ruby, right?'' > Geez... i really hope you were kidding. > > > -- > Mateusz Jan Przybylski > > > ``One can't proceed from the informal to the formal by formal means.'' > -- > UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe > -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
Hi Edwin, > if anybody would be so kind to share how they have experienced running > picolisp in production. fine, not just stories, but also numbers. how Since we are using PicoLisp in production since 1986, I could perhaps tell a lot if I should remember it all. Concerning numbers, we have several customers running many years. Our oldest customer using the current system has the system running since January 2001 without interruption. The database of that customer is not very big, though (430 Megabytes, 277723 objects). > big have your databases grown? how fast has the picolisp appserver The biggest databases we had for another project, for systems indexing and classifying filer systems of big customers (I should not tell names here). There we had distributed databases (up to 70 interconnected databases) with nearly one billion objects. The larger databases within such a system were around 100-200 GB, more typical was around 20-80 GB. > delivered your queries? did you ever get to see the picolisp database I have never directly measured that speed, that wasn't an issue as all those apps were not oriented for especially many clients. In this context perhaps the results of the database contest in the german c't magazine (http://www.heise.de/kiosk/archiv/ct/2006/13/190) are relevant, where PicoLisp on the second price. > recover from unforeseen system errors like crashes from the operating > system and so? Fortunately, not yet. We tested such situations, however (pulling the plug), and normal power outages happened from time to time whithout any data loss so far. > can you please share your stories? would love to hear them. I'm afraid I'm not a good story-teller, so I hope the above fragments are useful ;-) Cheers, - Alex -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
On Mon, Jul 19, 2010 at 04:39:08PM +0200, Mateusz Jan Przybylski wrote: > The lecturer never heard of Lisp before; after listening to my explanations > he > wrapped it up with: > ``So this Lisp is a newfangled language, quite like Ruby, right?'' > Geez... I'm deeply shocked! -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
Currently vizreader.com contains roughly 350 000 articles with a full word index (not partial). The word index is spread out on "virtual remotes" ie they are not really on remote machines, it's more a way to split up the physical database files on disk (I've written on how that is done on picolisp.com). I have no way of knowing how many words are mapped to their articles like this but most of the database is occupied by these indexes and it currently occupies some 30GB all in all. A search for the word "Google" just took 22 seconds. No other part of the application is lagging significantly except for when listing new articles in my news category due to the fact that there are so many articles in that category. However the fetching method is highly inefficient as I first fetch all feeds in a category and then all their articles and then take (tail) on them to get the 50 newest for instance. Walking and then only loading the wanted articles to memory would of course be the best way and something I will look into. Why don't you try out the application yourself now that you know how big the database is and so on, if you use Google Reader you can just export your subscriptions as an OPML and import it into VizReader. Cheers, Henrik Sarvell On Mon, Jul 19, 2010 at 4:39 PM, Mateusz Jan Przybylski wrote: > On Monday 19 July 2010 16:23:27 you wrote: >> if anybody would be so kind to share how they have experienced running >> picolisp in production. > > None yet, unfortunately. > > However, a (quick'n'dirty) HTML & HTTP application in PicoLisp got me a v= ery > good grade for `Programming languages & paradigms' course at Uni. > > The lecturer never heard of Lisp before; after listening to my explanatio= ns he > wrapped it up with: > =A0``So this Lisp is a newfangled language, quite like Ruby, right?'' > Geez... > > > -- > Mateusz Jan Przybylski > > > ``One can't proceed from the informal to the formal by formal means.'' > -- > UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe > -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: solicting user stories of picolisp
On Monday 19 July 2010 16:23:27 you wrote: > if anybody would be so kind to share how they have experienced running > picolisp in production. None yet, unfortunately. However, a (quick'n'dirty) HTML & HTTP application in PicoLisp got me a very good grade for `Programming languages & paradigms' course at Uni. The lecturer never heard of Lisp before; after listening to my explanations he wrapped it up with: ``So this Lisp is a newfangled language, quite like Ruby, right?'' Geez... -- Mateusz Jan Przybylski ``One can't proceed from the informal to the formal by formal means.'' -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe