Re: solicting user stories of picolisp

2010-07-21 Thread Tomas Hlavaty
Hi Alex,


 if I understand it well, you have all the articles locally on one
 machine.  I wonder how long a simple grep over the article blobs would
 take?  22 seconds seems very long for any serious use.  Have you

 Such numbers are very variable, and difficult to predict.

I'm not sure what you mean.  How long does a simple grep over the
article blob files take?  That should serve as a rough indicator about
worst case behaviour.

 For example, in the system mentioned in my previous mail, with
 informations about millions of files distributed across several hosts,
 searching for a given combination of e.g. file name pattern and
 meta-informations like access times, sizes or md5 keys might take a
 few seconds at the first access, but subsequent accesses
 (i.e. continuing the search by scrolling down the list) showed almost
 no delay at all.

Hmm, I know too little about the actual system you talk about so I it's
hard to make an educated opinion on this;-) 

Cheers,

Tomas
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: solicting user stories of picolisp

2010-07-21 Thread Alexander Burger
Hi Tomas,

  Such numbers are very variable, and difficult to predict.
 
 I'm not sure what you mean.  How long does a simple grep over the
 article blob files take?  That should serve as a rough indicator about
 worst case behaviour.

I'm not talking about the timings of 'grep', but of the database.

'grep' is also subject to cache effects, but not as much as the picoLisp
database, where each process caches all objects once they have been
accessed. The whole query context is also cached, and related searches
continue in the same context.

The timings are also difficult to predict because they depend very much
on the distribution of keys within the indexes, and which keys are
queried from each index in which combination. For example, if you ask
for a key combination that contains one or several keys that occur
_seldom_ in the db, the matching results are found almost immediately.
On the opposite end, searching for a combination of _common_ keys may
require relatively long to find the exact hits.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: solicting user stories of picolisp

2010-07-21 Thread Tomas Hlavaty
Hi Henrik,

 1.) This is what each remote looks like by way of E/R:

 (class +WordCount +Entity)
 (rel article =C2=A0 (+Ref +Number))
 (rel word =C2=A0 =C2=A0 =C2=A0(+Aux +Ref +Number) (article))
 (rel count =C2=A0 =C2=A0 (+Number))
 (rel picoStamp (+Ref +Number))

 (dbs =C2=A0
 =C2=A0 (4 +WordCount)
 =C2=A0 (3 (+WordCount word article picoStamp)))


I can't see how this works.  In the search index I implemented was like
this:

   (picolisp (5 . file1) (4 . file2) ...)
   (google (3 . file1) (2 . file3) ...)
   ...

In your schema, I don't see how words are represented.

 The bottleneck lies somewhere else than the actual lookup,

So what is the problem then? ;-)

 search since it returns the maximum 50 where picolisp only returns 8.

Those are very long times considering there are so little results.

 So the bottleneck is not the search itself but rather badly optimized
 code that goes to work on the results later.

Hard to say from what I know.

 a way of extracting and specifying the interesting content from the h=
arvested
 feeds and links their articles point to

 Well the links you should be able to see in a per feed/category link map =
(I noticed
 it was broken hopefully it will work from now on) As per specifying conte=
nt through
 an Xpath what is it that you hope to gain by that? Give me a specific exa=
mple
 please.

Most feeds don't contain actual text which I'm interested in but only a
link.  That means I have to click around too much.  For example, the BBC
News http://www.bbc.co.uk/news/ feed
http://feeds.bbci.co.uk/news/rss.xml gets me onle short line and a link.
I would like to see the link directly without clicking and also I don't
want to see the whole page with all that redundant junk but only the
text of the article.  That text is inside of div class=3Dstory-body so
I could specify xpath
/html/body/div[2]/div[2]/div[2]/div/div[2]/div[2]/div[2]/div and the
feed reader would automatically display just the portion of page I am
interested in.

 The main imperative for me to create the reader is the fact that the
 Google Reader's GUI is horrible IMO and I'm happy with that part of
 VizReader. That and I thought it would be an easy thing to start out
 with in PL, but there is more to a feed reader than meets the
 eye... If I had thought about making the application distributed right
 from the start I would've been even happier.

Sure, you have different motivation and way of reading news which
doesn't match with my way.  That's why I also suggested exporting a
personal feed of collected feeds or sending that stuff by email.

 In the beginning I also had an algorithm that compared articles for
 automatic recommendations of similar content, that worked for a short

That could be interesting but not something crucial I would need.

Cheers,

Tomas
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: solicting user stories of picolisp

2010-07-21 Thread Henrik Sarvell
Thomas, you have to read http://picolisp.com/5000/-2-I.html if you want to
understand how it works completely.

And the problem is of course that it's slow (regardless of where or what)
and I don't really have the time to fix it :-)


On Wed, Jul 21, 2010 at 9:40 AM, Alexander Burger a...@software-lab.dewrote:

 Hi Tomas,

   Such numbers are very variable, and difficult to predict.
 
  I'm not sure what you mean.  How long does a simple grep over the
  article blob files take?  That should serve as a rough indicator about
  worst case behaviour.

 I'm not talking about the timings of 'grep', but of the database.

 'grep' is also subject to cache effects, but not as much as the picoLisp
 database, where each process caches all objects once they have been
 accessed. The whole query context is also cached, and related searches
 continue in the same context.

 The timings are also difficult to predict because they depend very much
 on the distribution of keys within the indexes, and which keys are
 queried from each index in which combination. For example, if you ask
 for a key combination that contains one or several keys that occur
 _seldom_ in the db, the matching results are found almost immediately.
 On the opposite end, searching for a combination of _common_ keys may
 require relatively long to find the exact hits.

 Cheers,
 - Alex
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe



Re: solicting user stories of picolisp

2010-07-20 Thread Tomas Hlavaty
Hi Henrik,

 Currently vizreader.com contains roughly 350 000 articles with a full
 word index (not partial).

 The word index is spread out on virtual remotes ie they are not
 really on remote machines, it's more a way to split up the physical
 database files on disk (I've written on how that is done on
 picolisp.com). I have no way of knowing how many words are mapped to
 their articles like this but most of the database is occupied by these
 indexes and it currently occupies some 30GB all in all.

 A search for the word Google just took 22 seconds.

if I understand it well, you have all the articles locally on one
machine.  I wonder how long a simple grep over the article blobs would
take?  22 seconds seems very long for any serious use.  Have you
considered some state-of-the-art full text search engine, e.g. Lucene?

Just curious, how did you create the word index?  I implemented a simple
search functionality and word index for LogandCMS which you can try as
http://demo.cms.logand.com/search.html?s=sheep and I even keep the count
of every word in each page for ranking purposes but I haven't had a
chance to run into scaling problems like that.

 No other part of the application is lagging significantly except for
 when listing new articles in my news category due to the fact that
 there are so many articles in that category. However the fetching
 method is highly inefficient as I first fetch all feeds in a category
 and then all their articles and then take (tail) on them to get the 50
 newest for instance. Walking and then only loading the wanted articles
 to memory would of course be the best way and something I will look
 into.

 Why don't you try out the application yourself now that you know how
 big the database is and so on, if you use Google Reader you can just
 export your subscriptions as an OPML and import it into VizReader.

I tried it and it looks interesting.  What feature I would actually want
from such a system is a way of extracting and specifying the interesting
content from the harvested feeds and links their articles point to,
e.g. using an xpath expression.  Then, either publishing it as per user
feed or sending that as email(s) so I could use my usual mail client to
read the news.

Cheers,

Tomas
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: solicting user stories of picolisp

2010-07-20 Thread Henrik Sarvell
Hi Tomas.

1.) This is what each remote looks like by way of E/R:

(class +WordCount +Entity)
(rel article   (+Ref +Number))
(rel word  (+Aux +Ref +Number) (article))
(rel count (+Number))
(rel picoStamp (+Ref +Number))

(dbs
  (4 +WordCount)
  (3 (+WordCount word article picoStamp)))

The bottleneck lies somewhere else than the actual lookup, here are some
results I just got when probably using the application all by myself:

picolisp = 1.97 s
google = 7.22 s
obama = 1.64 s (cached from prior search in RAM maybe?)
afghanistan = 7.2 s

Note the difference between google and picolisp we the search is performed
in exactly the same way, the only difference being that the system needs to
do post work after the results have been fetched and that is more work with
the google search since it returns the maximum 50 where picolisp only
returns 8. So the bottleneck is not the search itself but rather badly
optimized code that goes to work on the results later.

a way of extracting and specifying the interesting content from the
 harvested feeds and links their articles point to


Well the links you should be able to see in a per feed/category link map (I
noticed it was broken hopefully it will work from now on). As per specifying
content through an Xpath what is it that you hope to gain by that? Give me a
specific example please.

The main imperative for me to create the reader is the fact that the Google
Reader's GUI is horrible IMO and I'm happy with that part of VizReader. That
and I thought it would be an easy thing to start out with in PL, but there
is more to a feed reader than meets the eye... If I had thought about making
the application distributed right from the start I would've been even
happier.

In the beginning I also had an algorithm that compared articles for
automatic recommendations of similar content, that worked for a short time.
If I were to currently apply it then it would take it roughly one year to
compare all articles with each other. At one point I only let it compare a
random subset but that resulted in (predictably) random quality too :-)

Also, a lot of the finesse of the application is lost if you're not a
Twitter user. The majority of the time I spend in it is simply checking my
flow from time to time where most of the flow consists of Twitter posts
since few normal feeds have implemented the pubsub protocol yet.

Cheers,
Henrik Sarvell


On Tue, Jul 20, 2010 at 7:45 PM, Tomas Hlavaty t...@logand.com wrote:
 Hi Henrik,

 Currently vizreader.com contains roughly 350 000 articles with a full
 word index (not partial).

 The word index is spread out on virtual remotes ie they are not
 really on remote machines, it's more a way to split up the physical
 database files on disk (I've written on how that is done on
 picolisp.com). I have no way of knowing how many words are mapped to
 their articles like this but most of the database is occupied by these
 indexes and it currently occupies some 30GB all in all.

 A search for the word Google just took 22 seconds.

 if I understand it well, you have all the articles locally on one
 machine.  I wonder how long a simple grep over the article blobs would
 take?  22 seconds seems very long for any serious use.  Have you
 considered some state-of-the-art full text search engine, e.g. Lucene?

 Just curious, how did you create the word index?  I implemented a simple
 search functionality and word index for LogandCMS which you can try as
 http://demo.cms.logand.com/search.html?s=sheep and I even keep the count
 of every word in each page for ranking purposes but I haven't had a
 chance to run into scaling problems like that.

 No other part of the application is lagging significantly except for
 when listing new articles in my news category due to the fact that
 there are so many articles in that category. However the fetching
 method is highly inefficient as I first fetch all feeds in a category
 and then all their articles and then take (tail) on them to get the 50
 newest for instance. Walking and then only loading the wanted articles
 to memory would of course be the best way and something I will look
 into.

 Why don't you try out the application yourself now that you know how
 big the database is and so on, if you use Google Reader you can just
 export your subscriptions as an OPML and import it into VizReader.

 I tried it and it looks interesting.  What feature I would actually want
 from such a system is a way of extracting and specifying the interesting
 content from the harvested feeds and links their articles point to,
 e.g. using an xpath expression.  Then, either publishing it as per user
 feed or sending that as email(s) so I could use my usual mail client to
 read the news.

 Cheers,

 Tomas
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe



Re: solicting user stories of picolisp

2010-07-19 Thread Henrik Sarvell
Currently vizreader.com contains roughly 350 000 articles with a full
word index (not partial).

The word index is spread out on virtual remotes ie they are not
really on remote machines, it's more a way to split up the physical
database files on disk (I've written on how that is done on
picolisp.com). I have no way of knowing how many words are mapped to
their articles like this but most of the database is occupied by these
indexes and it currently occupies some 30GB all in all.

A search for the word Google just took 22 seconds.

No other part of the application is lagging significantly except for
when listing new articles in my news category due to the fact that
there are so many articles in that category. However the fetching
method is highly inefficient as I first fetch all feeds in a category
and then all their articles and then take (tail) on them to get the 50
newest for instance. Walking and then only loading the wanted articles
to memory would of course be the best way and something I will look
into.

Why don't you try out the application yourself now that you know how
big the database is and so on, if you use Google Reader you can just
export your subscriptions as an OPML and import it into VizReader.

Cheers,
Henrik Sarvell



On Mon, Jul 19, 2010 at 4:39 PM, Mateusz Jan Przybylski
dexen.devr...@gmail.com wrote:
 On Monday 19 July 2010 16:23:27 you wrote:
 if anybody would be so kind to share how they have experienced running
 picolisp in production.

 None yet, unfortunately.

 However, a (quick'n'dirty) HTML  HTTP application in PicoLisp got me a v=
ery
 good grade for `Programming languages  paradigms' course at Uni.

 The lecturer never heard of Lisp before; after listening to my explanatio=
ns he
 wrapped it up with:
 =A0``So this Lisp is a newfangled language, quite like Ruby, right?''
 Geez...


 --
 Mateusz Jan Przybylski


 ``One can't proceed from the informal to the formal by formal means.''
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe

-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: solicting user stories of picolisp

2010-07-19 Thread Alexander Burger
Hi Edwin,

 if anybody would be so kind to share how they have experienced running
 picolisp in production. fine, not just stories, but also numbers. how

Since we are using PicoLisp in production since 1986, I could perhaps
tell a lot if I should remember it all. Concerning numbers, we have
several customers running many years. Our oldest customer using the
current system has the system running since January 2001 without
interruption. The database of that customer is not very big, though (430
Megabytes, 277723 objects).


 big have your databases grown? how fast has the picolisp appserver

The biggest databases we had for another project, for systems indexing
and classifying filer systems of big customers (I should not tell names
here). There we had distributed databases (up to 70 interconnected
databases) with nearly one billion objects. The larger databases within
such a system were around 100-200 GB, more typical was around 20-80 GB.


 delivered your queries? did you ever get to see the picolisp database

I have never directly measured that speed, that wasn't an issue as all
those apps were not oriented for especially many clients. In this
context perhaps the results of the database contest in the german c't
magazine (http://www.heise.de/kiosk/archiv/ct/2006/13/190) are relevant,
where PicoLisp on the second price.


 recover from unforeseen system errors like crashes from the operating
 system and so?

Fortunately, not yet. We tested such situations, however (pulling the
plug), and normal power outages happened from time to time whithout any
data loss so far.


 can you please share your stories? would love to hear them.

I'm afraid I'm not a good story-teller, so I hope the above fragments
are useful ;-)

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: solicting user stories of picolisp

2010-07-19 Thread Edwin Eyan Moragas
On Mon, Jul 19, 2010 at 10:39 PM, Mateusz Jan Przybylski
dexen.devr...@gmail.com wrote:
 However, a (quick'n'dirty) HTML  HTTP application in PicoLisp got me a v=
ery
 good grade for `Programming languages  paradigms' course at Uni.

 The lecturer never heard of Lisp before; after listening to my explanatio=
ns he
 wrapped it up with:
 =A0``So this Lisp is a newfangled language, quite like Ruby, right?''
 Geez...

i really hope you were kidding.



 --
 Mateusz Jan Przybylski


 ``One can't proceed from the informal to the formal by formal means.''
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe

-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: solicting user stories of picolisp

2010-07-19 Thread José Romero
El Mon, 19 Jul 2010 18:46:55 +0200
Alexander Burger a...@software-lab.de escribi=C3=B3:
 On Mon, Jul 19, 2010 at 04:39:08PM +0200, Mateusz Jan Przybylski
 wrote:
  The lecturer never heard of Lisp before; after listening to my
  explanations he wrapped it up with:
   ``So this Lisp is a newfangled language, quite like Ruby, right?''
  Geez...
=20
 I'm deeply shocked!

Lisp Never gets old
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: solicting user stories of picolisp

2010-07-19 Thread Peter Fischer

On 19.07.2010 18:46, Alexander Burger wrote:

On Mon, Jul 19, 2010 at 04:39:08PM +0200, Mateusz Jan Przybylski wrote:
  ``So this Lisp is a newfangled language, quite like Ruby, right?''
   
I'm deeply shocked!
   
I'm not surprised. In 2010, people like wrapping yet another library in 
yet another framework. Until the solution(tm) is about 47 MB (=mega 
bloat) big - minimum. RAM and Disk is cheap nowadays... Programmers are 
admired for more LoC, not for less.


Another point may be the orientation of educational entities towards 
certain industry standards and the vendors academical pricing.


Peter

P.S.: even less people have heard of Forth.

--
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe