[General] Webboard: index only new pages

2017-04-12 Thread bar
Author: Alexander Barkov
Email: 
Message:
> Unfortunately it did not work, but I found a working method.
> 
> I added 'Period 30y' before my 'Server' command in config file and 
> did
>  indexer --drop
>  indexer --create
>  indexer -a
> It ran forever. I killed it (ctrl-C) and it reported crawling over 
> 50 pages - there are about 16000 pages on the site.

It seems 30y makes some integer overflow.
Should work with "Period 1y".

> 
> I removed the 'Period' command and reindexed the site. I then 
> added a new directory with the newest pages and did:
> 
>  indexer -ai -u 'https://domain.com/msgs/v117n014/%.html'
>  indexer --index

The above command will insert 'https://domain.com/msgs/v117n014/%.html' into 
the database. This is probably not what you need.


It should be:

indexer -ai -u 'https://domain.com/msgs/v117n014/'
indexer --index


> 
> This processed only the new pages and correctly added them to the 
> index.
> 
> Thanks,
> Jeff


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: index only new pages

2017-04-12 Thread bar
Author: Jeff Dwork
Email: jeffdwor...@gmail.com
Message:
Unfortunately it did not work, but I found a working method.

I added 'Period 30y' before my 'Server' command in config file and 
did
 indexer --drop
 indexer --create
 indexer -a
It ran forever. I killed it (ctrl-C) and it reported crawling over 
50 pages - there are about 16000 pages on the site.

I removed the 'Period' command and reindexed the site. I then 
added a new directory with the newest pages and did:

 indexer -ai -u 'https://domain.com/msgs/v117n014/%.html'
 indexer --index

This processed only the new pages and correctly added them to the 
index.

Thanks,
Jeff

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: index only new pages

2017-04-11 Thread bar
Author: Alexander Barkov
Email: 
Message:
> I'm indexing a mailing list archive. Pages never change. Every 
> week a few pages are added in a new directory. The archive is on 
> the same machine as the index, so my server directive is
> 
> Server https://domain.com/msgs/ file:///var/www/domain/msgs/
> 
> I ran a full index (indexer --drop; indexer --create; indexer -a) 
> after creating the archive. The next week I add new messages in a 
> new directory (for example: /var/www/domain/msgs/v117n013/). I 
> cannot get the new pages indexed. I tried 'indexer' with no 
> options and several variations on
>  indexer -a -u '%/v117n013/%'
> all report 0 documents indexed.
> So I have to run another full index.
> 
> How can I get only the new pages indexed?

You need to re-crawl the index page:

indexer -am -u https://domain.com/msgs/

The you can run like this:

indexer -u '%/v117n013/%'


Btw, don't forget to set Period to some huge value.


> 
> Thanks,
> Jeff

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: index only new pages

2017-04-10 Thread bar
Author: Jeff Dwork
Email: jeffdwor...@gmail.com
Message:
I'm indexing a mailing list archive. Pages never change. Every 
week a few pages are added in a new directory. The archive is on 
the same machine as the index, so my server directive is

Server https://domain.com/msgs/ file:///var/www/domain/msgs/

I ran a full index (indexer --drop; indexer --create; indexer -a) 
after creating the archive. The next week I add new messages in a 
new directory (for example: /var/www/domain/msgs/v117n013/). I 
cannot get the new pages indexed. I tried 'indexer' with no 
options and several variations on
 indexer -a -u '%/v117n013/%'
all report 0 documents indexed.
So I have to run another full index.

How can I get only the new pages indexed?

Thanks,
Jeff

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general