[General] Webboard: tagging or categorizing without crawling again

2014-12-12 Thread bar
Author: bruno
Email: bruno.v...@gmail.com
Message:
Thank you alexander, that was exactly what i was looking for!
Kind regards,
Bruno

Reply: http://www.mnogosearch.org/board/message.php?id=21670

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: tagging or categorizing without crawling again

2014-12-11 Thread bar
Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
 Actually, the way of using tag or categories is perfect but, i don't want 
 to crawl again the whole site because i didn't write my tagging rule in 
 the correct way the first time.

This task consists of two parts:

a. update what you have in the tables server and srvinfo.
This is done automatically when you start crawling.
indexer -n0 will do this. Note, this is enough when you just need
to rename some tag to a new value.

But usually this is not enough,
as you might want to redistribute documents between tags
(i.e. split a single tag into multiple ones, or join multiple tags
into a single one, or do some more complex redistribution).
In these cases part b is also needed.


b. update the table url to refer to the table server properly.
There is no a special command for this. Normally, documents are 
updated properly only when they're crawled next time.
But there is a trick to use Skip option temporarily,
to avoid real downloading.


Suppose you want to split the section of your site
into two subsections and assign different tags for them.

What you do is:

1. Change indexer.conf:

# Remove the old command
Tag doc
Server http://host/doc/


# And add two new commands instead
Tag doca
Server skip http://host/doc/a/

Tag docb
Server skip http://host/doc/b/


Notice the skip option in the new commands.


2. Run indexer -am -u 'http://host/doc/%'

It will a kind crawl all documents, but without real downloading.
It will actually only nothing else but execute a query like this
for every document:

UPDATE url SET status=200,next_index_time=1418965297, 
site_id=-1519382294,server_id=-1738492707 WHERE rec_id=259;


3. Make sure not to forget to remove the skip options
from the new Server commands in indexer.conf.

4. Check that everything went well:
SELECT server.tag,url.url FROM url,server WHERE url.server_id=server.rec_id;




Reply: http://www.mnogosearch.org/board/message.php?id=21669

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: tagging or categorizing without crawling again

2014-12-08 Thread bar
Author: bruno
Email: bruno.v...@gmail.com
Message:
Thanks for your reply,

it would be by using documents properties.
Actually, the way of using tag or categories is perfect but, i don't want 
to crawl again the whole site because i didn't write my tagging rule in 
the correct way the first time.

Many thanks!
Bruno

Reply: http://www.mnogosearch.org/board/message.php?id=21668

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: tagging or categorizing without crawling again

2014-12-06 Thread bar
Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hi Bruno,

 Hi Alexander and big congrats for the amazing tool you've built.
 I intend to use it as a seo tool but i came to an issue : i would like to 
 tag or categorize the urls after having already fetched the content but i 
 can't figure how to do it.
 We sometimes miss the correct structure and it's really a pain to have to 
 crawl again the whole site to rebuild the categorization as the urls are 
 arleady in the base.
 
 Many thanks for your help!
 kind regards,
 Bruno

How would you like to tag? Manually? Or in some automated way,
using document properties (e.g. document words, URL, etc)?


Reply: http://www.mnogosearch.org/board/message.php?id=21667

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: tagging or categorizing without crawling again

2014-12-05 Thread bar
Author: bruno
Email: bruno.v...@gmail.com
Message:
Hi Alexander and big congrats for the amazing tool you've built.
I intend to use it as a seo tool but i came to an issue : i would like to 
tag or categorize the urls after having already fetched the content but i 
can't figure how to do it.
We sometimes miss the correct structure and it's really a pain to have to 
crawl again the whole site to rebuild the categorization as the urls are 
arleady in the base.

Many thanks for your help!
kind regards,
Bruno

Reply: http://www.mnogosearch.org/board/message.php?id=21666

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general