[General] Webboard: tagging or categorizing without crawling again
Author: bruno Email: bruno.v...@gmail.com Message: Thank you alexander, that was exactly what i was looking for! Kind regards, Bruno Reply: http://www.mnogosearch.org/board/message.php?id=21670 ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general
[General] Webboard: tagging or categorizing without crawling again
Author: Alexander Barkov Email: b...@mnogosearch.org Message: Actually, the way of using tag or categories is perfect but, i don't want to crawl again the whole site because i didn't write my tagging rule in the correct way the first time. This task consists of two parts: a. update what you have in the tables server and srvinfo. This is done automatically when you start crawling. indexer -n0 will do this. Note, this is enough when you just need to rename some tag to a new value. But usually this is not enough, as you might want to redistribute documents between tags (i.e. split a single tag into multiple ones, or join multiple tags into a single one, or do some more complex redistribution). In these cases part b is also needed. b. update the table url to refer to the table server properly. There is no a special command for this. Normally, documents are updated properly only when they're crawled next time. But there is a trick to use Skip option temporarily, to avoid real downloading. Suppose you want to split the section of your site into two subsections and assign different tags for them. What you do is: 1. Change indexer.conf: # Remove the old command Tag doc Server http://host/doc/ # And add two new commands instead Tag doca Server skip http://host/doc/a/ Tag docb Server skip http://host/doc/b/ Notice the skip option in the new commands. 2. Run indexer -am -u 'http://host/doc/%' It will a kind crawl all documents, but without real downloading. It will actually only nothing else but execute a query like this for every document: UPDATE url SET status=200,next_index_time=1418965297, site_id=-1519382294,server_id=-1738492707 WHERE rec_id=259; 3. Make sure not to forget to remove the skip options from the new Server commands in indexer.conf. 4. Check that everything went well: SELECT server.tag,url.url FROM url,server WHERE url.server_id=server.rec_id; Reply: http://www.mnogosearch.org/board/message.php?id=21669 ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general
[General] Webboard: tagging or categorizing without crawling again
Author: bruno Email: bruno.v...@gmail.com Message: Thanks for your reply, it would be by using documents properties. Actually, the way of using tag or categories is perfect but, i don't want to crawl again the whole site because i didn't write my tagging rule in the correct way the first time. Many thanks! Bruno Reply: http://www.mnogosearch.org/board/message.php?id=21668 ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general
[General] Webboard: tagging or categorizing without crawling again
Author: Alexander Barkov Email: b...@mnogosearch.org Message: Hi Bruno, Hi Alexander and big congrats for the amazing tool you've built. I intend to use it as a seo tool but i came to an issue : i would like to tag or categorize the urls after having already fetched the content but i can't figure how to do it. We sometimes miss the correct structure and it's really a pain to have to crawl again the whole site to rebuild the categorization as the urls are arleady in the base. Many thanks for your help! kind regards, Bruno How would you like to tag? Manually? Or in some automated way, using document properties (e.g. document words, URL, etc)? Reply: http://www.mnogosearch.org/board/message.php?id=21667 ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general
[General] Webboard: tagging or categorizing without crawling again
Author: bruno Email: bruno.v...@gmail.com Message: Hi Alexander and big congrats for the amazing tool you've built. I intend to use it as a seo tool but i came to an issue : i would like to tag or categorize the urls after having already fetched the content but i can't figure how to do it. We sometimes miss the correct structure and it's really a pain to have to crawl again the whole site to rebuild the categorization as the urls are arleady in the base. Many thanks for your help! kind regards, Bruno Reply: http://www.mnogosearch.org/board/message.php?id=21666 ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general