Re: Excluding html files and following links

2011-06-20 Thread Karl Wright
Hi Erlend, The inclusions and exclusions are based solely on URL, and block the connector from fetching the file. Otherwise you would easily wind up fetching the entire web. However, this raises an interesting issue as to whether there's a way in the web connector to do what you are trying to do

Excluding html files and following links

2011-06-20 Thread Erlend GarĂ¥sen
I just realized that if I exclude html files for a job, links in these files will not be followed. Is this a desirable behaviour? Should links be followed regardless of the exclude filter? I discovered this issue when I was going to crawl only pdfs and realized that the job ended without fin

Re: MySql DBInterface problem on getTableSchema

2011-06-20 Thread Karl Wright
Rather than change the database contract, which would have far-reaching effects, is there any way to simply implement getTableSchema to work properly with the abstraction? For example, read the result of the DESCRIBE within the getTableSchema method and translate it in whatever manner is needed.

MySql DBInterface problem on getTableSchema

2011-06-20 Thread lelelomba...@libero.it
Hi, i'm working on MySql implementation of DBinferface, I have found a problem in the current implementation of getTableSchema. I don't know if the DESCRIBE is part of the SQL standard o not, but in MySql after the execution of the query the result is a "virtual table" populated with the data t