RE: CrawlerCommons & ManifoldCF

2011-06-02 Thread Fuad Efendi
I mean "join button" at http://code.google.com/p/crawler-commons/ I am well familiar with BIXO and Droids; it will be hard to make minor changes in ManifoldCF... although it's possible (without "crawler" part, only "robots rules parser")... -Fuad -Original Message- From: Fuad Efendi [mail

RE: CrawlerCommons & ManifoldCF

2011-06-02 Thread Fuad Efendi
I'd like to join this project but can't find "join" button :) Thanks! Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search -Original Message- From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com] Sent: J

Re: CrawlerCommons & ManifoldCF

2011-06-02 Thread Karl Wright
I don't think it would be hard to peel out the robots parser, although obviously it would need refactoring to live in a more standard library environment. If you want to look at it, it is in: https://svn.apache.org/repos/asf/incubator/lcf/trunk/connectors/webcrawler/connector/src/main/java/org/ap

[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042861#comment-13042861 ] Karl Wright commented on CONNECTORS-110: r1130644 implements this for HSQLDB.

Re: CrawlerCommons & ManifoldCF

2011-06-02 Thread Julien Nioche
Hi Karl, Maybe a good start would be to identify which parts of your crawler could be shared and would not take too much effort to be made generic. I haven't looked to the code of the crawler in great details but do you think the robots parser would be a good candidate? Julien On 2 June 2011 16:

Re: CrawlerCommons & ManifoldCF

2011-06-02 Thread Karl Wright
Absolutely! We're a bit thin on active committers at the moment, which will probably limit our ability to take any highly active roles in your development process. But we do have a pile of code which you might be able to leverage, and once there is common functionality available I think we'd all p

CrawlerCommons & ManifoldCF

2011-06-02 Thread Julien Nioche
Hi guys, I'd just like to mention Crawler Commons which is a effort between the committers of various crawl-related projects (Nutch, Bixo or Heritrix) to put some basic functionalities in common. We currently have mostly a top level domain finder and a sitemap parser, but are definitely planning t

[jira] [Resolved] (CONNECTORS-205) Database DISTINCT ON abstraction needs to include ordering information in order to work for HSQLDB

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-205. Resolution: Fixed Fix Version/s: ManifoldCF 0.3 Assignee: Karl Wright r

[jira] [Created] (CONNECTORS-205) Database DISTINCT ON abstraction needs to include ordering information in order to work for HSQLDB

2011-06-02 Thread Karl Wright (JIRA)
Database DISTINCT ON abstraction needs to include ordering information in order to work for HSQLDB -- Key: CONNECTORS-205 URL: https://issues.apache.org/jira/browse/CO

[RESULT][VOTE] Adopt Java 1.5 as the minimum Java release for ManifoldCF

2011-06-02 Thread Karl Wright
Although it hasn't been the quite required 3 days, this vote isn't binding anyway, so I'm going to declare it closed and commit the code. Karl On Mon, May 30, 2011 at 7:32 PM, Karl Wright wrote: > Please have a look at CONNECTORS-203 and vote +1 if you think it's > time to move beyond Java 1.4 a

ManifoldCF now officially requires Java 1.5

2011-06-02 Thread Karl Wright
Hi everyone, I've checked in changes that move ManifoldCF from mostly the Java 1.4 world into the Java 1.5 world. This should introduce no compilation errors in user connector code, but most people will need to do a clean recompile to get a working system again. Please let me know ASAP if anyone

[jira] [Commented] (CONNECTORS-203) Consider porting ManifoldCF to Java 1.5 code standards

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042724#comment-13042724 ] Karl Wright commented on CONNECTORS-203: I've merged in all the major interfac

[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042669#comment-13042669 ] Karl Wright commented on CONNECTORS-110: Updated suggestion from Fred pertaini

[jira] [Created] (CONNECTORS-204) Now that HSQLDB functions with ManifoldCF, write a test-hsqldb ant target to test it

2011-06-02 Thread Karl Wright (JIRA)
Now that HSQLDB functions with ManifoldCF, write a test-hsqldb ant target to test it Key: CONNECTORS-204 URL: https://issues.apache.org/jira/browse/CONNECTORS-204 Pr

[jira] [Updated] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-110: --- Summary: Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042655#comment-13042655 ] Karl Wright commented on CONNECTORS-110: HSQLDB is now also in roughly the sam

Re: Incubator PMC/Board report for June 2011 (connectors-dev@incubator.apache.org)

2011-06-02 Thread Karl Wright
I've now edited the page accordingly. Let me know of any changes you'd like to see. Karl On Thu, Jun 2, 2011 at 4:18 AM, Tommaso Teofili wrote: > it sounds good to me, any others? > Tommaso > > 2011/6/1 Karl Wright > >> Here's my proposed text: >> >> ManifoldCF >> >> --Description-- >> >> Manif

Re: Incubator PMC/Board report for June 2011 (connectors-dev@incubator.apache.org)

2011-06-02 Thread Tommaso Teofili
it sounds good to me, any others? Tommaso 2011/6/1 Karl Wright > Here's my proposed text: > > ManifoldCF > > --Description-- > > ManifoldCF is an incremental crawler framework and set of connectors > designed to pull documents from various kinds of repositories into > search engine indexes or ot