I mean "join button" at http://code.google.com/p/crawler-commons/
I am well familiar with BIXO and Droids; it will be hard to make minor
changes in ManifoldCF... although it's possible (without "crawler" part,
only "robots rules parser")...
-Fuad
-Original Message-
From: Fuad Efendi [mail
I'd like to join this project but can't find "join" button :)
Thanks!
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search
-Original Message-
From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com]
Sent: J
I don't think it would be hard to peel out the robots parser, although
obviously it would need refactoring to live in a more standard library
environment. If you want to look at it, it is in:
https://svn.apache.org/repos/asf/incubator/lcf/trunk/connectors/webcrawler/connector/src/main/java/org/ap
[
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042861#comment-13042861
]
Karl Wright commented on CONNECTORS-110:
r1130644 implements this for HSQLDB.
Hi Karl,
Maybe a good start would be to identify which parts of your crawler could be
shared and would not take too much effort to be made generic. I haven't
looked to the code of the crawler in great details but do you think the
robots parser would be a good candidate?
Julien
On 2 June 2011 16:
Absolutely!
We're a bit thin on active committers at the moment, which will
probably limit our ability to take any highly active roles in your
development process. But we do have a pile of code which you might be
able to leverage, and once there is common functionality available I
think we'd all p
Hi guys,
I'd just like to mention Crawler Commons which is a effort between the
committers of various crawl-related projects (Nutch, Bixo or Heritrix) to
put some basic functionalities in common. We currently have mostly a top
level domain finder and a sitemap parser, but are definitely planning t
[
https://issues.apache.org/jira/browse/CONNECTORS-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright resolved CONNECTORS-205.
Resolution: Fixed
Fix Version/s: ManifoldCF 0.3
Assignee: Karl Wright
r
Database DISTINCT ON abstraction needs to include ordering information in order
to work for HSQLDB
--
Key: CONNECTORS-205
URL: https://issues.apache.org/jira/browse/CO
Although it hasn't been the quite required 3 days, this vote isn't
binding anyway, so I'm going to declare it closed and commit the code.
Karl
On Mon, May 30, 2011 at 7:32 PM, Karl Wright wrote:
> Please have a look at CONNECTORS-203 and vote +1 if you think it's
> time to move beyond Java 1.4 a
Hi everyone,
I've checked in changes that move ManifoldCF from mostly the Java 1.4
world into the Java 1.5 world. This should introduce no compilation
errors in user connector code, but most people will need to do a clean
recompile to get a working system again. Please let me know ASAP if
anyone
[
https://issues.apache.org/jira/browse/CONNECTORS-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042724#comment-13042724
]
Karl Wright commented on CONNECTORS-203:
I've merged in all the major interfac
[
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042669#comment-13042669
]
Karl Wright commented on CONNECTORS-110:
Updated suggestion from Fred pertaini
Now that HSQLDB functions with ManifoldCF, write a test-hsqldb ant target to
test it
Key: CONNECTORS-204
URL: https://issues.apache.org/jira/browse/CONNECTORS-204
Pr
[
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright updated CONNECTORS-110:
---
Summary: Max activity and Max bandwidth reports don't work properly under
Derby or HSQLDB
[
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042655#comment-13042655
]
Karl Wright commented on CONNECTORS-110:
HSQLDB is now also in roughly the sam
I've now edited the page accordingly. Let me know of any changes
you'd like to see.
Karl
On Thu, Jun 2, 2011 at 4:18 AM, Tommaso Teofili
wrote:
> it sounds good to me, any others?
> Tommaso
>
> 2011/6/1 Karl Wright
>
>> Here's my proposed text:
>>
>> ManifoldCF
>>
>> --Description--
>>
>> Manif
it sounds good to me, any others?
Tommaso
2011/6/1 Karl Wright
> Here's my proposed text:
>
> ManifoldCF
>
> --Description--
>
> ManifoldCF is an incremental crawler framework and set of connectors
> designed to pull documents from various kinds of repositories into
> search engine indexes or ot
18 matches
Mail list logo