[jira] [Commented] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
[ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178695#comment-13178695 ] Karl Wright commented on CONNECTORS-286: Using the Warthog API as the standard ManifoldCF way of dealing with databases may not be practical, for the following reasons. - A significant amount of the actual functionality of Warthog comes from java methods you supply to it. This is incompatible fundamentally with using a standard database to do the same thing, because there are bound to be situations where the two implementations disagree. - A full database implementation under Warthog entails using the database for table storage and index access (ordered) with conditions applied to the index. Warthog would do the rest. But it is conceivable that this would not perform as well as native database queries. - It is not clear how to construct a cache key in Warthog, so caching database results will require some thought. Caching at the interface to the underlying database is not practical at all, because only partial resultsets will be read from many of the queries. - It's not even clear (yet) whether critical functionality is missing from Warthog that will be needed to implement ManifoldCF. Nevertheless, the next step is to try to create an implementation of Warthog where WHTableStore, WHTable, and WHIndex are implemented by an underlying relational database. The difficulty in this, as stated above, occurs because the index (for example) is defined in terms of a WHComparator for each column being indexed, which is opaque Java code. Instead of merely performing the comparison, the code must, in addition, be in accordance with what the database is doing, AND also be capable of assisting in the generation of SQL code. Special SQL-consistent WHComparator implementations are therefore going to be necessary, which also implement another interface (SQLInspectable?). The WHIndex implementation can therefore use them to do what it needs, and complain if somebody tries to use incompatible comparator implementations. Thus, each implementation of the Warthog API consists of: - Implementations of WHTableStore and WHTable and WHIndex - A body of comparators, filters, etc. that implement data types consistent with the SQL database Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains Key: CONNECTORS-286 URL: https://issues.apache.org/jira/browse/CONNECTORS-286 Project: ManifoldCF Issue Type: New Feature Components: Framework core Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF next ManifoldCF's reliance on a relational database limits its throughput and scalability. I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling. I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large: (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate; (2) A transaction layer, which uses atomic serialization to build a notion of light transactions; (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API; (4) A generic database abstraction layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models. This is obviously a major development task, and as such is not envisioned to be completed by the next standard release. Work will indeed need to be done in a branch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-258) pom.xml refers to jars not available in public repositories
[ https://issues.apache.org/jira/browse/CONNECTORS-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-258: --- Fix Version/s: (was: ManifoldCF next) ManifoldCF 0.3 Assignee: Karl Wright pom.xml refers to jars not available in public repositories --- Key: CONNECTORS-258 URL: https://issues.apache.org/jira/browse/CONNECTORS-258 Project: ManifoldCF Issue Type: Bug Components: Build Affects Versions: ManifoldCF 0.4 Environment: all supported platforms Reporter: Alex Ott Assignee: Karl Wright Priority: Minor Labels: maven Fix For: ManifoldCF 0.3 Attachments: mvn-bootstrap.sh Maven's pom.xmls refers to jars that aren't available in public repositories, as maven central, apache repository, etc. This includes: - com.bitmechanic:jdbcpool - org.hsqldb:hsqldb:jar:2.2.5.6-9-2011 (at maven central only version 2.2.4 is available right now) I think, that ManifoldCF should adopt the same approach as other Apache projects, like Tika, when all needed jars first promoted to public repositories, and only after that, they are used as dependency... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CONNECTORS-258) pom.xml refers to jars not available in public repositories
[ https://issues.apache.org/jira/browse/CONNECTORS-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-258. Resolution: Fixed The mvn-bootstrap .sh/.bat has been part of manifoldcf since the 0.3 release. pom.xml refers to jars not available in public repositories --- Key: CONNECTORS-258 URL: https://issues.apache.org/jira/browse/CONNECTORS-258 Project: ManifoldCF Issue Type: Bug Components: Build Affects Versions: ManifoldCF 0.4 Environment: all supported platforms Reporter: Alex Ott Assignee: Karl Wright Priority: Minor Labels: maven Fix For: ManifoldCF 0.3 Attachments: mvn-bootstrap.sh Maven's pom.xmls refers to jars that aren't available in public repositories, as maven central, apache repository, etc. This includes: - com.bitmechanic:jdbcpool - org.hsqldb:hsqldb:jar:2.2.5.6-9-2011 (at maven central only version 2.2.4 is available right now) I think, that ManifoldCF should adopt the same approach as other Apache projects, like Tika, when all needed jars first promoted to public repositories, and only after that, they are used as dependency... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-318) Make it easier to trace XML parsing errors
[ https://issues.apache.org/jira/browse/CONNECTORS-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-318: --- Affects Version/s: ManifoldCF 0.5 Fix Version/s: ManifoldCF 0.5 Assignee: Karl Wright Although it's far better to have the Solr connector handle its own diagnostics, this patch may still be helpful upon occasion, so I recommend committing it. Make it easier to trace XML parsing errors -- Key: CONNECTORS-318 URL: https://issues.apache.org/jira/browse/CONNECTORS-318 Project: ManifoldCF Issue Type: Improvement Components: Framework core Affects Versions: ManifoldCF 0.5 Reporter: Martin Goldhahn Assignee: Karl Wright Priority: Minor Fix For: ManifoldCF 0.5 Attachments: XMLDoc.java.patch I had a hard time tracking an erroneous response from Solr. All I got was something like this: {{[Fatal Error] :112:120: The element type HR must be terminated by the matching end-tag /HR.}} There was no indication what the error was an what component issued the error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-309) On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl
[ https://issues.apache.org/jira/browse/CONNECTORS-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-309: --- Fix Version/s: (was: ManifoldCF next) ManifoldCF 0.5 Assignee: Karl Wright As stated this looks straightforward and will probably fit in the 0.5 timeframe. On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl Key: CONNECTORS-309 URL: https://issues.apache.org/jira/browse/CONNECTORS-309 Project: ManifoldCF Issue Type: Improvement Components: Web connector Affects Versions: ManifoldCF 0.4 Reporter: Michael J. Kelleher Assignee: Karl Wright Priority: Minor Fix For: ManifoldCF 0.5 There was not a Component for a Job. Canonicalization is part of the Job definition. I would like the ability to use a regex to transform a URL (not necessarily including the hostname and port). Specifically what I would like to use this for is to remove certain URL request parameters from the URL. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-292) Problems compiling agents, pull-agent, connectors/filesystem, etc directly in Maven
[ https://issues.apache.org/jira/browse/CONNECTORS-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179064#comment-13179064 ] Karl Wright commented on CONNECTORS-292: Given the open-endedness of this ticket, I'm going to triage it into ManifoldCF-next. Problems compiling agents, pull-agent, connectors/filesystem, etc directly in Maven --- Key: CONNECTORS-292 URL: https://issues.apache.org/jira/browse/CONNECTORS-292 Project: ManifoldCF Issue Type: Bug Components: Build Affects Versions: ManifoldCF 0.4 Environment: java 6, maven 3.0.3, ManifoldCF trunk version from http://svn.apache.org/repos/asf/incubator/lcf/trunk Reporter: Luca Stancapiano Priority: Minor Fix For: ManifoldCF next if I try to execute the command 'mvn install -Dmaven.test.skip inside 'framework/agents' I get this error: [ERROR] Failed to execute goal on project mcf-agents: Could not resolve dependencies for project org.apache.manifoldcf:mcf-agents:jar:0.4.0-SNAPSHOT: Failure to find org.apache.manifoldcf:mcf-core:jar:tests:0.4.0-SNAPSHOT in was cached in the local repository, resolution will not be reattempted until the update interval of sose-private has elapsed or updates are forced - [Help 1] In the pom.xml of the mcf-agents project there is a wrong dependency: dependency groupId${project.groupId}/groupId artifactIdmcf-core/artifactId version${project.version}/version typetest-jar/type scopetest/scope /dependency -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-351) Alfresco Connector documentation must be updated
[ https://issues.apache.org/jira/browse/CONNECTORS-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-351: --- Affects Version/s: ManifoldCF 0.5 Fix Version/s: ManifoldCF 0.5 Alfresco Connector documentation must be updated Key: CONNECTORS-351 URL: https://issues.apache.org/jira/browse/CONNECTORS-351 Project: ManifoldCF Issue Type: Bug Components: Documentation Affects Versions: ManifoldCF 0.5 Reporter: Piergiorgio Lucidi Assignee: Piergiorgio Lucidi Fix For: ManifoldCF 0.5 Original Estimate: 2h Remaining Estimate: 2h The Alfresco connector documentation must be updated with the new tenant domain parameter (text and screenshots). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-345) Jetty Configuration Support
[ https://issues.apache.org/jira/browse/CONNECTORS-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-345: --- Fix Version/s: ManifoldCF 0.5 Assignee: Karl Wright Jetty Configuration Support --- Key: CONNECTORS-345 URL: https://issues.apache.org/jira/browse/CONNECTORS-345 Project: ManifoldCF Issue Type: Improvement Components: Framework core Affects Versions: ManifoldCF 0.4 Environment: Jetty Configuration Reporter: Michael J. Kelleher Assignee: Karl Wright Fix For: ManifoldCF 0.5 Can the single process example be extended to support Jetty configuration? 1) jetty.xml 2) webdefault.xml 3) OPTIONS= along with their corresponding XML config files, most importantly the JMX option, Server,ajp,setuid would be nice to have -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Announcing the availability of UI testing infrastructure
Folks, When ManifoldCF was granted by MetaCarta, comprehensive tests existed for the main Crawler UI as well as the UI contributions of each connector. This testing was all done in Python, and was thus unavailable within Junit, even though the MetaCarta test code itself had been granted, including the Python browser emulator (which I had written). My original plans had been to port the browser emulator to Java. I kept starting to do this but other tasks continually interfered. Eventually in December I finally gave up on having enough of a block of time to do the port, and created infrastructure instead that invokes Python directly from within the Junit test framework. So we now have limited but sufficient capability for testing connector UIs. In order to use the tester, all you have to do is the following: - Install Python 2.x on the computer you intend to test with. - Make sure that typing the command python brings up the python shell. - Execute ant uitest. Currently tests exist for the filesystem connector, the rss connector, and the web connector (which I'm currently completing). To write your own test, have a look at the code in tests/rss/src/test/java/org/apache/manifoldcf/rss_tests/NavigationDerbyUI.java. It should be pretty self-explanatory. Ask questions if it isn't. I think we should have UI tests for all connectors before we ship 0.5, so if you own a connector please consider adding such a test. Bear in mind that the UI tester is NOT going to emulate IE or Firefox, but is only capable of doing the basics. Thus, there are plenty of things you can do in Javascript in a browser that won't work in the tester. If you are trying to do something in your UI that the tester does not like, usually the best solution is to simply do it in a different way. If that can't be done, we can augment the tester as needed. Let me know if you run into this problem and I'd be happy to help. The tester is also rigorous about properly formed HTML, which is good since most browsers silently accept crappy HTML and then break things in different ways. Thanks! Karl