[jira] Created: (CONNECTORS-103) RSS connector: Have better initial default values for throttling
RSS connector: Have better initial default values for throttling Key: CONNECTORS-103 URL: https://issues.apache.org/jira/browse/CONNECTORS-103 Project: Apache Connectors Framework Issue Type: Improvement Components: RSS connector Reporter: Karl Wright Priority: Minor When you first create an rss connector connection, the bandwidth tab should come prepopulated with the following values: Max connections per server: 2 Max KB per second per server: 64 Max fetches per minute per server: 12 Too many casual users of ACF have been crawling without any throttling, and that's going to give ACF a bad name in the long run, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-101) File system connector would benefit by default crawling rules
[ https://issues.apache.org/jira/browse/CONNECTORS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-101. Fix Version/s: LCF Release 0.5 Resolution: Fixed r993551. By the way, the UI is really pretty bad for this connector also, so I may open a ticket to clean that up as well. File system connector would benefit by default crawling rules - Key: CONNECTORS-101 URL: https://issues.apache.org/jira/browse/CONNECTORS-101 Project: Apache Connectors Framework Issue Type: Improvement Components: File system connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 When you add a path to a file system connector job, it should automatically put in rules that cause it to include all files and directories under that path. This makes it easier to use, and more easily demonstrable too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-105) File system connector UI no longer adheres to connector UI standards, needs to be updated
File system connector UI no longer adheres to connector UI standards, needs to be updated - Key: CONNECTORS-105 URL: https://issues.apache.org/jira/browse/CONNECTORS-105 Project: Apache Connectors Framework Issue Type: Improvement Components: File system connector Reporter: Karl Wright Priority: Minor The file system connector specification Paths tab no longer adheres to the prevailing connector standard, which suggests a table for rule list displays. The connector UI should be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-105) File system connector UI no longer adheres to connector UI standards, needs to be updated
[ https://issues.apache.org/jira/browse/CONNECTORS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-105: -- Assignee: Karl Wright File system connector UI no longer adheres to connector UI standards, needs to be updated - Key: CONNECTORS-105 URL: https://issues.apache.org/jira/browse/CONNECTORS-105 Project: Apache Connectors Framework Issue Type: Improvement Components: File system connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 The file system connector specification Paths tab no longer adheres to the prevailing connector standard, which suggests a table for rule list displays. The connector UI should be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-105) File system connector UI no longer adheres to connector UI standards, needs to be updated
[ https://issues.apache.org/jira/browse/CONNECTORS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-105. Fix Version/s: LCF Release 0.5 Resolution: Fixed r993565. File system connector UI no longer adheres to connector UI standards, needs to be updated - Key: CONNECTORS-105 URL: https://issues.apache.org/jira/browse/CONNECTORS-105 Project: Apache Connectors Framework Issue Type: Improvement Components: File system connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 The file system connector specification Paths tab no longer adheres to the prevailing connector standard, which suggests a table for rule list displays. The connector UI should be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-41) Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc.
[ https://issues.apache.org/jira/browse/CONNECTORS-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904582#action_12904582 ] Karl Wright commented on CONNECTORS-41: --- I looked at this in some detail yesterday. The prime implementation option is to add notification methods to IOutputConnector, so that job events get reported to the connector when the job is being terminated. The issue in this case is going to be how exactly to handle ServiceInterruption exceptions that occur at the time of the notification into the connector. This is not hypothetical because in the Solr case a notification may well fail, or it may take a very long time (many minutes). Usually when there is a possibility of extended interaction it argues for an additional state in the database. It looks like it will not be possible to delay the change of the job status, since that takes place in a transaction. If the notification fails, the job could otherwise be left in the running state, and a retry would naturally occur until the commit succeeded. But that doesn't look possible given the transaction structure. An alternative (non-notification) method of handling a commit request would require the commit to take place as part of the output connector's poll() method. This is a little better to work with because the poll() method will naturally retry in any case. The issue here is that there would be no *guarantee* of a commit taking place at all, since it isn't part of the connector contract that the connection must continue to exist for any period of time, which I think would violate the spirit of this ticket. If explicit notification takes place, we could just report any error, and forget about it, rather than keeping the job alive for a retry. That, too, would mean that a commit was not guaranteed to occur during the job's lifecycle. The final alternative, which would seemingly work, would involve there being two job shutdown states - one prior to notification, and the second after notification. The first state would be entered based on the current shutdown logic. The second state would be entered only after the notification had been successful. Thus, the notification *could* be called more than once, if there were errors, or if the crawler were shut down and restarted before the state transition was completed. The extra state would also allow the job's pre-notification status to be noted in the crawler ui. Because of the potential time delay of a commit, it is probably best for the first to second shutdown state transition to be handled by a separate thread, or family of threads. Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc. --- Key: CONNECTORS-41 URL: https://issues.apache.org/jira/browse/CONNECTORS-41 Project: Apache Connectors Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Priority: Minor Currently there is no logic that informs an output connection of a job start, end, deletion, or other activity. While this would seem to have little to do with an output connector, this feature has been requested by Jack Krupansky as a potential way of deciding when to tell Solr to commit documents, rather than leave it up to Solr's configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-41) Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc.
[ https://issues.apache.org/jira/browse/CONNECTORS-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904611#action_12904611 ] Karl Wright commented on CONNECTORS-41: --- Does it makes no sense to create an entirely new kind of connector just for notifications of this sort? So when you create a job you select THREE different kinds of connection (repository, output, and notification)? That seems like supreme overkill to me, and I can well argue that this kind of notification really is only useful to an output connection in any case. Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc. --- Key: CONNECTORS-41 URL: https://issues.apache.org/jira/browse/CONNECTORS-41 Project: Apache Connectors Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Priority: Minor Currently there is no logic that informs an output connection of a job start, end, deletion, or other activity. While this would seem to have little to do with an output connector, this feature has been requested by Jack Krupansky as a potential way of deciding when to tell Solr to commit documents, rather than leave it up to Solr's configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-41) Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc.
[ https://issues.apache.org/jira/browse/CONNECTORS-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904622#action_12904622 ] Karl Wright commented on CONNECTORS-41: --- I think we're discussing two entirely distinct features here. Feature 1: Let the output connector know that a job is finished, so that it can flush whatever internal buffering etc. it has been doing (e.g. tell solr to commit). Feature 2: Provide some general way of monitoring the progress of jobs etc. Feature 2 is already met by the API, in my opinion. It's a polling-style fulfillment of the requirement, but it does exist. There doesn't seem to me to yet be a requirement that a notification-style API be provided also, despite the stated use case. Feature 1 is what I consider to be the use case for this current ticket. Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc. --- Key: CONNECTORS-41 URL: https://issues.apache.org/jira/browse/CONNECTORS-41 Project: Apache Connectors Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Priority: Minor Currently there is no logic that informs an output connection of a job start, end, deletion, or other activity. While this would seem to have little to do with an output connector, this feature has been requested by Jack Krupansky as a potential way of deciding when to tell Solr to commit documents, rather than leave it up to Solr's configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-41) Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc.
[ https://issues.apache.org/jira/browse/CONNECTORS-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-41: - Assignee: Karl Wright Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc. --- Key: CONNECTORS-41 URL: https://issues.apache.org/jira/browse/CONNECTORS-41 Project: Apache Connectors Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Currently there is no logic that informs an output connection of a job start, end, deletion, or other activity. While this would seem to have little to do with an output connector, this feature has been requested by Jack Krupansky as a potential way of deciding when to tell Solr to commit documents, rather than leave it up to Solr's configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-57) Solr output connector option to commit at end of job, by default
[ https://issues.apache.org/jira/browse/CONNECTORS-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904736#action_12904736 ] Karl Wright commented on CONNECTORS-57: --- I added unconditional commit support to the Solr connector as part of ticket CONNECTORS-41. The ability to turn it off and on cannot be done per job based on that implementation, but could readily be specified per Solr connection. This makes more sense to me anyway, since what will control whether you want this feature on or not is your solr configuration, and that's not going to change per job. Solr output connector option to commit at end of job, by default Key: CONNECTORS-57 URL: https://issues.apache.org/jira/browse/CONNECTORS-57 Project: Apache Connectors Framework Issue Type: Sub-task Components: Lucene/SOLR connector Reporter: Jack Krupansky By default, Solr will eventually commit documents that have been submitted to the Solr Cell interface, but the time lag can confuse and annoy people. Although commit strategy is a difficult issue in general, an option in LCF to automatically commit at the end of a job, by default, would eliminate a lot of potential confusion and generally be close to what the user needs. The desired feature is that there be an option to commit for each job that uses the Solr output connector. This option would default to on (or a different setting based on some global configuration setting), but the user may turn it off if commit is only desired upon completion of some jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904205#action_12904205 ] Karl Wright commented on CONNECTORS-92: --- Jettro, If you are using maven to start jetty directly, it will not work. You are missing the jetty runner, which only starts jetty at the end of a number of steps, including creating the database properly and setting up the schema and registering the connectors. Then, the crawler itself is started as a separate thread. It took me many weeks to get everything to work properly using jetty. Changing all this stuff around does not seem either warranted or useful at this time. I strongly recommend that you concentrate on using maven to actually build the software, and not try to re-engineer the example right now. Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: maven-poms-problem-starting-jetty-and-derby.patch, move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904209#action_12904209 ] Karl Wright commented on CONNECTORS-92: --- I've had a cursory glance at the pom files and they all look reasonable. I'm going to play around with this a bit locally to see how it behaves, and then if all seems OK I am happy to commit those. Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: maven-poms-problem-starting-jetty-and-derby.patch, move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904219#action_12904219 ] Karl Wright commented on CONNECTORS-92: --- bq. I am still thinking about why this is so hard. Would be nice to have something like a servlet or filter that initializes everything that you do in your special runner now. The issues have to do with these facts: - Embedded derby is single-process. You cannot run more than one process against a given database at a given time. - ACF supports both single-process and multi-process models, but IF you're going to use single-process, you need to have a main class that starts up all the threads that would otherwise be different processes. That's what jetty-runner does, in part. So, obviously, something like jetty-runner needs to exist if you are going to use derby. I don't think maven magic will suffice to replace the code that does that. Furthermore, I think trying to get maven to do this for us is overkill. I'm open to suggestions, but I still don't think you need to solve this problem in order to have ACF be built effectively by maven. What I think we need to build at the framework level are all the jars and wars (which it looks like you have pretty well specified), PLUS a start.jar (which I didn't see anywhere - did I miss it?). Then your example execution will not be a jetty instance per se, but will simply fire off the equivalent of java -jar start.jar. I can't believe there isn't a maven plugin for that. This, of course, must happen at the modules level, because no connectors will be available at the framework level. Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: maven-poms-problem-starting-jetty-and-derby.patch, move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904324#action_12904324 ] Karl Wright commented on CONNECTORS-92: --- Another way you can determine what's supposed to be a dependency is to look at the start.jar produced by the ant build: attribute name=Class-Path value=lib/commons-codec.jar lib/commons-collections.jar lib/commons-el.jar lib/commons-fileupload.jar lib/commons-httpclient-acf.jar lib/commons-io.jar lib/commons-logging.jar lib/derbyclient.jar lib/derby.jar lib/derbyLocale_cs.jar lib/derbyLocale_de_DE.jar lib/derbyLocale_es.jar lib/derbyLocale_fr.jar lib/derbyLocale_hu.jar lib/derbyLocale_it.jar lib/derbyLocale_ja_JP.jar lib/derbyLocale_ko_KR.jar lib/derbyLocale_pl.jar lib/derbyLocale_pt_BR.jar lib/derbyLocale_ru.jar lib/derbyLocale_zh_CN.jar lib/derbyLocale_zh_TW.jar lib/derbynet.jar lib/derbyrun.jar lib/derbytools.jar lib/eclipse-ecj.jar lib/jasper-6.0.24.jar lib/jasper-el-6.0.24.jar lib/jdbcpool-0.99.jar lib/jetty-6.1.22.jar lib/jetty-util-6.1.22.jar lib/jsp-api-2.1-glassfish-9.1.1.B60.25.p2.jar lib/json.jar lib/acf-agents.jar lib/acf-core.jar lib/acf-jetty-runner.jar lib/acf-pull-agent.jar lib/acf-ui-core.jar lib/log4j-1.2.jar lib/postgresql.jar lib/serializer.jar lib/servlet-api-2.5-20081211.jar lib/tomcat-juli-6.0.24.jar lib/xalan2.jar lib/xercesImpl-lcf.jar lib/xml-apis.jar/ Note that commons-httpclient-acf.jar is our own version of commons-httpclient, and must therefore NOT be an external dependency. Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Assignee: Karl Wright Attachments: maven-poms-including-start-jar.patch, maven-poms-problem-starting-jetty-and-derby.patch, move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903356#action_12903356 ] Karl Wright commented on CONNECTORS-92: --- I am now ready to commit the connectors reorganization also, once I hear back. Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903465#action_12903465 ] Karl Wright commented on CONNECTORS-92: --- I should also clarify that, to me, servlet is not just a single class in any case, but a body of functionality responsible for fielding web requests. So I think the servlet label is quite accurate. Others, of course, doubtless have different definitions. ;-) Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903463#action_12903463 ] Karl Wright commented on CONNECTORS-92: --- bq. Wouldn't it be better to rename the *-servlet into something like war or web. There will probably be more things in there than a servlet. No, really, there's just the servlet. All that I did was break the authority service into a separate web application and jar file. Both of these were built before under the heading of authority-service, but since we're getting rigorous, I separated out the targets. Did the same thing for the api - there's now a servlet, and a service, one yields a jar, the other a war (which includes the jar). Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903477#action_12903477 ] Karl Wright commented on CONNECTORS-92: --- No, the directories ending in -service produce wars. Those ending in -servlet produce a jar. Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be pure RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903556#action_12903556 ] Karl Wright commented on CONNECTORS-98: --- Jack, if you intend to work on this, can you give me an idea of roughly when I can expect to see something? It looks like there's going to be another renaming exercise, and I'd rather not step too hard on ongoing work, so please us apprised of your schedule/progress. API should be pure RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods - Key: CONNECTORS-98 URL: https://issues.apache.org/jira/browse/CONNECTORS-98 Project: Apache Connectors Framework Issue Type: Improvement Components: API Affects Versions: LCF Release 0.5 Reporter: Jack Krupansky Fix For: LCF Release 0.5 (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) It has come to my attention that the API would be more pure RESTful if the API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the input argument identifier represented in the context path. So, GET outputconnection/get \{connection_name:_connection_name_\} would be GET outputconnections/connection_name and GET outputconnection/delete \{connection_name:_connection_name_\} would be DELETE outputconnections/connection_name and GET outputconnection/list would be GET outputconnections and PUT outputconnection/save \{outputconnection:_output_connection_object_\} would be PUT outputconnections/connection_name \{outputconnection:_output_connection_object_\} What we have today is certainly workable, but just not as pure as some might desire. It would be better to take care of this before the initial release so that we never have to answer the question of why it wasn't done as a proper RESTful API. BTW, I did check to verify that an HttpServlet running under Jetty can process the DELETE and PUT methods (using the doDelete and doPut method overrides.) Also, POST should be usable as an alternative to PUT for API calls that have large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902805#action_12902805 ] Karl Wright commented on CONNECTORS-92: --- It looks to me like you adopted the one-jar-per-maven-script approach, with no coalescing of jars, but instead introducing /src/main under each of the subtargets within framework. I'd really like instead to make our job easier by at least combining the framework main jars together into one target first, along the lines I described above. I'd also like to get a sense of the overall picture before proceeding, so can we discuss what individual maven targets there are that you are proposing, and what each of them is, before we undertake any changes of this kind? The individual connector ones are obvious, but I'm concerned about stuff like the integration tests and the quick-start jetty package. How do you cover those in a maven build? Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-97) Web connector session authentication fails for some sites due to cookies httpclient says are illegal, but browsers accept
[ https://issues.apache.org/jira/browse/CONNECTORS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902880#action_12902880 ] Karl Wright commented on CONNECTORS-97: --- It turns out that our version of httpclient does not allow this to be configured. The code in question can be found in the validate() methods in commons-httpclient-3x/src/java/org/apache/httpclient/cookie/: CookieSpecBase.java and RFC2965Spec.java Thus, fixing this problem will require adding a configuration parameter to our httpclient version, as well as changing the web connector to set this configuration parameter appropriately. Web connector session authentication fails for some sites due to cookies httpclient says are illegal, but browsers accept - Key: CONNECTORS-97 URL: https://issues.apache.org/jira/browse/CONNECTORS-97 Project: Apache Connectors Framework Issue Type: Bug Components: Web connector Reporter: Karl Wright While trying to set up session authentication for the site http://www.ppdm.org, I ran into authentication problems that resulted from httpclient rejecting cookies: Cookie rejected: ppdm_forum_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A0%3A%22%22%3Bs%3A6%3A%22userid%22%3Bi%3A-1%3B%7D. Illegal path attribute /forums. Path of origin: /ba/login/login Cookie rejected: ppdm_forum_sid=338b5f5f0887ab4c2499948fc05daac8. Illegal path attribute /forums. Path of origin: /ba/login/login Cookie rejected: ppdm_forum_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A32%3A%2266a33ac80119bdcf7a1129f78de857a1%22%3Bs%3A6%3A%22userid%22%3Bs%3A4%3A%221346%22%3B%7D. Illegal path attribute /forums. Path of origin: /ba/login/login Cookie rejected: ppdm_forum_sid=3c36d20f96423b2de2d215a33b304e18. Illegal path attribute /forums. Path of origin: /ba/login/login And yet, FireFox and IE have no trouble with these. I suspect that there must be a configuration setting for httpclient that will fix this problem - and if there isn't, we need to add one and set it appropriately in the web connector code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-97) Web connector session authentication fails for some sites due to cookies httpclient says are illegal, but browsers accept
[ https://issues.apache.org/jira/browse/CONNECTORS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-97. --- Fix Version/s: LCF Release 0.5 Resolution: Fixed r989844-r989847 Web connector session authentication fails for some sites due to cookies httpclient says are illegal, but browsers accept - Key: CONNECTORS-97 URL: https://issues.apache.org/jira/browse/CONNECTORS-97 Project: Apache Connectors Framework Issue Type: Bug Components: Web connector Reporter: Karl Wright Assignee: Karl Wright Fix For: LCF Release 0.5 While trying to set up session authentication for the site http://www.ppdm.org, I ran into authentication problems that resulted from httpclient rejecting cookies: Cookie rejected: ppdm_forum_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A0%3A%22%22%3Bs%3A6%3A%22userid%22%3Bi%3A-1%3B%7D. Illegal path attribute /forums. Path of origin: /ba/login/login Cookie rejected: ppdm_forum_sid=338b5f5f0887ab4c2499948fc05daac8. Illegal path attribute /forums. Path of origin: /ba/login/login Cookie rejected: ppdm_forum_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A32%3A%2266a33ac80119bdcf7a1129f78de857a1%22%3Bs%3A6%3A%22userid%22%3Bs%3A4%3A%221346%22%3B%7D. Illegal path attribute /forums. Path of origin: /ba/login/login Cookie rejected: ppdm_forum_sid=3c36d20f96423b2de2d215a33b304e18. Illegal path attribute /forums. Path of origin: /ba/login/login And yet, FireFox and IE have no trouble with these. I suspect that there must be a configuration setting for httpclient that will fix this problem - and if there isn't, we need to add one and set it appropriately in the web connector code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-97) Web connector session authentication fails for some sites due to cookies httpclient says are illegal, but browsers accept
[ https://issues.apache.org/jira/browse/CONNECTORS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-97: - Assignee: Karl Wright Web connector session authentication fails for some sites due to cookies httpclient says are illegal, but browsers accept - Key: CONNECTORS-97 URL: https://issues.apache.org/jira/browse/CONNECTORS-97 Project: Apache Connectors Framework Issue Type: Bug Components: Web connector Reporter: Karl Wright Assignee: Karl Wright Fix For: LCF Release 0.5 While trying to set up session authentication for the site http://www.ppdm.org, I ran into authentication problems that resulted from httpclient rejecting cookies: Cookie rejected: ppdm_forum_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A0%3A%22%22%3Bs%3A6%3A%22userid%22%3Bi%3A-1%3B%7D. Illegal path attribute /forums. Path of origin: /ba/login/login Cookie rejected: ppdm_forum_sid=338b5f5f0887ab4c2499948fc05daac8. Illegal path attribute /forums. Path of origin: /ba/login/login Cookie rejected: ppdm_forum_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A32%3A%2266a33ac80119bdcf7a1129f78de857a1%22%3Bs%3A6%3A%22userid%22%3Bs%3A4%3A%221346%22%3B%7D. Illegal path attribute /forums. Path of origin: /ba/login/login Cookie rejected: ppdm_forum_sid=3c36d20f96423b2de2d215a33b304e18. Illegal path attribute /forums. Path of origin: /ba/login/login And yet, FireFox and IE have no trouble with these. I suspect that there must be a configuration setting for httpclient that will fix this problem - and if there isn't, we need to add one and set it appropriately in the web connector code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903151#action_12903151 ] Karl Wright commented on CONNECTORS-92: --- I rearranged the framework part of the tree to what I believe will satisfy maven. The rest of the tree I will cover in a subsequent check-in, provided I got this part right. Can you verify that the current tree is correct, and can you upload a new maven patch based on the new tree? Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-93) add contributors to CHANGES.txt
[ https://issues.apache.org/jira/browse/CONNECTORS-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902436#action_12902436 ] Karl Wright commented on CONNECTORS-93: --- So I hear that there has been a lot of recent discussion about our status change at gene...@incubator.apache.org, which I was unaware of. I was not subscribed to that list, and had accepted Grant's assessment of our status change. We'll have to see where it leads now. add contributors to CHANGES.txt --- Key: CONNECTORS-93 URL: https://issues.apache.org/jira/browse/CONNECTORS-93 Project: Apache Connectors Framework Issue Type: Task Components: Documentation Reporter: Robert Muir Attachments: CONNECTORS-93.patch As mentioned on the connectors-dev@ list (change the format of CHANGES.txt), I propose we modify CHANGES.txt to give credit to contributors who have given bug reports, comments, patches, etc. I'll volunteer to go thru the mail archives and jira issues that are marked 'resolved' and upload a patch here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901777#action_12901777 ] Karl Wright commented on CONNECTORS-92: --- bq. As a response to the remark from Karl (1) Breaking up modules and putting pieces of that all over the place I do not think they are all over the place, maybe I am thinking wrong about the modules part, but for me modules is not really clear. At the moment we have documentation, modules and tests. I suggest a slightly more separated mode with: documentation, integration-tests, framework, connectors and environment. The only change is to move some stuff from modules into a new part environment en move the other parts of modules one level up. Each thing under modules is something you'd want to build separately, which is why I chose the arrangement in the first place. If I were deploying these on a debian system, each would be its own package. That is, each connector would necessarily be its own package, as would mod-authz-annotate, and java-environment. Indeed, java-environment was originally a debian package that was part of the LCF software grant and has not been modified even to build, because it in effect represented a Debian java deployment framework rather than actual code. Same thing with postgres-config, except that was for postgresql configuration under Debian. Furthermore, mod-authz-annotate is C, and probably cannot be built under maven (or do I have that wrong?) Therefore, for a maven build we should plan on building the following as SEPARATE maven deliverables/targets: - (1) Each connector - (2) The framework If there is a way to build C stuff under maven, then this too should be a maven deliverable/target: - (3) mod-authz-annotate These should exist in the tree but be ignored for now, since they are not applicable to maven at all: - (4) java-environment - (5) postgres-config bq. (2) Taking jetty-runner out of framework I do not think that Jetty is part of your framework, you create war files and give the option for an easy start using Jetty. But maybe I am wrong. I set the jetty example and runner up so that they do not have explicit dependencies on any individual connectors, and thus they're built as part of the framework, which they DO have a dependency on. A case could be made for having these be separated into their own module-level component, in which case they'd also be their own maven deliverable. bq. (3) Introducing a src directory under each of the framework components At the moment when running ant. You get a lot of folders of which it is not always easy to understand whether they are original source folders or not. That is why maven comes with a clear separation of src, generated-source and target for other generated content. To my opinion this makes it easier to see what is under version control and what is not. Check the maven page for more explanation. http://maven.apache.org/guides/introduction/introduction-to-the-standard-directory-layout.html I will read the page. It seems to me that we'd need to agree what the maven deliverables would be before we can decide where the src directory goes. If the framework is a component all by itself (and I think it should be), then naturally the structure would be modules/framework/src/... instead. Does maven allow multiple jars in a deliverable? That would be a necessary condition. bq. (4) Moving the tests so far away from the code they are related to I am not sure if I was clear enough on this. In the original code base a test folder is available next to modules. For unit tests I would keep them as close as possible to the source code. Therefore we have the src/main and src/test in the same module. The integration tests are another beast. Usually a lot of environmental setup needs to be done, they take longer, and you might want to store them in a different folder so you can run them all at once. Another option would be to add them next to the unit tests in a different folder [src/main, src/test/ and src/integration-test] or use a different naming scheme. **Test.java and **IntegrationTest.java That way you can folder them out as well and use the maven lifecycle to decide whether to run unit test or both unit and integration tests. As of right now, there are three kinds of tests in the system: Unit tests (which are checked in in the module they are to test), integration unit tests (which are checked in at the modules level), and full integration tests that are a legacy of the LCF code grant (which are checked in in the tests directory above the modules level). The full integration tests are not executed but were meant to furnish the rudiments of a test plan,as well as useful bits for manipulating repositories themselves during test processes, and thus must be considered reference material at this time. The
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901811#action_12901811 ] Karl Wright commented on CONNECTORS-92: --- bq. Web projects are no problem at all. You can even have dependencies between webproject. Althought I would try to make dependencies on jars only. The question is, who would *want* to depend on any individual ACF war files? If there's a need, then fine, but I don't see one here. The only use case I can come up with for anybody depending on ACF is on the main framework jars, which could be consolidated into one jar quite readily. I would therefore propose breaking up modules/framework into two pieces: modules/framework-core, and modules/framework, or some such. framework-core would contain what's currently in framework/core, framework/ui-core, framework/agents, and framework/pull-agent, and would have both an ant build and a maven build that wraps it. framework would contain crawler-ui, api', authorityservice, and the jetty stuff, and would have a straight ant build and an ant-with-ivy build wrapping that. Each connector would have an ant build and a wrapping ant-ivy build also. Thoughts? Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-91) Making the initialization commands more useable
[ https://issues.apache.org/jira/browse/CONNECTORS-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901309#action_12901309 ] Karl Wright commented on CONNECTORS-91: --- This patch file worked properly. Since the automated tests do not exercise the commands, it would be good to set up a database instance from scratch using the changed code. If you have already done this, please let me know and I will go ahead and commit the changes. Making the initialization commands more useable --- Key: CONNECTORS-91 URL: https://issues.apache.org/jira/browse/CONNECTORS-91 Project: Apache Connectors Framework Issue Type: Improvement Components: Framework core Reporter: Jettro Coenradie Fix For: LCF Release 0.5 Attachments: change_commands.patch At the moment LCF comes with some classes that can be used to run command line to interact with the system. Examples are DBCreate, DBDrop and LockClean. I wanted to create a class that rebuilds my complete environment. So dropping a database, creating a database, cleaning the synch folder, registering agents, etc. Due to the structure of the classes with all the logic in the main method, I could not easily reuse these classes. In the patch I submit with issue I have refactored the current solution in a better reuseable solution that can still be called command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-91) Making the initialization commands more useable
[ https://issues.apache.org/jira/browse/CONNECTORS-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901312#action_12901312 ] Karl Wright commented on CONNECTORS-91: --- Another thing I had not noticed before is that this patch removes all stderr success confirmation messages for those folks who use the commands, and replaces them with log output. The log output is perfectly fine, but removing the feedback that the command was successful is, I think, not great. If the log were going to stderr typically that would be OK, but it typically is not, so I think you are going to want to do both. You would, obviously, want to do the stderr output within the main() method. Would it be possible to fix that up before I commit this? Making the initialization commands more useable --- Key: CONNECTORS-91 URL: https://issues.apache.org/jira/browse/CONNECTORS-91 Project: Apache Connectors Framework Issue Type: Improvement Components: Framework core Reporter: Jettro Coenradie Fix For: LCF Release 0.5 Attachments: change_commands.patch At the moment LCF comes with some classes that can be used to run command line to interact with the system. Examples are DBCreate, DBDrop and LockClean. I wanted to create a class that rebuilds my complete environment. So dropping a database, creating a database, cleaning the synch folder, registering agents, etc. Due to the structure of the classes with all the logic in the main method, I could not easily reuse these classes. In the patch I submit with issue I have refactored the current solution in a better reuseable solution that can still be called command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-91) Making the initialization commands more useable
[ https://issues.apache.org/jira/browse/CONNECTORS-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901336#action_12901336 ] Karl Wright commented on CONNECTORS-91: --- I looked at this. The patch seems correct for some classes, but for others it is clearly incorrect, e.g. SynchronizeAll: { System.err.println(Usage: SynchronizeAll); System.exit(1); + System.err.println(Successfully synchronized all agents); } Can you review your change for accuracy please? Also, responding to the logging change - the log settings are global, and we are trying for the least amount of setup work necessary to achieve a functional system. Clearly, all log messages to stderr is not going to be reasonable for people doing real crawls, so we'd need some way to segregate command output in order to direct it differently than everything else, which implies at the least you'd want a different logger, and then you'd also want to revise the documented log4j properties, if you think we should go that route. Re: testing. The testing you've done so far is best we can do at the moment, unless you'd also like to write some unit tests. I don't think this would be terribly difficult, but once again it would be time consuming. ;-) Making the initialization commands more useable --- Key: CONNECTORS-91 URL: https://issues.apache.org/jira/browse/CONNECTORS-91 Project: Apache Connectors Framework Issue Type: Improvement Components: Framework core Reporter: Jettro Coenradie Fix For: LCF Release 0.5 Attachments: change_commands.patch, change_commands_with_system_err_println.patch At the moment LCF comes with some classes that can be used to run command line to interact with the system. Examples are DBCreate, DBDrop and LockClean. I wanted to create a class that rebuilds my complete environment. So dropping a database, creating a database, cleaning the synch folder, registering agents, etc. Due to the structure of the classes with all the logic in the main method, I could not easily reuse these classes. In the patch I submit with issue I have refactored the current solution in a better reuseable solution that can still be called command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-91) Making the initialization commands more useable
[ https://issues.apache.org/jira/browse/CONNECTORS-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-91: - Assignee: Karl Wright Making the initialization commands more useable --- Key: CONNECTORS-91 URL: https://issues.apache.org/jira/browse/CONNECTORS-91 Project: Apache Connectors Framework Issue Type: Improvement Components: Framework core Reporter: Jettro Coenradie Assignee: Karl Wright Fix For: LCF Release 0.5 Attachments: change_commands.patch, change_commands_with_system_err_println.patch, change_commands_with_system_err_println_v2.patch At the moment LCF comes with some classes that can be used to run command line to interact with the system. Examples are DBCreate, DBDrop and LockClean. I wanted to create a class that rebuilds my complete environment. So dropping a database, creating a database, cleaning the synch folder, registering agents, etc. Due to the structure of the classes with all the logic in the main method, I could not easily reuse these classes. In the patch I submit with issue I have refactored the current solution in a better reuseable solution that can still be called command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-91) Making the initialization commands more useable
[ https://issues.apache.org/jira/browse/CONNECTORS-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-91. --- Resolution: Fixed Patch committed. r988101. Making the initialization commands more useable --- Key: CONNECTORS-91 URL: https://issues.apache.org/jira/browse/CONNECTORS-91 Project: Apache Connectors Framework Issue Type: Improvement Components: Framework core Reporter: Jettro Coenradie Assignee: Karl Wright Fix For: LCF Release 0.5 Attachments: change_commands.patch, change_commands_with_system_err_println.patch, change_commands_with_system_err_println_v2.patch At the moment LCF comes with some classes that can be used to run command line to interact with the system. Examples are DBCreate, DBDrop and LockClean. I wanted to create a class that rebuilds my complete environment. So dropping a database, creating a database, cleaning the synch folder, registering agents, etc. Due to the structure of the classes with all the logic in the main method, I could not easily reuse these classes. In the patch I submit with issue I have refactored the current solution in a better reuseable solution that can still be called command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901432#action_12901432 ] Karl Wright commented on CONNECTORS-92: --- This proposed change has a number of features I don't understand the reasons for: (1) Breaking up modules and putting pieces of that all over the place (2) Taking jetty-runner out of framework (3) Introducing a src directory under each of the framework components (4) Moving the tests so far away from the code they are related to Can you describe your logic for this reorganization? Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901436#action_12901436 ] Karl Wright commented on CONNECTORS-92: --- Re: build preferences Continuing to have an ant build is actually pretty important for some modes of delivery. I'm specifically thinking of debian and Ubuntu packaging here. Maven does not work well with these packaging schemes because it's too all-encompassing. We therefore need a way of doing builds locally, without pulling things down from a mirror. My original thought was that we'd have multiple layers - ant being the most basic, with a maven wrapper available to pull down what the ant build needed, and have the maven build call ant underneath. I don't know how realistic that is, but it does solve all the problems if it can be done that way. Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Attachments: Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-91) Making the initialization commands more useable
[ https://issues.apache.org/jira/browse/CONNECTORS-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898920#action_12898920 ] Karl Wright commented on CONNECTORS-91: --- It looks like this is simply using class-inheritance to separate out common functionality. As such, I'm in favor of including this contribution. Are there any subtleties I am missing? Making the initialization commands more useable --- Key: CONNECTORS-91 URL: https://issues.apache.org/jira/browse/CONNECTORS-91 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Jettro Coenradie Fix For: LCF Release 0.5 Attachments: commandsPatch.patch At the moment LCF comes with some classes that can be used to run command line to interact with the system. Examples are DBCreate, DBDrop and LockClean. I wanted to create a class that rebuilds my complete environment. So dropping a database, creating a database, cleaning the synch folder, registering agents, etc. Due to the structure of the classes with all the logic in the main method, I could not easily reuse these classes. In the patch I submit with issue I have refactored the current solution in a better reuseable solution that can still be called command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product
[ https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891533#action_12891533 ] Karl Wright commented on CONNECTORS-55: --- MVCC is the feature that suggests greater concurrency (and, hence, greater performance). Bundle database server with LCF packaged product Key: CONNECTORS-55 URL: https://issues.apache.org/jira/browse/CONNECTORS-55 Project: Lucene Connector Framework Issue Type: Sub-task Components: Installers Reporter: Jack Krupansky The current requirement that the user install and deploy a PostgreSQL server complicates the installation and deployment of LCF for the user. Installation and deployment of LCF should be as simple as Solr itself. QuickStart is great for the low-end and basic evaluation, but a comparable level of simplified installation and deployment is still needed for full-blown, high-end environments that need the full performance of a ProstgreSQL-class database server. So, PostgreSQL should be bundled with the packaged release of LCF so that installation and deployment of LCF will automatically install and deploy a subset of the full PostgreSQL distribution that is sufficient for the needs of LCF. Starting LCF, with or without the LCF UI, should automatically start the database server. Shutting down LCF should also shutdown the database server process. A typical use case would be for a non-developer who is comfortable with Solr and simply wants to crawl documents from, for example, a SharePoint repository and feed them into Solr. QuickStart should work well for the low end or in the early stages of evaluation, but the user would prefer to evaluate the real thing with something resembling a production crawl of thousands of documents. Such a user might not be a hard-core developer or be comfortable fiddling with a lot of software components simply to do one conceptually simple operation. It should still be possible for the user to supply database server settings to override the defaults, but the LCF package should have all of the best-practice settings deemed appropriate for use with LCF. One downside is that installation and deployment will be platform-specific since there are multiple processes and PostgreSQL itself requires a platform-specific installation. This proposal presumes that PostgreSQL is the best option for the foreseeable future, but nothing here is intended to preclude support for other database servers in futures releases. This proposal should not have any impact on QuickStart packaging or deployment. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-76) Document Web Connector configuration/specification API pieces
Document Web Connector configuration/specification API pieces - Key: CONNECTORS-76 URL: https://issues.apache.org/jira/browse/CONNECTORS-76 Project: Lucene Connector Framework Issue Type: Improvement Components: Documentation Reporter: Karl Wright Priority: Minor Need to document web connector - specific API objects and commands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-58) Mini-API to initially configure default connections and example jobs for file system and web crawl
[ https://issues.apache.org/jira/browse/CONNECTORS-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-58: -- Priority: Minor (was: Major) Component/s: Examples (was: Framework core) I'm going to put this in a new category called examples. Mini-API to initially configure default connections and example jobs for file system and web crawl - Key: CONNECTORS-58 URL: https://issues.apache.org/jira/browse/CONNECTORS-58 Project: Lucene Connector Framework Issue Type: Sub-task Components: Examples Reporter: Jack Krupansky Priority: Minor Creating a basic connection setup to do a relatively simple crawl for a file system or web can be a daunting task for someone new to LCF. So, it would be nice to have a scripting file that supports an abbreviated API (subset of the full API discussed in CONNECTORS-56) sufficient to create a default set of connections and example jobs that the new user can choose from. Beyond this initial need, this script format might be a useful form to dump all of the connections and jobs in the LCF database in a form that can be used to recreate an LCF configuration. Kind of a dump and reload capability. That in fact might be how the initial example script gets created. Those are two distinct use cases, but could utilize the same feature. The example script could have example jobs to crawl a subdirectory of LCF, crawl the LCF wiki, etc. There could be more than one script. There might be example scripts for each form of connector. This capability should be available for both QuickStart and the general release of LCF. As just one possibility, the script format might be a sequence of JSON expressions, each with an initial string analogous to a servlet path to specify the operation to be performed, followed by the JSON form of the connection or job or other LCF object. Or, some other format might be more suitable. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-50) Proposal for initial two releases of LCF, including packaged product and full API
[ https://issues.apache.org/jira/browse/CONNECTORS-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-50: -- Component/s: (was: Framework core) Moving this out of core, since it's a planning ticket not a software issue. Proposal for initial two releases of LCF, including packaged product and full API - Key: CONNECTORS-50 URL: https://issues.apache.org/jira/browse/CONNECTORS-50 Project: Lucene Connector Framework Issue Type: New Feature Reporter: Jack Krupansky Currently, LCF has a relatively high-bar for evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications. A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues. Goals: 1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today) 2. LCF as a toolkit for developers needing customized crawling and repository access 3. An API-based crawling engine that can be integrated with applications (as Aperture is today) Larger goals: 1. Make it very easy for users to evaluate LCF. 2. Make it very easy for developers to customize LCF. 3. Make it very easy for appplications to fully manage and control LCF in operation. Two phases: 1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5. 2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0. Phase 1 --- LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later. It would contain roughly the features that are currently in place or currently underway, plus a little more. Specifically, LCF 0.5 would contain these additional capabilities: 1. Plug-in architecture for connectors (CONNECTORS-40 - DONE) 2. Packaged app ready to run with embedded Jetty app server (CONNECTORS-59) 3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup (CONNECTORS-55) 4. Mini-API to initially configure default connections and example jobs for file system and web crawl (CONNECTORS-58) 5. Agent process started automatically (CONNECTORS-60) 6. Solr output connector option to commit at end of job, by default (CONNECTORS-57) Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example connections and jobs would permit the user to initiate example crawls of a file system example directory and an example web on the LCF web site with just a couple of clicks (as opposed to the detailed manual setup required today to create repository and output connections and jobs. It is worth considering whether the SharePoint connector could also be included as part of the default package. Users could then add additional connectors and repositories and jobs as desired. Timeframe for release? Level of effort? Phase 2 --- The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a crawling engine, in additional to the full LCF UI. Call this LCF 1.0. Specifically, LCF 1.0 would contain these additional capabilities: 1. Full API for LCF as a crawling engine (CONNECTORS-56) 2. LCF can be bundled within an app (CONNECTORS-61) 3. LCF event and activity notification for full control by an application (CONNECTORS-41) Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug fixes and minor enhancements might also be added. Timeframe for release? Level of effort? - Issues: - Can we package PostgreSQL with LCF so LCF can set it up? - Or do we need Derby for that purpose? - Managing multiple processes (UI, database, agent, app processes) - What exactly would the API look like? (URL, XML, JSON, YAML?) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-60) Agent process should be started automatically
[ https://issues.apache.org/jira/browse/CONNECTORS-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-60: -- Priority: Minor (was: Major) Description: LCF as it exists today is a bit too complex to run for an average user, especially with a separate agent process for crawling. LCF should be as easy to run as Solr is today. QuickStart is a good move in this direction, but the same user-visible simplicity is needed for full LCF. The separate agent process is a reasonable design for execution, but a little too cumbersome for the average user to manage. Unfortunately, it is expected that starting up a multi-process application will require platform-specific scripting. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. KDW - this functionality is already present; however the documentation is not adequate to help people figure out how to do it. So I'm moving this to Documentation and treating it as a doc bug. was: LCF as it exists today is a bit too complex to run for an average user, especially with a separate agent process for crawling. LCF should be as easy to run as Solr is today. QuickStart is a good move in this direction, but the same user-visible simplicity is needed for full LCF. The separate agent process is a reasonable design for execution, but a little too cumbersome for the average user to manage. Unfortunately, it is expected that starting up a multi-process application will require platform-specific scripting. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. Component/s: Documentation (was: Framework agents process) Agent process should be started automatically - Key: CONNECTORS-60 URL: https://issues.apache.org/jira/browse/CONNECTORS-60 Project: Lucene Connector Framework Issue Type: Sub-task Components: Documentation Reporter: Jack Krupansky Priority: Minor LCF as it exists today is a bit too complex to run for an average user, especially with a separate agent process for crawling. LCF should be as easy to run as Solr is today. QuickStart is a good move in this direction, but the same user-visible simplicity is needed for full LCF. The separate agent process is a reasonable design for execution, but a little too cumbersome for the average user to manage. Unfortunately, it is expected that starting up a multi-process application will require platform-specific scripting. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. KDW - this functionality is already present; however the documentation is not adequate to help people figure out how to do it. So I'm moving this to Documentation and treating it as a doc bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-59) Packaged app ready to run with embedded Jetty app server
[ https://issues.apache.org/jira/browse/CONNECTORS-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-59. --- Resolution: Fixed I am unaware of any lingering issues with the QuickStart work. Packaged app ready to run with embedded Jetty app server - Key: CONNECTORS-59 URL: https://issues.apache.org/jira/browse/CONNECTORS-59 Project: Lucene Connector Framework Issue Type: Sub-task Components: Framework core Reporter: Jack Krupansky Many potential users of LCF are not necessarily sophisticated developers who are prepared to work with code, but are able to install packaged software, much as Solr is currently distributed. QuickStart for LCF is a good move in this direction, but similar packaging is needed for full LCF with a production database server. This issue focuses on assuring that full LCF is released as a packaged app suitable for download and immediate use without any additional software development expertise required. Database packaging has already been called out as a distinct issue (CONNECTORS-55), so this issue is more of a catch-all for any lingering work needed to address support for full LCF as a packaged app. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-61) Support bundling of LCF with an app
[ https://issues.apache.org/jira/browse/CONNECTORS-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887806#action_12887806 ] Karl Wright commented on CONNECTORS-61: --- I'm tempted to close this issue because (a) there is absolutely no reason anyone competent cannot bundle lcf with an app today, and (b) it is completely unclear what, if anything, the 'fix' would look like. A specific statement of an actual concrete problem is the only thing that will prevent me from closing this. --- original message --- From: ext Jack Krupansky (JIRA) j...@apache.org Subject: [jira] Created: (CONNECTORS-61) Support bundling of LCF with an app Date: July 12, 2010 Time: 2:48:11 PM Support bundling of LCF with an app --- Key: CONNECTORS-61 URL: https://issues.apache.org/jira/browse/CONNECTORS-61 Project: Lucene Connector Framework Issue Type: Sub-task Components: Framework core Reporter: Jack Krupansky It should be possible for an application developer to bundle LCF with an application to facilitate installation and deployment of the application in conjunction with LCF. This may (or may not) be as simple as providing appropriate jar files and documentation for how to use them, but there may be other components or scripts needed. There are two options: 1) include the LCF UI along with the other LCF processes, and 2) exclude the LCF UI and include only the other processes that can be controlled via the full API. The database server would be included. The web app server would be optional since the application may have its own choice of web app server. One use case is bundling LCF with Solr or a Solr-based application. Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. Support bundling of LCF with an app --- Key: CONNECTORS-61 URL: https://issues.apache.org/jira/browse/CONNECTORS-61 Project: Lucene Connector Framework Issue Type: Sub-task Components: Framework core Reporter: Jack Krupansky It should be possible for an application developer to bundle LCF with an application to facilitate installation and deployment of the application in conjunction with LCF. This may (or may not) be as simple as providing appropriate jar files and documentation for how to use them, but there may be other components or scripts needed. There are two options: 1) include the LCF UI along with the other LCF processes, and 2) exclude the LCF UI and include only the other processes that can be controlled via the full API. The database server would be included. The web app server would be optional since the application may have its own choice of web app server. One use case is bundling LCF with Solr or a Solr-based application. Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product
[ https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886717#action_12886717 ] Karl Wright commented on CONNECTORS-55: --- Mark, If your concern is about installing LCF, read the Quick Start part of the build/deploy page. You check out, build, and run. Derby-based. Nothing else to install. Not hard really. Bundle database server with LCF packaged product Key: CONNECTORS-55 URL: https://issues.apache.org/jira/browse/CONNECTORS-55 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Jack Krupansky The current requirement that the user install and deploy a PostgreSQL server complicates the installation and deployment of LCF for the user. Installation and deployment of LCF should be as simple as Solr itself. QuickStart is great for the low-end and basic evaluation, but a comparable level of simplified installation and deployment is still needed for full-blown, high-end environments that need the full performance of a ProstgreSQL-class database server. So, PostgreSQL should be bundled with the packaged release of LCF so that installation and deployment of LCF will automatically install and deploy a subset of the full PostgreSQL distribution that is sufficient for the needs of LCF. Starting LCF, with or without the LCF UI, should automatically start the database server. Shutting down LCF should also shutdown the database server process. A typical use case would be for a non-developer who is comfortable with Solr and simply wants to crawl documents from, for example, a SharePoint repository and feed them into Solr. QuickStart should work well for the low end or in the early stages of evaluation, but the user would prefer to evaluate the real thing with something resembling a production crawl of thousands of documents. Such a user might not be a hard-core developer or be comfortable fiddling with a lot of software components simply to do one conceptually simple operation. It should still be possible for the user to supply database server settings to override the defaults, but the LCF package should have all of the best-practice settings deemed appropriate for use with LCF. One downside is that installation and deployment will be platform-specific since there are multiple processes and PostgreSQL itself requires a platform-specific installation. This proposal presumes that PostgreSQL is the best option for the foreseeable future, but nothing here is intended to preclude support for other database servers in futures releases. This proposal should not have any impact on QuickStart packaging or deployment. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product
[ https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886722#action_12886722 ] Karl Wright commented on CONNECTORS-55: --- forcing the user to pick the right/acceptable release of PostgreSQL to install is error prone and a support headache Yup. It is. Problem is that products/versions get security fixes, CVE's, end-of-life notices, etc. It is beyond the scope of LCF to try and control all that - we'd be buying a whole new level of support headache, believe me. Bundle database server with LCF packaged product Key: CONNECTORS-55 URL: https://issues.apache.org/jira/browse/CONNECTORS-55 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Jack Krupansky The current requirement that the user install and deploy a PostgreSQL server complicates the installation and deployment of LCF for the user. Installation and deployment of LCF should be as simple as Solr itself. QuickStart is great for the low-end and basic evaluation, but a comparable level of simplified installation and deployment is still needed for full-blown, high-end environments that need the full performance of a ProstgreSQL-class database server. So, PostgreSQL should be bundled with the packaged release of LCF so that installation and deployment of LCF will automatically install and deploy a subset of the full PostgreSQL distribution that is sufficient for the needs of LCF. Starting LCF, with or without the LCF UI, should automatically start the database server. Shutting down LCF should also shutdown the database server process. A typical use case would be for a non-developer who is comfortable with Solr and simply wants to crawl documents from, for example, a SharePoint repository and feed them into Solr. QuickStart should work well for the low end or in the early stages of evaluation, but the user would prefer to evaluate the real thing with something resembling a production crawl of thousands of documents. Such a user might not be a hard-core developer or be comfortable fiddling with a lot of software components simply to do one conceptually simple operation. It should still be possible for the user to supply database server settings to override the defaults, but the LCF package should have all of the best-practice settings deemed appropriate for use with LCF. One downside is that installation and deployment will be platform-specific since there are multiple processes and PostgreSQL itself requires a platform-specific installation. This proposal presumes that PostgreSQL is the best option for the foreseeable future, but nothing here is intended to preclude support for other database servers in futures releases. This proposal should not have any impact on QuickStart packaging or deployment. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product
[ https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886730#action_12886730 ] Karl Wright commented on CONNECTORS-55: --- The quick-start even takes care of connector registration for you, so executecommand is not needed even then. What you *don't* get to do is use the command-based API to control LCF; that's not going to work in the single-process model. By the way, hsqldb is apparently limited to a 16GB database (version 2.0). That's not very much. Bundle database server with LCF packaged product Key: CONNECTORS-55 URL: https://issues.apache.org/jira/browse/CONNECTORS-55 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Jack Krupansky The current requirement that the user install and deploy a PostgreSQL server complicates the installation and deployment of LCF for the user. Installation and deployment of LCF should be as simple as Solr itself. QuickStart is great for the low-end and basic evaluation, but a comparable level of simplified installation and deployment is still needed for full-blown, high-end environments that need the full performance of a ProstgreSQL-class database server. So, PostgreSQL should be bundled with the packaged release of LCF so that installation and deployment of LCF will automatically install and deploy a subset of the full PostgreSQL distribution that is sufficient for the needs of LCF. Starting LCF, with or without the LCF UI, should automatically start the database server. Shutting down LCF should also shutdown the database server process. A typical use case would be for a non-developer who is comfortable with Solr and simply wants to crawl documents from, for example, a SharePoint repository and feed them into Solr. QuickStart should work well for the low end or in the early stages of evaluation, but the user would prefer to evaluate the real thing with something resembling a production crawl of thousands of documents. Such a user might not be a hard-core developer or be comfortable fiddling with a lot of software components simply to do one conceptually simple operation. It should still be possible for the user to supply database server settings to override the defaults, but the LCF package should have all of the best-practice settings deemed appropriate for use with LCF. One downside is that installation and deployment will be platform-specific since there are multiple processes and PostgreSQL itself requires a platform-specific installation. This proposal presumes that PostgreSQL is the best option for the foreseeable future, but nothing here is intended to preclude support for other database servers in futures releases. This proposal should not have any impact on QuickStart packaging or deployment. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-38) There should be an LCF startup path that uses Jetty for running lcf-crawler-ui and lcf-authority-service
[ https://issues.apache.org/jira/browse/CONNECTORS-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885746#action_12885746 ] Karl Wright commented on CONNECTORS-38: --- Code complete. There's now a dist/example directory, and you run lcf with the command java -jar start.jar from that directory, just like Solr. Documentation needs updating, but otherwise this ticket is complete. There should be an LCF startup path that uses Jetty for running lcf-crawler-ui and lcf-authority-service Key: CONNECTORS-38 URL: https://issues.apache.org/jira/browse/CONNECTORS-38 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Integrating with Jetty would allow LCF to be deployed in simple cases without requiring Tomcat, which would simplify the setup in such cases. This of course should not be construed as removing the support for Tomcat-style web applications. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-38) There should be an LCF startup path that uses Jetty for running lcf-crawler-ui and lcf-authority-service
[ https://issues.apache.org/jira/browse/CONNECTORS-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884651#action_12884651 ] Karl Wright commented on CONNECTORS-38: --- I've started to look at what would be necessary to perform this work. If the quick-start implementation will be using embedded derby, then it must run in a single process (or derby is not happy at all). That would include the crawler ui, the authority service, and the crawler daemon. If jetty can be configured to run in such a way as to use system classes for all of its web applications, then in theory it should be possible to put together an LCF which, on startup, spawns the crawler daemon before starting up jetty within the same process. For the classloader issue, there seems to be a considerable degree of configuration flexibility, as described here: http://docs.codehaus.org/display/JETTY/Classloading The rest of the problem, i.e. starting and stopping jetty programmatically, may be doable based on this page: http://docs.codehaus.org/display/JETTY/Embedding+Jetty However, (1) it's really not clear what model I should be using. I basically need to be able to fire up two entire web applications, which don't need to be in wars necessarily, but which certainly need to contain JSPs, .css files, .jpg's, tld's, and other standard webish content. And (2), it's not clear if/how you properly perform Jetty shutdown using the chosen model. Any advice welcome. There should be an LCF startup path that uses Jetty for running lcf-crawler-ui and lcf-authority-service Key: CONNECTORS-38 URL: https://issues.apache.org/jira/browse/CONNECTORS-38 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Integrating with Jetty would allow LCF to be deployed in simple cases without requiring Tomcat, which would simplify the setup in such cases. This of course should not be construed as removing the support for Tomcat-style web applications. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-40) Classloader-based plug-in architecture would permit LCF to be prebuilt
[ https://issues.apache.org/jira/browse/CONNECTORS-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884285#action_12884285 ] Karl Wright commented on CONNECTORS-40: --- Classloader has bee added, and the configuration file format is now XML. The wiki connector description pages have been updated. Next: - Change the build process and connector delivery model to take advantage of the classloader - Change the build process wiki document to reflect all changes Classloader-based plug-in architecture would permit LCF to be prebuilt -- Key: CONNECTORS-40 URL: https://issues.apache.org/jira/browse/CONNECTORS-40 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Assignee: Karl Wright The LCF architecture at this point requires interaction with the build script in order to add connectors. This is because the connector JSPs and jars need to be added to the appropriate war files. However, there is another architectural option that would eliminate this need, which is to use a custom classloader to pull components from jars that are placed in a specific directory or directories. In order for this to work, however, the UI components of every connector must become part of a jar. That implies that they will need to cease being JSPs, and become instead methods of each connector class. (There is no proscription against using something like Velocity for assembling the necessary output for a connector, however.) Limiting the backwards-compatibility impact of this change will be difficult, especially after a first release is made, so it seems clear that any change along these lines should be attempted before version 1.0 is released. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-40) Classloader-based plug-in architecture would permit LCF to be prebuilt
[ https://issues.apache.org/jira/browse/CONNECTORS-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-40. --- Resolution: Fixed All code committed. Related tickets (such as removing the need for connector-specific -D switches) still in progress. Classloader-based plug-in architecture would permit LCF to be prebuilt -- Key: CONNECTORS-40 URL: https://issues.apache.org/jira/browse/CONNECTORS-40 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Assignee: Karl Wright The LCF architecture at this point requires interaction with the build script in order to add connectors. This is because the connector JSPs and jars need to be added to the appropriate war files. However, there is another architectural option that would eliminate this need, which is to use a custom classloader to pull components from jars that are placed in a specific directory or directories. In order for this to work, however, the UI components of every connector must become part of a jar. That implies that they will need to cease being JSPs, and become instead methods of each connector class. (There is no proscription against using something like Velocity for assembling the necessary output for a connector, however.) Limiting the backwards-compatibility impact of this change will be difficult, especially after a first release is made, so it seems clear that any change along these lines should be attempted before version 1.0 is released. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-47) Framework UI seems to call connector post processing more than needed
[ https://issues.apache.org/jira/browse/CONNECTORS-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-47. --- Assignee: Karl Wright Resolution: Fixed r959393. Refactor as needed to solidify the contract between edit pages and the execute.jsp post page. Framework UI seems to call connector post processing more than needed - Key: CONNECTORS-47 URL: https://issues.apache.org/jira/browse/CONNECTORS-47 Project: Lucene Connector Framework Issue Type: Bug Components: Framework crawler agent Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Connector form post processing is currently invoked both in execute.jsp (which is the target of all form posts), as well as in individual edit pages (such as editconfig.jsp and editjob.jsp). Unless a reason can be found for why this is done, the individual edit page calls should be removed, since they are by definition superfluous. Possible reasons it was done this way were: (a) that code predates execute.jsp (b) some other functionality, e.g. copy or posting of certificates, needs it At any rate, this should be looked at after the bulk of CONNECTORS-40 related changes are committed to trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-51) Reduce the number of required -D defines by using System.setProperty() in the appropriate places
Reduce the number of required -D defines by using System.setProperty() in the appropriate places Key: CONNECTORS-51 URL: https://issues.apache.org/jira/browse/CONNECTORS-51 Project: Lucene Connector Framework Issue Type: Improvement Components: JCIFS connector Reporter: Karl Wright Priority: Minor The JCIFS connector requires a fair number of -D switches in the java startup in order to do the right things. This is largely because jcifs.jar is constructed this way. It may be possible, however, to eliminate these -D's by judicious static use of System.setProperty() within the appropriate connector class, provided we presume that jcifs classes will never be loaded prior to the jcifs connector classes being loaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-40) Classloader-based plug-in architecture would permit LCF to be prebuilt
[ https://issues.apache.org/jira/browse/CONNECTORS-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-40: - Assignee: Karl Wright Classloader-based plug-in architecture would permit LCF to be prebuilt -- Key: CONNECTORS-40 URL: https://issues.apache.org/jira/browse/CONNECTORS-40 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Assignee: Karl Wright The LCF architecture at this point requires interaction with the build script in order to add connectors. This is because the connector JSPs and jars need to be added to the appropriate war files. However, there is another architectural option that would eliminate this need, which is to use a custom classloader to pull components from jars that are placed in a specific directory or directories. In order for this to work, however, the UI components of every connector must become part of a jar. That implies that they will need to cease being JSPs, and become instead methods of each connector class. (There is no proscription against using something like Velocity for assembling the necessary output for a connector, however.) Limiting the backwards-compatibility impact of this change will be difficult, especially after a first release is made, so it seems clear that any change along these lines should be attempted before version 1.0 is released. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-40) Classloader-based plug-in architecture would permit LCF to be prebuilt
[ https://issues.apache.org/jira/browse/CONNECTORS-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883595#action_12883595 ] Karl Wright commented on CONNECTORS-40: --- The UI changes have been made, largely hand-tested, and merged into trunk. Next steps for this ticket include: - Updating the wiki page on how to build a connector - Writing the classloader implementation that will actually allow for plugin loading Classloader-based plug-in architecture would permit LCF to be prebuilt -- Key: CONNECTORS-40 URL: https://issues.apache.org/jira/browse/CONNECTORS-40 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright The LCF architecture at this point requires interaction with the build script in order to add connectors. This is because the connector JSPs and jars need to be added to the appropriate war files. However, there is another architectural option that would eliminate this need, which is to use a custom classloader to pull components from jars that are placed in a specific directory or directories. In order for this to work, however, the UI components of every connector must become part of a jar. That implies that they will need to cease being JSPs, and become instead methods of each connector class. (There is no proscription against using something like Velocity for assembling the necessary output for a connector, however.) Limiting the backwards-compatibility impact of this change will be difficult, especially after a first release is made, so it seems clear that any change along these lines should be attempted before version 1.0 is released. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-49) Solr connector metadata and id field can collide, causing multiple id fields to be passed in
[ https://issues.apache.org/jira/browse/CONNECTORS-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-49. --- Resolution: Fixed r959167. Tested, except in the context of an actual crawl. Solr connector metadata and id field can collide, causing multiple id fields to be passed in Key: CONNECTORS-49 URL: https://issues.apache.org/jira/browse/CONNECTORS-49 Project: Lucene Connector Framework Issue Type: Bug Components: Lucene/SOLR connector Reporter: Karl Wright Assignee: Karl Wright If a document has a metadata field called id, or ID, or Id, or any such thing, the Solr connector will blithely send both the document id and the metadata id along to Solr, which will then crap out with an error. The solution is to map the metadata id field to something else, which should be determined by the solr connection definition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-49) Solr connector metadata and id field can collide, causing multiple id fields to be passed in
[ https://issues.apache.org/jira/browse/CONNECTORS-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881604#action_12881604 ] Karl Wright commented on CONNECTORS-49: --- As per discussions in connectors-user, it's probably important to also provide a declaration of the name of the solr id field in the configuration, with a default value of id. Longer term, maybe Solr can learn to accept a generic notion of primary key, but that's as yet undecided. Solr connector metadata and id field can collide, causing multiple id fields to be passed in Key: CONNECTORS-49 URL: https://issues.apache.org/jira/browse/CONNECTORS-49 Project: Lucene Connector Framework Issue Type: Bug Components: Lucene/SOLR connector Reporter: Karl Wright Assignee: Karl Wright If a document has a metadata field called id, or ID, or Id, or any such thing, the Solr connector will blithely send both the document id and the metadata id along to Solr, which will then crap out with an error. The solution is to map the metadata id field to something else, which should be determined by the solr connection definition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-49) Solr connector metadata and id field can collide, causing multiple id fields to be passed in
Solr connector metadata and id field can collide, causing multiple id fields to be passed in Key: CONNECTORS-49 URL: https://issues.apache.org/jira/browse/CONNECTORS-49 Project: Lucene Connector Framework Issue Type: Bug Components: Lucene/SOLR connector Reporter: Karl Wright If a document has a metadata field called id, or ID, or Id, or any such thing, the Solr connector will blithely send both the document id and the metadata id along to Solr, which will then crap out with an error. The solution is to map the metadata id field to something else, which should be determined by the solr connection definition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-49) Solr connector metadata and id field can collide, causing multiple id fields to be passed in
[ https://issues.apache.org/jira/browse/CONNECTORS-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-49: - Assignee: Karl Wright Solr connector metadata and id field can collide, causing multiple id fields to be passed in Key: CONNECTORS-49 URL: https://issues.apache.org/jira/browse/CONNECTORS-49 Project: Lucene Connector Framework Issue Type: Bug Components: Lucene/SOLR connector Reporter: Karl Wright Assignee: Karl Wright If a document has a metadata field called id, or ID, or Id, or any such thing, the Solr connector will blithely send both the document id and the metadata id along to Solr, which will then crap out with an error. The solution is to map the metadata id field to something else, which should be determined by the solr connection definition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-48) SharePoint rules description is incomplete
[ https://issues.apache.org/jira/browse/CONNECTORS-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-48. --- Assignee: Karl Wright Resolution: Fixed Added a section on rule matching and implied rules - hope this helps. SharePoint rules description is incomplete -- Key: CONNECTORS-48 URL: https://issues.apache.org/jira/browse/CONNECTORS-48 Project: Lucene Connector Framework Issue Type: Improvement Components: Documentation Reporter: Karl Wright Assignee: Karl Wright The description of how SharePoint inclusion and exclusion rules work is inadequate for an end user to be able to use the connector effectively. Specifically, it does not explain how the connector matches a rule. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-47) Framework UI seems to call connector post processing more than needed
Framework UI seems to call connector post processing more than needed - Key: CONNECTORS-47 URL: https://issues.apache.org/jira/browse/CONNECTORS-47 Project: Lucene Connector Framework Issue Type: Bug Components: Framework crawler agent Reporter: Karl Wright Priority: Minor Connector form post processing is currently invoked both in execute.jsp (which is the target of all form posts), as well as in individual edit pages (such as editconfig.jsp and editjob.jsp). Unless a reason can be found for why this is done, the individual edit page calls should be removed, since they are by definition superfluous. Possible reasons it was done this way were: (a) that code predates execute.jsp (b) some other functionality, e.g. copy or posting of certificates, needs it At any rate, this should be looked at after the bulk of CONNECTORS-40 related changes are committed to trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-45) Solr connector gives no way to specify the solr core name
Solr connector gives no way to specify the solr core name - Key: CONNECTORS-45 URL: https://issues.apache.org/jira/browse/CONNECTORS-45 Project: Lucene Connector Framework Issue Type: Bug Components: Lucene/SOLR connector Reporter: Karl Wright The Solr Connector allows you to specify everything about the Solr connection except the Solr Core name. A new configuration field should be added, which is optional and defaults to blank, to allow this field to be set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-40) Classloader-based plug-in architecture would permit LCF to be prebuilt
[ https://issues.apache.org/jira/browse/CONNECTORS-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879215#action_12879215 ] Karl Wright commented on CONNECTORS-40: --- The implementation strategy is as follows: (1) Add methods to the connector interfaces to support the UI. These correspond directly to the chunks of UI contributed by each connector that used to be performed by jsps, which used to be located by a naming technique. (Every connector had a family of jsps, e.g. output/connector_name/headerconfig.jsp, output/connector_name/editconfig.jsp, etc.) To do this in a way that will make it possible to easily replace the technology for the framework side of the UI later, I also introduced some interfaces so that there are no direct references to any JSP or servlet classes. (2) Change the framework UI to call the connector methods rather than the old jsp components. (3) Change all individual connectors to discard their JSPs and instead implement the connector methods. Once this preliminary work is done, it should be possible to write a class loader to allow a user (or an installer) to specify a set of paths in which to search for jars. This would make it possible for people to deliver connectors into the system without having to rebuild the war file, which currently is necessary. That, in turn, makes it feasible to prebuild all LCF components and deliver it much like Solr is delivered. The CONNECTORS-40 branch currently contains just the following: - UI method additions to the output connection interface only; - Changes to the framework UI code to call the new methods; - Changes to the GTS output connector to implement the new methods (and remove the old JSPs). The reason this has been checked in at this point is largely as a sanity check. It's a lot easier to change direction when one connector has been done than it would be to change 15 of them. Hope this helps. Classloader-based plug-in architecture would permit LCF to be prebuilt -- Key: CONNECTORS-40 URL: https://issues.apache.org/jira/browse/CONNECTORS-40 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright The LCF architecture at this point requires interaction with the build script in order to add connectors. This is because the connector JSPs and jars need to be added to the appropriate war files. However, there is another architectural option that would eliminate this need, which is to use a custom classloader to pull components from jars that are placed in a specific directory or directories. In order for this to work, however, the UI components of every connector must become part of a jar. That implies that they will need to cease being JSPs, and become instead methods of each connector class. (There is no proscription against using something like Velocity for assembling the necessary output for a connector, however.) Limiting the backwards-compatibility impact of this change will be difficult, especially after a first release is made, so it seems clear that any change along these lines should be attempted before version 1.0 is released. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-34) eRoom authority and connector
[ https://issues.apache.org/jira/browse/CONNECTORS-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877408#action_12877408 ] Karl Wright commented on CONNECTORS-34: --- It turns out that EMC has released a new version of eRoom that uses Documentum as an implementation platform. This would imply that no connector needs to be developed, except perhaps to support legacy eRoom installations. Can anyone confirm this story? eRoom authority and connector - Key: CONNECTORS-34 URL: https://issues.apache.org/jira/browse/CONNECTORS-34 Project: Lucene Connector Framework Issue Type: New Feature Reporter: Karl Wright eRoom has a SOAP API which looks like it has enough power to perhaps implement a connector and an authority. The eRoom API url is here (and yes, it is a chinese url, but is legit): https://eroom.abraxas.ch/eroomHelp/en/API_Help/Api.htm#home_api.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-44) Adding metadata support to JDBC connector
[ https://issues.apache.org/jira/browse/CONNECTORS-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877409#action_12877409 ] Karl Wright commented on CONNECTORS-44: --- I think this feature has merit in its own right. I'm a little leery about this becoming a Stellent connector, though, since: (a) it's hardly end-user friendly for users to have to learn the Stellent schema; (b) I'm sure Stellent has some kind of security, and this proposal would not address that. Adding metadata support to JDBC connector - Key: CONNECTORS-44 URL: https://issues.apache.org/jira/browse/CONNECTORS-44 Project: Lucene Connector Framework Issue Type: Improvement Components: JDBC connector Environment: Windows, Oracle 10g, Oracle Universal Content Management System Reporter: Rohan G Patil Priority: Critical Original Estimate: 0.02h Remaining Estimate: 0.02h The metadata for the documents checked in is stored in different fields of the Database. for example created date, Author,Title etc. The BLOB object contains only the text of the document. It would be very helpful if we could add support select Metadata fields (Columns in the database ) while querying the table. The above support would be helpful and make it a substitute for Oracle UCM (Stellent) Connector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-44) Adding metadata support to JDBC connector
[ https://issues.apache.org/jira/browse/CONNECTORS-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-44: -- Original Estimate: 48h (was: 0.02h) Remaining Estimate: 48h (was: 0.02h) Assignee: Karl Wright Priority: Major (was: Critical) Adding metadata support to JDBC connector - Key: CONNECTORS-44 URL: https://issues.apache.org/jira/browse/CONNECTORS-44 Project: Lucene Connector Framework Issue Type: Improvement Components: JDBC connector Environment: Windows, Oracle 10g, Oracle Universal Content Management System Reporter: Rohan G Patil Assignee: Karl Wright Original Estimate: 48h Remaining Estimate: 48h The metadata for the documents checked in is stored in different fields of the Database. for example created date, Author,Title etc. The BLOB object contains only the text of the document. It would be very helpful if we could add support select Metadata fields (Columns in the database ) while querying the table. The above support would be helpful and make it a substitute for Oracle UCM (Stellent) Connector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-44) Adding metadata support to JDBC connector
[ https://issues.apache.org/jira/browse/CONNECTORS-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-44. --- Resolution: Fixed Committed fix in svn revision 953386. Adding metadata support to JDBC connector - Key: CONNECTORS-44 URL: https://issues.apache.org/jira/browse/CONNECTORS-44 Project: Lucene Connector Framework Issue Type: Improvement Components: JDBC connector Environment: Windows, Oracle 10g, Oracle Universal Content Management System Reporter: Rohan G Patil Assignee: Karl Wright Original Estimate: 48h Remaining Estimate: 48h The metadata for the documents checked in is stored in different fields of the Database. for example created date, Author,Title etc. The BLOB object contains only the text of the document. It would be very helpful if we could add support select Metadata fields (Columns in the database ) while querying the table. The above support would be helpful and make it a substitute for Oracle UCM (Stellent) Connector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-43) Useless call to String.trim() in org.apache.lcf.ui.util.MultilineParser
[ https://issues.apache.org/jira/browse/CONNECTORS-43?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-43: - Assignee: Karl Wright Useless call to String.trim() in org.apache.lcf.ui.util.MultilineParser --- Key: CONNECTORS-43 URL: https://issues.apache.org/jira/browse/CONNECTORS-43 Project: Lucene Connector Framework Issue Type: Bug Reporter: Mark Miller Assignee: Karl Wright Priority: Trivial {code} nextString.trim(); {code} should likely be: {code} nextString = nextString.trim(); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-40) Classloader-based plug-in architecture would permit LCF to be prebuilt
Classloader-based plug-in architecture would permit LCF to be prebuilt -- Key: CONNECTORS-40 URL: https://issues.apache.org/jira/browse/CONNECTORS-40 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright The LCF architecture at this point requires interaction with the build script in order to add connectors. This is because the connector JSPs and jars need to be added to the appropriate war files. However, there is another architectural option that would eliminate this need, which is to use a custom classloader to pull components from jars that are placed in a specific directory or directories. In order for this to work, however, the UI components of every connector must become part of a jar. That implies that they will need to cease being JSPs, and become instead methods of each connector class. (There is no proscription against using something like Velocity for assembling the necessary output for a connector, however.) Limiting the backwards-compatibility impact of this change will be difficult, especially after a first release is made, so it seems clear that any change along these lines should be attempted before version 1.0 is released. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-41) Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc.
Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc. --- Key: CONNECTORS-41 URL: https://issues.apache.org/jira/browse/CONNECTORS-41 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Priority: Minor Currently there is no logic that informs an output connection of a job start, end, deletion, or other activity. While this would seem to have little to do with an output connector, this feature has been requested by Jack Krupansky as a potential way of deciding when to tell Solr to commit documents, rather than leave it up to Solr's configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-39) Database abstraction layer does not abstract from transactions
[ https://issues.apache.org/jira/browse/CONNECTORS-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-39. --- Resolution: Fixed Database abstraction layer does not abstract from transactions -- Key: CONNECTORS-39 URL: https://issues.apache.org/jira/browse/CONNECTORS-39 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Assignee: Karl Wright The database abstraction layer in LCF does not permit someone to abstract from transaction management. That responsibility is delegated to a different class, which presumes that transaction management is not database-type dependent. Unfortunately, this is not the case. A better code structure would involve creating an abstract base class that performed the transaction management and caching, and causing all database implementations to be derived from it. Then, abstract methods for transaction begin and end could be readily defined. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-35) Need a way to reset LCF when external conditions change
[ https://issues.apache.org/jira/browse/CONNECTORS-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-35. --- Resolution: Fixed Have decided that the current functionality is adequate, and no further work needs to be done. Need a way to reset LCF when external conditions change --- Key: CONNECTORS-35 URL: https://issues.apache.org/jira/browse/CONNECTORS-35 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework agents process, Framework core, Framework crawler agent Reporter: Karl Wright Assignee: Karl Wright When a change is made external to LCF, such as a Solr configuration change, LCF needs some way for a user to signal that that change took place. For example, a button or link on the view output connection page might signal some undefined global change in the target of that output connection. A similar button or link on the repository connection view page might signal a corresponding change to the underlying repository. Clicking the button would do the following things: (1) It would clear the current version string for all documents that passed through that connection. This would guarantee that the documents would be reingested if and when they were processed the next time. (2) It would reset the last job time value for all jobs affected by the connection to zero. This would guarantee that all documents belonging to that job would be rechecked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-37) LCF should use an XML configuration file, not the simple name/value config file it currently has
LCF should use an XML configuration file, not the simple name/value config file it currently has Key: CONNECTORS-37 URL: https://issues.apache.org/jira/browse/CONNECTORS-37 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright LCF's configuration file is limited in what it can specify, and XML configuration files seem to offer more flexibility and are the modern norm. Before backwards compatibility becomes an issue, it may therefore be worth converting the property file reader to use XML rather than name/value format. It would also be nice to be able to fold the logging configuration into the same file, if this seems possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-37) LCF should use an XML configuration file, not the simple name/value config file it currently has
[ https://issues.apache.org/jira/browse/CONNECTORS-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874000#action_12874000 ] Karl Wright commented on CONNECTORS-37: --- Which comes first, chicken or egg? The current properties file specifies quite a bit of stuff about database implementation and access, so obviously that can't go into the database. Also, the pointer to the logging configuration file, and any other file pointers, probably should stay out of the database, since these tend to be local instance configuration rather than global configuration. While I'm sure that there are still *some* configuration parameters that are legitimately global in nature, most of the serious configuration (like connections, authorities, jobs, etc.) are already in the database. So maybe this ticket should read, ... excluding all global configuration information, which should be moved to a database table... The driver behind this ticket, FWIW, is a complaint that configuring LCF requires repeated user interaction with the database - and that user prefers solr-style XML config files instead. I don't necessarily buy that view, but using XML instead of name/value pairs seemed like a wise precaution. ;-) LCF should use an XML configuration file, not the simple name/value config file it currently has Key: CONNECTORS-37 URL: https://issues.apache.org/jira/browse/CONNECTORS-37 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright LCF's configuration file is limited in what it can specify, and XML configuration files seem to offer more flexibility and are the modern norm. Before backwards compatibility becomes an issue, it may therefore be worth converting the property file reader to use XML rather than name/value format. It would also be nice to be able to fold the logging configuration into the same file, if this seems possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-37) LCF should use an XML configuration file, not the simple name/value config file it currently has
[ https://issues.apache.org/jira/browse/CONNECTORS-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874035#action_12874035 ] Karl Wright commented on CONNECTORS-37: --- I am not happy with the idea of configuration living in both the database and in an XML file. The idea that you can somehow read the XML configuration just once the first time LCF is started seems rife with potential problems. Far from improving the user experience, I think that the proposed design would instead create enormous confusion. Perhaps the problem is that Mr. Krupansky is attempting to do too much with a single configuration file here. It would be perfectly reasonable to introduce a read setup information command that would read what is effectively a sequence of commands from an XML file. However, that command file would be an execute once kind of affair - although it could be coded in such a way as to ignore the definition of entities that already exist in the database. Nevertheless, such a file would have a very different usage pattern than the configuration file as it exists today, so I'd have a lot of concern using the same configuration file for both purposes. LCF should use an XML configuration file, not the simple name/value config file it currently has Key: CONNECTORS-37 URL: https://issues.apache.org/jira/browse/CONNECTORS-37 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright LCF's configuration file is limited in what it can specify, and XML configuration files seem to offer more flexibility and are the modern norm. Before backwards compatibility becomes an issue, it may therefore be worth converting the property file reader to use XML rather than name/value format. It would also be nice to be able to fold the logging configuration into the same file, if this seems possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-39) Database abstraction layer does not abstract from transactions
[ https://issues.apache.org/jira/browse/CONNECTORS-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-39: - Assignee: Karl Wright Database abstraction layer does not abstract from transactions -- Key: CONNECTORS-39 URL: https://issues.apache.org/jira/browse/CONNECTORS-39 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright Assignee: Karl Wright The database abstraction layer in LCF does not permit someone to abstract from transaction management. That responsibility is delegated to a different class, which presumes that transaction management is not database-type dependent. Unfortunately, this is not the case. A better code structure would involve creating an abstract base class that performed the transaction management and caching, and causing all database implementations to be derived from it. Then, abstract methods for transaction begin and end could be readily defined. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-39) Database abstraction layer does not abstract from transactions
Database abstraction layer does not abstract from transactions -- Key: CONNECTORS-39 URL: https://issues.apache.org/jira/browse/CONNECTORS-39 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Karl Wright The database abstraction layer in LCF does not permit someone to abstract from transaction management. That responsibility is delegated to a different class, which presumes that transaction management is not database-type dependent. Unfortunately, this is not the case. A better code structure would involve creating an abstract base class that performed the transaction management and caching, and causing all database implementations to be derived from it. Then, abstract methods for transaction begin and end could be readily defined. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-36) The Solr connector's UI method of handling arguments is limited and non-intuitive
[ https://issues.apache.org/jira/browse/CONNECTORS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-36. --- Assignee: Karl Wright Resolution: Fixed Revised UI as stipulated. r946090. The Solr connector's UI method of handling arguments is limited and non-intuitive - Key: CONNECTORS-36 URL: https://issues.apache.org/jira/browse/CONNECTORS-36 Project: Lucene Connector Framework Issue Type: Improvement Components: Lucene/SOLR connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor The arguments are currently ordered by name, and are stored in a simple hash, meaning that they cannot be multivalued. Furthermore you cannot edit arguments; you can only delete and replace them. It would be better if: - Argument names were ordered, but values appeared in the order they were entered. - Each argument value appeared in a text box, so it could be edited directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-35) Need a way to reset LCF when external conditions change
[ https://issues.apache.org/jira/browse/CONNECTORS-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867513#action_12867513 ] Karl Wright commented on CONNECTORS-35: --- Added the ability to perform this reset from the view output connection screen. Still not sure if we really need a repository-connection equivalent; that's in any case much harder, because the ingeststatus table has no column at this time containing the repository connection name by itself. r944298 Need a way to reset LCF when external conditions change --- Key: CONNECTORS-35 URL: https://issues.apache.org/jira/browse/CONNECTORS-35 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework agents process, Framework core, Framework crawler agent Reporter: Karl Wright Assignee: Karl Wright When a change is made external to LCF, such as a Solr configuration change, LCF needs some way for a user to signal that that change took place. For example, a button or link on the view output connection page might signal some undefined global change in the target of that output connection. A similar button or link on the repository connection view page might signal a corresponding change to the underlying repository. Clicking the button would do the following things: (1) It would clear the current version string for all documents that passed through that connection. This would guarantee that the documents would be reingested if and when they were processed the next time. (2) It would reset the last job time value for all jobs affected by the connection to zero. This would guarantee that all documents belonging to that job would be rechecked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-35) Need a way to reset LCF when external conditions change
Need a way to reset LCF when external conditions change --- Key: CONNECTORS-35 URL: https://issues.apache.org/jira/browse/CONNECTORS-35 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework agents process, Framework core, Framework crawler agent Reporter: Karl Wright When a change is made external to LCF, such as a Solr configuration change, LCF needs some way for a user to signal that that change took place. For example, a button or link on the view output connection page might signal some undefined global change in the target of that output connection. A similar button or link on the repository connection view page might signal a corresponding change to the underlying repository. Clicking the button would do the following things: (1) It would clear the current version string for all documents that passed through that connection. This would guarantee that the documents would be reingested if and when they were processed the next time. (2) It would reset the last job time value for all jobs affected by the connection to zero. This would guarantee that all documents belonging to that job would be rechecked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-35) Need a way to reset LCF when external conditions change
[ https://issues.apache.org/jira/browse/CONNECTORS-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-35: - Assignee: Karl Wright Need a way to reset LCF when external conditions change --- Key: CONNECTORS-35 URL: https://issues.apache.org/jira/browse/CONNECTORS-35 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework agents process, Framework core, Framework crawler agent Reporter: Karl Wright Assignee: Karl Wright When a change is made external to LCF, such as a Solr configuration change, LCF needs some way for a user to signal that that change took place. For example, a button or link on the view output connection page might signal some undefined global change in the target of that output connection. A similar button or link on the repository connection view page might signal a corresponding change to the underlying repository. Clicking the button would do the following things: (1) It would clear the current version string for all documents that passed through that connection. This would guarantee that the documents would be reingested if and when they were processed the next time. (2) It would reset the last job time value for all jobs affected by the connection to zero. This would guarantee that all documents belonging to that job would be rechecked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-33) Need a wiki page for people who want to operate LCF programmatically
[ https://issues.apache.org/jira/browse/CONNECTORS-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-33. --- Resolution: Fixed Here's the page: https://cwiki.apache.org/confluence/display/CONNECTORS/Programmatic+Operation+of+LCF Need a wiki page for people who want to operate LCF programmatically Key: CONNECTORS-33 URL: https://issues.apache.org/jira/browse/CONNECTORS-33 Project: Lucene Connector Framework Issue Type: Improvement Components: Documentation Reporter: Karl Wright The necessary commands are present, but we still need a wiki page to document how to manipulate LCF programmatically. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-34) eRoom authority and connector
[ https://issues.apache.org/jira/browse/CONNECTORS-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12865868#action_12865868 ] Karl Wright commented on CONNECTORS-34: --- .ch/.cn - so close. ;-) eRoom authority and connector - Key: CONNECTORS-34 URL: https://issues.apache.org/jira/browse/CONNECTORS-34 Project: Lucene Connector Framework Issue Type: New Feature Reporter: Karl Wright eRoom has a SOAP API which looks like it has enough power to perhaps implement a connector and an authority. The eRoom API url is here (and yes, it is a chinese url, but is legit): https://eroom.abraxas.ch/eroomHelp/en/API_Help/Api.htm#home_api.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-33) Need a wiki page for people who want to operate LCF programmatically
Need a wiki page for people who want to operate LCF programmatically Key: CONNECTORS-33 URL: https://issues.apache.org/jira/browse/CONNECTORS-33 Project: Lucene Connector Framework Issue Type: Improvement Components: Documentation Reporter: Karl Wright The necessary commands are present, but we still need a wiki page to document how to manipulate LCF programmatically. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-29) Credentials are not properly encoded when sent to JCIFS, making passwords with %'s or #'s not work properly
[ https://issues.apache.org/jira/browse/CONNECTORS-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-29: - Assignee: Karl Wright Credentials are not properly encoded when sent to JCIFS, making passwords with %'s or #'s not work properly --- Key: CONNECTORS-29 URL: https://issues.apache.org/jira/browse/CONNECTORS-29 Project: Lucene Connector Framework Issue Type: Bug Components: JCIFS connector Reporter: Karl Wright Assignee: Karl Wright The credentials assembled by the JCIFS connector do not properly encode usernames, passwords using %-encoding as JCIFS expects. This leads to passwords with %'s or #'s in them not working properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-21) Authority service needed that knows how to obtain SIDs from a Kerberos principal
[ https://issues.apache.org/jira/browse/CONNECTORS-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-21. --- Resolution: Fixed I created a Java authority instead, using JNDI, so this is moot. Authority service needed that knows how to obtain SIDs from a Kerberos principal Key: CONNECTORS-21 URL: https://issues.apache.org/jira/browse/CONNECTORS-21 Project: Lucene Connector Framework Issue Type: Improvement Components: Mod-authz-annotate Reporter: Karl Wright The code that was granted to Apache from MetaCarta intentionally did not include an authority service that knows how to obtain SIDs from a Kerberos principal. This will invalidate the security enforcement for the FileNet, Meridio, and SharePoint connectors, since these use AD as their primary security model. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-31) For the Solr LCF security filter plugin, establish a concept of session to improve performance
For the Solr LCF security filter plugin, establish a concept of session to improve performance -- Key: CONNECTORS-31 URL: https://issues.apache.org/jira/browse/CONNECTORS-31 Project: Lucene Connector Framework Issue Type: Improvement Reporter: Karl Wright Instead of only allowing an authenticated user name to be passed to the LCFSecurityFilter SearchComponent, improve this to return a security token and optionally receive the security token as well. Then it will be possible for it to make the access tokens sticky, reducing load on the authority service on situations where multiple searches occur in each session. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-27) Add support for observation to the crawler agent
[ https://issues.apache.org/jira/browse/CONNECTORS-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857861#action_12857861 ] Karl Wright commented on CONNECTORS-27: --- I understand what your proposed infrastructure does. What I don't understand is the use case. It seems to me like all you are doing is adding a poll method to a repository connector. But there already is one. Can you provide a case which demonstrates the need for this infrastructure? Add support for observation to the crawler agent Key: CONNECTORS-27 URL: https://issues.apache.org/jira/browse/CONNECTORS-27 Project: Lucene Connector Framework Issue Type: New Feature Components: Framework crawler agent Reporter: Ralph Benjamin Ruijs Priority: Minor Attachments: Added_observation_logic_to_the_crawler.patch When crawling a large repository, it could take a lot of time before changes are propagated to Solr. You can add an event listener to the repository, and be notified about changes. The crawler will ensure you have a complete copy in case of missed events. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (CONNECTORS-16) JCIFS connector's document fingerprinting feature is not general enough
[ https://issues.apache.org/jira/browse/CONNECTORS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-16: - Assignee: Karl Wright JCIFS connector's document fingerprinting feature is not general enough --- Key: CONNECTORS-16 URL: https://issues.apache.org/jira/browse/CONNECTORS-16 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework agents process, Framework crawler agent, GTS connector, JCIFS connector, LiveLink connector, Lucene/SOLR connector, Meridio connector, RSS connector, SharePoint connector, Web connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor The JCIFS connector has a feature, called fingerprinting, which allows it to classify documents according to ability of the back-end to index that content. Right at the moment, this fingerprinter is capable of recognizing PDFs, Microsoft Office files, and text files as being indexable. One could imagine, though, that different SOLR plugins, etc. might have more capability than that. Also, other connectors could potentially benefit from similar technology, specifically any connector that deals with binary documents. One approach to solving this problem would be to remove the feature entirely, and allow whatever pipeline exists in SOLR determine the indexability after the fact. The reason this feature was added at MetaCarta, however, is that it may be possible to exclude an un-useful document without having to fetch the whole thing, and (at least for MetaCarta clients) the number of unindexable files of gigantic size was a big concern. Another approach might be to tie the functionality in with the output connector interface, so that an output connector would (somehow) determine applicability of a document. This would require some care to make it possible to fingerprint without having to download the entire document, but would otherwise have the correct overall structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-24) SOLR connector needs the ability to ingest metadata
[ https://issues.apache.org/jira/browse/CONNECTORS-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-24. --- Resolution: Fixed Oops, I'd forgotten that this was actually already done. SOLR connector needs the ability to ingest metadata --- Key: CONNECTORS-24 URL: https://issues.apache.org/jira/browse/CONNECTORS-24 Project: Lucene Connector Framework Issue Type: Improvement Components: Lucene/SOLR connector Reporter: Karl Wright The SOLR connector is pretty bare-bones at the moment, and even lacks the ability to transmit metadata to SOLR. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-23) Command documentation could benefit from usage information
[ https://issues.apache.org/jira/browse/CONNECTORS-23?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-23: - Assignee: Karl Wright Command documentation could benefit from usage information -- Key: CONNECTORS-23 URL: https://issues.apache.org/jira/browse/CONNECTORS-23 Project: Lucene Connector Framework Issue Type: Improvement Components: Documentation Reporter: Damien Mabin Assignee: Karl Wright Priority: Minor It's about the page : [Build Deploy|http://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Build+and+Deploy+Lucene+Connectors+Framework] In the paragraph about Commands, each commands should be associate with an example of use, something like that : ||Core Command Class||Function|| |org.apache.lcf.core.DBCreate|Create LCF database instance eg : java org.apache.lcf.core.DBCreate UserName Password| |org.apache.lcf.core.DBDrop|Drop LCF database instance eg : java org.apache.lcf.core.DBDrop| -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-16) JCIFS connector's document fingerprinting feature is not general enough
JCIFS connector's document fingerprinting feature is not general enough --- Key: CONNECTORS-16 URL: https://issues.apache.org/jira/browse/CONNECTORS-16 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework agents process, Framework crawler agent, GTS connector, JCIFS connector, LiveLink connector, Lucene/SOLR connector, Meridio connector, RSS connector, SharePoint connector, Web connector Reporter: Karl Wright Priority: Minor The JCIFS connector has a feature, called fingerprinting, which allows it to classify documents according to ability of the back-end to index that content. Right at the moment, this fingerprinter is capable of recognizing PDFs, Microsoft Office files, and text files as being indexable. One could imagine, though, that different SOLR plugins, etc. might have more capability than that. Also, other connectors could potentially benefit from similar technology, specifically any connector that deals with binary documents. One approach to solving this problem would be to remove the feature entirely, and allow whatever pipeline exists in SOLR determine the indexability after the fact. The reason this feature was added at MetaCarta, however, is that it may be possible to exclude an un-useful document without having to fetch the whole thing, and (at least for MetaCarta clients) the number of unindexable files of gigantic size was a big concern. Another approach might be to tie the functionality in with the output connector interface, so that an output connector would (somehow) determine applicability of a document. This would require some care to make it possible to fingerprint without having to download the entire document, but would otherwise have the correct overall structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-15) Documentum Connector testing code references a not-present class
[ https://issues.apache.org/jira/browse/CONNECTORS-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-15: -- Component/s: Documentum connector Documentum Connector testing code references a not-present class Key: CONNECTORS-15 URL: https://issues.apache.org/jira/browse/CONNECTORS-15 Project: Lucene Connector Framework Issue Type: Test Components: Documentum connector Reporter: Karl Wright The documentum connector Java testing code references a class from TrinityTechnologies, which was not granted. This class reference should be removed and replaced by direct references to the appropriate DFC methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-4) Submit other package changes supplied with software grant upstream to the proper projects
[ https://issues.apache.org/jira/browse/CONNECTORS-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-4: - Component/s: LiveLink connector Submit other package changes supplied with software grant upstream to the proper projects - Key: CONNECTORS-4 URL: https://issues.apache.org/jira/browse/CONNECTORS-4 Project: Lucene Connector Framework Issue Type: Task Components: Framework agents process, Framework crawler agent, LiveLink connector, Meridio connector, RSS connector, SharePoint connector, Web connector Reporter: Karl Wright Assignee: Karl Wright The code granted by MetaCarta depends on certain specific feature additions and changes MetaCarta made to some packages it depends upon, specifically jCIFS, commons-httpclient, and xerces-j. These changes should be percolated accordingly. They can be found in the tarball under the directory upstream-diffs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-4) Submit other package changes supplied with software grant upstream to the proper projects
[ https://issues.apache.org/jira/browse/CONNECTORS-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837779#action_12837779 ] Karl Wright commented on CONNECTORS-4: -- HttpClient team wants us to upgrade to their latest release, which is 4.1. They claim this fixes 2 of the 3 patches I submitted. For the record, the patches were submitted under tickets: HTTPCLIENT-917 HTTPCLIENT-918 HTTPCLIENT-919 The one they rejected outright was ticket HTTPCLIENT-919, for reasons that they believed it violated Apache policy as pertaining to potential IP infringement, specifically because NTLM is a proprietary authentication and authorization scheme. There was no indication that they were aware of any specific patent issues, but that apparently is not the key point. If this reasoning stands, I intend to create two additional tickets - one for moving to HttpClient 4.1, and one for modifying the build scripts to obtain an appropriate NTLM implementation from some non-Apache open-source project. Submit other package changes supplied with software grant upstream to the proper projects - Key: CONNECTORS-4 URL: https://issues.apache.org/jira/browse/CONNECTORS-4 Project: Lucene Connector Framework Issue Type: Task Reporter: Karl Wright Assignee: Karl Wright The code granted by MetaCarta depends on certain specific feature additions and changes MetaCarta made to some packages it depends upon, specifically jCIFS, commons-httpclient, and xerces-j. These changes should be percolated accordingly. They can be found in the tarball under the directory upstream-diffs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-12) Need to make use of tabs and spaces consistent in code base
[ https://issues.apache.org/jira/browse/CONNECTORS-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-12: -- Priority: Minor (was: Major) Description: Some java files have tabs, some have spaces. Any individual file has either one or the other, but not both. We should decide which one we prefer, or adopt the Apache standard if there is one, and convert accordingly. (The jsps are all consistent and use only tabs, which in my opinion should remain because their mixed nature makes spaces hard to work with in some editors, like scite.) was: Some java files have tabs, some have spaces. We should decide which one we prefer, or adopt the Apache standard if there is one, and convert accordingly. (The jsps are all consistent and use only tabs, which in my opinion should remain because their mixed nature makes spaces hard to work with in some editors, like scite.) Need to make use of tabs and spaces consistent in code base --- Key: CONNECTORS-12 URL: https://issues.apache.org/jira/browse/CONNECTORS-12 Project: Lucene Connector Framework Issue Type: Task Reporter: Karl Wright Priority: Minor Some java files have tabs, some have spaces. Any individual file has either one or the other, but not both. We should decide which one we prefer, or adopt the Apache standard if there is one, and convert accordingly. (The jsps are all consistent and use only tabs, which in my opinion should remain because their mixed nature makes spaces hard to work with in some editors, like scite.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-3) Ant build needs to be created for code base
[ https://issues.apache.org/jira/browse/CONNECTORS-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-3. -- Resolution: Fixed Ant builds are complete for the java part of the project. Still need builds for C part and for documentation, but will open separate tickets for those. Ant build needs to be created for code base --- Key: CONNECTORS-3 URL: https://issues.apache.org/jira/browse/CONNECTORS-3 Project: Lucene Connector Framework Issue Type: Task Reporter: Karl Wright The code granted by MetaCarta was built within a debian system. It would be much more consistent with Apache philosophy to make a self-contained ant build for the code base. In the future, if debian packages are again required, they could simply wrap the ant build. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-2) Revamp package names and paths to remove MetaCarta references
[ https://issues.apache.org/jira/browse/CONNECTORS-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-2. -- Resolution: Fixed Code reorganized as described. Revamp package names and paths to remove MetaCarta references - Key: CONNECTORS-2 URL: https://issues.apache.org/jira/browse/CONNECTORS-2 Project: Lucene Connector Framework Issue Type: Task Reporter: Karl Wright The software grant from MetaCarta will not be reorganized prior to the grant, so MetaCarta-specific package and class names will be present. The code needs to be appropriately rearranged to adhere to Apache package-name standards. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.