[jira] Issue Comment Edited: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920801#action_12920801 ] Jack Krupansky edited comment on CONNECTORS-118 at 10/13/10 7:35 PM: - I have personally written unit tests that generated most of those formats which Aperture then extracted. See: http://sourceforge.net/apps/trac/aperture/wiki/SubCrawlers org.apache.tools.bzip2 - BZIP2 archives. java.util.zip.GZIPInputStream - GZIP archives. javax.mail - message/rfc822-style messages and mbox files. org.apache.tools.tar - tar archives. was (Author: jkrupan): One of those VFS links points to all the Java packages used to access the list of archive formats I listed. I have personally written unit tests that generated most of those formats which Aperture then extracted. > Crawled archive files should be expanded into their constituent files > - > > Key: CONNECTORS-118 > URL: https://issues.apache.org/jira/browse/CONNECTORS-118 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent >Reporter: Jack Krupansky > > Archive files such as zip, mbox, tar, etc. should be expanded into their > constituent files during crawling of repositories so that any output > connector would output the flattened archive. > This could be an option, defaulted to ON, since someone may want to implement > a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920805#action_12920805 ] Jack Krupansky commented on CONNECTORS-118: --- At least for file system crawls we can depend on modification date to decide whether to re-crawl an archive file, can't we? I wouldn't rate crawling of archive files over the web efficiently too high a priority. > Crawled archive files should be expanded into their constituent files > - > > Key: CONNECTORS-118 > URL: https://issues.apache.org/jira/browse/CONNECTORS-118 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent >Reporter: Jack Krupansky > > Archive files such as zip, mbox, tar, etc. should be expanded into their > constituent files during crawling of repositories so that any output > connector would output the flattened archive. > This could be an option, defaulted to ON, since someone may want to implement > a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920801#action_12920801 ] Jack Krupansky commented on CONNECTORS-118: --- One of those VFS links points to all the Java packages used to access the list of archive formats I listed. I have personally written unit tests that generated most of those formats which Aperture then extracted. > Crawled archive files should be expanded into their constituent files > - > > Key: CONNECTORS-118 > URL: https://issues.apache.org/jira/browse/CONNECTORS-118 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent >Reporter: Jack Krupansky > > Archive files such as zip, mbox, tar, etc. should be expanded into their > constituent files during crawling of repositories so that any output > connector would output the flattened archive. > This could be an option, defaulted to ON, since someone may want to implement > a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920787#action_12920787 ] Jack Krupansky commented on CONNECTORS-118: --- Aperture's approach was just a starting point for discussion for how to form an id for a file in an archive file. As long as the MCF rules are functionally equivalent to the Apache VFS rules, we should be okay. In short, my proposal does not have a requirement for what an id should look like, just a suggestion. > Crawled archive files should be expanded into their constituent files > - > > Key: CONNECTORS-118 > URL: https://issues.apache.org/jira/browse/CONNECTORS-118 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent >Reporter: Jack Krupansky > > Archive files such as zip, mbox, tar, etc. should be expanded into their > constituent files during crawling of repositories so that any output > connector would output the flattened archive. > This could be an option, defaulted to ON, since someone may want to implement > a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920730#action_12920730 ] Jack Krupansky commented on CONNECTORS-118: --- Support within the file system connector is obviously the higher priority. Windows shares as well. And FTP/SFTP. > Crawled archive files should be expanded into their constituent files > - > > Key: CONNECTORS-118 > URL: https://issues.apache.org/jira/browse/CONNECTORS-118 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent >Reporter: Jack Krupansky > > Archive files such as zip, mbox, tar, etc. should be expanded into their > constituent files during crawling of repositories so that any output > connector would output the flattened archive. > This could be an option, defaulted to ON, since someone may want to implement > a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920720#action_12920720 ] Jack Krupansky commented on CONNECTORS-118: --- Just to be clear, this subcrawling proosal does not depend on Apache VFS, but as does Aperture it simply borrows the naming convention for representing the id for each file as a pseudo-URL, not a real URL. So, if somebody wants to de-reference one of these pseudo URLS they must: 1) Separate the prefix, parent-object-uri, and path from the pseudo-URL. 2) Fetch the file from the parent-object-uri. 3) Use an access library based on the prefix to extract the file at the path from within the fetched archive. > Crawled archive files should be expanded into their constituent files > - > > Key: CONNECTORS-118 > URL: https://issues.apache.org/jira/browse/CONNECTORS-118 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent >Reporter: Jack Krupansky > > Archive files such as zip, mbox, tar, etc. should be expanded into their > constituent files during crawling of repositories so that any output > connector would output the flattened archive. > This could be an option, defaulted to ON, since someone may want to implement > a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920711#action_12920711 ] Jack Krupansky commented on CONNECTORS-118: --- Subcrawling is based on the file type (zip, tar, gzip, bzip2, mbox, jar, etc.), not the type of repository that contains it. I can't speak about all repository types, but subcrawling would apply to web and SharePoint in addition to file system and share crawling. Basically, any repository type that returns files, as opposed to say the JDBC connector which is returning a row of data values rather than a file. > Crawled archive files should be expanded into their constituent files > - > > Key: CONNECTORS-118 > URL: https://issues.apache.org/jira/browse/CONNECTORS-118 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent >Reporter: Jack Krupansky > > Archive files such as zip, mbox, tar, etc. should be expanded into their > constituent files during crawling of repositories so that any output > connector would output the flattened archive. > This could be an option, defaulted to ON, since someone may want to implement > a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920704#action_12920704 ] Jack Krupansky commented on CONNECTORS-118: --- Karl correctly points out that "The key question here is how you describe the component of an archive. There must be a URL to describe it..." I am basing my request on the subcrawling feature of Aperture, which is basing archive support on Apache Commons VFS. See: http://sourceforge.net/apps/trac/aperture/wiki/SubCrawlers Which says: The uris of the data objects found inside other data objects have a fixed form, consisting of three basic parts: :!/ * - the uri prefix, characteristic for a particular SubCrawler, returned by the SubCrawlerFactory.getUriPrefix() method * - the uri of the parent data object, it is obtained from the parentMetadata parameter to the subCrawl() method, by calling RDFContainer.getDescribedUri() * - an internal path of the 'child' data object inside the 'parent' data object This scheme has been inspired by the apache commons VFS project, homepaged under http://commons.apache.org/vfs See: http://commons.apache.org/vfs/filesystems.html Which says: Provides read-only access to the contents of Zip, Jar and Tar files. URI Format zip:// arch-file-uri [! absolute-path ] jar:// arch-file-uri [! absolute-path ] tar:// arch-file-uri [! absolute-path ] tgz:// arch-file-uri [! absolute-path ] tbz2:// arch-file-uri [! absolute-path ] Where arch-file-uri refers to a file of any supported type, including other zip files. Note: if you would like to use the ! as normal character it must be escaped using %21. tgz and tbz2 are convenience for tar:gz and tar:bz2. Examples jar:../lib/classes.jar!/META-INF/manifest.mf zip:http://somehost/downloads/somefile.zip jar:zip:outer.zip!/nested.jar!/somedir jar:zip:outer.zip!/nested.jar!/some%21dir tar:gz:http://anyhost/dir/mytar.tar.gz!/mytar.tar!/path/in/tar/README.txt tgz:file://anyhost/dir/mytar.tgz!/somepath/somefile Provides read-only access to the contents of gzip and bzip2 files. URI Format gz:// compressed-file-uri bz2:// compressed-file-uri Where compressed-file-uri refers to a file of any supported type. There is no need to add a ! part to the uri if you read the content of the file you always will get the uncompressed version. Examples gz:/my/gz/file.gz > Crawled archive files should be expanded into their constituent files > - > > Key: CONNECTORS-118 > URL: https://issues.apache.org/jira/browse/CONNECTORS-118 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent >Reporter: Jack Krupansky > > Archive files such as zip, mbox, tar, etc. should be expanded into their > constituent files during crawling of repositories so that any output > connector would output the flattened archive. > This could be an option, defaulted to ON, since someone may want to implement > a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-116) Possibly remove memex connector depending upon legal resolution
[ https://issues.apache.org/jira/browse/CONNECTORS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920568#action_12920568 ] Jack Krupansky commented on CONNECTORS-116: --- It would be nice to see a comment about what would be required to add Memex support back. I note the following statement in the original incubation submission: "It is unlikely that EMC, OpenText, Memex, or IBM would grant Apache-license-compatible use of these client libraries. Thus, the expectation is that users of these connectors obtain the necessary client libraries from the owners prior to building or using the corresponding connector. An alternative would be to undertake a clean-room implementation of the client API's, which may well yield suitable results in some cases (LiveLink, Memex, FileNet), while being out of reach in others (Documentum). Conditional compilation, for the short term, is thus likely to be a necessity." Is it only the Memex connector that now has this problem? Do we need do a clean-room implementation for Memex? For any of the others? FWIW, I don't see a Google Connector for Memex. > Possibly remove memex connector depending upon legal resolution > --- > > Key: CONNECTORS-116 > URL: https://issues.apache.org/jira/browse/CONNECTORS-116 > Project: ManifoldCF > Issue Type: Task > Components: Memex connector >Reporter: Robert Muir >Assignee: Robert Muir > > Apparently there is an IP problem with the memex connector code. > Depending upon what apache legal says, we will take any action under this > issue publicly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909036#action_12909036 ] Jack Krupansky commented on CONNECTORS-98: -- Looks good. This meets meets my expectations. Any further tweaks that might arise would be distinct Jira issues. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908581#action_12908581 ] Jack Krupansky commented on CONNECTORS-98: -- Just to confirm, as requested, that I am comfortable sticking with connection name (and job name, etc.) in API paths as opposed to using a more abstract "id" since we seem to have an encoding convention to deal with slash so that an ACF object name can always be represented using a single HTTP path segment. Names clearly feel more natural and will be easier to use, both for app code using the ACF API and for CURL and other scripting tools. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908573#action_12908573 ] Jack Krupansky commented on CONNECTORS-98: -- re: Spaces in connection names... A URL path sent by a cleint cannot have an unencoded space. Typically, a space is encoded as "+" or "%20". The final path retrieved by the server app will have the expanded spaces, but the path to be sent via HTTP from the client must be encoded since a space is the delimiter between the path and the HTTP version as per IETF RFC 2616 Sec 5.1: Request-Line = Method SP Request-URI SP HTTP-Version CRLF See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html The first upshot of this is that the client needs to encode spaces as "+" or "%20". Ditto for other reserved chars (described in an earlier comment.) A second upshot of this is that we can't use ".+" in the original path from the client to encode slash since it would come through to the ACF server app as ".". So, either the client would have to write ".%2B" or we pick some other encoding. Lacking some more preferred choice, we could simply propose ".-" as our encoding for slash. Almost any (non-reserved) char will do. Another proposed encoding for slash: double the slash when it is to be embedded in a name and then the adjacent path segments will be merged with a single slash between. I don't like this since it is not encoding the full name as a single path segment, but it may be the cleanest way of dealing with slash. An example, encoding the name "this updated/revised example connection 1.0": GET info/outputconnections/this+updated//revised+example+connection+1.0/ Personally, I lean towards an encoding convention that can result in encoding the name as a single path segment. With the ".." and ".-" encoding convention this example would be: GET info/outputconnections/this+updated.-revised+example+connection+1..0/ > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908244#action_12908244 ] Jack Krupansky commented on CONNECTORS-98: -- And that last reference provides examples that illustrate the convention of using plurals. For example: GET /customers/1234 HTTP/1.1 http://www.infoq.com/articles/rest-introduction The goal here is to use a common style so that people approaching the ACF API will not be surprised and have to re-learn things to use this API. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908240#action_12908240 ] Jack Krupansky commented on CONNECTORS-98: -- On closer examination, all of the examples I have found use an "id" rather than a name in the path. Typically a number or maybe alphanumeric with hyphens. So, we should consider revisiting that aspect of the API. That avoids the slash issue. So, presumably an app using the API would query the list of connections and the JSON would provide the id for each connection and the the app would use those ids for API calls. Another reference: http://www.infoq.com/articles/rest-introduction "Give every 'thing' an ID" > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908190#action_12908190 ] Jack Krupansky commented on CONNECTORS-98: -- I am reading IETF RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax, section 3.3, "Path", among other things. See: http://www.ietf.org/rfc/rfc3986.txt No conclusion yet. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908148#action_12908148 ] Jack Krupansky commented on CONNECTORS-98: -- I am still pondering this embedded slash issue and checking into some things related to it. Maybe Monday I'll have something more concrete to say. For example, I want to make sure I understand the rules for what a path can have in it in a URI and whether simply placing a name at the tail of the path means it can have slashes or other reserved characters in it. My model is that a name should occupy only a single path component. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907875#action_12907875 ] Jack Krupansky commented on CONNECTORS-98: -- It makes sense that GetPathInfo would have removed escapes from the URL. So, either we don't use % escaping or bypass GetPathInfo and manually decode. Maybe we could use backslash for escaping. I'm not sure whether it needs to be % escaped as well. This is only needed if the user has one of the reserved special characters in a name. It would be an issue if it was something that users commonly needed, but it seems like more of an edge case rather than a common case. Encourage people to use alphanumeric, "-", and "_" for names and it won't be an issue for them. And, the real point of the API is access from code. We can provide helper functions for working with names and building API paths. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907758#action_12907758 ] Jack Krupansky commented on CONNECTORS-98: -- re: " the cannot itself contain "/" characters, or it won't be uniquely parseable" Elsewhere I noted that URI-reserved characters need to be encoded with the "%" notation, so this is not a fatal problem. reserved= ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907736#action_12907736 ] Jack Krupansky commented on CONNECTORS-98: -- re: "We could not pass (arguments) except as part of the path." Sure, we could go that route, and list the arguments as path elements, but I think a JSON object (array list of arguments) is acceptable. So, I'd go with the latter (JSON.) > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907735#action_12907735 ] Jack Krupansky commented on CONNECTORS-98: -- I think status is probably technically okay since it is disambiguated by number path elements, but it could be moved to the end: GET outputconnections//status () vs. GET outputconnections/status/ () Same for execute/request: GET outputconnections//request/ (arguments) vs. GET outputconnections/request// (arguments) That way the connection name is always in the same position. So, I'd revise my counter-proposal that way. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907723#action_12907723 ] Jack Krupansky commented on CONNECTORS-98: -- Karl asks how to "handle connection names that are non-7-bit-ascii". I believe that non-7-bit-ASCII and URI-reserved chars would simply be escaped using the "%" notation. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907712#action_12907712 ] Jack Krupansky commented on CONNECTORS-98: -- Some RESTful resource doc: http://en.wikipedia.org/wiki/Representational_State_Transfer http://www.xfront.com/REST-Web-Services.html http://www.oracle.com/technetwork/articles/javase/table3-138001.html The idea of using a plural is that it is the name of the collection and the qualifier (name or argument object) provides the specificity. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907702#action_12907702 ] Jack Krupansky commented on CONNECTORS-98: -- Karl, I did a quick read of your suggestions and mostly they seem fine, including keeping the JSON usage as is, but to be more purely RESTful the connection_name should be part of the path in those cases where it would have been a standalone name, although for PUT it was simply redundant as you noted. Another nuance is to consistently refer to outputconnections as plural. My counter-proposal: outputconnection/get (connection_name) -> GET outputconnections/ () outputconnection/save (output_connection_object) -> PUT outputconnections (output_connection_object) outputconnection/delete (connection_name) -> DELETE outputconnections/ () outputconnection/list () -> GET outputconnections () outputconnection/checkstatus (connection_name) -> GET outputconnections/status/ () outputconnection/execute/ (connection_name, arguments) -> GET outputconnections/request// (arguments) Comments? > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907617#action_12907617 ] Jack Krupansky commented on CONNECTORS-98: -- It sounds reasonable that the connection name is not needed in the path when creating from a JSON object that already has the name in it. So, instead of: PUT outputconnections/ {"outputconnection":} we could have: PUT outputconnections {"outputconnection":} Further, I don't think we need the extra level of object, so that could be: PUT outputconnections {} > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907614#action_12907614 ] Jack Krupansky commented on CONNECTORS-98: -- I have looked at the code a bit but not made any actual progress at a patch, so you can go ahead and take a crack at it. Yes, I'll do the transformation table. As far as updating the wiki, do I have privileges to do that? > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site
[ https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907201#action_12907201 ] Jack Krupansky commented on CONNECTORS-104: --- Simple works best. This enhancement is primarily for the simple use case where a "novice" user tries to do what they think is obvious ("crawl the web pages at this URL"), but without considering all of the potential nuances or how to fully specify the details of their goal. One nuance is whether subdomains are considered part of the domain. I would say "no" if a subdomain was specified by the user and "yes" if no subdomain was specified. Another nuance is whether a "path" is specified to select a subset of a domain. It would be nice to handle that and (optionally) limit the crawl to that path (or sub-paths below it). An example would be to crawl the news archive for a site. > Make it easier to limit a web crawl to a single site > > > Key: CONNECTORS-104 > URL: https://issues.apache.org/jira/browse/CONNECTORS-104 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: Web connector >Reporter: Jack Krupansky >Priority: Minor > > Unless the user explicitly enters an include regex carefully, a web crawl can > quickly get out of control and start crawling the entire web when all the > user may really want is to crawl just a single web site or portion thereof. > So, it would be preferable if either by default or with a simple button the > crawl could be limited to the seed web site(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-104) Make it easier to limit a web crawl to a single site
Make it easier to limit a web crawl to a single site Key: CONNECTORS-104 URL: https://issues.apache.org/jira/browse/CONNECTORS-104 Project: Apache Connectors Framework Issue Type: Improvement Components: Web connector Affects Versions: LCF Release 0.5 Reporter: Jack Krupansky Priority: Minor Fix For: LCF Release 0.5 Unless the user explicitly enters an include regex carefully, a web crawl can quickly get out of control and start crawling the entire web when all the user may really want is to crawl just a single web site or portion thereof. So, it would be preferable if either by default or with a simple button the crawl could be limited to the seed web site(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-101) File system connector would benefit by default crawling rules
[ https://issues.apache.org/jira/browse/CONNECTORS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906969#action_12906969 ] Jack Krupansky commented on CONNECTORS-101: --- +1 > File system connector would benefit by default crawling rules > - > > Key: CONNECTORS-101 > URL: https://issues.apache.org/jira/browse/CONNECTORS-101 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: File system connector >Reporter: Karl Wright >Priority: Minor > > When you add a path to a file system connector job, it should automatically > put in rules that cause it to include all files and directories under that > path. This makes it easier to use, and more easily demonstrable too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-62) Document the LCF API
[ https://issues.apache.org/jira/browse/CONNECTORS-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904768#action_12904768 ] Jack Krupansky commented on CONNECTORS-62: -- Just wanted to update the link to the doc after the LCF/ACF name change for people who search for this issue: https://cwiki.apache.org/confluence/display/CONNECTORS/Programmatic+Operation+of+ACF > Document the LCF API > > > Key: CONNECTORS-62 > URL: https://issues.apache.org/jira/browse/CONNECTORS-62 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: Documentation >Reporter: Karl Wright >Assignee: Karl Wright > > Not only does the LCF API itself need documentation, but so do all the > connector configuration/specification objects, now that they are exposed. > This should probably become part of the developer documentation on the main > LCF website. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-57) Solr output connector option to commit at end of job, by default
[ https://issues.apache.org/jira/browse/CONNECTORS-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904746#action_12904746 ] Jack Krupansky commented on CONNECTORS-57: -- This looks fine so far and should work for me. If I understand the code, the Connector.noteJobComplete method is called when the job completes or is aborted and the SolrConnector.noteJobComplete implementation method unconditionally does a commit. That's fine my my use case, but we probably still want a connection option to disable that commit if the user has some other commit strategy in mind. > Solr output connector option to commit at end of job, by default > > > Key: CONNECTORS-57 > URL: https://issues.apache.org/jira/browse/CONNECTORS-57 > Project: Apache Connectors Framework > Issue Type: Sub-task > Components: Lucene/SOLR connector >Reporter: Jack Krupansky > > By default, Solr will eventually commit documents that have been submitted to > the Solr Cell interface, but the time lag can confuse and annoy people. > Although commit strategy is a difficult issue in general, an option in LCF to > automatically commit at the end of a job, by default, would eliminate a lot > of potential confusion and generally be close to what the user needs. > The desired feature is that there be an option to commit for each job that > uses the Solr output connector. This option would default to "on" (or a > different setting based on some global configuration setting), but the user > may turn it off if commit is only desired upon completion of some jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-41) Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc.
[ https://issues.apache.org/jira/browse/CONNECTORS-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904618#action_12904618 ] Jack Krupansky commented on CONNECTORS-41: -- The notification is certainly "associated" with the output connection... or is it really associated with the job? Originally, my thinking had been that the notification URL would be specified as part of the output connection, but maybe it is an output-specific parameter that gets specified for the job. It could be either. OTOH, maybe a "notification connector" makes more sense and is more general rather than just something to use for Solr. Also, that might provide the way to implement an optional commit for Solr, as a simple notification connection, so that ACF core itself doesn't know or care about Solr or commits or any of that. I think the concept of a notification connector makes sense, but is not essential for release 0.1 or 0.5. I'm open to suggestions. We can do it real simple as a parameter for the Solr output connector or the job, or we could be more general. Tough call. If you feel up to doing the more general feature, fine, but the simple notification URL feature is all that is essential. Also, to be clear about the use case, it is not just Solr commit, but some external app might just want to notify a user that their job has finished and to do whatever other (beyond Solr commit) processing may be needed upon job completion. > Add hooks to output connectors for receiving event notifications, > specifically job start, job end, etc. > --- > > Key: CONNECTORS-41 > URL: https://issues.apache.org/jira/browse/CONNECTORS-41 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: Framework core >Reporter: Karl Wright >Priority: Minor > > Currently there is no logic that informs an output connection of a job start, > end, deletion, or other activity. While this would seem to have little to do > with an output connector, this feature has been requested by Jack Krupansky > as a potential way of deciding when to tell Solr to commit documents, rather > than leave it up to Solr's configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-41) Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc.
[ https://issues.apache.org/jira/browse/CONNECTORS-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904610#action_12904610 ] Jack Krupansky commented on CONNECTORS-41: -- To be clear about the use case, the notification is not to an output connector or output connection per se, but to an external process that is trying to monitor the job status. Kind of a reverse API. The URL that job status notifications should be sent to might be in the same process as Solr or another process that is monitoring Solr. Further, this feature should be of value for any type of output connector, although Solr is my current main interest. > Add hooks to output connectors for receiving event notifications, > specifically job start, job end, etc. > --- > > Key: CONNECTORS-41 > URL: https://issues.apache.org/jira/browse/CONNECTORS-41 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: Framework core >Reporter: Karl Wright >Priority: Minor > > Currently there is no logic that informs an output connection of a job start, > end, deletion, or other activity. While this would seem to have little to do > with an output connector, this feature has been requested by Jack Krupansky > as a potential way of deciding when to tell Solr to commit documents, rather > than leave it up to Solr's configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903559#action_12903559 ] Jack Krupansky commented on CONNECTORS-98: -- I'll be mostly looking through code and thinking it through and looking at the API string changes first, so I may not touch any code for another week, if not longer. Feel free to rename or refactor code at will. I'll probably let you know in advance of what changes I expect to make in the code. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902983#action_12902983 ] Jack Krupansky commented on CONNECTORS-98: -- Karl says "I await your patch." Point well made. There is a great starting point with the current code. A bit of refactoring required. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
[ https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902982#action_12902982 ] Jack Krupansky commented on CONNECTORS-98: -- Karl asks "what do you plan to do for the list and execute verbs?" List would be a GET and execute would be PUT. > API should be "pure" RESTful with the API verb represented using the HTTP > GET/PUT/POST/DELETE methods > - > > Key: CONNECTORS-98 > URL: https://issues.apache.org/jira/browse/CONNECTORS-98 > Project: Apache Connectors Framework > Issue Type: Improvement > Components: API >Affects Versions: LCF Release 0.5 >Reporter: Jack Krupansky > Fix For: LCF Release 0.5 > > > (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) > It has come to my attention that the API would be more "pure" RESTful if the > API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the > input argument identifier represented in the context path. > So, GET outputconnection/get \{"connection_name":__\} would > be GET outputconnections/ > and GET outputconnection/delete \{"connection_name":__\} > would be DELETE outputconnections/ > and GET outputconnection/list would be GET outputconnections > and PUT outputconnection/save > \{"outputconnection":__\} would be PUT > outputconnections/ > \{"outputconnection":__\} > What we have today is certainly workable, but just not as "pure" as some > might desire. It would be better to take care of this before the initial > release so that we never have to answer the question of why it wasn't done as > a "proper" RESTful API. > BTW, I did check to verify that an HttpServlet running under Jetty can > process the DELETE and PUT methods (using the doDelete and doPut method > overrides.) > Also, POST should be usable as an alternative to PUT for API calls that have > large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods
API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods - Key: CONNECTORS-98 URL: https://issues.apache.org/jira/browse/CONNECTORS-98 Project: Apache Connectors Framework Issue Type: Improvement Components: API Affects Versions: LCF Release 0.5 Reporter: Jack Krupansky Fix For: LCF Release 0.5 (This was originally a comment on CONNECTORS-56 dated 7/16/2010.) It has come to my attention that the API would be more "pure" RESTful if the API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the input argument identifier represented in the context path. So, GET outputconnection/get \{"connection_name":__\} would be GET outputconnections/ and GET outputconnection/delete \{"connection_name":__\} would be DELETE outputconnections/ and GET outputconnection/list would be GET outputconnections and PUT outputconnection/save \{"outputconnection":__\} would be PUT outputconnections/ \{"outputconnection":__\} What we have today is certainly workable, but just not as "pure" as some might desire. It would be better to take care of this before the initial release so that we never have to answer the question of why it wasn't done as a "proper" RESTful API. BTW, I did check to verify that an HttpServlet running under Jetty can process the DELETE and PUT methods (using the doDelete and doPut method overrides.) Also, POST should be usable as an alternative to PUT for API calls that have large volumes of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product
[ https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891298#action_12891298 ] Jack Krupansky commented on CONNECTORS-55: -- I checked that H2 feature comparison table, but it did not suggest a great benefit of H2 for LCF. The footprint is a little smaller than Derby and of course a lot smaller than PostgreSQL. One area not in the table that could matter a lot is performance. Any quick thoughts on H2 performance relative to PostgreSQL and Derby? > Bundle database server with LCF packaged product > > > Key: CONNECTORS-55 > URL: https://issues.apache.org/jira/browse/CONNECTORS-55 > Project: Lucene Connector Framework > Issue Type: Sub-task > Components: Installers >Reporter: Jack Krupansky > > The current requirement that the user install and deploy a PostgreSQL server > complicates the installation and deployment of LCF for the user. Installation > and deployment of LCF should be as simple as Solr itself. QuickStart is great > for the low-end and basic evaluation, but a comparable level of simplified > installation and deployment is still needed for full-blown, high-end > environments that need the full performance of a ProstgreSQL-class database > server. So, PostgreSQL should be bundled with the packaged release of LCF so > that installation and deployment of LCF will automatically install and deploy > a subset of the full PostgreSQL distribution that is sufficient for the needs > of LCF. Starting LCF, with or without the LCF UI, should automatically start > the database server. Shutting down LCF should also shutdown the database > server process. > A typical use case would be for a non-developer who is comfortable with Solr > and simply wants to crawl documents from, for example, a SharePoint > repository and feed them into Solr. QuickStart should work well for the low > end or in the early stages of evaluation, but the user would prefer to > evaluate "the real thing" with something resembling a production crawl of > thousands of documents. Such a user might not be a hard-core developer or be > comfortable fiddling with a lot of software components simply to do one > conceptually simple operation. > It should still be possible for the user to supply database server settings > to override the defaults, but the LCF package should have all of the > best-practice settings deemed appropriate for use with LCF. > One downside is that installation and deployment will be platform-specific > since there are multiple processes and PostgreSQL itself requires a > platform-specific installation. > This proposal presumes that PostgreSQL is the best option for the foreseeable > future, but nothing here is intended to preclude support for other database > servers in futures releases. > This proposal should not have any impact on QuickStart packaging or > deployment. > Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-56) All features should be accessible through an API
[ https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889237#action_12889237 ] Jack Krupansky commented on CONNECTORS-56: -- It has come to my attention that the API would be more "pure" RESTful if the API verb was represented using the HTTP GET/PUT/DELETE verb and the input argument identifier represented in the context path. So, GET outputconnection/get \{"connection_name":__\} would be GET outputconnections/ and GET outputconnection/delete \{"connection_name":__\} would be DELETE outputconnections/ and GET outputconnection/list would be GET outputconnections and PUT outputconnection/save \{"outputconnection":__\} would be PUT outputconnections/ \{"outputconnection":__\} What we have today is certainly workable, but just not as "pure" as some might desire. I am not going to classify this as a required issue just yet, but this would be a great time to change things before the API gets cast in concrete. Comments? > All features should be accessible through an API > > > Key: CONNECTORS-56 > URL: https://issues.apache.org/jira/browse/CONNECTORS-56 > Project: Lucene Connector Framework > Issue Type: Sub-task > Components: Framework core >Reporter: Jack Krupansky >Assignee: Karl Wright > > LCF consists of a full-featured crawling engine and a full-featured user > interface to access the features of that engine, but some applications are > better served with a full API that lets the application control the crawling > engine, including creation and editing of connections and creation, editing, > and control of jobs. Put simply, everything that a user can accomplish via > the LCF UI should be doable through an LCF API. All LCF objects should be > queryable through the API. > A primary use case is Solr applications which currently use Aperture for > crawling, but would prefer the full-featured capabilities of LCF as a > crawling engine over Aperture. > I do not wish to over-specify the API in this initial description, but I > think the LCF API should probably be a traditional REST API., with some of > the API elements specified via the context path, some parameters via URL > query parameters, and complex, detailed structures as JSON (or similar.). The > precise details of the API are beyond the scope of this initial description > and will be added incrementally once the high-level approach to the API > becomes reasonably settled. > A job status and event reporting scheme is also needed in conjunction with > the LCF API. That requirement has already been captured as CONNECTORS-41. > The intention for the API is to create, edit, access, and control all of the > objects managed by LCF. The main focus is on repositories, jobs, and status, > and less about document-specific crawling information, but there may be some > benefit to querying crawling status for individual documents as well. > Nothing in this proposal should in any way limit or constrain the features > that will be available in the LCF UI. The intent is that LCF should continue > to have a full-featured UI, but in addition to a full-featured API. > Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-56) All features should be accessible through an API
[ https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888377#action_12888377 ] Jack Krupansky commented on CONNECTORS-56: -- Some cURL and/or Perl test scripts to illustrate use of the API would be helpful. > All features should be accessible through an API > > > Key: CONNECTORS-56 > URL: https://issues.apache.org/jira/browse/CONNECTORS-56 > Project: Lucene Connector Framework > Issue Type: Sub-task > Components: Framework core >Reporter: Jack Krupansky > > LCF consists of a full-featured crawling engine and a full-featured user > interface to access the features of that engine, but some applications are > better served with a full API that lets the application control the crawling > engine, including creation and editing of connections and creation, editing, > and control of jobs. Put simply, everything that a user can accomplish via > the LCF UI should be doable through an LCF API. All LCF objects should be > queryable through the API. > A primary use case is Solr applications which currently use Aperture for > crawling, but would prefer the full-featured capabilities of LCF as a > crawling engine over Aperture. > I do not wish to over-specify the API in this initial description, but I > think the LCF API should probably be a traditional REST API., with some of > the API elements specified via the context path, some parameters via URL > query parameters, and complex, detailed structures as JSON (or similar.). The > precise details of the API are beyond the scope of this initial description > and will be added incrementally once the high-level approach to the API > becomes reasonably settled. > A job status and event reporting scheme is also needed in conjunction with > the LCF API. That requirement has already been captured as CONNECTORS-41. > The intention for the API is to create, edit, access, and control all of the > objects managed by LCF. The main focus is on repositories, jobs, and status, > and less about document-specific crawling information, but there may be some > benefit to querying crawling status for individual documents as well. > Nothing in this proposal should in any way limit or constrain the features > that will be available in the LCF UI. The intent is that LCF should continue > to have a full-featured UI, but in addition to a full-featured API. > Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-60) Agent process should be started automatically
[ https://issues.apache.org/jira/browse/CONNECTORS-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888006#action_12888006 ] Jack Krupansky commented on CONNECTORS-60: -- Karl asks "Let me get this straight. There is a way you can deploy LCF that does everything you are currently asking for. But you are not willing to use it. Why?". The simple answer is that as good as QuickStart is for basic evaluation and low-end production, it is limited by the constraints of derby, so the multi-process configuration of LCF with PostgreSQL is considered superior for production use. So, yes, QuickStart will be used, but the multi-process configuration will be used as well in other situations. Deployment in those higher-end situations should also be as easy as possible. > Agent process should be started automatically > - > > Key: CONNECTORS-60 > URL: https://issues.apache.org/jira/browse/CONNECTORS-60 > Project: Lucene Connector Framework > Issue Type: Sub-task >Reporter: Jack Krupansky > > LCF as it exists today is a bit too complex to run for an average user, > especially with a separate agent process for crawling. LCF should be as easy > to run as Solr is today. QuickStart is a good move in this direction, but the > same user-visible simplicity is needed for full LCF. The separate agent > process is a reasonable design for execution, but a little too cumbersome for > the average user to manage. > Unfortunately, it is expected that starting up a multi-process application > will require platform-specific scripting. > Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-60) Agent process should be started automatically
[ https://issues.apache.org/jira/browse/CONNECTORS-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888000#action_12888000 ] Jack Krupansky commented on CONNECTORS-60: -- Unless I am mistaken, the jetty integration is for QuickStart (single process) only. The issue is for non-QuickStart, multi-process execution. > Agent process should be started automatically > - > > Key: CONNECTORS-60 > URL: https://issues.apache.org/jira/browse/CONNECTORS-60 > Project: Lucene Connector Framework > Issue Type: Sub-task >Reporter: Jack Krupansky > > LCF as it exists today is a bit too complex to run for an average user, > especially with a separate agent process for crawling. LCF should be as easy > to run as Solr is today. QuickStart is a good move in this direction, but the > same user-visible simplicity is needed for full LCF. The separate agent > process is a reasonable design for execution, but a little too cumbersome for > the average user to manage. > Unfortunately, it is expected that starting up a multi-process application > will require platform-specific scripting. > Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-61) Support bundling of LCF with an app
[ https://issues.apache.org/jira/browse/CONNECTORS-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887996#action_12887996 ] Jack Krupansky commented on CONNECTORS-61: -- Some doc is needed on which files or subdirectories are needed to distribute LCF with an app, along with doc on what setup may be required to use LCF when distributed in that manner. > Support bundling of LCF with an app > --- > > Key: CONNECTORS-61 > URL: https://issues.apache.org/jira/browse/CONNECTORS-61 > Project: Lucene Connector Framework > Issue Type: Sub-task > Components: Framework core >Reporter: Jack Krupansky > > It should be possible for an application developer to bundle LCF with an > application to facilitate installation and deployment of the application in > conjunction with LCF. This may (or may not) be as simple as providing > appropriate jar files and documentation for how to use them, but there may be > other components or scripts needed. > There are two options: 1) include the LCF UI along with the other LCF > processes, and 2) exclude the LCF UI and include only the other processes > that can be controlled via the full API. > The database server would be included. > The web app server would be optional since the application may have its own > choice of web app server. > One use case is bundling LCF with Solr or a Solr-based application. > Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-50) Proposal for initial two releases of LCF, including packaged product and full API
[ https://issues.apache.org/jira/browse/CONNECTORS-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated CONNECTORS-50: - Original Estimate: (was: 3360h) Remaining Estimate: (was: 3360h) Description: Currently, LCF has a relatively high-bar for evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications. A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues. Goals: 1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today) 2. LCF as a toolkit for developers needing customized crawling and repository access 3. An API-based crawling engine that can be integrated with applications (as Aperture is today) Larger goals: 1. Make it very easy for users to evaluate LCF. 2. Make it very easy for developers to customize LCF. 3. Make it very easy for appplications to fully manage and control LCF in operation. Two phases: 1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5. 2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0. Phase 1 --- LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later. It would contain roughly the features that are currently in place or currently underway, plus a little more. Specifically, LCF 0.5 would contain these additional capabilities: 1. Plug-in architecture for connectors (CONNECTORS-40 - DONE) 2. Packaged app ready to run with embedded Jetty app server (CONNECTORS-59) 3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup (CONNECTORS-55) 4. Mini-API to initially configure default connections and "example" jobs for file system and web crawl (CONNECTORS-58) 5. Agent process started automatically (CONNECTORS-60) 6. Solr output connector option to commit at end of job, by default (CONNECTORS-57) Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example connections and jobs would permit the user to initiate example crawls of a file system example directory and an example web on the LCF web site with just a couple of clicks (as opposed to the detailed manual setup required today to create repository and output connections and jobs. It is worth considering whether the SharePoint connector could also be included as part of the default package. Users could then add additional connectors and repositories and jobs as desired. Timeframe for release? Level of effort? Phase 2 --- The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a crawling "engine", in additional to the full LCF UI. Call this LCF 1.0. Specifically, LCF 1.0 would contain these additional capabilities: 1. Full API for LCF as a crawling engine (CONNECTORS-56) 2. LCF can be bundled within an app (CONNECTORS-61) 3. LCF event and activity notification for full control by an application (CONNECTORS-41) Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug fixes and minor enhancements might also be added. Timeframe for release? Level of effort? - Issues: - Can we package PostgreSQL with LCF so LCF can set it up? - Or do we need Derby for that purpose? - Managing multiple processes (UI, database, agent, app processes) - What exactly would the API look like? (URL, XML, JSON, YAML?) was: Currently, LCF has a relatively high-bar for evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications. A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues. Goals: 1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today) 2. LCF as a toolkit for developers needing customized crawling and repository access 3. An API-based crawling engine that can be integrated with applications (as Aperture is today) Larger goals: 1. Make it very easy for users to evaluate LCF. 2. Make it very easy for developers to customize LCF. 3. Make it very easy for appplications to fully manage and control LCF in operation. Two phases: 1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5. 2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0. Phase 1 --- LCF 0.5 right out of
[jira] Created: (CONNECTORS-61) Support bundling of LCF with an app
Support bundling of LCF with an app --- Key: CONNECTORS-61 URL: https://issues.apache.org/jira/browse/CONNECTORS-61 Project: Lucene Connector Framework Issue Type: Sub-task Components: Framework core Reporter: Jack Krupansky It should be possible for an application developer to bundle LCF with an application to facilitate installation and deployment of the application in conjunction with LCF. This may (or may not) be as simple as providing appropriate jar files and documentation for how to use them, but there may be other components or scripts needed. There are two options: 1) include the LCF UI along with the other LCF processes, and 2) exclude the LCF UI and include only the other processes that can be controlled via the full API. The database server would be included. The web app server would be optional since the application may have its own choice of web app server. One use case is bundling LCF with Solr or a Solr-based application. Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-60) Agent process should be started automatically
[ https://issues.apache.org/jira/browse/CONNECTORS-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated CONNECTORS-60: - Description: LCF as it exists today is a bit too complex to run for an average user, especially with a separate agent process for crawling. LCF should be as easy to run as Solr is today. QuickStart is a good move in this direction, but the same user-visible simplicity is needed for full LCF. The separate agent process is a reasonable design for execution, but a little too cumbersome for the average user to manage. Unfortunately, it is expected that starting up a multi-process application will require platform-specific scripting. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. was: LCF as it exists today is a bit too complex to run for an average user, especially with a separate agent process for crawling. LCF should be as easy to run as Solr is today. QuickStart is a good move in this direction, but the same user-visible simplicity is needed for LCF. The separate agent process is a reasonable design for execution, but a little too cumbersome for the average user to manage. Unfortunately, it is expected that starting up a multi-process application will require platform-specific scripting. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. > Agent process should be started automatically > - > > Key: CONNECTORS-60 > URL: https://issues.apache.org/jira/browse/CONNECTORS-60 > Project: Lucene Connector Framework > Issue Type: Sub-task >Reporter: Jack Krupansky > > LCF as it exists today is a bit too complex to run for an average user, > especially with a separate agent process for crawling. LCF should be as easy > to run as Solr is today. QuickStart is a good move in this direction, but the > same user-visible simplicity is needed for full LCF. The separate agent > process is a reasonable design for execution, but a little too cumbersome for > the average user to manage. > Unfortunately, it is expected that starting up a multi-process application > will require platform-specific scripting. > Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-60) Agent process should be started automatically
Agent process should be started automatically - Key: CONNECTORS-60 URL: https://issues.apache.org/jira/browse/CONNECTORS-60 Project: Lucene Connector Framework Issue Type: Sub-task Reporter: Jack Krupansky LCF as it exists today is a bit too complex to run for an average user, especially with a separate agent process for crawling. LCF should be as easy to run as Solr is today. QuickStart is a good move in this direction, but the same user-visible simplicity is needed for LCF. The separate agent process is a reasonable design for execution, but a little too cumbersome for the average user to manage. Unfortunately, it is expected that starting up a multi-process application will require platform-specific scripting. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-56) All features should be accessible through an API
[ https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated CONNECTORS-56: - Parent: CONNECTORS-50 Issue Type: Sub-task (was: Improvement) > All features should be accessible through an API > > > Key: CONNECTORS-56 > URL: https://issues.apache.org/jira/browse/CONNECTORS-56 > Project: Lucene Connector Framework > Issue Type: Sub-task > Components: Framework core >Reporter: Jack Krupansky > > LCF consists of a full-featured crawling engine and a full-featured user > interface to access the features of that engine, but some applications are > better served with a full API that lets the application control the crawling > engine, including creation and editing of connections and creation, editing, > and control of jobs. Put simply, everything that a user can accomplish via > the LCF UI should be doable through an LCF API. All LCF objects should be > queryable through the API. > A primary use case is Solr applications which currently use Aperture for > crawling, but would prefer the full-featured capabilities of LCF as a > crawling engine over Aperture. > I do not wish to over-specify the API in this initial description, but I > think the LCF API should probably be a traditional REST API., with some of > the API elements specified via the context path, some parameters via URL > query parameters, and complex, detailed structures as JSON (or similar.). The > precise details of the API are beyond the scope of this initial description > and will be added incrementally once the high-level approach to the API > becomes reasonably settled. > A job status and event reporting scheme is also needed in conjunction with > the LCF API. That requirement has already been captured as CONNECTORS-41. > The intention for the API is to create, edit, access, and control all of the > objects managed by LCF. The main focus is on repositories, jobs, and status, > and less about document-specific crawling information, but there may be some > benefit to querying crawling status for individual documents as well. > Nothing in this proposal should in any way limit or constrain the features > that will be available in the LCF UI. The intent is that LCF should continue > to have a full-featured UI, but in addition to a full-featured API. > Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CONNECTORS-55) Bundle database server with LCF packaged product
[ https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated CONNECTORS-55: - Parent: CONNECTORS-50 Issue Type: Sub-task (was: Improvement) > Bundle database server with LCF packaged product > > > Key: CONNECTORS-55 > URL: https://issues.apache.org/jira/browse/CONNECTORS-55 > Project: Lucene Connector Framework > Issue Type: Sub-task > Components: Framework core >Reporter: Jack Krupansky > > The current requirement that the user install and deploy a PostgreSQL server > complicates the installation and deployment of LCF for the user. Installation > and deployment of LCF should be as simple as Solr itself. QuickStart is great > for the low-end and basic evaluation, but a comparable level of simplified > installation and deployment is still needed for full-blown, high-end > environments that need the full performance of a ProstgreSQL-class database > server. So, PostgreSQL should be bundled with the packaged release of LCF so > that installation and deployment of LCF will automatically install and deploy > a subset of the full PostgreSQL distribution that is sufficient for the needs > of LCF. Starting LCF, with or without the LCF UI, should automatically start > the database server. Shutting down LCF should also shutdown the database > server process. > A typical use case would be for a non-developer who is comfortable with Solr > and simply wants to crawl documents from, for example, a SharePoint > repository and feed them into Solr. QuickStart should work well for the low > end or in the early stages of evaluation, but the user would prefer to > evaluate "the real thing" with something resembling a production crawl of > thousands of documents. Such a user might not be a hard-core developer or be > comfortable fiddling with a lot of software components simply to do one > conceptually simple operation. > It should still be possible for the user to supply database server settings > to override the defaults, but the LCF package should have all of the > best-practice settings deemed appropriate for use with LCF. > One downside is that installation and deployment will be platform-specific > since there are multiple processes and PostgreSQL itself requires a > platform-specific installation. > This proposal presumes that PostgreSQL is the best option for the foreseeable > future, but nothing here is intended to preclude support for other database > servers in futures releases. > This proposal should not have any impact on QuickStart packaging or > deployment. > Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-59) Packaged app ready to run with embedded Jetty app server
Packaged app ready to run with embedded Jetty app server - Key: CONNECTORS-59 URL: https://issues.apache.org/jira/browse/CONNECTORS-59 Project: Lucene Connector Framework Issue Type: Sub-task Components: Framework core Reporter: Jack Krupansky Many potential users of LCF are not necessarily sophisticated developers who are prepared to "work with code", but are able to install packaged software, much as Solr is currently distributed. QuickStart for LCF is a good move in this direction, but similar packaging is needed for full LCF with a production database server. This issue focuses on assuring that full LCF is released as a packaged app suitable for download and immediate use without any additional software development expertise required. Database packaging has already been called out as a distinct issue (CONNECTORS-55), so this issue is more of a catch-all for any lingering work needed to address support for full LCF as a packaged app. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-56) All features should be accessible through an API
[ https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886932#action_12886932 ] Jack Krupansky commented on CONNECTORS-56: -- Karl's suggested approach seems consistent with my own thoughts. More details and discussion to follow, but I'd be interested in more community feedback of the overall, high-level concept before we get too detailed. Also, just to remind people that my suggestion was that a full API would not be a requirement for the initial release. Better to get QuickStart and basic capabilities in peoples' hands, but some aspects of the API, such as factoring needed to facilitate the API might well be better off being done sooner than a second release. So, maybe a portion or foundation of the API would be in the initial release. > All features should be accessible through an API > > > Key: CONNECTORS-56 > URL: https://issues.apache.org/jira/browse/CONNECTORS-56 > Project: Lucene Connector Framework > Issue Type: Improvement > Components: Framework core >Reporter: Jack Krupansky > > LCF consists of a full-featured crawling engine and a full-featured user > interface to access the features of that engine, but some applications are > better served with a full API that lets the application control the crawling > engine, including creation and editing of connections and creation, editing, > and control of jobs. Put simply, everything that a user can accomplish via > the LCF UI should be doable through an LCF API. All LCF objects should be > queryable through the API. > A primary use case is Solr applications which currently use Aperture for > crawling, but would prefer the full-featured capabilities of LCF as a > crawling engine over Aperture. > I do not wish to over-specify the API in this initial description, but I > think the LCF API should probably be a traditional REST API., with some of > the API elements specified via the context path, some parameters via URL > query parameters, and complex, detailed structures as JSON (or similar.). The > precise details of the API are beyond the scope of this initial description > and will be added incrementally once the high-level approach to the API > becomes reasonably settled. > A job status and event reporting scheme is also needed in conjunction with > the LCF API. That requirement has already been captured as CONNECTORS-41. > The intention for the API is to create, edit, access, and control all of the > objects managed by LCF. The main focus is on repositories, jobs, and status, > and less about document-specific crawling information, but there may be some > benefit to querying crawling status for individual documents as well. > Nothing in this proposal should in any way limit or constrain the features > that will be available in the LCF UI. The intent is that LCF should continue > to have a full-featured UI, but in addition to a full-featured API. > Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-58) Mini-API to initially configure default connections and "example" jobs for file system and web crawl
Mini-API to initially configure default connections and "example" jobs for file system and web crawl - Key: CONNECTORS-58 URL: https://issues.apache.org/jira/browse/CONNECTORS-58 Project: Lucene Connector Framework Issue Type: Sub-task Components: Framework core Reporter: Jack Krupansky Creating a basic connection setup to do a relatively simple crawl for a file system or web can be a daunting task for someone new to LCF. So, it would be nice to have a scripting file that supports an abbreviated API (subset of the full API discussed in CONNECTORS-56) sufficient to create a default set of connections and example jobs that the new user can choose from. Beyond this initial need, this script format might be a useful form to "dump" all of the connections and jobs in the LCF database in a form that can be used to recreate an LCF configuration. Kind of a "dump and reload" capability. That in fact might be how the initial example script gets created. Those are two distinct use cases, but could utilize the same feature. The example script could have example jobs to crawl a subdirectory of LCF, crawl the LCF wiki, etc. There could be more than one script. There might be example scripts for each form of connector. This capability should be available for both QuickStart and the general release of LCF. As just one possibility, the script format might be a sequence of JSON expressions, each with an initial string analogous to a servlet path to specify the operation to be performed, followed by the JSON form of the connection or job or other LCF object. Or, some other format might be more suitable. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-57) Solr output connector option to commit at end of job, by default
Solr output connector option to commit at end of job, by default Key: CONNECTORS-57 URL: https://issues.apache.org/jira/browse/CONNECTORS-57 Project: Lucene Connector Framework Issue Type: Sub-task Components: Lucene/SOLR connector Reporter: Jack Krupansky By default, Solr will eventually commit documents that have been submitted to the Solr Cell interface, but the time lag can confuse and annoy people. Although commit strategy is a difficult issue in general, an option in LCF to automatically commit at the end of a job, by default, would eliminate a lot of potential confusion and generally be close to what the user needs. The desired feature is that there be an option to commit for each job that uses the Solr output connector. This option would default to "on" (or a different setting based on some global configuration setting), but the user may turn it off if commit is only desired upon completion of some jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product
[ https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886724#action_12886724 ] Jack Krupansky commented on CONNECTORS-55: -- When Karl says "It *does* limit your ability to use other commands simultaneously" (referring to use of embedded Derby), he is referring to commands executed using the "executecommand" shell script, such as registering and unregistering connectors, which is something typically done once before starting the UI or once every blue moon when you want to support a new type of repository, but not done on as regular a basis as editing connections and jobs and running jobs. The java classes to execute those commands would be, by definition, outside of the LCF process. > Bundle database server with LCF packaged product > > > Key: CONNECTORS-55 > URL: https://issues.apache.org/jira/browse/CONNECTORS-55 > Project: Lucene Connector Framework > Issue Type: Improvement > Components: Framework core >Reporter: Jack Krupansky > > The current requirement that the user install and deploy a PostgreSQL server > complicates the installation and deployment of LCF for the user. Installation > and deployment of LCF should be as simple as Solr itself. QuickStart is great > for the low-end and basic evaluation, but a comparable level of simplified > installation and deployment is still needed for full-blown, high-end > environments that need the full performance of a ProstgreSQL-class database > server. So, PostgreSQL should be bundled with the packaged release of LCF so > that installation and deployment of LCF will automatically install and deploy > a subset of the full PostgreSQL distribution that is sufficient for the needs > of LCF. Starting LCF, with or without the LCF UI, should automatically start > the database server. Shutting down LCF should also shutdown the database > server process. > A typical use case would be for a non-developer who is comfortable with Solr > and simply wants to crawl documents from, for example, a SharePoint > repository and feed them into Solr. QuickStart should work well for the low > end or in the early stages of evaluation, but the user would prefer to > evaluate "the real thing" with something resembling a production crawl of > thousands of documents. Such a user might not be a hard-core developer or be > comfortable fiddling with a lot of software components simply to do one > conceptually simple operation. > It should still be possible for the user to supply database server settings > to override the defaults, but the LCF package should have all of the > best-practice settings deemed appropriate for use with LCF. > One downside is that installation and deployment will be platform-specific > since there are multiple processes and PostgreSQL itself requires a > platform-specific installation. > This proposal presumes that PostgreSQL is the best option for the foreseeable > future, but nothing here is intended to preclude support for other database > servers in futures releases. > This proposal should not have any impact on QuickStart packaging or > deployment. > Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product
[ https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886720#action_12886720 ] Jack Krupansky commented on CONNECTORS-55: -- Karl notes that "we've had to mess with the stuffer query on pretty near every point release of Postgresql". Letting/forcing the user to pick the right/acceptable release of PostgreSQL to install is error prone and a support headache. I would argue that it is better for the LCF team to bundle the right/best release of PostgreSQL with LCF. > Bundle database server with LCF packaged product > > > Key: CONNECTORS-55 > URL: https://issues.apache.org/jira/browse/CONNECTORS-55 > Project: Lucene Connector Framework > Issue Type: Improvement > Components: Framework core >Reporter: Jack Krupansky > > The current requirement that the user install and deploy a PostgreSQL server > complicates the installation and deployment of LCF for the user. Installation > and deployment of LCF should be as simple as Solr itself. QuickStart is great > for the low-end and basic evaluation, but a comparable level of simplified > installation and deployment is still needed for full-blown, high-end > environments that need the full performance of a ProstgreSQL-class database > server. So, PostgreSQL should be bundled with the packaged release of LCF so > that installation and deployment of LCF will automatically install and deploy > a subset of the full PostgreSQL distribution that is sufficient for the needs > of LCF. Starting LCF, with or without the LCF UI, should automatically start > the database server. Shutting down LCF should also shutdown the database > server process. > A typical use case would be for a non-developer who is comfortable with Solr > and simply wants to crawl documents from, for example, a SharePoint > repository and feed them into Solr. QuickStart should work well for the low > end or in the early stages of evaluation, but the user would prefer to > evaluate "the real thing" with something resembling a production crawl of > thousands of documents. Such a user might not be a hard-core developer or be > comfortable fiddling with a lot of software components simply to do one > conceptually simple operation. > It should still be possible for the user to supply database server settings > to override the defaults, but the LCF package should have all of the > best-practice settings deemed appropriate for use with LCF. > One downside is that installation and deployment will be platform-specific > since there are multiple processes and PostgreSQL itself requires a > platform-specific installation. > This proposal presumes that PostgreSQL is the best option for the foreseeable > future, but nothing here is intended to preclude support for other database > servers in futures releases. > This proposal should not have any impact on QuickStart packaging or > deployment. > Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-56) All features should be accessible through an API
All features should be accessible through an API Key: CONNECTORS-56 URL: https://issues.apache.org/jira/browse/CONNECTORS-56 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Jack Krupansky LCF consists of a full-featured crawling engine and a full-featured user interface to access the features of that engine, but some applications are better served with a full API that lets the application control the crawling engine, including creation and editing of connections and creation, editing, and control of jobs. Put simply, everything that a user can accomplish via the LCF UI should be doable through an LCF API. All LCF objects should be queryable through the API. A primary use case is Solr applications which currently use Aperture for crawling, but would prefer the full-featured capabilities of LCF as a crawling engine over Aperture. I do not wish to over-specify the API in this initial description, but I think the LCF API should probably be a traditional REST API., with some of the API elements specified via the context path, some parameters via URL query parameters, and complex, detailed structures as JSON (or similar.). The precise details of the API are beyond the scope of this initial description and will be added incrementally once the high-level approach to the API becomes reasonably settled. A job status and event reporting scheme is also needed in conjunction with the LCF API. That requirement has already been captured as CONNECTORS-41. The intention for the API is to create, edit, access, and control all of the objects managed by LCF. The main focus is on repositories, jobs, and status, and less about document-specific crawling information, but there may be some benefit to querying crawling status for individual documents as well. Nothing in this proposal should in any way limit or constrain the features that will be available in the LCF UI. The intent is that LCF should continue to have a full-featured UI, but in addition to a full-featured API. Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product
[ https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886490#action_12886490 ] Jack Krupansky commented on CONNECTORS-55: -- I was using the term "install" loosely, not so much the way a typical package has a GUI wizard and lots of stuff going on, but more in the sense of raw Solr where you download, unzip, and files are in sub directories right where they need to be. In that sense, the theory is that a subset of PostgreSQL could be in a subdirectory. Some enterprising vendor, such as Lucid Imagination, might want to have a fancy GUI install, but that would be beyond the scope of what I intended here. > Bundle database server with LCF packaged product > > > Key: CONNECTORS-55 > URL: https://issues.apache.org/jira/browse/CONNECTORS-55 > Project: Lucene Connector Framework > Issue Type: Improvement > Components: Framework core >Reporter: Jack Krupansky > > The current requirement that the user install and deploy a PostgreSQL server > complicates the installation and deployment of LCF for the user. Installation > and deployment of LCF should be as simple as Solr itself. QuickStart is great > for the low-end and basic evaluation, but a comparable level of simplified > installation and deployment is still needed for full-blown, high-end > environments that need the full performance of a ProstgreSQL-class database > server. So, PostgreSQL should be bundled with the packaged release of LCF so > that installation and deployment of LCF will automatically install and deploy > a subset of the full PostgreSQL distribution that is sufficient for the needs > of LCF. Starting LCF, with or without the LCF UI, should automatically start > the database server. Shutting down LCF should also shutdown the database > server process. > A typical use case would be for a non-developer who is comfortable with Solr > and simply wants to crawl documents from, for example, a SharePoint > repository and feed them into Solr. QuickStart should work well for the low > end or in the early stages of evaluation, but the user would prefer to > evaluate "the real thing" with something resembling a production crawl of > thousands of documents. Such a user might not be a hard-core developer or be > comfortable fiddling with a lot of software components simply to do one > conceptually simple operation. > It should still be possible for the user to supply database server settings > to override the defaults, but the LCF package should have all of the > best-practice settings deemed appropriate for use with LCF. > One downside is that installation and deployment will be platform-specific > since there are multiple processes and PostgreSQL itself requires a > platform-specific installation. > This proposal presumes that PostgreSQL is the best option for the foreseeable > future, but nothing here is intended to preclude support for other database > servers in futures releases. > This proposal should not have any impact on QuickStart packaging or > deployment. > Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-55) Bundle database server with LCF packaged product
Bundle database server with LCF packaged product Key: CONNECTORS-55 URL: https://issues.apache.org/jira/browse/CONNECTORS-55 Project: Lucene Connector Framework Issue Type: Improvement Components: Framework core Reporter: Jack Krupansky The current requirement that the user install and deploy a PostgreSQL server complicates the installation and deployment of LCF for the user. Installation and deployment of LCF should be as simple as Solr itself. QuickStart is great for the low-end and basic evaluation, but a comparable level of simplified installation and deployment is still needed for full-blown, high-end environments that need the full performance of a ProstgreSQL-class database server. So, PostgreSQL should be bundled with the packaged release of LCF so that installation and deployment of LCF will automatically install and deploy a subset of the full PostgreSQL distribution that is sufficient for the needs of LCF. Starting LCF, with or without the LCF UI, should automatically start the database server. Shutting down LCF should also shutdown the database server process. A typical use case would be for a non-developer who is comfortable with Solr and simply wants to crawl documents from, for example, a SharePoint repository and feed them into Solr. QuickStart should work well for the low end or in the early stages of evaluation, but the user would prefer to evaluate "the real thing" with something resembling a production crawl of thousands of documents. Such a user might not be a hard-core developer or be comfortable fiddling with a lot of software components simply to do one conceptually simple operation. It should still be possible for the user to supply database server settings to override the defaults, but the LCF package should have all of the best-practice settings deemed appropriate for use with LCF. One downside is that installation and deployment will be platform-specific since there are multiple processes and PostgreSQL itself requires a platform-specific installation. This proposal presumes that PostgreSQL is the best option for the foreseeable future, but nothing here is intended to preclude support for other database servers in futures releases. This proposal should not have any impact on QuickStart packaging or deployment. Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-50) Proposal for initial two releases of LCF, including packaged product and full API
[ https://issues.apache.org/jira/browse/CONNECTORS-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883932#action_12883932 ] Jack Krupansky commented on CONNECTORS-50: -- I expect to be able to address all of Karl's points... > I don't think much of "umbrella tickets"... Can you break this up into more > specific work items... I'll be doing that over the coming week or so. I'll keep this umbrella ticket not for details, but just to show how all of the individual tickets fit together. The discussion on this ticket is more for the overall proposal for two separate releases and roughly what they are. > I'm also still looking for much greater specificity as to the use cases. I'll provide some of that for each individual ticket. I'll try to keep the use cases as simple and minimalist as possible, but I'll address specific questions or issues that arise. > the word "API" is so unspecific as to be essentially meaningless... I'm > interested in how you intend to interact with it. Initially I'll be relatively light on detail to permit others to have some input on what they expect from a full API, but eventually all of the API issues will need to be flushed out and detailed to some extent. > There are still a number of points ... we have discussed in the past which > remain but whose controversy goes unacknowledged. Yes, with the proposed commit feature as an example. The specific ticket for each feature should address such concerns. > I've discussed the limitations of using Derby as the prime database for LCF - > that should be captured somewhere. Yes. There might be several database tickets. One for alternate databases. Another for bundling the database with LCF. > Proposal for initial two releases of LCF, including packaged product and full > API > - > > Key: CONNECTORS-50 > URL: https://issues.apache.org/jira/browse/CONNECTORS-50 > Project: Lucene Connector Framework > Issue Type: New Feature > Components: Framework core >Reporter: Jack Krupansky > Original Estimate: 3360h > Remaining Estimate: 3360h > > Currently, LCF has a relatively high-bar for evaluation and use, requiring > developer expertise. Also, although LCF has a comprehensive UI, it is not > currently packaged for use as a crawling engine for advanced applications. > A small set of individual feature requests are needed to address these > issues. They are summarized briefly to show how they fit together for two > initial releases of LCF, but will be broken out into individual LCF Jira > issues. > Goals: > 1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as > Solr is today) > 2. LCF as a toolkit for developers needing customized crawling and repository > access > 3. An API-based crawling engine that can be integrated with applications (as > Aperture is today) > Larger goals: > 1. Make it very easy for users to evaluate LCF. > 2. Make it very easy for developers to customize LCF. > 3. Make it very easy for appplications to fully manage and control LCF in > operation. > Two phases: > 1) Standalone, packaged app that is super-easy to evaluate and deploy. Call > it LCF 0.5. > 2) API-based crawling engine for applications for which the UI might not be > appropriate. Call it LCF 1.0. > Phase 1 > --- > LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later. > It would contain roughly the features that are currently in place or > currently underway, plus a little more. > Specifically, LCF 0.5 would contain these additional capabilities: > 1. Plug-in architecture for connectors (already underway) > 2. Packaged app ready to run with embedded Jetty app server (I think this has > been agreed to) > 3. Bundled with database - PostgreSQL or derby - ready to run without > additional manual setup > 4. Mini-API to initially configure default connections and "example" jobs for > file system and web crawl > 5. Agent process started automatically (platform-specific startup required) > 6. Solr output connector option to commit at end of job, by default > Installation and basic evaluation of LCF would be essentially as simple as > Solr is today. The example > connections and jobs would permit the user to initiate example crawls of a > file system example > directory and an example web on the LCF web site with just a couple of clicks > (as opposed to the > detailed manual setup required today to create repository and output > connections and jobs. > It is worth considering whether the SharePoint connector could also be > included as part of the default package. > Users could then add additional connectors and repositories and jobs as > desired. > Timeframe for release? Level of effort? > Phase 2 > --- > The essence of Phase 2 is
[jira] Updated: (CONNECTORS-50) Proposal for initial two releases of LCF, including packaged product and full API
[ https://issues.apache.org/jira/browse/CONNECTORS-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated CONNECTORS-50: - Original Estimate: 3360h (was: 0.08h) Remaining Estimate: 3360h (was: 0.08h) Description: Currently, LCF has a relatively high-bar for evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications. A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues. Goals: 1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today) 2. LCF as a toolkit for developers needing customized crawling and repository access 3. An API-based crawling engine that can be integrated with applications (as Aperture is today) Larger goals: 1. Make it very easy for users to evaluate LCF. 2. Make it very easy for developers to customize LCF. 3. Make it very easy for appplications to fully manage and control LCF in operation. Two phases: 1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5. 2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0. Phase 1 --- LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later. It would contain roughly the features that are currently in place or currently underway, plus a little more. Specifically, LCF 0.5 would contain these additional capabilities: 1. Plug-in architecture for connectors (already underway) 2. Packaged app ready to run with embedded Jetty app server (I think this has been agreed to) 3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup 4. Mini-API to initially configure default connections and "example" jobs for file system and web crawl 5. Agent process started automatically (platform-specific startup required) 6. Solr output connector option to commit at end of job, by default Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example connections and jobs would permit the user to initiate example crawls of a file system example directory and an example web on the LCF web site with just a couple of clicks (as opposed to the detailed manual setup required today to create repository and output connections and jobs. It is worth considering whether the SharePoint connector could also be included as part of the default package. Users could then add additional connectors and repositories and jobs as desired. Timeframe for release? Level of effort? Phase 2 --- The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a crawling "engine", in additional to the full LCF UI. Call this LCF 1.0. Specifically, LCF 1.0 would contain these additional capabilities: 1. Full API for LCF as a crawling engine 2. LCF can be bundled within an app (such as the default LCF package itself with its UI) 3. LCF event and activity notification for full control by an application (already a Jira request) Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug fixes and minor enhancements might also be added. Timeframe for release? Level of effort? - Issues: - Can we package PostgreSQL with LCF so LCF can set it up? - Or do we need Derby for that purpose? - Managing multiple processes (UI, database, agent, app processes) - What exactly would the API look like? (URL, XML, JSON, YAML?) was: Currently, LCF has a relatively high-bar or evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications. A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues. Goals: 1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today) 2. LCF as a toolkit for developers needing customized crawling and repository access 3. An API-based crawling engine that can be integrated with applications (as Aperture is today) Larger goals: 1. Make it very easy for users to evaluate LCF. 2. Make it very easy for developers to customize LCF. 3. Make it very easy for appplications to fully manage and control LCF in operation. Two phases: 1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5. 2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0. Phase 1 ---
[jira] Created: (CONNECTORS-50) Proposal for initial two releases of LCF, including packaged product and full API
Proposal for initial two releases of LCF, including packaged product and full API - Key: CONNECTORS-50 URL: https://issues.apache.org/jira/browse/CONNECTORS-50 Project: Lucene Connector Framework Issue Type: New Feature Components: Framework core Reporter: Jack Krupansky Currently, LCF has a relatively high-bar or evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications. A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues. Goals: 1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today) 2. LCF as a toolkit for developers needing customized crawling and repository access 3. An API-based crawling engine that can be integrated with applications (as Aperture is today) Larger goals: 1. Make it very easy for users to evaluate LCF. 2. Make it very easy for developers to customize LCF. 3. Make it very easy for appplications to fully manage and control LCF in operation. Two phases: 1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5. 2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0. Phase 1 --- LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later. It would contain roughly the features that are currently in place or currently underway, plus a little more. Specifically, LCF 0.5 would contain these additional capabilities: 1. Plug-in architecture for connectors (already underway) 2. Packaged app ready to run with embedded Jetty app server (I think this has been agreed to) 3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup 4. Mini-API to initially configure default connections and "example" jobs for file system and web crawl 5. Agent process started automatically (platform-specific startup required) 6. Solr output connector option to commit at end of job, by default Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example connections and jobs would permit the user to initiate example crawls of a file system example directory and an example web on the LCF web site with just a couple of clicks (as opposed to the detailed manual setup required today to create repository and output connections and jobs. It is worth considering whether the SharePoint connector could also be included as part of the default package. Users could then add additional connectors and repositories and jobs as desired. Timeframe for release? Level of effort? Phase 2 --- The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a crawling "engine", in additional to the full LCF UI. Call this LCF 1.0. Specifically, LCF 1.0 would contain these additional capabilities: 1. Full API for LCF as a crawling engine 2. LCF can be bundled within an app (such as the default LCF package itself with its UI) 3. LCF event and activity notification for full control by an application (already a Jira request) Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug fixes and minor enhancements might also be added. Timeframe for release? Level of effort? - Issues: - Can we package PostgreSQL with LCF so LCF can set it up? - Or do we need Derby for that purpose? - Managing multiple processes (UI, database, agent, app processes) - What exactly would the API look like? (URL, XML, JSON, YAML?) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-37) LCF should use an XML configuration file, not the simple name/value config file it currently has
[ https://issues.apache.org/jira/browse/CONNECTORS-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874029#action_12874029 ] Jack Krupansky commented on CONNECTORS-37: -- I'll defer to the community on the logging issue, other than to simply say that it should be as "standard" as possible and relatively compatible with how Solr does logging so that it will not surprise people. I don't have a problem with the LCF .properties file per se, other than the fact that since it is restricted to being strictly keyword/value pairs it cannot contain more complex, structured configuration information. The main thing I'd like to see is that the current "executecommand" configuration setup, such as which output connectors and crawlers to register, be done using descriptions in a config file rather than discrete shell commands to manually execute. The default config file from svn checkout should have a default set of connectors, crawlers, etc., and have commented-out entries for other connectors that people can un-comment and edit as desired. A key advantage of having such a config file is that when people do report problems here we can ask them to provide their config file rather than ask them to try to remember and re-type whatever commands they might remember that they intended to type. Whether connections and jobs can be initially created from a config file is a larger discussion. The main point here is simply that it be easy to get LCF initialized and configured for the really basic stuff needed for a typical initial evaluation (comparable to what occurs in a Solr tutorial.) The proverbial "zero-hour" experience. > LCF should use an XML configuration file, not the simple name/value config > file it currently has > > > Key: CONNECTORS-37 > URL: https://issues.apache.org/jira/browse/CONNECTORS-37 > Project: Lucene Connector Framework > Issue Type: Improvement > Components: Framework core >Reporter: Karl Wright > > LCF's configuration file is limited in what it can specify, and XML > configuration files seem to offer more flexibility and are the modern norm. > Before backwards compatibility becomes an issue, it may therefore be worth > converting the property file reader to use XML rather than name/value format. > It would also be nice to be able to fold the logging configuration into the > same file, if this seems possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.