[jira] Issue Comment Edited: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

2010-10-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920801#action_12920801
 ] 

Jack Krupansky edited comment on CONNECTORS-118 at 10/13/10 7:35 PM:
-

I have personally written unit tests that generated most of those formats which 
Aperture then extracted.

See:
http://sourceforge.net/apps/trac/aperture/wiki/SubCrawlers

org.apache.tools.bzip2 - BZIP2 archives.
java.util.zip.GZIPInputStream - GZIP archives.
javax.mail   - message/rfc822-style messages and mbox files.
org.apache.tools.tar - tar archives.



  was (Author: jkrupan):
One of those VFS links points to all the Java packages used to access the 
list of archive formats I listed. I have personally written unit tests that 
generated most of those formats which Aperture then extracted.

  
> Crawled archive files should be expanded into their constituent files
> -
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Framework crawler agent
>Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their 
> constituent files during crawling of repositories so that any output 
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement 
> a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

2010-10-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920805#action_12920805
 ] 

Jack Krupansky commented on CONNECTORS-118:
---

At least for file system crawls we can depend on modification date to decide 
whether to re-crawl an archive file, can't we?

I wouldn't rate crawling of archive files over the web efficiently too high a 
priority.


> Crawled archive files should be expanded into their constituent files
> -
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Framework crawler agent
>Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their 
> constituent files during crawling of repositories so that any output 
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement 
> a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

2010-10-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920801#action_12920801
 ] 

Jack Krupansky commented on CONNECTORS-118:
---

One of those VFS links points to all the Java packages used to access the list 
of archive formats I listed. I have personally written unit tests that 
generated most of those formats which Aperture then extracted.


> Crawled archive files should be expanded into their constituent files
> -
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Framework crawler agent
>Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their 
> constituent files during crawling of repositories so that any output 
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement 
> a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

2010-10-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920787#action_12920787
 ] 

Jack Krupansky commented on CONNECTORS-118:
---

Aperture's approach was just a starting point for discussion for how to form an 
id for a file in an archive file. As long as the MCF rules are functionally 
equivalent to the Apache VFS rules, we should be okay.

In short, my proposal does not have a requirement for what an id should look 
like, just a suggestion.


> Crawled archive files should be expanded into their constituent files
> -
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Framework crawler agent
>Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their 
> constituent files during crawling of repositories so that any output 
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement 
> a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

2010-10-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920730#action_12920730
 ] 

Jack Krupansky commented on CONNECTORS-118:
---

Support within the file system connector is obviously the higher priority. 
Windows shares as well. And FTP/SFTP.


> Crawled archive files should be expanded into their constituent files
> -
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Framework crawler agent
>Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their 
> constituent files during crawling of repositories so that any output 
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement 
> a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

2010-10-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920720#action_12920720
 ] 

Jack Krupansky commented on CONNECTORS-118:
---

Just to be clear, this subcrawling proosal does not depend on Apache VFS, but 
as does Aperture it simply borrows the naming convention for representing the 
id for each file as a pseudo-URL, not a real URL.

So, if somebody wants to de-reference one of these pseudo URLS they must:

1) Separate the prefix, parent-object-uri, and path from the pseudo-URL.
2) Fetch the file from the parent-object-uri.
3) Use an access library based on the prefix to extract the file at the path 
from within the fetched archive.


> Crawled archive files should be expanded into their constituent files
> -
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Framework crawler agent
>Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their 
> constituent files during crawling of repositories so that any output 
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement 
> a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

2010-10-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920711#action_12920711
 ] 

Jack Krupansky commented on CONNECTORS-118:
---

Subcrawling is based on the file type (zip, tar, gzip, bzip2, mbox, jar, etc.), 
not the type of repository that contains it. I can't speak about all repository 
types, but subcrawling would apply to web and SharePoint in addition to file 
system and share crawling. Basically, any repository type that returns files, 
as opposed to say the JDBC connector which is returning a row of data values 
rather than a file.


> Crawled archive files should be expanded into their constituent files
> -
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Framework crawler agent
>Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their 
> constituent files during crawling of repositories so that any output 
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement 
> a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

2010-10-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920704#action_12920704
 ] 

Jack Krupansky commented on CONNECTORS-118:
---

Karl correctly points out that "The key question here is how you describe the 
component of an archive.  There must be a URL to describe it..." I am basing my 
request on the subcrawling feature of Aperture, which is basing archive support 
on Apache Commons VFS.

See:
http://sourceforge.net/apps/trac/aperture/wiki/SubCrawlers

Which says:

The uris of the data objects found inside other data objects have a fixed form, 
consisting of three basic parts:

:!/

*  - the uri prefix, characteristic for a particular SubCrawler, 
returned by the SubCrawlerFactory.getUriPrefix() method
*  - the uri of the parent data object, it is obtained from 
the parentMetadata parameter to the subCrawl() method, by calling 
RDFContainer.getDescribedUri()
*  - an internal path of the 'child' data object inside the 'parent' data 
object

This scheme has been inspired by the apache commons VFS project, homepaged 
under http://commons.apache.org/vfs

See:
http://commons.apache.org/vfs/filesystems.html

Which says:

Provides read-only access to the contents of Zip, Jar and Tar files.

URI Format

zip:// arch-file-uri [! absolute-path ]
jar:// arch-file-uri [! absolute-path ]
tar:// arch-file-uri [! absolute-path ]
tgz:// arch-file-uri [! absolute-path ]
tbz2:// arch-file-uri [! absolute-path ]

Where arch-file-uri refers to a file of any supported type, including other zip 
files. Note: if you would like to use the ! as normal character it must be 
escaped using %21.
tgz and tbz2 are convenience for tar:gz and tar:bz2.

Examples

jar:../lib/classes.jar!/META-INF/manifest.mf
zip:http://somehost/downloads/somefile.zip
jar:zip:outer.zip!/nested.jar!/somedir
jar:zip:outer.zip!/nested.jar!/some%21dir
tar:gz:http://anyhost/dir/mytar.tar.gz!/mytar.tar!/path/in/tar/README.txt
tgz:file://anyhost/dir/mytar.tgz!/somepath/somefile



Provides read-only access to the contents of gzip and bzip2 files.

URI Format

gz:// compressed-file-uri
bz2:// compressed-file-uri

Where compressed-file-uri refers to a file of any supported type. There is no 
need to add a ! part to the uri if you read the content of the file you always 
will get the uncompressed version.

Examples

gz:/my/gz/file.gz


> Crawled archive files should be expanded into their constituent files
> -
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Framework crawler agent
>Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their 
> constituent files during crawling of repositories so that any output 
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement 
> a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

2010-10-13 Thread Jack Krupansky (JIRA)
Crawled archive files should be expanded into their constituent files
-

 Key: CONNECTORS-118
 URL: https://issues.apache.org/jira/browse/CONNECTORS-118
 Project: ManifoldCF
  Issue Type: New Feature
  Components: Framework crawler agent
Reporter: Jack Krupansky


Archive files such as zip, mbox, tar, etc. should be expanded into their 
constituent files during crawling of repositories so that any output connector 
would output the flattened archive.

This could be an option, defaulted to ON, since someone may want to implement a 
"copy" connector that maintains crawled files as-is.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-116) Possibly remove memex connector depending upon legal resolution

2010-10-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920568#action_12920568
 ] 

Jack Krupansky commented on CONNECTORS-116:
---

It would be nice to see a comment about what would be required to add Memex 
support back.

I note the following statement in the original incubation submission:

"It is unlikely that EMC, OpenText, Memex, or IBM would grant 
Apache-license-compatible use of these client libraries. Thus, the expectation 
is that users of these connectors obtain the necessary client libraries from 
the owners prior to building or using the corresponding connector. An 
alternative would be to undertake a clean-room implementation of the client 
API's, which may well yield suitable results in some cases (LiveLink, Memex, 
FileNet), while being out of reach in others (Documentum). Conditional 
compilation, for the short term, is thus likely to be a necessity."

Is it only the Memex connector that now has this problem?

Do we need do a clean-room implementation for Memex? For any of the others?

FWIW, I don't see a Google Connector for Memex.


> Possibly remove memex connector depending upon legal resolution
> ---
>
> Key: CONNECTORS-116
> URL: https://issues.apache.org/jira/browse/CONNECTORS-116
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Memex connector
>Reporter: Robert Muir
>Assignee: Robert Muir
>
> Apparently there is an IP problem with the memex connector code.
> Depending upon what apache legal says, we will take any action under this 
> issue publicly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909036#action_12909036
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

Looks good. This meets meets my expectations. Any further tweaks that might 
arise would be distinct Jira issues.

> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-12 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908581#action_12908581
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

Just to confirm, as requested, that I am comfortable sticking with connection 
name (and job name, etc.) in API paths as opposed to using a more abstract "id" 
since we seem to have an encoding convention to deal with slash so that an ACF 
object name can always be represented using a single HTTP path segment. Names 
clearly feel more natural and will be easier to use, both for app code using 
the ACF API and for CURL and other scripting tools.




> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-12 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908573#action_12908573
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

re: Spaces in connection names...

A URL path sent by a cleint cannot have an unencoded space. Typically, a space 
is encoded as "+" or "%20". The final path retrieved by the server app will 
have the expanded spaces, but the path to be sent via HTTP from the client must 
be encoded since a space is the delimiter between the path and the HTTP version 
as per IETF RFC 2616 Sec 5.1:

Request-Line   = Method SP Request-URI SP HTTP-Version CRLF

See:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html

The first upshot of this is that the client needs to encode spaces as "+" or 
"%20". Ditto for other reserved chars (described in an earlier comment.)

A second upshot of this is that we can't use ".+" in the original path from the 
client to encode slash since it would come through to the ACF server app as 
".". So, either the client would have to write ".%2B" or we pick some 
other encoding. Lacking some more preferred choice, we could simply propose 
".-" as our encoding for slash. Almost any (non-reserved) char will do.

Another proposed encoding for slash: double the slash when it is to be embedded 
in a name and then the adjacent path segments will be merged with a single 
slash between. I don't like this since it is not encoding the full name as a 
single path segment, but it may be the cleanest way of dealing with slash. An 
example, encoding the name "this updated/revised example connection 1.0":

GET  
info/outputconnections/this+updated//revised+example+connection+1.0/

Personally, I lean towards an encoding convention that can result in encoding 
the name as a single path segment. With the ".." and ".-" encoding convention 
this example would be:

GET  
info/outputconnections/this+updated.-revised+example+connection+1..0/


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-10 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908244#action_12908244
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

And that last reference provides examples that illustrate the convention of 
using plurals. For example:

GET /customers/1234 HTTP/1.1

http://www.infoq.com/articles/rest-introduction

The goal here is to use a common style so that people approaching the ACF API 
will not be surprised and have to re-learn things to use this API.


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-10 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908240#action_12908240
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

On closer examination, all of the examples I have found use an "id" rather than 
a name in the path. Typically a number or maybe alphanumeric with hyphens. So, 
we should consider revisiting that aspect of the API. That avoids the slash 
issue.

So, presumably an app using the API would query the list of connections and the 
JSON would provide the id for each connection and the the app would use those 
ids for API calls.

Another reference:

http://www.infoq.com/articles/rest-introduction
"Give every 'thing' an ID"

> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-10 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908190#action_12908190
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

I am reading IETF RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax, 
section 3.3, "Path", among other things.

See:
http://www.ietf.org/rfc/rfc3986.txt

No conclusion yet.


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-10 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908148#action_12908148
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

I am still pondering this embedded slash issue and checking into some things 
related to it. Maybe Monday I'll have something more concrete to say.

For example, I want to make sure I understand the rules for what a path can 
have in it in a URI and whether simply placing a name at the tail of the path 
means it can have slashes or other reserved characters in it. My model is that 
a name should occupy only a single path component.


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907875#action_12907875
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

It makes sense that GetPathInfo would have removed escapes from the URL. So, 
either we don't use % escaping or bypass GetPathInfo and manually decode.

Maybe we could use backslash for escaping. I'm not sure whether it needs to be 
% escaped as well.

This is only needed if the user has one of the reserved special characters in a 
name. It would be an issue if it was something that users commonly needed, but 
it seems like more of an edge case rather than a common case.

Encourage people to use alphanumeric, "-", and "_" for names and it won't be an 
issue for them.

And, the real point of the API is access from code. We can provide helper 
functions for working with names and building API paths.



> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907758#action_12907758
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

re: " the  cannot itself contain "/" characters, or it won't be 
uniquely parseable"

Elsewhere I noted that URI-reserved characters need to be encoded with the "%" 
notation, so this is not a fatal problem.


  reserved= ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907736#action_12907736
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

re: "We could not pass (arguments) except as part of the path."

Sure, we could go that route, and list the arguments as path elements, but I 
think a JSON object (array list of arguments) is acceptable.

So, I'd go with the latter (JSON.)


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907735#action_12907735
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

I think status is probably technically okay since it is disambiguated by number 
path elements, but it could be moved to the end:

 GET outputconnections//status ()

vs.

 GET outputconnections/status/ ()

Same for execute/request:

GET outputconnections//request/ (arguments)

vs.

GET outputconnections/request// (arguments)


That way the connection name is always in the same position.

So, I'd revise my counter-proposal that way.


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907723#action_12907723
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

Karl asks how to "handle connection names that are non-7-bit-ascii".

I believe that non-7-bit-ASCII and URI-reserved chars would simply be escaped 
using the "%" notation.



> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907712#action_12907712
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

Some RESTful resource doc:

http://en.wikipedia.org/wiki/Representational_State_Transfer

http://www.xfront.com/REST-Web-Services.html

http://www.oracle.com/technetwork/articles/javase/table3-138001.html

The idea of using a plural is that it is the name of the collection and the 
qualifier (name or argument object) provides the specificity.


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907702#action_12907702
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

Karl, I did a quick read of your suggestions and mostly they seem fine, 
including keeping the JSON usage as is, but to be more purely RESTful the 
connection_name should be part of the path in those cases where it would have 
been a standalone name, although for PUT it was simply redundant as you noted. 
Another nuance is to consistently refer to outputconnections as plural.

My counter-proposal:

outputconnection/get (connection_name) -> GET 
outputconnections/ ()
outputconnection/save (output_connection_object) -> PUT outputconnections 
(output_connection_object)
outputconnection/delete (connection_name) -> DELETE 
outputconnections/ ()
outputconnection/list () -> GET outputconnections ()
outputconnection/checkstatus (connection_name) -> GET 
outputconnections/status/ ()
outputconnection/execute/ (connection_name, arguments) -> GET 
outputconnections/request// (arguments)

Comments?


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907617#action_12907617
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

It sounds reasonable that the connection name is not needed in the path when 
creating from a JSON object that already has the name in it.

So, instead of:

PUT outputconnections/ 
{"outputconnection":}

we could have:

PUT outputconnections {"outputconnection":}

Further, I don't think we need the extra level of object, so that could be:

PUT outputconnections {}



> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-09-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907614#action_12907614
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

I have looked at the code a bit but not made any actual progress at a patch, so 
you can go ahead and take a crack at it. Yes, I'll do the transformation table. 
As far as updating the wiki, do I have privileges to do that?


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

2010-09-08 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907201#action_12907201
 ] 

Jack Krupansky commented on CONNECTORS-104:
---

Simple works best. This enhancement is primarily for the simple use case where 
a "novice" user tries to do what they think is obvious ("crawl the web pages at 
this URL"), but without considering all of the potential nuances or how to 
fully specify the details of their goal.

One nuance is whether subdomains are considered part of the domain. I would say 
"no" if a subdomain was specified by the user and "yes" if no subdomain was 
specified.

Another nuance is whether a "path" is specified to select a subset of a domain. 
It would be nice to handle that and (optionally) limit the crawl to that path 
(or sub-paths below it). An example would be to crawl the news archive for a 
site.


> Make it easier to limit a web crawl to a single site
> 
>
> Key: CONNECTORS-104
> URL: https://issues.apache.org/jira/browse/CONNECTORS-104
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: Web connector
>Reporter: Jack Krupansky
>Priority: Minor
>
> Unless the user explicitly enters an include regex carefully, a web crawl can 
> quickly get out of control and start crawling the entire web when all the 
> user may really want is to crawl just a single web site or portion thereof. 
> So, it would be preferable if either by default or with a simple button the 
> crawl could be limited to the seed web site(s).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

2010-09-07 Thread Jack Krupansky (JIRA)
Make it easier to limit a web crawl to a single site


 Key: CONNECTORS-104
 URL: https://issues.apache.org/jira/browse/CONNECTORS-104
 Project: Apache Connectors Framework
  Issue Type: Improvement
  Components: Web connector
Affects Versions: LCF Release 0.5
Reporter: Jack Krupansky
Priority: Minor
 Fix For: LCF Release 0.5


Unless the user explicitly enters an include regex carefully, a web crawl can 
quickly get out of control and start crawling the entire web when all the user 
may really want is to crawl just a single web site or portion thereof. So, it 
would be preferable if either by default or with a simple button the crawl 
could be limited to the seed web site(s).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-101) File system connector would benefit by default crawling rules

2010-09-07 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906969#action_12906969
 ] 

Jack Krupansky commented on CONNECTORS-101:
---

+1


> File system connector would benefit by default crawling rules
> -
>
> Key: CONNECTORS-101
> URL: https://issues.apache.org/jira/browse/CONNECTORS-101
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: File system connector
>Reporter: Karl Wright
>Priority: Minor
>
> When you add a path to a file system connector job, it should automatically 
> put in rules that cause it to include all files and directories under that 
> path.  This makes it easier to use, and more easily demonstrable too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-62) Document the LCF API

2010-08-31 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904768#action_12904768
 ] 

Jack Krupansky commented on CONNECTORS-62:
--

Just wanted to update the link to the doc after the LCF/ACF name change for 
people who search for this issue:

https://cwiki.apache.org/confluence/display/CONNECTORS/Programmatic+Operation+of+ACF


> Document the LCF API
> 
>
> Key: CONNECTORS-62
> URL: https://issues.apache.org/jira/browse/CONNECTORS-62
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Karl Wright
>Assignee: Karl Wright
>
> Not only does the LCF API itself need documentation, but so do all the 
> connector configuration/specification objects, now that they are exposed.  
> This should probably become part of the developer documentation on the main 
> LCF website.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-57) Solr output connector option to commit at end of job, by default

2010-08-31 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904746#action_12904746
 ] 

Jack Krupansky commented on CONNECTORS-57:
--

This looks fine so far and should work for me.

If I understand the code, the Connector.noteJobComplete method is called when 
the job completes or is aborted and the SolrConnector.noteJobComplete 
implementation method unconditionally does a commit. That's fine my my use 
case, but we probably still want a connection option to disable that commit if 
the user has some other commit strategy in mind.

> Solr output connector option to commit at end of job, by default
> 
>
> Key: CONNECTORS-57
> URL: https://issues.apache.org/jira/browse/CONNECTORS-57
> Project: Apache Connectors Framework
>  Issue Type: Sub-task
>  Components: Lucene/SOLR connector
>Reporter: Jack Krupansky
>
> By default, Solr will eventually commit documents that have been submitted to 
> the Solr Cell interface, but the time lag can confuse and annoy people. 
> Although commit strategy is a difficult issue in general, an option in LCF to 
> automatically commit at the end of a job, by default, would eliminate a lot 
> of potential confusion and generally be close to what the user needs.
> The desired feature is that there be an option to commit for each job that 
> uses the Solr output connector. This option would default to "on" (or a 
> different setting based on some global configuration setting), but the user 
> may turn it off if commit is only desired upon completion of some jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-41) Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc.

2010-08-31 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904618#action_12904618
 ] 

Jack Krupansky commented on CONNECTORS-41:
--

The notification is certainly "associated" with the output connection... or is 
it really associated with the job?

Originally, my thinking had been that the notification URL would be specified 
as part of the output connection, but maybe it is an output-specific parameter 
that gets specified for the job. It could be either.

OTOH, maybe a "notification connector" makes more sense and is more general 
rather than just something to use for Solr. Also, that might provide the way to 
implement an optional commit for Solr, as a simple notification connection, so 
that ACF core itself doesn't know or care about Solr or commits or any of that. 
I think the concept of a notification connector makes sense, but is not 
essential for release 0.1 or 0.5.

I'm open to suggestions. We can do it real simple as a parameter for the Solr 
output connector or the job, or we could be more general. Tough call. If you 
feel up to doing the more general feature, fine, but the simple notification 
URL feature is all that is essential.

Also, to be clear about the use case, it is not just Solr commit, but some 
external app might just want to notify a user that their job has finished and 
to do whatever other (beyond Solr commit) processing may be needed upon job 
completion.

> Add hooks to output connectors for receiving event notifications, 
> specifically job start, job end, etc.
> ---
>
> Key: CONNECTORS-41
> URL: https://issues.apache.org/jira/browse/CONNECTORS-41
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: Framework core
>Reporter: Karl Wright
>Priority: Minor
>
> Currently there is no logic that informs an output connection of a job start, 
> end, deletion, or other activity.  While this would seem to have little to do 
> with an output connector, this feature has been requested by Jack Krupansky 
> as a potential way of deciding when to tell Solr to commit documents, rather 
> than leave it up to Solr's configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-41) Add hooks to output connectors for receiving event notifications, specifically job start, job end, etc.

2010-08-31 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904610#action_12904610
 ] 

Jack Krupansky commented on CONNECTORS-41:
--

To be clear about the use case, the notification is not to an output connector 
or output connection per se, but to an external process that is trying to 
monitor the job status. Kind of a reverse API. The URL that job status 
notifications should be sent to might be in the same process as Solr or another 
process that is monitoring Solr. Further, this feature should be of value for 
any type of output connector, although Solr is my current main interest.


> Add hooks to output connectors for receiving event notifications, 
> specifically job start, job end, etc.
> ---
>
> Key: CONNECTORS-41
> URL: https://issues.apache.org/jira/browse/CONNECTORS-41
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: Framework core
>Reporter: Karl Wright
>Priority: Minor
>
> Currently there is no logic that informs an output connection of a job start, 
> end, deletion, or other activity.  While this would seem to have little to do 
> with an output connector, this feature has been requested by Jack Krupansky 
> as a potential way of deciding when to tell Solr to commit documents, rather 
> than leave it up to Solr's configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-08-27 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903559#action_12903559
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

I'll be mostly looking through code and thinking it through and looking at the 
API string changes first, so I may not touch any code for another week, if not 
longer. Feel free to rename or refactor code at will. I'll probably let you 
know in advance of what changes I expect to make in the code.

> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-08-26 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902983#action_12902983
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

Karl says "I await your patch."

Point well made. There is a great starting point with the current code. A bit 
of refactoring required.


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-08-26 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902982#action_12902982
 ] 

Jack Krupansky commented on CONNECTORS-98:
--

Karl asks "what do you plan to do for the list and execute verbs?"

List would be a GET and execute would be PUT.


> API should be "pure" RESTful with the API verb represented using the HTTP 
> GET/PUT/POST/DELETE methods
> -
>
> Key: CONNECTORS-98
> URL: https://issues.apache.org/jira/browse/CONNECTORS-98
> Project: Apache Connectors Framework
>  Issue Type: Improvement
>  Components: API
>Affects Versions: LCF Release 0.5
>Reporter: Jack Krupansky
> Fix For: LCF Release 0.5
>
>
> (This was originally a comment on CONNECTORS-56 dated 7/16/2010.)
> It has come to my attention that the API would be more "pure" RESTful if the 
> API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
> input argument identifier represented in the context path.
> So,  GET outputconnection/get \{"connection_name":__\} would 
> be GET outputconnections/
> and GET outputconnection/delete \{"connection_name":__\} 
> would be DELETE outputconnections/
> and GET outputconnection/list would be GET outputconnections
> and PUT outputconnection/save 
> \{"outputconnection":__\} would be PUT 
> outputconnections/ 
> \{"outputconnection":__\}
> What we have today is certainly workable, but just not as "pure" as some 
> might desire. It would be better to take care of this before the initial 
> release so that we never have to answer the question of why it wasn't done as 
> a "proper" RESTful API.
> BTW, I did check to verify that an HttpServlet running under Jetty can 
> process the DELETE and PUT methods (using the doDelete and doPut method 
> overrides.)
> Also, POST should be usable as an alternative to PUT for API calls that have 
> large volumes of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-98) API should be "pure" RESTful with the API verb represented using the HTTP GET/PUT/POST/DELETE methods

2010-08-26 Thread Jack Krupansky (JIRA)
API should be "pure" RESTful with the API verb represented using the HTTP 
GET/PUT/POST/DELETE methods
-

 Key: CONNECTORS-98
 URL: https://issues.apache.org/jira/browse/CONNECTORS-98
 Project: Apache Connectors Framework
  Issue Type: Improvement
  Components: API
Affects Versions: LCF Release 0.5
Reporter: Jack Krupansky
 Fix For: LCF Release 0.5


(This was originally a comment on CONNECTORS-56 dated 7/16/2010.)

It has come to my attention that the API would be more "pure" RESTful if the 
API verb was represented using the HTTP GET/PUT/POST/DELETE methods and the 
input argument identifier represented in the context path.

So,  GET outputconnection/get \{"connection_name":__\} would 
be GET outputconnections/

and GET outputconnection/delete \{"connection_name":__\} would 
be DELETE outputconnections/

and GET outputconnection/list would be GET outputconnections

and PUT outputconnection/save 
\{"outputconnection":__\} would be PUT 
outputconnections/ 
\{"outputconnection":__\}

What we have today is certainly workable, but just not as "pure" as some might 
desire. It would be better to take care of this before the initial release so 
that we never have to answer the question of why it wasn't done as a "proper" 
RESTful API.

BTW, I did check to verify that an HttpServlet running under Jetty can process 
the DELETE and PUT methods (using the doDelete and doPut method overrides.)

Also, POST should be usable as an alternative to PUT for API calls that have 
large volumes of data.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product

2010-07-22 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891298#action_12891298
 ] 

Jack Krupansky commented on CONNECTORS-55:
--

I checked that H2 feature comparison table, but it did not suggest a great 
benefit of H2 for LCF. The footprint is a little smaller than Derby and of 
course a lot smaller than PostgreSQL. One area not in the table that could 
matter a lot is performance. Any quick thoughts on H2 performance relative to 
PostgreSQL and Derby?

> Bundle database server with LCF packaged product
> 
>
> Key: CONNECTORS-55
> URL: https://issues.apache.org/jira/browse/CONNECTORS-55
> Project: Lucene Connector Framework
>  Issue Type: Sub-task
>  Components: Installers
>Reporter: Jack Krupansky
>
> The current requirement that the user install and deploy a PostgreSQL server 
> complicates the installation and deployment of LCF for the user. Installation 
> and deployment of LCF should be as simple as Solr itself. QuickStart is great 
> for the low-end and basic evaluation, but a comparable level of simplified 
> installation and deployment is still needed for full-blown, high-end 
> environments that need the full performance of a ProstgreSQL-class database 
> server. So, PostgreSQL should be bundled with the packaged release of LCF so 
> that installation and deployment of LCF will automatically install and deploy 
> a subset of the full PostgreSQL distribution that is sufficient for the needs 
> of LCF. Starting LCF, with or without the LCF UI, should automatically start 
> the database server. Shutting down LCF should also shutdown the database 
> server process.
> A typical use case would be for a non-developer who is comfortable with Solr 
> and simply wants to crawl documents from, for example, a SharePoint 
> repository and feed them into Solr. QuickStart should work well for the low 
> end or in the early stages of evaluation, but the user would prefer to 
> evaluate "the real thing" with something resembling a production crawl of 
> thousands of documents. Such a user might not be a hard-core developer or be 
> comfortable fiddling with a lot of software components simply to do one 
> conceptually simple operation.
> It should still be possible for the user to supply database server settings 
> to override the defaults, but the LCF package should have all of the 
> best-practice settings deemed appropriate for use with LCF.
> One downside is that installation and deployment will be platform-specific 
> since there are multiple processes and PostgreSQL itself requires a 
> platform-specific installation.
> This proposal presumes that PostgreSQL is the best option for the foreseeable 
> future, but nothing here is intended to preclude support for other database 
> servers in futures releases.
> This proposal should not have any impact on QuickStart packaging or 
> deployment.
> Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-56) All features should be accessible through an API

2010-07-16 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889237#action_12889237
 ] 

Jack Krupansky commented on CONNECTORS-56:
--

It has come to my attention that the API would be more "pure" RESTful if the 
API verb was represented using the HTTP GET/PUT/DELETE verb and the input 
argument identifier represented in the context path.

So,  GET outputconnection/get \{"connection_name":__\} would 
be GET outputconnections/

and GET outputconnection/delete \{"connection_name":__\} would 
be DELETE outputconnections/

and GET outputconnection/list would be GET outputconnections

and PUT outputconnection/save 
\{"outputconnection":__\} would be PUT 
outputconnections/ 
\{"outputconnection":__\}

What we have today is certainly workable, but just not as "pure" as some might 
desire.

I am not going to classify this as a required issue just yet, but this would be 
a great time to change things before the API gets cast in concrete.

Comments?


> All features should be accessible through an API
> 
>
> Key: CONNECTORS-56
> URL: https://issues.apache.org/jira/browse/CONNECTORS-56
> Project: Lucene Connector Framework
>  Issue Type: Sub-task
>  Components: Framework core
>Reporter: Jack Krupansky
>Assignee: Karl Wright
>
> LCF consists of a full-featured crawling engine and a full-featured user 
> interface to access the features of that engine, but some applications are 
> better served with a full API that lets the application control the crawling 
> engine, including creation and editing of connections and creation, editing, 
> and control of jobs. Put simply, everything that a user can accomplish via 
> the LCF UI should be doable through an LCF API. All LCF objects should be 
> queryable through the API.
> A primary use case is Solr applications which currently use Aperture for 
> crawling, but would prefer the full-featured capabilities of LCF as a 
> crawling engine over Aperture.
> I do not wish to over-specify the API in this initial description, but I 
> think the LCF API should probably be a traditional REST API., with some of 
> the API elements specified via the context path, some parameters via URL 
> query parameters, and complex, detailed structures as JSON (or similar.). The 
> precise details of the API are beyond the scope of this initial description 
> and will be added incrementally once the high-level approach to the API 
> becomes reasonably settled.
> A job status and event reporting scheme is also needed in conjunction with 
> the LCF API. That requirement has already been captured as CONNECTORS-41.
> The intention for the API is to create, edit, access, and control all of the 
> objects managed by LCF. The main focus is on repositories, jobs, and status, 
> and less about document-specific crawling information, but there may be some 
> benefit to querying crawling status for individual documents as well.
> Nothing in this proposal should in any way limit or constrain the features 
> that will be available in the LCF UI. The intent is that LCF should continue 
> to have a full-featured UI, but in addition to a full-featured API.
> Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-56) All features should be accessible through an API

2010-07-14 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888377#action_12888377
 ] 

Jack Krupansky commented on CONNECTORS-56:
--

Some cURL and/or Perl test scripts to illustrate use of the API would be 
helpful.

> All features should be accessible through an API
> 
>
> Key: CONNECTORS-56
> URL: https://issues.apache.org/jira/browse/CONNECTORS-56
> Project: Lucene Connector Framework
>  Issue Type: Sub-task
>  Components: Framework core
>Reporter: Jack Krupansky
>
> LCF consists of a full-featured crawling engine and a full-featured user 
> interface to access the features of that engine, but some applications are 
> better served with a full API that lets the application control the crawling 
> engine, including creation and editing of connections and creation, editing, 
> and control of jobs. Put simply, everything that a user can accomplish via 
> the LCF UI should be doable through an LCF API. All LCF objects should be 
> queryable through the API.
> A primary use case is Solr applications which currently use Aperture for 
> crawling, but would prefer the full-featured capabilities of LCF as a 
> crawling engine over Aperture.
> I do not wish to over-specify the API in this initial description, but I 
> think the LCF API should probably be a traditional REST API., with some of 
> the API elements specified via the context path, some parameters via URL 
> query parameters, and complex, detailed structures as JSON (or similar.). The 
> precise details of the API are beyond the scope of this initial description 
> and will be added incrementally once the high-level approach to the API 
> becomes reasonably settled.
> A job status and event reporting scheme is also needed in conjunction with 
> the LCF API. That requirement has already been captured as CONNECTORS-41.
> The intention for the API is to create, edit, access, and control all of the 
> objects managed by LCF. The main focus is on repositories, jobs, and status, 
> and less about document-specific crawling information, but there may be some 
> benefit to querying crawling status for individual documents as well.
> Nothing in this proposal should in any way limit or constrain the features 
> that will be available in the LCF UI. The intent is that LCF should continue 
> to have a full-featured UI, but in addition to a full-featured API.
> Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-60) Agent process should be started automatically

2010-07-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888006#action_12888006
 ] 

Jack Krupansky commented on CONNECTORS-60:
--

Karl asks "Let me get this straight.  There is a way you can deploy LCF that 
does everything you are currently asking for.  But you are not willing to use 
it.  Why?".

The simple answer is that as good as QuickStart is for basic evaluation and 
low-end production, it is limited by the constraints of derby, so the 
multi-process configuration of LCF with PostgreSQL is considered superior for 
production use.

So, yes, QuickStart will be used, but the multi-process configuration will be 
used as well in other situations. Deployment in those higher-end situations 
should also be as easy as possible.

> Agent process should be started automatically
> -
>
> Key: CONNECTORS-60
> URL: https://issues.apache.org/jira/browse/CONNECTORS-60
> Project: Lucene Connector Framework
>  Issue Type: Sub-task
>Reporter: Jack Krupansky
>
> LCF as it exists today is a bit too complex to run for an average user, 
> especially with a separate agent process for crawling. LCF should be as easy 
> to run as Solr is today. QuickStart is a good move in this direction, but the 
> same user-visible simplicity is needed for full LCF. The separate agent 
> process is a reasonable design for execution, but a little too cumbersome for 
> the average user to manage.
> Unfortunately, it is expected that starting up a multi-process application 
> will require platform-specific scripting.
> Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-60) Agent process should be started automatically

2010-07-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888000#action_12888000
 ] 

Jack Krupansky commented on CONNECTORS-60:
--

Unless I am mistaken, the jetty integration is for QuickStart (single process) 
only. The issue is for non-QuickStart, multi-process execution.


> Agent process should be started automatically
> -
>
> Key: CONNECTORS-60
> URL: https://issues.apache.org/jira/browse/CONNECTORS-60
> Project: Lucene Connector Framework
>  Issue Type: Sub-task
>Reporter: Jack Krupansky
>
> LCF as it exists today is a bit too complex to run for an average user, 
> especially with a separate agent process for crawling. LCF should be as easy 
> to run as Solr is today. QuickStart is a good move in this direction, but the 
> same user-visible simplicity is needed for full LCF. The separate agent 
> process is a reasonable design for execution, but a little too cumbersome for 
> the average user to manage.
> Unfortunately, it is expected that starting up a multi-process application 
> will require platform-specific scripting.
> Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-61) Support bundling of LCF with an app

2010-07-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887996#action_12887996
 ] 

Jack Krupansky commented on CONNECTORS-61:
--

Some doc is needed on which files or subdirectories are needed to distribute 
LCF with an app, along with doc on what setup may be required to use LCF when 
distributed in that manner.

> Support bundling of LCF with an app
> ---
>
> Key: CONNECTORS-61
> URL: https://issues.apache.org/jira/browse/CONNECTORS-61
> Project: Lucene Connector Framework
>  Issue Type: Sub-task
>  Components: Framework core
>Reporter: Jack Krupansky
>
> It should be possible for an application developer to bundle LCF with an 
> application to facilitate installation and deployment of the application in 
> conjunction with LCF. This may (or may not) be as simple as providing 
> appropriate jar files and documentation for how to use them, but there may be 
> other components or scripts needed.
> There are two options: 1) include the LCF UI along with the other LCF 
> processes, and 2) exclude the LCF UI and include only the other processes 
> that can be controlled via the full API.
> The database server would be included.
> The web app server would be optional since the application may have its own 
> choice of web app server.
> One use case is bundling LCF with Solr or a Solr-based application.
> Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CONNECTORS-50) Proposal for initial two releases of LCF, including packaged product and full API

2010-07-12 Thread Jack Krupansky (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated CONNECTORS-50:
-

 Original Estimate: (was: 3360h)
Remaining Estimate: (was: 3360h)
   Description: 
Currently, LCF has a relatively high-bar for evaluation and use, requiring 
developer expertise. Also, although LCF has a comprehensive UI, it is not 
currently packaged for use as a crawling engine for advanced applications.

A small set of individual feature requests are needed to address these issues. 
They are summarized briefly to show how they fit together for two initial 
releases of LCF, but will be broken out into individual LCF Jira issues.

Goals:

1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as 
Solr is today)
2. LCF as a toolkit for developers needing customized crawling and repository 
access
3. An API-based crawling engine that can be integrated with applications (as 
Aperture is today)

Larger goals:

1. Make it very easy for users to evaluate LCF.
2. Make it very easy for developers to customize LCF.
3. Make it very easy for appplications to fully manage and control LCF in 
operation.

Two phases:

1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it 
LCF 0.5.
2) API-based crawling engine for applications for which the UI might not be 
appropriate. Call it LCF 1.0.


Phase 1
---

LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
It would contain roughly the features that are currently in place or currently 
underway, plus a little more.

Specifically, LCF 0.5 would contain these additional capabilities:

1. Plug-in architecture for connectors (CONNECTORS-40 - DONE)
2. Packaged app ready to run with embedded Jetty app server (CONNECTORS-59)
3. Bundled with database - PostgreSQL or derby - ready to run without 
additional manual setup (CONNECTORS-55)
4. Mini-API to initially configure default connections and "example" jobs for 
file system and web crawl (CONNECTORS-58)
5. Agent process started automatically (CONNECTORS-60)
6. Solr output connector option to commit at end of job, by default 
(CONNECTORS-57)

Installation and basic evaluation of LCF would be essentially as simple as Solr 
is today. The example
connections and jobs would permit the user to initiate example crawls of a file 
system example
directory and an example web on the LCF web site with just a couple of clicks 
(as opposed to the
detailed manual setup required today to create repository and output 
connections and jobs.

It is worth considering whether the SharePoint connector could also be included 
as part of the default package.

Users could then add additional connectors and repositories and jobs as desired.

Timeframe for release? Level of effort?

Phase 2
---

The essence of Phase 2 is that LCF would be split to allow direct, full API 
access to LCF as a
crawling "engine", in additional to the full LCF UI. Call this LCF 1.0.

Specifically, LCF 1.0 would contain these additional capabilities:

1. Full API for LCF as a crawling engine (CONNECTORS-56)
2. LCF can be bundled within an app (CONNECTORS-61)
3. LCF event and activity notification for full control by an application 
(CONNECTORS-41)

Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, 
plus whatever bug
fixes and minor enhancements might also be added.

Timeframe for release? Level of effort?

-

Issues:

- Can we package PostgreSQL with LCF so LCF can set it up?
  - Or do we need Derby for that purpose?
- Managing multiple processes (UI, database, agent, app processes)
- What exactly would the API look like? (URL, XML, JSON, YAML?)


  was:
Currently, LCF has a relatively high-bar for evaluation and use, requiring 
developer expertise. Also, although LCF has a comprehensive UI, it is not 
currently packaged for use as a crawling engine for advanced applications.

A small set of individual feature requests are needed to address these issues. 
They are summarized briefly to show how they fit together for two initial 
releases of LCF, but will be broken out into individual LCF Jira issues.

Goals:

1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as 
Solr is today)
2. LCF as a toolkit for developers needing customized crawling and repository 
access
3. An API-based crawling engine that can be integrated with applications (as 
Aperture is today)

Larger goals:

1. Make it very easy for users to evaluate LCF.
2. Make it very easy for developers to customize LCF.
3. Make it very easy for appplications to fully manage and control LCF in 
operation.

Two phases:

1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it 
LCF 0.5.
2) API-based crawling engine for applications for which the UI might not be 
appropriate. Call it LCF 1.0.


Phase 1
---

LCF 0.5 right out of 

[jira] Created: (CONNECTORS-61) Support bundling of LCF with an app

2010-07-12 Thread Jack Krupansky (JIRA)
Support bundling of LCF with an app
---

 Key: CONNECTORS-61
 URL: https://issues.apache.org/jira/browse/CONNECTORS-61
 Project: Lucene Connector Framework
  Issue Type: Sub-task
  Components: Framework core
Reporter: Jack Krupansky


It should be possible for an application developer to bundle LCF with an 
application to facilitate installation and deployment of the application in 
conjunction with LCF. This may (or may not) be as simple as providing 
appropriate jar files and documentation for how to use them, but there may be 
other components or scripts needed.

There are two options: 1) include the LCF UI along with the other LCF 
processes, and 2) exclude the LCF UI and include only the other processes that 
can be controlled via the full API.

The database server would be included.

The web app server would be optional since the application may have its own 
choice of web app server.

One use case is bundling LCF with Solr or a Solr-based application.

Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CONNECTORS-60) Agent process should be started automatically

2010-07-12 Thread Jack Krupansky (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated CONNECTORS-60:
-

Description: 
LCF as it exists today is a bit too complex to run for an average user, 
especially with a separate agent process for crawling. LCF should be as easy to 
run as Solr is today. QuickStart is a good move in this direction, but the same 
user-visible simplicity is needed for full LCF. The separate agent process is a 
reasonable design for execution, but a little too cumbersome for the average 
user to manage.

Unfortunately, it is expected that starting up a multi-process application will 
require platform-specific scripting.

Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.


  was:
LCF as it exists today is a bit too complex to run for an average user, 
especially with a separate agent process for crawling. LCF should be as easy to 
run as Solr is today. QuickStart is a good move in this direction, but the same 
user-visible simplicity is needed for LCF. The separate agent process is a 
reasonable design for execution, but a little too cumbersome for the average 
user to manage.

Unfortunately, it is expected that starting up a multi-process application will 
require platform-specific scripting.

Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.



> Agent process should be started automatically
> -
>
> Key: CONNECTORS-60
> URL: https://issues.apache.org/jira/browse/CONNECTORS-60
> Project: Lucene Connector Framework
>  Issue Type: Sub-task
>Reporter: Jack Krupansky
>
> LCF as it exists today is a bit too complex to run for an average user, 
> especially with a separate agent process for crawling. LCF should be as easy 
> to run as Solr is today. QuickStart is a good move in this direction, but the 
> same user-visible simplicity is needed for full LCF. The separate agent 
> process is a reasonable design for execution, but a little too cumbersome for 
> the average user to manage.
> Unfortunately, it is expected that starting up a multi-process application 
> will require platform-specific scripting.
> Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-60) Agent process should be started automatically

2010-07-12 Thread Jack Krupansky (JIRA)
Agent process should be started automatically
-

 Key: CONNECTORS-60
 URL: https://issues.apache.org/jira/browse/CONNECTORS-60
 Project: Lucene Connector Framework
  Issue Type: Sub-task
Reporter: Jack Krupansky


LCF as it exists today is a bit too complex to run for an average user, 
especially with a separate agent process for crawling. LCF should be as easy to 
run as Solr is today. QuickStart is a good move in this direction, but the same 
user-visible simplicity is needed for LCF. The separate agent process is a 
reasonable design for execution, but a little too cumbersome for the average 
user to manage.

Unfortunately, it is expected that starting up a multi-process application will 
require platform-specific scripting.

Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CONNECTORS-56) All features should be accessible through an API

2010-07-12 Thread Jack Krupansky (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated CONNECTORS-56:
-

Parent: CONNECTORS-50
Issue Type: Sub-task  (was: Improvement)

> All features should be accessible through an API
> 
>
> Key: CONNECTORS-56
> URL: https://issues.apache.org/jira/browse/CONNECTORS-56
> Project: Lucene Connector Framework
>  Issue Type: Sub-task
>  Components: Framework core
>Reporter: Jack Krupansky
>
> LCF consists of a full-featured crawling engine and a full-featured user 
> interface to access the features of that engine, but some applications are 
> better served with a full API that lets the application control the crawling 
> engine, including creation and editing of connections and creation, editing, 
> and control of jobs. Put simply, everything that a user can accomplish via 
> the LCF UI should be doable through an LCF API. All LCF objects should be 
> queryable through the API.
> A primary use case is Solr applications which currently use Aperture for 
> crawling, but would prefer the full-featured capabilities of LCF as a 
> crawling engine over Aperture.
> I do not wish to over-specify the API in this initial description, but I 
> think the LCF API should probably be a traditional REST API., with some of 
> the API elements specified via the context path, some parameters via URL 
> query parameters, and complex, detailed structures as JSON (or similar.). The 
> precise details of the API are beyond the scope of this initial description 
> and will be added incrementally once the high-level approach to the API 
> becomes reasonably settled.
> A job status and event reporting scheme is also needed in conjunction with 
> the LCF API. That requirement has already been captured as CONNECTORS-41.
> The intention for the API is to create, edit, access, and control all of the 
> objects managed by LCF. The main focus is on repositories, jobs, and status, 
> and less about document-specific crawling information, but there may be some 
> benefit to querying crawling status for individual documents as well.
> Nothing in this proposal should in any way limit or constrain the features 
> that will be available in the LCF UI. The intent is that LCF should continue 
> to have a full-featured UI, but in addition to a full-featured API.
> Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CONNECTORS-55) Bundle database server with LCF packaged product

2010-07-12 Thread Jack Krupansky (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated CONNECTORS-55:
-

Parent: CONNECTORS-50
Issue Type: Sub-task  (was: Improvement)

> Bundle database server with LCF packaged product
> 
>
> Key: CONNECTORS-55
> URL: https://issues.apache.org/jira/browse/CONNECTORS-55
> Project: Lucene Connector Framework
>  Issue Type: Sub-task
>  Components: Framework core
>Reporter: Jack Krupansky
>
> The current requirement that the user install and deploy a PostgreSQL server 
> complicates the installation and deployment of LCF for the user. Installation 
> and deployment of LCF should be as simple as Solr itself. QuickStart is great 
> for the low-end and basic evaluation, but a comparable level of simplified 
> installation and deployment is still needed for full-blown, high-end 
> environments that need the full performance of a ProstgreSQL-class database 
> server. So, PostgreSQL should be bundled with the packaged release of LCF so 
> that installation and deployment of LCF will automatically install and deploy 
> a subset of the full PostgreSQL distribution that is sufficient for the needs 
> of LCF. Starting LCF, with or without the LCF UI, should automatically start 
> the database server. Shutting down LCF should also shutdown the database 
> server process.
> A typical use case would be for a non-developer who is comfortable with Solr 
> and simply wants to crawl documents from, for example, a SharePoint 
> repository and feed them into Solr. QuickStart should work well for the low 
> end or in the early stages of evaluation, but the user would prefer to 
> evaluate "the real thing" with something resembling a production crawl of 
> thousands of documents. Such a user might not be a hard-core developer or be 
> comfortable fiddling with a lot of software components simply to do one 
> conceptually simple operation.
> It should still be possible for the user to supply database server settings 
> to override the defaults, but the LCF package should have all of the 
> best-practice settings deemed appropriate for use with LCF.
> One downside is that installation and deployment will be platform-specific 
> since there are multiple processes and PostgreSQL itself requires a 
> platform-specific installation.
> This proposal presumes that PostgreSQL is the best option for the foreseeable 
> future, but nothing here is intended to preclude support for other database 
> servers in futures releases.
> This proposal should not have any impact on QuickStart packaging or 
> deployment.
> Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-59) Packaged app ready to run with embedded Jetty app server

2010-07-12 Thread Jack Krupansky (JIRA)
Packaged app ready to run with embedded Jetty app server 
-

 Key: CONNECTORS-59
 URL: https://issues.apache.org/jira/browse/CONNECTORS-59
 Project: Lucene Connector Framework
  Issue Type: Sub-task
  Components: Framework core
Reporter: Jack Krupansky


Many potential users of LCF are not necessarily sophisticated developers who 
are prepared to "work with code", but are able to install packaged software, 
much as Solr is currently distributed. QuickStart for LCF is a good move in 
this direction, but similar packaging is needed for full LCF with a production 
database server. This issue focuses on assuring that full LCF is released as a 
packaged app suitable for download and immediate use without any additional 
software development expertise required.

Database packaging has already been called out as a distinct issue 
(CONNECTORS-55), so this issue is more of a catch-all for any lingering work 
needed to address support for full LCF as a packaged app.

Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-56) All features should be accessible through an API

2010-07-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886932#action_12886932
 ] 

Jack Krupansky commented on CONNECTORS-56:
--

Karl's suggested approach seems consistent with my own thoughts.

More details and discussion to follow, but I'd be interested in more community 
feedback of the overall, high-level concept before we get too detailed.

Also, just to remind people that my suggestion was that a full API would not be 
a requirement for the initial release. Better to get QuickStart and basic 
capabilities in peoples' hands, but some aspects of the API, such as factoring 
needed to facilitate the API might well be better off being done sooner than a 
second release. So, maybe a portion or foundation of the API would be in the 
initial release.


> All features should be accessible through an API
> 
>
> Key: CONNECTORS-56
> URL: https://issues.apache.org/jira/browse/CONNECTORS-56
> Project: Lucene Connector Framework
>  Issue Type: Improvement
>  Components: Framework core
>Reporter: Jack Krupansky
>
> LCF consists of a full-featured crawling engine and a full-featured user 
> interface to access the features of that engine, but some applications are 
> better served with a full API that lets the application control the crawling 
> engine, including creation and editing of connections and creation, editing, 
> and control of jobs. Put simply, everything that a user can accomplish via 
> the LCF UI should be doable through an LCF API. All LCF objects should be 
> queryable through the API.
> A primary use case is Solr applications which currently use Aperture for 
> crawling, but would prefer the full-featured capabilities of LCF as a 
> crawling engine over Aperture.
> I do not wish to over-specify the API in this initial description, but I 
> think the LCF API should probably be a traditional REST API., with some of 
> the API elements specified via the context path, some parameters via URL 
> query parameters, and complex, detailed structures as JSON (or similar.). The 
> precise details of the API are beyond the scope of this initial description 
> and will be added incrementally once the high-level approach to the API 
> becomes reasonably settled.
> A job status and event reporting scheme is also needed in conjunction with 
> the LCF API. That requirement has already been captured as CONNECTORS-41.
> The intention for the API is to create, edit, access, and control all of the 
> objects managed by LCF. The main focus is on repositories, jobs, and status, 
> and less about document-specific crawling information, but there may be some 
> benefit to querying crawling status for individual documents as well.
> Nothing in this proposal should in any way limit or constrain the features 
> that will be available in the LCF UI. The intent is that LCF should continue 
> to have a full-featured UI, but in addition to a full-featured API.
> Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-58) Mini-API to initially configure default connections and "example" jobs for file system and web crawl

2010-07-09 Thread Jack Krupansky (JIRA)
Mini-API to initially configure default connections and "example" jobs for file 
system and web crawl 
-

 Key: CONNECTORS-58
 URL: https://issues.apache.org/jira/browse/CONNECTORS-58
 Project: Lucene Connector Framework
  Issue Type: Sub-task
  Components: Framework core
Reporter: Jack Krupansky


Creating a basic connection setup to do a relatively simple crawl for a file 
system or web can be a daunting task for someone new to LCF. So, it would be 
nice to have a scripting file that supports an abbreviated API (subset of the 
full API discussed in CONNECTORS-56) sufficient to create a default set of 
connections and example jobs that the new user can choose from.

Beyond this initial need, this script format might be a useful form to "dump" 
all of the connections and jobs in the LCF database in a form that can be used 
to recreate an LCF configuration. Kind of a "dump and reload" capability. That 
in fact might be how the initial example script gets created.

Those are two distinct use cases, but could utilize the same feature.

The example script could have example jobs to crawl a subdirectory of LCF, 
crawl the LCF wiki, etc.

There could be more than one script. There might be example scripts for each 
form of connector.

This capability should be available for both QuickStart and the general release 
of LCF.

As just one possibility, the script format might be a sequence of JSON 
expressions, each with an initial string analogous to a servlet path to specify 
the operation to be performed, followed by the JSON form of the connection or 
job or other LCF object. Or, some other format might be more suitable.

Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-57) Solr output connector option to commit at end of job, by default

2010-07-09 Thread Jack Krupansky (JIRA)
Solr output connector option to commit at end of job, by default


 Key: CONNECTORS-57
 URL: https://issues.apache.org/jira/browse/CONNECTORS-57
 Project: Lucene Connector Framework
  Issue Type: Sub-task
  Components: Lucene/SOLR connector
Reporter: Jack Krupansky


By default, Solr will eventually commit documents that have been submitted to 
the Solr Cell interface, but the time lag can confuse and annoy people. 
Although commit strategy is a difficult issue in general, an option in LCF to 
automatically commit at the end of a job, by default, would eliminate a lot of 
potential confusion and generally be close to what the user needs.

The desired feature is that there be an option to commit for each job that uses 
the Solr output connector. This option would default to "on" (or a different 
setting based on some global configuration setting), but the user may turn it 
off if commit is only desired upon completion of some jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product

2010-07-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886724#action_12886724
 ] 

Jack Krupansky commented on CONNECTORS-55:
--

When Karl says "It *does* limit your ability to use other commands 
simultaneously" (referring to use of embedded Derby), he is referring to 
commands executed using the "executecommand" shell script, such as registering 
and unregistering connectors, which is something typically done once before 
starting the UI or once every blue moon when you want to support a new type of 
repository, but not done on as regular a basis as editing connections and jobs 
and running jobs. The java classes to execute those commands would be, by 
definition, outside of the LCF process.

> Bundle database server with LCF packaged product
> 
>
> Key: CONNECTORS-55
> URL: https://issues.apache.org/jira/browse/CONNECTORS-55
> Project: Lucene Connector Framework
>  Issue Type: Improvement
>  Components: Framework core
>Reporter: Jack Krupansky
>
> The current requirement that the user install and deploy a PostgreSQL server 
> complicates the installation and deployment of LCF for the user. Installation 
> and deployment of LCF should be as simple as Solr itself. QuickStart is great 
> for the low-end and basic evaluation, but a comparable level of simplified 
> installation and deployment is still needed for full-blown, high-end 
> environments that need the full performance of a ProstgreSQL-class database 
> server. So, PostgreSQL should be bundled with the packaged release of LCF so 
> that installation and deployment of LCF will automatically install and deploy 
> a subset of the full PostgreSQL distribution that is sufficient for the needs 
> of LCF. Starting LCF, with or without the LCF UI, should automatically start 
> the database server. Shutting down LCF should also shutdown the database 
> server process.
> A typical use case would be for a non-developer who is comfortable with Solr 
> and simply wants to crawl documents from, for example, a SharePoint 
> repository and feed them into Solr. QuickStart should work well for the low 
> end or in the early stages of evaluation, but the user would prefer to 
> evaluate "the real thing" with something resembling a production crawl of 
> thousands of documents. Such a user might not be a hard-core developer or be 
> comfortable fiddling with a lot of software components simply to do one 
> conceptually simple operation.
> It should still be possible for the user to supply database server settings 
> to override the defaults, but the LCF package should have all of the 
> best-practice settings deemed appropriate for use with LCF.
> One downside is that installation and deployment will be platform-specific 
> since there are multiple processes and PostgreSQL itself requires a 
> platform-specific installation.
> This proposal presumes that PostgreSQL is the best option for the foreseeable 
> future, but nothing here is intended to preclude support for other database 
> servers in futures releases.
> This proposal should not have any impact on QuickStart packaging or 
> deployment.
> Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product

2010-07-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886720#action_12886720
 ] 

Jack Krupansky commented on CONNECTORS-55:
--

Karl notes that "we've had to mess with the stuffer query on pretty near every 
point release of Postgresql". Letting/forcing the user to pick the 
right/acceptable release of PostgreSQL to install is error prone and a support 
headache. I would argue that it is better for the LCF team to bundle the 
right/best release of PostgreSQL with LCF.

> Bundle database server with LCF packaged product
> 
>
> Key: CONNECTORS-55
> URL: https://issues.apache.org/jira/browse/CONNECTORS-55
> Project: Lucene Connector Framework
>  Issue Type: Improvement
>  Components: Framework core
>Reporter: Jack Krupansky
>
> The current requirement that the user install and deploy a PostgreSQL server 
> complicates the installation and deployment of LCF for the user. Installation 
> and deployment of LCF should be as simple as Solr itself. QuickStart is great 
> for the low-end and basic evaluation, but a comparable level of simplified 
> installation and deployment is still needed for full-blown, high-end 
> environments that need the full performance of a ProstgreSQL-class database 
> server. So, PostgreSQL should be bundled with the packaged release of LCF so 
> that installation and deployment of LCF will automatically install and deploy 
> a subset of the full PostgreSQL distribution that is sufficient for the needs 
> of LCF. Starting LCF, with or without the LCF UI, should automatically start 
> the database server. Shutting down LCF should also shutdown the database 
> server process.
> A typical use case would be for a non-developer who is comfortable with Solr 
> and simply wants to crawl documents from, for example, a SharePoint 
> repository and feed them into Solr. QuickStart should work well for the low 
> end or in the early stages of evaluation, but the user would prefer to 
> evaluate "the real thing" with something resembling a production crawl of 
> thousands of documents. Such a user might not be a hard-core developer or be 
> comfortable fiddling with a lot of software components simply to do one 
> conceptually simple operation.
> It should still be possible for the user to supply database server settings 
> to override the defaults, but the LCF package should have all of the 
> best-practice settings deemed appropriate for use with LCF.
> One downside is that installation and deployment will be platform-specific 
> since there are multiple processes and PostgreSQL itself requires a 
> platform-specific installation.
> This proposal presumes that PostgreSQL is the best option for the foreseeable 
> future, but nothing here is intended to preclude support for other database 
> servers in futures releases.
> This proposal should not have any impact on QuickStart packaging or 
> deployment.
> Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-56) All features should be accessible through an API

2010-07-08 Thread Jack Krupansky (JIRA)
All features should be accessible through an API


 Key: CONNECTORS-56
 URL: https://issues.apache.org/jira/browse/CONNECTORS-56
 Project: Lucene Connector Framework
  Issue Type: Improvement
  Components: Framework core
Reporter: Jack Krupansky


LCF consists of a full-featured crawling engine and a full-featured user 
interface to access the features of that engine, but some applications are 
better served with a full API that lets the application control the crawling 
engine, including creation and editing of connections and creation, editing, 
and control of jobs. Put simply, everything that a user can accomplish via the 
LCF UI should be doable through an LCF API. All LCF objects should be queryable 
through the API.

A primary use case is Solr applications which currently use Aperture for 
crawling, but would prefer the full-featured capabilities of LCF as a crawling 
engine over Aperture.

I do not wish to over-specify the API in this initial description, but I think 
the LCF API should probably be a traditional REST API., with some of the API 
elements specified via the context path, some parameters via URL query 
parameters, and complex, detailed structures as JSON (or similar.). The precise 
details of the API are beyond the scope of this initial description and will be 
added incrementally once the high-level approach to the API becomes reasonably 
settled.

A job status and event reporting scheme is also needed in conjunction with the 
LCF API. That requirement has already been captured as CONNECTORS-41.

The intention for the API is to create, edit, access, and control all of the 
objects managed by LCF. The main focus is on repositories, jobs, and status, 
and less about document-specific crawling information, but there may be some 
benefit to querying crawling status for individual documents as well.

Nothing in this proposal should in any way limit or constrain the features that 
will be available in the LCF UI. The intent is that LCF should continue to have 
a full-featured UI, but in addition to a full-featured API.

Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-55) Bundle database server with LCF packaged product

2010-07-08 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886490#action_12886490
 ] 

Jack Krupansky commented on CONNECTORS-55:
--

I was using the term "install" loosely, not so much the way a typical package 
has a GUI wizard and lots of stuff going on, but more in the sense of raw Solr 
where you download, unzip, and files are in sub directories right where they 
need to be. In that sense, the theory is that a subset of PostgreSQL could be 
in a subdirectory.

Some enterprising vendor, such as Lucid Imagination, might want to have a fancy 
GUI install, but that would be beyond the scope of what I intended here.


> Bundle database server with LCF packaged product
> 
>
> Key: CONNECTORS-55
> URL: https://issues.apache.org/jira/browse/CONNECTORS-55
> Project: Lucene Connector Framework
>  Issue Type: Improvement
>  Components: Framework core
>Reporter: Jack Krupansky
>
> The current requirement that the user install and deploy a PostgreSQL server 
> complicates the installation and deployment of LCF for the user. Installation 
> and deployment of LCF should be as simple as Solr itself. QuickStart is great 
> for the low-end and basic evaluation, but a comparable level of simplified 
> installation and deployment is still needed for full-blown, high-end 
> environments that need the full performance of a ProstgreSQL-class database 
> server. So, PostgreSQL should be bundled with the packaged release of LCF so 
> that installation and deployment of LCF will automatically install and deploy 
> a subset of the full PostgreSQL distribution that is sufficient for the needs 
> of LCF. Starting LCF, with or without the LCF UI, should automatically start 
> the database server. Shutting down LCF should also shutdown the database 
> server process.
> A typical use case would be for a non-developer who is comfortable with Solr 
> and simply wants to crawl documents from, for example, a SharePoint 
> repository and feed them into Solr. QuickStart should work well for the low 
> end or in the early stages of evaluation, but the user would prefer to 
> evaluate "the real thing" with something resembling a production crawl of 
> thousands of documents. Such a user might not be a hard-core developer or be 
> comfortable fiddling with a lot of software components simply to do one 
> conceptually simple operation.
> It should still be possible for the user to supply database server settings 
> to override the defaults, but the LCF package should have all of the 
> best-practice settings deemed appropriate for use with LCF.
> One downside is that installation and deployment will be platform-specific 
> since there are multiple processes and PostgreSQL itself requires a 
> platform-specific installation.
> This proposal presumes that PostgreSQL is the best option for the foreseeable 
> future, but nothing here is intended to preclude support for other database 
> servers in futures releases.
> This proposal should not have any impact on QuickStart packaging or 
> deployment.
> Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-55) Bundle database server with LCF packaged product

2010-07-08 Thread Jack Krupansky (JIRA)
Bundle database server with LCF packaged product


 Key: CONNECTORS-55
 URL: https://issues.apache.org/jira/browse/CONNECTORS-55
 Project: Lucene Connector Framework
  Issue Type: Improvement
  Components: Framework core
Reporter: Jack Krupansky


The current requirement that the user install and deploy a PostgreSQL server 
complicates the installation and deployment of LCF for the user. Installation 
and deployment of LCF should be as simple as Solr itself. QuickStart is great 
for the low-end and basic evaluation, but a comparable level of simplified 
installation and deployment is still needed for full-blown, high-end 
environments that need the full performance of a ProstgreSQL-class database 
server. So, PostgreSQL should be bundled with the packaged release of LCF so 
that installation and deployment of LCF will automatically install and deploy a 
subset of the full PostgreSQL distribution that is sufficient for the needs of 
LCF. Starting LCF, with or without the LCF UI, should automatically start the 
database server. Shutting down LCF should also shutdown the database server 
process.

A typical use case would be for a non-developer who is comfortable with Solr 
and simply wants to crawl documents from, for example, a SharePoint repository 
and feed them into Solr. QuickStart should work well for the low end or in the 
early stages of evaluation, but the user would prefer to evaluate "the real 
thing" with something resembling a production crawl of thousands of documents. 
Such a user might not be a hard-core developer or be comfortable fiddling with 
a lot of software components simply to do one conceptually simple operation.

It should still be possible for the user to supply database server settings to 
override the defaults, but the LCF package should have all of the best-practice 
settings deemed appropriate for use with LCF.

One downside is that installation and deployment will be platform-specific 
since there are multiple processes and PostgreSQL itself requires a 
platform-specific installation.

This proposal presumes that PostgreSQL is the best option for the foreseeable 
future, but nothing here is intended to preclude support for other database 
servers in futures releases.

This proposal should not have any impact on QuickStart packaging or deployment.

Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-50) Proposal for initial two releases of LCF, including packaged product and full API

2010-06-30 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883932#action_12883932
 ] 

Jack Krupansky commented on CONNECTORS-50:
--

I expect to be able to address all of Karl's points...

> I don't think much of "umbrella tickets"... Can you break this up into more 
> specific work items...

I'll be doing that over the coming week or so. I'll keep this umbrella ticket 
not for details, but just to show how all of the individual tickets fit 
together. The discussion on this ticket is more for the overall proposal for 
two separate releases and roughly what they are.

> I'm also still looking for much greater specificity as to the use cases.

I'll provide some of that for each individual ticket. I'll try to keep the use 
cases as simple and minimalist as possible, but I'll address specific questions 
or issues that arise.

> the word "API" is so unspecific as to be essentially meaningless... I'm 
> interested in how you intend to interact with it.

Initially I'll be relatively light on detail to permit others to have some 
input on what they expect from a full API, but eventually all of the API issues 
will need to be flushed out and detailed to some extent.

> There are still a number of points ... we have discussed in the past which 
> remain but whose controversy goes unacknowledged.

Yes, with the proposed commit feature as an example. The specific ticket for 
each feature should address such concerns.

> I've discussed the limitations of using Derby as the prime database for LCF - 
> that should be captured somewhere.

Yes. There might be several database tickets. One for alternate databases. 
Another for bundling the database with LCF.


> Proposal for initial two releases of LCF, including packaged product and full 
> API
> -
>
> Key: CONNECTORS-50
> URL: https://issues.apache.org/jira/browse/CONNECTORS-50
> Project: Lucene Connector Framework
>  Issue Type: New Feature
>  Components: Framework core
>Reporter: Jack Krupansky
>   Original Estimate: 3360h
>  Remaining Estimate: 3360h
>
> Currently, LCF has a relatively high-bar for evaluation and use, requiring 
> developer expertise. Also, although LCF has a comprehensive UI, it is not 
> currently packaged for use as a crawling engine for advanced applications.
> A small set of individual feature requests are needed to address these 
> issues. They are summarized briefly to show how they fit together for two 
> initial releases of LCF, but will be broken out into individual LCF Jira 
> issues.
> Goals:
> 1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as 
> Solr is today)
> 2. LCF as a toolkit for developers needing customized crawling and repository 
> access
> 3. An API-based crawling engine that can be integrated with applications (as 
> Aperture is today)
> Larger goals:
> 1. Make it very easy for users to evaluate LCF.
> 2. Make it very easy for developers to customize LCF.
> 3. Make it very easy for appplications to fully manage and control LCF in 
> operation.
> Two phases:
> 1) Standalone, packaged app that is super-easy to evaluate and deploy. Call 
> it LCF 0.5.
> 2) API-based crawling engine for applications for which the UI might not be 
> appropriate. Call it LCF 1.0.
> Phase 1
> ---
> LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
> It would contain roughly the features that are currently in place or 
> currently underway, plus a little more.
> Specifically, LCF 0.5 would contain these additional capabilities:
> 1. Plug-in architecture for connectors (already underway)
> 2. Packaged app ready to run with embedded Jetty app server (I think this has 
> been agreed to)
> 3. Bundled with database - PostgreSQL or derby - ready to run without 
> additional manual setup
> 4. Mini-API to initially configure default connections and "example" jobs for 
> file system and web crawl
> 5. Agent process started automatically (platform-specific startup required)
> 6. Solr output connector option to commit at end of job, by default
> Installation and basic evaluation of LCF would be essentially as simple as 
> Solr is today. The example
> connections and jobs would permit the user to initiate example crawls of a 
> file system example
> directory and an example web on the LCF web site with just a couple of clicks 
> (as opposed to the
> detailed manual setup required today to create repository and output 
> connections and jobs.
> It is worth considering whether the SharePoint connector could also be 
> included as part of the default package.
> Users could then add additional connectors and repositories and jobs as 
> desired.
> Timeframe for release? Level of effort?
> Phase 2
> ---
> The essence of Phase 2 is 

[jira] Updated: (CONNECTORS-50) Proposal for initial two releases of LCF, including packaged product and full API

2010-06-30 Thread Jack Krupansky (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated CONNECTORS-50:
-

 Original Estimate: 3360h  (was: 0.08h)
Remaining Estimate: 3360h  (was: 0.08h)
   Description: 
Currently, LCF has a relatively high-bar for evaluation and use, requiring 
developer expertise. Also, although LCF has a comprehensive UI, it is not 
currently packaged for use as a crawling engine for advanced applications.

A small set of individual feature requests are needed to address these issues. 
They are summarized briefly to show how they fit together for two initial 
releases of LCF, but will be broken out into individual LCF Jira issues.

Goals:

1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as 
Solr is today)
2. LCF as a toolkit for developers needing customized crawling and repository 
access
3. An API-based crawling engine that can be integrated with applications (as 
Aperture is today)

Larger goals:

1. Make it very easy for users to evaluate LCF.
2. Make it very easy for developers to customize LCF.
3. Make it very easy for appplications to fully manage and control LCF in 
operation.

Two phases:

1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it 
LCF 0.5.
2) API-based crawling engine for applications for which the UI might not be 
appropriate. Call it LCF 1.0.


Phase 1
---

LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
It would contain roughly the features that are currently in place or currently 
underway, plus a little more.

Specifically, LCF 0.5 would contain these additional capabilities:

1. Plug-in architecture for connectors (already underway)
2. Packaged app ready to run with embedded Jetty app server (I think this has 
been agreed to)
3. Bundled with database - PostgreSQL or derby - ready to run without 
additional manual setup
4. Mini-API to initially configure default connections and "example" jobs for 
file system and web crawl
5. Agent process started automatically (platform-specific startup required)
6. Solr output connector option to commit at end of job, by default

Installation and basic evaluation of LCF would be essentially as simple as Solr 
is today. The example
connections and jobs would permit the user to initiate example crawls of a file 
system example
directory and an example web on the LCF web site with just a couple of clicks 
(as opposed to the
detailed manual setup required today to create repository and output 
connections and jobs.

It is worth considering whether the SharePoint connector could also be included 
as part of the default package.

Users could then add additional connectors and repositories and jobs as desired.

Timeframe for release? Level of effort?

Phase 2
---

The essence of Phase 2 is that LCF would be split to allow direct, full API 
access to LCF as a
crawling "engine", in additional to the full LCF UI. Call this LCF 1.0.

Specifically, LCF 1.0 would contain these additional capabilities:

1. Full API for LCF as a crawling engine
2. LCF can be bundled within an app (such as the default LCF package itself 
with its UI)
3. LCF event and activity notification for full control by an application 
(already a Jira request)

Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, 
plus whatever bug
fixes and minor enhancements might also be added.

Timeframe for release? Level of effort?

-

Issues:

- Can we package PostgreSQL with LCF so LCF can set it up?
  - Or do we need Derby for that purpose?
- Managing multiple processes (UI, database, agent, app processes)
- What exactly would the API look like? (URL, XML, JSON, YAML?)


  was:
Currently, LCF has a relatively high-bar or evaluation and use, requiring 
developer expertise. Also, although LCF has a comprehensive UI, it is not 
currently packaged for use as a crawling engine for advanced applications.

A small set of individual feature requests are needed to address these issues. 
They are summarized briefly to show how they fit together for two initial 
releases of LCF, but will be broken out into individual LCF Jira issues.

Goals:

1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as 
Solr is today)
2. LCF as a toolkit for developers needing customized crawling and repository 
access
3. An API-based crawling engine that can be integrated with applications (as 
Aperture is today)

Larger goals:

1. Make it very easy for users to evaluate LCF.
2. Make it very easy for developers to customize LCF.
3. Make it very easy for appplications to fully manage and control LCF in 
operation.

Two phases:

1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it 
LCF 0.5.
2) API-based crawling engine for applications for which the UI might not be 
appropriate. Call it LCF 1.0.


Phase 1
---

[jira] Created: (CONNECTORS-50) Proposal for initial two releases of LCF, including packaged product and full API

2010-06-30 Thread Jack Krupansky (JIRA)
Proposal for initial two releases of LCF, including packaged product and full 
API
-

 Key: CONNECTORS-50
 URL: https://issues.apache.org/jira/browse/CONNECTORS-50
 Project: Lucene Connector Framework
  Issue Type: New Feature
  Components: Framework core
Reporter: Jack Krupansky


Currently, LCF has a relatively high-bar or evaluation and use, requiring 
developer expertise. Also, although LCF has a comprehensive UI, it is not 
currently packaged for use as a crawling engine for advanced applications.

A small set of individual feature requests are needed to address these issues. 
They are summarized briefly to show how they fit together for two initial 
releases of LCF, but will be broken out into individual LCF Jira issues.

Goals:

1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as 
Solr is today)
2. LCF as a toolkit for developers needing customized crawling and repository 
access
3. An API-based crawling engine that can be integrated with applications (as 
Aperture is today)

Larger goals:

1. Make it very easy for users to evaluate LCF.
2. Make it very easy for developers to customize LCF.
3. Make it very easy for appplications to fully manage and control LCF in 
operation.

Two phases:

1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it 
LCF 0.5.
2) API-based crawling engine for applications for which the UI might not be 
appropriate. Call it LCF 1.0.


Phase 1
---

LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
It would contain roughly the features that are currently in place or currently 
underway, plus a little more.

Specifically, LCF 0.5 would contain these additional capabilities:

1. Plug-in architecture for connectors (already underway)
2. Packaged app ready to run with embedded Jetty app server (I think this has 
been agreed to)
3. Bundled with database - PostgreSQL or derby - ready to run without 
additional manual setup
4. Mini-API to initially configure default connections and "example" jobs for 
file system and web crawl
5. Agent process started automatically (platform-specific startup required)
6. Solr output connector option to commit at end of job, by default

Installation and basic evaluation of LCF would be essentially as simple as Solr 
is today. The example
connections and jobs would permit the user to initiate example crawls of a file 
system example
directory and an example web on the LCF web site with just a couple of clicks 
(as opposed to the
detailed manual setup required today to create repository and output 
connections and jobs.

It is worth considering whether the SharePoint connector could also be included 
as part of the default package.

Users could then add additional connectors and repositories and jobs as desired.

Timeframe for release? Level of effort?

Phase 2
---

The essence of Phase 2 is that LCF would be split to allow direct, full API 
access to LCF as a
crawling "engine", in additional to the full LCF UI. Call this LCF 1.0.

Specifically, LCF 1.0 would contain these additional capabilities:

1. Full API for LCF as a crawling engine
2. LCF can be bundled within an app (such as the default LCF package itself 
with its UI)
3. LCF event and activity notification for full control by an application 
(already a Jira request)

Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, 
plus whatever bug
fixes and minor enhancements might also be added.

Timeframe for release? Level of effort?

-

Issues:

- Can we package PostgreSQL with LCF so LCF can set it up?
  - Or do we need Derby for that purpose?
- Managing multiple processes (UI, database, agent, app processes)
- What exactly would the API look like? (URL, XML, JSON, YAML?)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-37) LCF should use an XML configuration file, not the simple name/value config file it currently has

2010-06-01 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874029#action_12874029
 ] 

Jack Krupansky commented on CONNECTORS-37:
--

I'll defer to the community on the logging issue, other than to simply say that 
it should be as "standard" as possible and relatively compatible with how Solr 
does logging so that it will not surprise people.

I don't have a problem with the LCF .properties file per se, other than the 
fact that since it is restricted to being strictly keyword/value pairs it 
cannot contain more complex, structured configuration information.

The main thing I'd like to see is that the current "executecommand" 
configuration setup, such as which output connectors and crawlers to register, 
be done using descriptions in a config file rather than discrete shell commands 
to manually execute. The default config file from svn checkout should have a 
default set of connectors, crawlers, etc., and have commented-out entries for 
other connectors that people can un-comment and edit as desired.

A key advantage of having such a config file is that when people do report 
problems here we can ask them to provide their config file rather than ask them 
to try to remember and re-type whatever commands they might remember that they 
intended to type.

Whether connections and jobs can be initially created from a config file is a 
larger discussion. The main point here is simply that it be easy to get LCF 
initialized and configured for the really basic stuff needed for a typical 
initial evaluation (comparable to what occurs in a Solr tutorial.) The 
proverbial "zero-hour" experience.


> LCF should use an XML configuration file, not the simple name/value config 
> file it currently has
> 
>
> Key: CONNECTORS-37
> URL: https://issues.apache.org/jira/browse/CONNECTORS-37
> Project: Lucene Connector Framework
>  Issue Type: Improvement
>  Components: Framework core
>Reporter: Karl Wright
>
> LCF's configuration file is limited in what it can specify, and XML 
> configuration files seem to offer more flexibility and are the modern norm.  
> Before backwards compatibility becomes an issue, it may therefore be worth 
> converting the property file reader to use XML rather than name/value format. 
>  It would also be nice to be able to fold the logging configuration into the 
> same file, if this seems possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.