Hi, Thank Grant and Chris for the feedback, and Jukka for your comments.
2008/11/29 Jukka Zitting <[EMAIL PROTECTED]> > > > 1) release naming: should probably be apache-tika-0.2-src.jar i seem to > > recall someone somewhere saying that was important for apache releases > > (and it's more consistent with the the 0.1 release) > > Good point, we probably should do that. Dave, can you take care of this? > I can sort this out. > > 2) release file format: the 0.1 release seems to have been a tar.gz ... > > was a concious choice made by the community to switch to distributing as > a > > src jar? otherwise you may want to publish both, or stick with tar.gz for > > consistency (the docs on the website refer to the tarball when giving > > examples of downloading and verifying) > > At least I was pretty vocal about switching to the jar format for our > source releases, see most notably > http://markmail.org/message/mwi4w2odztsxlcgi and > http://markmail.org/message/jnthn2q4pghqxjlc. Unless the PMC prefers a > tarball, at least I would rather fix the documentation than change the > packaging format. > I agree with Jukka, but I am happy to add a tarball if required. > > 3) incubator refs: as mentioned before, there are a lot of refrences to > > the incubator that should be switched to point to lucene... > > > > [EMAIL PROTECTED]:~/tmp/tika-release/rc1/tika-0.2$ grep -lir incubator . > > ./pom.xml > > ./src/site/apt/download.apt > > ./src/site/apt/index.apt > > ./README.txt > > Fair point, and it goes with my statement above about getting the > release out as soon as possible after graduation. In Tika trunk we've > now updated all Incubator references, so any new release will have > this issue fixed. Given the PMC pushback; perhaps we should just scrap > the 0.2 release and go directly to 0.3 based on the current trunk? > If we were happy to release with 0.3 trunk, which I certainly am, I think this would be best. Although I can just up date the 0.2 branch in line with trunk if not. > > 4) user docs: (I think grant may have already mentioned this) The > > README.txt file talks about building Tika, but there doesn't seem to be > > anything in the release that describes how to use Tika ... has any > thought > > been given to including more docs in the release it self? -- > > gettingstarted.html perhaps? ... at the very least a paragraph should be > > added to the README refering to the gettingstarted.html page. > > > > Personally, i think including documentation.html and formats.html in the > > release are also important -- they're going to change between releases, > > probably more then the "getting started" type info, and should be > > "versioned" so moving forward people with older versions won't get > > misslead by the docs on the site. > > The available documentation is already included in the source release > in src/site and can be generated with "mvn site". The fact that the > documentation isn't complete (e.g. the Getting Started guide didn't > yet exist in 0.2 release candidate) shouldn't IMHO be a blocker for a > release (especially for a 0.x one). In any case it's an area where we > are clearly getting better during the 0.x release cycle. > > The README could mention "mvn site" as the command to generate the > official documentation for that release and we could include a static > snapshot of that in http://lucene.apache.org/tika/ for reference. This > is something we should look at. > In the future we could update our maven build to produce and add this information in the binary and source releases, but for just now I think this is a good approach. > > 5) artifacts missing: i tried following along with the > gettingstarted.html > > (my first time using maven BTW so i may have messed something up) and ran > > into a snag... "mvn install" download a bunch of dependencies (i think > > they were maven's own dependencies since i'd never used it before), ran > > some test (these definitely had tika in the name) then downloaded some > > more things, then told me it was installing tika-0.2.jar in my ~/.m2 > > directory. When i looked at the next section "Build artifacts" it > refered > > to 3 jars in my target directory -- but i only have one... > > > > [EMAIL PROTECTED]:~/tmp/tika-release/rc1/tika-0.2$ find target -name \*jar > > target/tika-0.2.jar > > > > ...is the gettingstarted.html wrong, or did the build not run correctly? > > As Jukka states, the 0.2 release was only meant to contain the single release artifact. > > 6) RAT: Apache RAT noticed the following files missing license info... > > > > !????? > /home/hossman/tmp/tika-release/rc1/tika-0.2/src/site/resources/tika.svg > > !????? > /home/hossman/tmp/tika-release/rc1/tika-0.2/src/site/resources/tikaNoText.svg > > !????? > /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testHTML.html > > !????? > /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testHTML_utf8.html > > !????? > /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testRTF.rtf > > !????? > /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testTXT.txt > > !????? > /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testXHTML.html > > !????? > /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testXML.xml > > > > ...I don't know if i've ever heard an opinion on needing to include the > > ASL header in *.svg files (they are xml, but they are also clearly > > generated by inkscape), but I do remember someone pointing out that test > > data files in formats that are capable of containing comments in them > (ie: > > xml, html, etc...) should include the ASL header, such as... > > > > > http://svn.apache.org/repos/asf/lucene/solr/trunk/example/exampledocs/hd.xml > > I think that having the license header in such test files disrupts the > main purpose of the test cases (i.e. you want to check whether the > extracted text contains some specific test phrase, not necessarily the > Apache license header), so at least I prefer to not include the > license header in those test files. See also > http://markmail.org/message/m7jmgl3qncsffygb for related discussion on > [EMAIL PROTECTED] > > However, if the PMC so wishes, I don't see any big problem in us > adding the license headers in these test files. Note that in some > future test files this might be troublesome, but for existing tests I > don't see problems with this. > I have added this already - was maybe a bit quick given the issues Jukka is raising, but as he points out the existing test-cases are fine. We probably want to clear this one up for the future. > > 7) javadocs: maybe this is something that is obvious to maven users, and > > as a non-maven user i just don't know the magic incantation, but i > > couldn't find any generated javadocs in the release (or in the "target" > > directory after running "mv install") ... since Tika is primarily a > > library people will use in java apps, this seems kind of important. If > > there is a magic maven incantation to build these, let's included the > > instructions somewhere (since the gettingstarted guide suggests that > maven > > is neccessary to build tika, but not to use it (per the Artifacts and Ant > > sections) > > Good point. The README could point out "mvn site" as the way to > produce a browseable version of all documentation associated with the > release, and as an added service we could (should?) publish specific > per-version documentation also on the Tika web site. > > On the other hand, I don't see documentation as being a valid blocker > for any 0.x release. Again like adding the documentation, we can improve our maven build to generate a javadoc jar with the build however in the mean time if someone wants to generate a javadoc jar they can use the following maven command, or the approach mentioned by Jukka: mvn javadoc:jar Cheers, Dave