Hi,

Thank Grant and Chris for the feedback, and Jukka for your comments.

2008/11/29 Jukka Zitting <[EMAIL PROTECTED]>

>
> > 1) release naming: should probably be apache-tika-0.2-src.jar  i seem to
> > recall someone somewhere saying that was important for apache releases
> > (and it's more consistent with the the 0.1 release)
>
> Good point, we probably should do that. Dave, can you take care of this?
>

I can sort this out.


> > 2) release file format: the 0.1 release seems to have been a tar.gz ...
> > was a concious choice made by the community to switch to distributing as
> a
> > src jar? otherwise you may want to publish both, or stick with tar.gz for
> > consistency (the docs on the website refer to the tarball when giving
> > examples of downloading and verifying)
>
> At least I was pretty vocal about switching to the jar format for our
> source releases, see most notably
> http://markmail.org/message/mwi4w2odztsxlcgi and
> http://markmail.org/message/jnthn2q4pghqxjlc. Unless the PMC prefers a
> tarball, at least I would rather fix the documentation than change the
> packaging format.
>

I agree with Jukka, but I am happy to add a tarball if required.


> > 3) incubator refs: as mentioned before, there are a lot of refrences to
> > the incubator that should be switched to point to lucene...
> >
> > [EMAIL PROTECTED]:~/tmp/tika-release/rc1/tika-0.2$ grep -lir incubator .
> > ./pom.xml
> > ./src/site/apt/download.apt
> > ./src/site/apt/index.apt
> > ./README.txt
>
> Fair point, and it goes with my statement above about getting the
> release out as soon as possible after graduation. In Tika trunk we've
> now updated all Incubator references, so any new release will have
> this issue fixed. Given the PMC pushback; perhaps we should just scrap
> the 0.2 release and go directly to 0.3 based on the current trunk?
>

If we were happy to release with 0.3 trunk, which I certainly am, I think
this would be best. Although I can just up date the 0.2 branch in line with
trunk if not.


> > 4) user docs: (I think grant may have already mentioned this) The
> > README.txt file talks about building Tika, but there doesn't seem to be
> > anything in the release that describes how to use Tika ... has any
> thought
> > been given to including more docs in the release it self? --
> > gettingstarted.html perhaps? ... at the very least a paragraph should be
> > added to the README refering to the gettingstarted.html page.
> >
> > Personally, i think including documentation.html and formats.html in the
> > release are also important -- they're going to change between releases,
> > probably more then the "getting started" type info, and should be
> > "versioned" so moving forward people with older versions won't get
> > misslead by the docs on the site.
>
> The available documentation is already included in the source release
> in src/site and can be generated with "mvn site". The fact that the
> documentation isn't complete (e.g. the Getting Started guide didn't
> yet exist in 0.2 release candidate) shouldn't IMHO be a blocker for a
> release (especially for a 0.x one). In any case it's an area where we
> are clearly getting better during the 0.x release cycle.
>
> The README could mention "mvn site" as the command to generate the
> official documentation for that release and we could include a static
> snapshot of that in http://lucene.apache.org/tika/ for reference. This
> is something we should look at.
>

In the future we could update our maven build to produce and add this
information in the binary and source releases, but for just now I think this
is a good approach.


> > 5) artifacts missing: i tried following along with the
> gettingstarted.html
> > (my first time using maven BTW so i may have messed something up) and ran
> > into a snag... "mvn install" download a bunch of dependencies (i think
> > they were maven's own dependencies since i'd never used it before), ran
> > some test (these definitely had tika in the name) then downloaded some
> > more things, then told me it was installing tika-0.2.jar in my ~/.m2
> > directory.  When i looked at the next section "Build artifacts" it
> refered
> > to 3 jars in my target directory -- but i only have one...
> >
> > [EMAIL PROTECTED]:~/tmp/tika-release/rc1/tika-0.2$ find target -name \*jar
> > target/tika-0.2.jar
> >
> > ...is the gettingstarted.html wrong, or did the build not run correctly?
>
>
As Jukka states, the 0.2 release was only meant to contain the single
release artifact.


> > 6) RAT: Apache RAT noticed the following files missing license info...
> >
> >  !?????
> /home/hossman/tmp/tika-release/rc1/tika-0.2/src/site/resources/tika.svg
> >  !?????
> /home/hossman/tmp/tika-release/rc1/tika-0.2/src/site/resources/tikaNoText.svg
> >  !?????
> /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testHTML.html
> >  !?????
> /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testHTML_utf8.html
> >  !?????
> /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testRTF.rtf
> >  !?????
> /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testTXT.txt
> >  !?????
> /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testXHTML.html
> >  !?????
> /home/hossman/tmp/tika-release/rc1/tika-0.2/src/test/resources/test-documents/testXML.xml
> >
> > ...I don't know if i've ever heard an opinion on needing to include the
> > ASL header in *.svg files (they are xml, but they are also clearly
> > generated by inkscape), but I do remember someone pointing out that test
> > data files in formats that are capable of containing comments in them
> (ie:
> > xml, html, etc...) should include the ASL header, such as...
> >
> >
> http://svn.apache.org/repos/asf/lucene/solr/trunk/example/exampledocs/hd.xml
>
> I think that having the license header in such test files disrupts the
> main purpose of the test cases (i.e. you want to check whether the
> extracted text contains some specific test phrase, not necessarily the
> Apache license header), so at least I prefer to not include the
> license header in those test files. See also
> http://markmail.org/message/m7jmgl3qncsffygb for related discussion on
> [EMAIL PROTECTED]
>
> However, if the PMC so wishes, I don't see any big problem in us
> adding the license headers in these test files. Note that in some
> future test files this might be troublesome, but for existing tests I
> don't see problems with this.
>

I have added this already - was maybe a bit quick given the issues Jukka is
raising, but as he points out the existing test-cases are fine. We probably
want to clear this one up for the future.


> > 7) javadocs: maybe this is something that is obvious to maven users, and
> > as a non-maven user i just don't know the magic incantation, but i
> > couldn't find any generated javadocs in the release (or in the "target"
> > directory after running "mv install") ... since Tika is primarily a
> > library people will use in java apps, this seems kind of important.  If
> > there is a magic maven incantation to build these, let's included the
> > instructions somewhere (since the gettingstarted guide suggests that
> maven
> > is neccessary to build tika, but not to use it (per the Artifacts and Ant
> > sections)
>
> Good point. The README could point out "mvn site" as the way to
> produce a browseable version of all documentation associated with the
> release, and as an added service we could (should?) publish specific
> per-version documentation also on the Tika web site.
>
> On the other hand, I don't see documentation as being a valid blocker
> for any 0.x release.


Again like adding the documentation, we can improve our maven build to
generate a javadoc jar with the build however in the mean time if someone
wants to generate a javadoc jar they can use the following maven command, or
the approach mentioned by Jukka:

mvn javadoc:jar

Cheers,
Dave

Reply via email to