That sounds awesome Nico. Thanks

On Tue, Jul 24, 2012 at 9:51 AM, Nicolaas Matthijs
<nicolaas.matth...@caret.cam.ac.uk> wrote:
> Looks like this has been hanging around on list for a while now, and we
> should probably try to move it forwards.
>
> The maintainability criterion can only be determined by a code review, which
> is standard practice. However, as this is proving to be such a critical
> feature in production, I'd suggest that we do a separate bugbash to evaluate
> its performance, ease of setup (running from a separate machine) and most
> importantly functional equivalence.
>
> When doing this, Kent can give his assessment of the ease of setup and the
> bugbashers can determine functional equivalence. We should also try to have
> it re-process the dummy content we usually bugbash with.
>
> If this all sounds good, I'd like to go ahead with this as soon as possible
> and run a bugbash straight after the 1.4.0 release with all of this set up.
> If the implementation survives the bugbash, it can be reviewed and merged.
>
> Does that sound reasonable?
>
> Thanks,
> Nicolaas
>
>
>
> On 23 Jul 2012, at 07:42, Carl Hall wrote:
>
> Lance, I think the work is already split the way you suggest given what I
> know about what Erik has done (rewrite in Java) and what's left (add JMS).
> Adding message queue capabilities should not hold back reviewing the
> proposed changes.
>
> I would say that it needs to meet these opening criteria for my general
> acceptance:
>
> * Be functionally equal with the current solution
> * A combination of performance and maintainability
>    * Perform can be no worse overall. There might be different hotspots in
> the java version than the current ruby solution but there shouldn't be
> anything exponentially worse. Overall, the java version has to perform at
> least as good and hopefully better. Memory usage, overall processing time,
> resource usage (iops, disc reads, caching) should all be considered.
>    * Be more maintainable than the Ruby solution. The current code has had
> very little cleaning and is not very readable. This includes using
> externally available libraries where possible. We shouldn't be maintaining
> plumbing not inherent to our domain.
> * Easier to setup. Though our current setup for the ruby PP is known to be
> problematic, we at least are accustomed to it. The proposed solution has got
> to be more straightforward and less fragile.
>
> The numbers I've seen from some preliminary testing showed the Java impl to
> take exponentially *less* time to process pdfs and was faster than the ruby
> PP in every test. It's an OSGi bundle and written in Java like the rest of
> our project which makes it easier to setup and maintain as we write far more
> java code than ruby. I believe there's also already a setup available to run
> the java PP as a standalone server.
> The Java version introduces a topia term extractor bundle which is a port
> from the Python version. This is a point of maintenance to consider but the
> python code has changed in years. It's a common impl for other languages to
> port but there wasn't a java version around. I would like to see this code
> find a permanent home in a relative OSS project. At the very least it should
> be maintained apart from OAE core to make it available to a broader
> audience.
>
> +1 to getting this code wrapped up and reviewed.
>
> On Wed, Jul 18, 2012 at 1:51 PM, Christian Vuerings
> <vueringschrist...@gmail.com> wrote:
>>
>> I'm not sure whether this is already part of the criteria list or not, but
>> what about CPU/Memory usage?
>> Is there a way we can measure that and compare it to the current ruby
>> based PP?
>> When I currently run the ruby PP locally, it's usually one of the
>> processes that uses the most resources.
>>
>> One other thing I'm curious about is how well it will compress/handle the
>> different file formats (png/jpg/gif/psd)
>>
>> These are just 2 things that I'm interested in since they (can) have an
>> impact on the overall performance.
>>
>>
>> - Christian
>>
>> On Jul 18, 2012, at 12:41 PM, Lance Speelmon wrote:
>>
>> Does anyone have an opinion about adopting the new java based PP?
>> Specifically can you articulate acceptance criteria for such an adoption?
>> e.g.
>>
>> Must support same preview behaviors as existing ruby-based PP.
>> Must pass QA with all blocker and critical items resolved.
>> Must start automatically OOTB to support the tire-kicking, web-start uses.
>> Must leverage as much 3rd party code as possible to minimize ownership
>> costs.
>> Must pass code review.
>> Unit test code coverage.
>> Basic config and deployment documentation.
>>
>>
>> What is missing?  Anything?  Thanks, L
>>
>>
>>
>> On Jul 17, 2012, at 2:58 PM, Lance Speelmon <la...@rsmart.com> wrote:
>>
>> Is there any way to break this work down into chunks?  e.g.
>>
>> 1. Adopt java PP as default PP moving forward. What are the acceptance
>> criteria?
>> 2. Enhance new java PP with message queue abilities.
>>
>> WDYT?  Thanks, L
>>
>> On Jul 17, 2012, at 8:34 AM, Carl Hall <c...@hallwaytech.com> wrote:
>>
>> Each app server could run it's own queues but that wouldn't support
>> building a farm of PP processors unless we also teach them to talk to
>> multiple JMS servers. Maybe something like DNS round-robin would suffice?
>>
>> On Tue, Jul 17, 2012 at 8:25 AM, Erik Froese <erik.fro...@gmail.com>
>> wrote:
>>>
>>> Do we need to cluster activemq? Can't each app server service its own
>>> queues?
>>> Erik
>>>
>>> On Tue, Jul 17, 2012 at 11:23 AM, Carl Hall <c...@hallwaytech.com> wrote:
>>> > What Erik describes has been on the dev wish list for a little while
>>> > now.
>>> > Moving to an event-driven model would allow us to build out concurrency
>>> > but
>>> > there also comes the question of clustering ActiveMQ.
>>> >
>>> >
>>> > On Thu, Jul 12, 2012 at 6:27 AM, Erik Froese <erik.fro...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hey David,
>>> >>
>>> >> The code is not clustered.
>>> >>
>>> >> You'd need to write an event listener that would fire when new content
>>> >> is uploaded. It would put the content ids on a JMS queue. Then
>>> >> implement a ContentFetcher that grabs a message off of the queue and
>>> >> wire that into the PPI. Events and Messages are not clustered in OAE
>>> >> (AFAIK) so this would have to be run on each app server.
>>> >>
>>> >> While we're in event-land it'd be nice to be able to regenerate a
>>> >> preview when a content body is updated. I'm not sure if this is
>>> >> possible yet.
>>> >>
>>> >> I'm not sure how we'd limit the CPU usage yet either. You could manage
>>> >> the quartz schedule that runs the PPI.
>>> >>
>>> >> We can also disable concurrent executions of the job.
>>> >>
>>> >> Erik
>>> >>
>>> >> On Wed, Jul 11, 2012 at 8:44 PM, Roma, David <dr...@csu.edu.au> wrote:
>>> >> > Awesome news Erik!
>>> >> >
>>> >> > Our Ops guys will be stoked when we can get this in.. A couple of
>>> >> > questions from someone who hasn't looked at the code or read too
>>> >> > deeply....
>>> >> > - Does it support clustering
>>> >> >         -e.g. can we just run it side by side on each of our app
>>> >> > servers
>>> >> > and they will play nice sharing out processing jobs?
>>> >> >         -will it affect performance of the app servers much? Can we
>>> >> > limit the preview processor to say 10%cpu and 500mb ram or low
>>> >> > priority
>>> >> > threads or limit the number of items to process or something? This
>>> >> > would
>>> >> > make for a nice simple deployment that wouldn't threaten the app
>>> >> > server
>>> >> > stability.
>>> >> >
>>> >> > Cheers,
>>> >> > Dave.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: oae-dev-boun...@collab.sakaiproject.org
>>> >> > [mailto:oae-dev-boun...@collab.sakaiproject.org] On Behalf Of Erik
>>> >> > Froese
>>> >> > Sent: Thursday, 12 July 2012 2:37 AM
>>> >> > To: Carl Hall
>>> >> > Cc: oae-dev@collab.sakaiproject.org; Clay Fenlason
>>> >> > Subject: Re: [oae-dev] Moving the preview processor to java
>>> >> >
>>> >> > Hey everyone,
>>> >> >
>>> >> > Its been a few months but I actually implemented the Java preview
>>> >> > processor as an OSGi bundle. I filed a ticket for it [1]
>>> >> >
>>> >> > I'm not sure where to go from here. Is this something that could be
>>> >> > included POST 1.4.0?
>>> >> > Should I open a PR so we can review the code? If so, PR against
>>> >> > which
>>> >> > branch?
>>> >> >
>>> >> > Either way, have a look, give it a go. We'll probably wind up using
>>> >> > it
>>> >> > at rSmart.
>>> >> >
>>> >> > Erik
>>> >> >
>>> >> > [1] https://jira.sakaiproject.org/browse/KERN-3021
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Apr 17, 2012 at 9:09 AM, Carl Hall <c...@hallwaytech.com>
>>> >> > wrote:
>>> >> >> I totally agree that we should ally ourselves with other
>>> >> >> communities. I
>>> >> >> see
>>> >> >> where we get docsplit from DocumentCloud[1] and we use several
>>> >> >> other
>>> >> >> libraries for processing that they've most likely contributed to.
>>> >> >> The Java approach is very little custom code compared to the
>>> >> >> libraries
>>> >> >> we're
>>> >> >> getting from Apache (tika, sanselan, commons, pdfbox), so we would
>>> >> >> still
>>> >> >> building on the shoulders of our friendly community giants.
>>> >> >>
>>> >> >> 1 https://github.com/documentcloud/docsplit
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Apr 14, 2012 at 5:43 AM, John Norman <j...@caret.cam.ac.uk>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> My recollection (perhaps wrong) is that  we got this from Document
>>> >> >>> Cloud
>>> >> >>> and I /think/ Chris Roby found it. Document Cloud seems a very
>>> >> >>> relevant and
>>> >> >>> valuable project. If we were able to help them while helping
>>> >> >>> ourselves,
>>> >> >>> other good things could come from the relationship. My general
>>> >> >>> point
>>> >> >>> is that
>>> >> >>> we are thin on resources and so, in principle, symbiotic
>>> >> >>> relationships
>>> >> >>> are
>>> >> >>> helpful.
>>> >> >>>
>>> >> >>> http://www.documentcloud.org/home
>>> >> >>>
>>> >> >>> John
>>> >> >>>
>>> >> >>> Sent from my iPad
>>> >> >>>
>>> >> >>> On 13 Apr 2012, at 17:03, Carl Hall <c...@hallwaytech.com> wrote:
>>> >> >>>
>>> >> >>> I agree with Daniel that our modifications to the preview
>>> >> >>> processor
>>> >> >>> have
>>> >> >>> put its ownership square on us. Was there a community that this
>>> >> >>> script
>>> >> >>> was
>>> >> >>> borrowed from? I thought it was original development that uses
>>> >> >>> various
>>> >> >>> external libraries to do the actual work. This is the approach
>>> >> >>> that
>>> >> >>> Erik is
>>> >> >>> taking with the rewrite using things like Tika (text extraction),
>>> >> >>> Sanselan
>>> >> >>> (images) and a Java port of the python topia.termextract library.
>>> >> >>>
>>> >> >>> I certainly don't deny the speed of development that was realized
>>> >> >>> in
>>> >> >>> creating the PP but the current state of the code is a mess at
>>> >> >>> best.
>>> >> >>> Reuse
>>> >> >>> of libraries in Java is showing a fast rewrite with very little
>>> >> >>> managed code
>>> >> >>> on our part.
>>> >> >>>
>>> >> >>>
>>> >> >>> On Fri, Apr 13, 2012 at 12:50 AM, Daniel Parry
>>> >> >>> <dan...@caret.cam.ac.uk>
>>> >> >>> wrote:
>>> >> >>>>
>>> >> >>>> On Thu, Apr 12, 2012 at 04:21:36PM -0400, Clay Fenlason wrote:
>>> >> >>>> > I think this response is at best orthogonal to the point John's
>>> >> >>>> > trying
>>> >> >>>> > to raise, though I gather this kind of reaction must come from
>>> >> >>>> > a
>>> >> >>>> > buildup of some real frustration around the PP, which I don't
>>> >> >>>> > mean
>>> >> >>>> > to
>>> >> >>>> > discount. I also think John was pretty clear about what he was
>>> >> >>>> > suggesting: that there be a conversation with the community we
>>> >> >>>> > got
>>> >> >>>> > the
>>> >> >>>> > PP from, if the conversation hasn't happened already, to see if
>>> >> >>>> > there
>>> >> >>>> > might still be a way to work together before we decide to just
>>> >> >>>> > own
>>> >> >>>> > it
>>> >> >>>> > ourselves.
>>> >> >>>>
>>> >> >>>> I'd suggest the way that the preview processor was being extended
>>> >> >>>> (initially a
>>> >> >>>> python server add on, followed by a ruby rewrite for tag
>>> >> >>>> extraction)
>>> >> >>>> and
>>> >> >>>> the
>>> >> >>>> variety of ruby versions that deployers were using and the
>>> >> >>>> methods
>>> >> >>>> used
>>> >> >>>> to
>>> >> >>>> deploy it were indicative of a) the OAE community already
>>> >> >>>> 'owning'
>>> >> >>>> the PP
>>> >> >>>> and b)
>>> >> >>>> as has already been pointed out some standardization needed
>>> >> >>>> restoring
>>> >> >>>> and
>>> >> >>>> additional functionality added for deployers.  Hence, the list
>>> >> >>>> was
>>> >> >>>> pinged[0] a
>>> >> >>>> while back to ask about standardizing and extending in java. I'm
>>> >> >>>> not
>>> >> >>>> sure
>>> >> >>>> of any
>>> >> >>>> other way to contact the original PP community or if such a
>>> >> >>>> community
>>> >> >>>> separate
>>> >> >>>> to OAE even still exists?
>>> >> >>>>
>>> >> >>>> Best wishes,
>>> >> >>>>
>>> >> >>>> Daniel
>>> >> >>>>
>>> >> >>>> [0]
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> http://collab.sakaiproject.org/pipermail/oae-dev/2012-April/001677.html
>>> >> >>>>
>>> >> >>>> --
>>> >> >>>> --| Daniel Parry: dan...@caret.cam.ac.uk. www.caret.cam.ac.uk/
>>> >> >>>> |--
>>> >> >>>> "Of all the things a leader should fear, complacency should
>>> >> >>>>  head the list." [John C. Maxwell]
>>> >> >>>> _______________________________________________
>>> >> >>>> oae-dev mailing list
>>> >> >>>> oae-dev@collab.sakaiproject.org
>>> >> >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>> >> >>>
>>> >> >>>
>>> >> >>> _______________________________________________
>>> >> >>> oae-dev mailing list
>>> >> >>> oae-dev@collab.sakaiproject.org
>>> >> >>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> oae-dev mailing list
>>> >> >> oae-dev@collab.sakaiproject.org
>>> >> >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>> >> >>
>>> >> > _______________________________________________
>>> >> > oae-dev mailing list
>>> >> > oae-dev@collab.sakaiproject.org
>>> >> > http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>> >> > Charles Sturt University
>>> >> >
>>> >> > | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN |
>>> >> > MELBOURNE |
>>> >> > ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA |
>>> >> >
>>> >> > LEGAL NOTICE
>>> >> > This email (and any attachment) is confidential and is intended for
>>> >> > the
>>> >> > use of the addressee(s) only. If you are not the intended recipient
>>> >> > of this
>>> >> > email, you must not copy, distribute, take any action in reliance on
>>> >> > it or
>>> >> > disclose it to anyone. Any confidentiality is not waived or lost by
>>> >> > reason
>>> >> > of mistaken delivery. Email should be checked for viruses and
>>> >> > defects before
>>> >> > opening. Charles Sturt University (CSU) does not accept liability
>>> >> > for
>>> >> > viruses or any consequence which arise as a result of this email
>>> >> > transmission. Email communications with CSU may be subject to
>>> >> > automated
>>> >> > email filtering, which could result in the delay or deletion of a
>>> >> > legitimate
>>> >> > email before it is read at CSU. The views expressed in this email
>>> >> > are not
>>> >> > necessarily those of CSU.
>>> >> >
>>> >> > Charles Sturt University in Australia  http://www.csu.edu.au  The
>>> >> > Chancellery, Panorama Avenue, Bathurst NSW Australia 2795  ABN: 83
>>> >> > 878 708
>>> >> > 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B
>>> >> > (ACT)
>>> >> >
>>> >> > Charles Sturt University in Ontario  http://www.charlessturt.ca 860
>>> >> > Harrington Court, Burlington Ontario Canada L7N 3N4  Registration:
>>> >> > www.peqab.ca
>>> >> >
>>> >> > Consider the environment before printing this email.
>>> >
>>> >
>>
>>
>> _______________________________________________
>> oae-dev mailing list
>> oae-dev@collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>
>>
>>
>> _______________________________________________
>> oae-dev mailing list
>> oae-dev@collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>
>>
>>
>> _______________________________________________
>> oae-dev mailing list
>> oae-dev@collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>
>
> _______________________________________________
> oae-dev mailing list
> oae-dev@collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>
>
>
> _______________________________________________
> oae-dev mailing list
> oae-dev@collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>
_______________________________________________
oae-dev mailing list
oae-dev@collab.sakaiproject.org
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Reply via email to