Looks like this has been hanging around on list for a while now, and we should probably try to move it forwards.

The maintainability criterion can only be determined by a code review, which is standard practice. However, as this is proving to be such a critical feature in production, I'd suggest that we do a separate bugbash to evaluate its performance, ease of setup (running from a separate machine) and most importantly functional equivalence.

When doing this, Kent can give his assessment of the ease of setup and the bugbashers can determine functional equivalence. We should also try to have it re-process the dummy content we usually bugbash with.

If this all sounds good, I'd like to go ahead with this as soon as possible and run a bugbash straight after the 1.4.0 release with all of this set up. If the implementation survives the bugbash, it can be reviewed and merged.

Does that sound reasonable?

Thanks,
Nicolaas



On 23 Jul 2012, at 07:42, Carl Hall wrote:

Lance, I think the work is already split the way you suggest given what I know about what Erik has done (rewrite in Java) and what's left (add JMS). Adding message queue capabilities should not hold back reviewing the proposed changes.

I would say that it needs to meet these opening criteria for my general acceptance:

* Be functionally equal with the current solution
* A combination of performance and maintainability
* Perform can be no worse overall. There might be different hotspots in the java version than the current ruby solution but there shouldn't be anything exponentially worse. Overall, the java version has to perform at least as good and hopefully better. Memory usage, overall processing time, resource usage (iops, disc reads, caching) should all be considered. * Be more maintainable than the Ruby solution. The current code has had very little cleaning and is not very readable. This includes using externally available libraries where possible. We shouldn't be maintaining plumbing not inherent to our domain. * Easier to setup. Though our current setup for the ruby PP is known to be problematic, we at least are accustomed to it. The proposed solution has got to be more straightforward and less fragile.

The numbers I've seen from some preliminary testing showed the Java impl to take exponentially *less* time to process pdfs and was faster than the ruby PP in every test. It's an OSGi bundle and written in Java like the rest of our project which makes it easier to setup and maintain as we write far more java code than ruby. I believe there's also already a setup available to run the java PP as a standalone server. The Java version introduces a topia term extractor bundle which is a port from the Python version. This is a point of maintenance to consider but the python code has changed in years. It's a common impl for other languages to port but there wasn't a java version around. I would like to see this code find a permanent home in a relative OSS project. At the very least it should be maintained apart from OAE core to make it available to a broader audience.

+1 to getting this code wrapped up and reviewed.

On Wed, Jul 18, 2012 at 1:51 PM, Christian Vuerings <vueringschrist...@gmail.com > wrote: I'm not sure whether this is already part of the criteria list or not, but what about CPU/Memory usage? Is there a way we can measure that and compare it to the current ruby based PP? When I currently run the ruby PP locally, it's usually one of the processes that uses the most resources.

One other thing I'm curious about is how well it will compress/ handle the different file formats (png/jpg/gif/psd)

These are just 2 things that I'm interested in since they (can) have an impact on the overall performance.


- Christian

On Jul 18, 2012, at 12:41 PM, Lance Speelmon wrote:

Does anyone have an opinion about adopting the new java based PP? Specifically can you articulate acceptance criteria for such an adoption? e.g.

Must support same preview behaviors as existing ruby-based PP.
Must pass QA with all blocker and critical items resolved.
Must start automatically OOTB to support the tire-kicking, web- start uses. Must leverage as much 3rd party code as possible to minimize ownership costs.
Must pass code review.
Unit test code coverage.
Basic config and deployment documentation.

What is missing?  Anything?  Thanks, L



On Jul 17, 2012, at 2:58 PM, Lance Speelmon <la...@rsmart.com> wrote:

Is there any way to break this work down into chunks?  e.g.

1. Adopt java PP as default PP moving forward. What are the acceptance criteria?
2. Enhance new java PP with message queue abilities.

WDYT?  Thanks, L

On Jul 17, 2012, at 8:34 AM, Carl Hall <c...@hallwaytech.com> wrote:

Each app server could run it's own queues but that wouldn't support building a farm of PP processors unless we also teach them to talk to multiple JMS servers. Maybe something like DNS round-robin would suffice?

On Tue, Jul 17, 2012 at 8:25 AM, Erik Froese <erik.fro...@gmail.com> wrote: Do we need to cluster activemq? Can't each app server service its own queues?
Erik

On Tue, Jul 17, 2012 at 11:23 AM, Carl Hall <c...@hallwaytech.com> wrote: > What Erik describes has been on the dev wish list for a little while now. > Moving to an event-driven model would allow us to build out concurrency but
> there also comes the question of clustering ActiveMQ.
>
>
> On Thu, Jul 12, 2012 at 6:27 AM, Erik Froese <erik.fro...@gmail.com > wrote:
>>
>> Hey David,
>>
>> The code is not clustered.
>>
>> You'd need to write an event listener that would fire when new content
>> is uploaded. It would put the content ids on a JMS queue. Then
>> implement a ContentFetcher that grabs a message off of the queue and >> wire that into the PPI. Events and Messages are not clustered in OAE
>> (AFAIK) so this would have to be run on each app server.
>>
>> While we're in event-land it'd be nice to be able to regenerate a
>> preview when a content body is updated. I'm not sure if this is
>> possible yet.
>>
>> I'm not sure how we'd limit the CPU usage yet either. You could manage
>> the quartz schedule that runs the PPI.
>>
>> We can also disable concurrent executions of the job.
>>
>> Erik
>>
>> On Wed, Jul 11, 2012 at 8:44 PM, Roma, David <dr...@csu.edu.au> wrote:
>> > Awesome news Erik!
>> >
>> > Our Ops guys will be stoked when we can get this in.. A couple of >> > questions from someone who hasn't looked at the code or read too deeply....
>> > - Does it support clustering
>> > -e.g. can we just run it side by side on each of our app servers
>> > and they will play nice sharing out processing jobs?
>> > -will it affect performance of the app servers much? Can we >> > limit the preview processor to say 10%cpu and 500mb ram or low priority >> > threads or limit the number of items to process or something? This would >> > make for a nice simple deployment that wouldn't threaten the app server
>> > stability.
>> >
>> > Cheers,
>> > Dave.
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: oae-dev-boun...@collab.sakaiproject.org
>> > [mailto:oae-dev-boun...@collab.sakaiproject.org] On Behalf Of Erik Froese
>> > Sent: Thursday, 12 July 2012 2:37 AM
>> > To: Carl Hall
>> > Cc: oae-dev@collab.sakaiproject.org; Clay Fenlason
>> > Subject: Re: [oae-dev] Moving the preview processor to java
>> >
>> > Hey everyone,
>> >
>> > Its been a few months but I actually implemented the Java preview
>> > processor as an OSGi bundle. I filed a ticket for it [1]
>> >
>> > I'm not sure where to go from here. Is this something that could be
>> > included POST 1.4.0?
>> > Should I open a PR so we can review the code? If so, PR against which
>> > branch?
>> >
>> > Either way, have a look, give it a go. We'll probably wind up using it
>> > at rSmart.
>> >
>> > Erik
>> >
>> > [1] https://jira.sakaiproject.org/browse/KERN-3021
>> >
>> >
>> >
>> > On Tue, Apr 17, 2012 at 9:09 AM, Carl Hall <c...@hallwaytech.com > wrote: >> >> I totally agree that we should ally ourselves with other communities. I
>> >> see
>> >> where we get docsplit from DocumentCloud[1] and we use several other >> >> libraries for processing that they've most likely contributed to. >> >> The Java approach is very little custom code compared to the libraries
>> >> we're
>> >> getting from Apache (tika, sanselan, commons, pdfbox), so we would
>> >> still
>> >> building on the shoulders of our friendly community giants.
>> >>
>> >> 1 https://github.com/documentcloud/docsplit
>> >>
>> >>
>> >>
>> >> On Sat, Apr 14, 2012 at 5:43 AM, John Norman <j...@caret.cam.ac.uk >
>> >> wrote:
>> >>>
>> >>> My recollection (perhaps wrong) is that we got this from Document
>> >>> Cloud
>> >>> and I /think/ Chris Roby found it. Document Cloud seems a very
>> >>> relevant and
>> >>> valuable project. If we were able to help them while helping
>> >>> ourselves,
>> >>> other good things could come from the relationship. My general point
>> >>> is that
>> >>> we are thin on resources and so, in principle, symbiotic relationships
>> >>> are
>> >>> helpful.
>> >>>
>> >>> http://www.documentcloud.org/home
>> >>>
>> >>> John
>> >>>
>> >>> Sent from my iPad
>> >>>
>> >>> On 13 Apr 2012, at 17:03, Carl Hall <c...@hallwaytech.com> wrote:
>> >>>
>> >>> I agree with Daniel that our modifications to the preview processor
>> >>> have
>> >>> put its ownership square on us. Was there a community that this script
>> >>> was
>> >>> borrowed from? I thought it was original development that uses various >> >>> external libraries to do the actual work. This is the approach that
>> >>> Erik is
>> >>> taking with the rewrite using things like Tika (text extraction),
>> >>> Sanselan
>> >>> (images) and a Java port of the python topia.termextract library.
>> >>>
>> >>> I certainly don't deny the speed of development that was realized in >> >>> creating the PP but the current state of the code is a mess at best.
>> >>> Reuse
>> >>> of libraries in Java is showing a fast rewrite with very little
>> >>> managed code
>> >>> on our part.
>> >>>
>> >>>
>> >>> On Fri, Apr 13, 2012 at 12:50 AM, Daniel Parry
>> >>> <dan...@caret.cam.ac.uk>
>> >>> wrote:
>> >>>>
>> >>>> On Thu, Apr 12, 2012 at 04:21:36PM -0400, Clay Fenlason wrote: >> >>>> > I think this response is at best orthogonal to the point John's
>> >>>> > trying
>> >>>> > to raise, though I gather this kind of reaction must come from a >> >>>> > buildup of some real frustration around the PP, which I don't mean
>> >>>> > to
>> >>>> > discount. I also think John was pretty clear about what he was >> >>>> > suggesting: that there be a conversation with the community we got
>> >>>> > the
>> >>>> > PP from, if the conversation hasn't happened already, to see if
>> >>>> > there
>> >>>> > might still be a way to work together before we decide to just own
>> >>>> > it
>> >>>> > ourselves.
>> >>>>
>> >>>> I'd suggest the way that the preview processor was being extended
>> >>>> (initially a
>> >>>> python server add on, followed by a ruby rewrite for tag extraction)
>> >>>> and
>> >>>> the
>> >>>> variety of ruby versions that deployers were using and the methods
>> >>>> used
>> >>>> to
>> >>>> deploy it were indicative of a) the OAE community already 'owning'
>> >>>> the PP
>> >>>> and b)
>> >>>> as has already been pointed out some standardization needed restoring
>> >>>> and
>> >>>> additional functionality added for deployers. Hence, the list was
>> >>>> pinged[0] a
>> >>>> while back to ask about standardizing and extending in java. I'm not
>> >>>> sure
>> >>>> of any
>> >>>> other way to contact the original PP community or if such a community
>> >>>> separate
>> >>>> to OAE even still exists?
>> >>>>
>> >>>> Best wishes,
>> >>>>
>> >>>> Daniel
>> >>>>
>> >>>> [0]
>> >>>>
>> >>>> http://collab.sakaiproject.org/pipermail/oae-dev/2012-April/001677.html
>> >>>>
>> >>>> --
>> >>>> --| Daniel Parry: dan...@caret.cam.ac.uk. www.caret.cam.ac.uk/ |--
>> >>>> "Of all the things a leader should fear, complacency should
>> >>>>  head the list." [John C. Maxwell]
>> >>>> _______________________________________________
>> >>>> oae-dev mailing list
>> >>>> oae-dev@collab.sakaiproject.org
>> >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> oae-dev mailing list
>> >>> oae-dev@collab.sakaiproject.org
>> >>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> oae-dev mailing list
>> >> oae-dev@collab.sakaiproject.org
>> >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>> >>
>> > _______________________________________________
>> > oae-dev mailing list
>> > oae-dev@collab.sakaiproject.org
>> > http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>> > Charles Sturt University
>> >
>> > | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN | MELBOURNE |
>> > ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA |
>> >
>> > LEGAL NOTICE
>> > This email (and any attachment) is confidential and is intended for the >> > use of the addressee(s) only. If you are not the intended recipient of this >> > email, you must not copy, distribute, take any action in reliance on it or >> > disclose it to anyone. Any confidentiality is not waived or lost by reason >> > of mistaken delivery. Email should be checked for viruses and defects before >> > opening. Charles Sturt University (CSU) does not accept liability for >> > viruses or any consequence which arise as a result of this email >> > transmission. Email communications with CSU may be subject to automated >> > email filtering, which could result in the delay or deletion of a legitimate >> > email before it is read at CSU. The views expressed in this email are not
>> > necessarily those of CSU.
>> >
>> > Charles Sturt University in Australia http:// www.csu.edu.au The >> > Chancellery, Panorama Avenue, Bathurst NSW Australia 2795 ABN: 83 878 708 >> > 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B (ACT)
>> >
>> > Charles Sturt University in Ontario http://www.charlessturt.ca 860 >> > Harrington Court, Burlington Ontario Canada L7N 3N4 Registration:
>> > www.peqab.ca
>> >
>> > Consider the environment before printing this email.
>
>

_______________________________________________
oae-dev mailing list
oae-dev@collab.sakaiproject.org
http://collab.sakaiproject.org/mailman/listinfo/oae-dev


_______________________________________________
oae-dev mailing list
oae-dev@collab.sakaiproject.org
http://collab.sakaiproject.org/mailman/listinfo/oae-dev


_______________________________________________
oae-dev mailing list
oae-dev@collab.sakaiproject.org
http://collab.sakaiproject.org/mailman/listinfo/oae-dev


_______________________________________________
oae-dev mailing list
oae-dev@collab.sakaiproject.org
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

_______________________________________________
oae-dev mailing list
oae-dev@collab.sakaiproject.org
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Reply via email to