Hi Erik,

How about trying to turn this work into an official sprint deliverable  
(co-ordinated with Kent)? Do you think you can free up some time for  
the next sprint (August 16 - August 30) and will that give us enough  
time to do remaining implementation, some testing and bug fixing?

Thanks,
Nicolaas



On 25 Jul 2012, at 00:18, Erik Froese wrote:

> I'm happy to let the Java PP testing slide to 1.5.0
>
> There are some recent improvements in the ruby PP that I need to  
> implement.
> * sakaidocs - (easy, call out to wkhtmltopdf)
> * image previews in the same format as the original
>
> Erik
>
> On Tue, Jul 24, 2012 at 10:18 AM, Kent Fitzgerald  
> <kentf...@umich.edu> wrote:
>> Several questions/comments.
>> There has already been  1.4.1. release proposed for immediately  
>> following
>> 1.4.0 that would be isolated to code reformatting . Which would take
>> precedence?
>>
>> We should definitely do a bug bash. One of the dangers of doing a  
>> bug bash
>> focused on the preview processor is that we'll likely have people  
>> uploading
>> hundreds of files each. Subjectively, this could give the  
>> impression of
>> decreased performance just because we're hitting it much harder.
>>
>> More importantly, in addition to the bug bash, we need to do  
>> controlled
>> tests on processing time on different data types. I'd like to break  
>> it down
>> by file types and have truly controlled tests, in addition to  
>> different file
>> types we'll need files of varying  sizes to compare performance not  
>> just on
>> quantity but on complexity. This needs to be compared to the  
>> performance of
>> the current implementation.
>>
>> I think we all agree that this is an important feature that we  
>> shouldn't try
>> to rush out the door.
>>
>> I have to read back through the thread, but is there set-up  
>> documentation?
>> Currently we have a section on the OAE Configuration and Deployment  
>> page [1]
>> for the preview processor. It's contains multiple supporting  
>> external links
>> that have proven confusing for many people trying to get preview  
>> processor
>> running locally. We'll need to make sure we have adequate  
>> documentation.
>>
>> As a side note, I will be out of the office starting this Friday  
>> through
>> next week.
>>
>>
>> [1]
>> https://confluence.sakaiproject.org/display/3AK/OAE+Configuration+and+Deployment
>>
>>
>>
>> --
>> Kent Fitzgerald
>>
>> On Tuesday, July 24, 2012 at 9:51 AM, Nicolaas Matthijs wrote:
>>
>> Looks like this has been hanging around on list for a while now,  
>> and we
>> should probably try to move it forwards.
>>
>> The maintainability criterion can only be determined by a code  
>> review, which
>> is standard practice. However, as this is proving to be such a  
>> critical
>> feature in production, I'd suggest that we do a separate bugbash to  
>> evaluate
>> its performance, ease of setup (running from a separate machine)  
>> and most
>> importantly functional equivalence.
>>
>> When doing this, Kent can give his assessment of the ease of setup  
>> and the
>> bugbashers can determine functional equivalence. We should also try  
>> to have
>> it re-process the dummy content we usually bugbash with.
>>
>> If this all sounds good, I'd like to go ahead with this as soon as  
>> possible
>> and run a bugbash straight after the 1.4.0 release with all of this  
>> set up.
>> If the implementation survives the bugbash, it can be reviewed and  
>> merged.
>>
>> Does that sound reasonable?
>>
>> Thanks,
>> Nicolaas
>>
>>
>>
>> On 23 Jul 2012, at 07:42, Carl Hall wrote:
>>
>> Lance, I think the work is already split the way you suggest given  
>> what I
>> know about what Erik has done (rewrite in Java) and what's left  
>> (add JMS).
>> Adding message queue capabilities should not hold back reviewing the
>> proposed changes.
>>
>> I would say that it needs to meet these opening criteria for my  
>> general
>> acceptance:
>>
>> * Be functionally equal with the current solution
>> * A combination of performance and maintainability
>>   * Perform can be no worse overall. There might be different  
>> hotspots in
>> the java version than the current ruby solution but there shouldn't  
>> be
>> anything exponentially worse. Overall, the java version has to  
>> perform at
>> least as good and hopefully better. Memory usage, overall  
>> processing time,
>> resource usage (iops, disc reads, caching) should all be considered.
>>   * Be more maintainable than the Ruby solution. The current code  
>> has had
>> very little cleaning and is not very readable. This includes using
>> externally available libraries where possible. We shouldn't be  
>> maintaining
>> plumbing not inherent to our domain.
>> * Easier to setup. Though our current setup for the ruby PP is  
>> known to be
>> problematic, we at least are accustomed to it. The proposed  
>> solution has got
>> to be more straightforward and less fragile.
>>
>> The numbers I've seen from some preliminary testing showed the Java  
>> impl to
>> take exponentially *less* time to process pdfs and was faster than  
>> the ruby
>> PP in every test. It's an OSGi bundle and written in Java like the  
>> rest of
>> our project which makes it easier to setup and maintain as we write  
>> far more
>> java code than ruby. I believe there's also already a setup  
>> available to run
>> the java PP as a standalone server.
>> The Java version introduces a topia term extractor bundle which is  
>> a port
>> from the Python version. This is a point of maintenance to consider  
>> but the
>> python code has changed in years. It's a common impl for other  
>> languages to
>> port but there wasn't a java version around. I would like to see  
>> this code
>> find a permanent home in a relative OSS project. At the very least  
>> it should
>> be maintained apart from OAE core to make it available to a broader
>> audience.
>>
>> +1 to getting this code wrapped up and reviewed.
>>
>> On Wed, Jul 18, 2012 at 1:51 PM, Christian Vuerings
>> <vueringschrist...@gmail.com> wrote:
>>
>> I'm not sure whether this is already part of the criteria list or  
>> not, but
>> what about CPU/Memory usage?
>> Is there a way we can measure that and compare it to the current  
>> ruby based
>> PP?
>> When I currently run the ruby PP locally, it's usually one of the  
>> processes
>> that uses the most resources.
>>
>> One other thing I'm curious about is how well it will compress/ 
>> handle the
>> different file formats (png/jpg/gif/psd)
>>
>> These are just 2 things that I'm interested in since they (can)  
>> have an
>> impact on the overall performance.
>>
>>
>> - Christian
>>
>> On Jul 18, 2012, at 12:41 PM, Lance Speelmon wrote:
>>
>> Does anyone have an opinion about adopting the new java based PP?
>> Specifically can you articulate acceptance criteria for such an  
>> adoption?
>> e.g.
>>
>> Must support same preview behaviors as existing ruby-based PP.
>> Must pass QA with all blocker and critical items resolved.
>> Must start automatically OOTB to support the tire-kicking, web- 
>> start uses.
>> Must leverage as much 3rd party code as possible to minimize  
>> ownership
>> costs.
>> Must pass code review.
>> Unit test code coverage.
>> Basic config and deployment documentation.
>>
>>
>> What is missing?  Anything?  Thanks, L
>>
>>
>>
>> On Jul 17, 2012, at 2:58 PM, Lance Speelmon <la...@rsmart.com> wrote:
>>
>> Is there any way to break this work down into chunks?  e.g.
>>
>> 1. Adopt java PP as default PP moving forward. What are the  
>> acceptance
>> criteria?
>> 2. Enhance new java PP with message queue abilities.
>>
>> WDYT?  Thanks, L
>>
>> On Jul 17, 2012, at 8:34 AM, Carl Hall <c...@hallwaytech.com> wrote:
>>
>> Each app server could run it's own queues but that wouldn't support  
>> building
>> a farm of PP processors unless we also teach them to talk to  
>> multiple JMS
>> servers. Maybe something like DNS round-robin would suffice?
>>
>> On Tue, Jul 17, 2012 at 8:25 AM, Erik Froese  
>> <erik.fro...@gmail.com> wrote:
>>
>> Do we need to cluster activemq? Can't each app server service its own
>> queues?
>> Erik
>>
>> On Tue, Jul 17, 2012 at 11:23 AM, Carl Hall <c...@hallwaytech.com>  
>> wrote:
>>> What Erik describes has been on the dev wish list for a little  
>>> while now.
>>> Moving to an event-driven model would allow us to build out  
>>> concurrency
>>> but
>>> there also comes the question of clustering ActiveMQ.
>>>
>>>
>>> On Thu, Jul 12, 2012 at 6:27 AM, Erik Froese <erik.fro...@gmail.com>
>>> wrote:
>>>>
>>>> Hey David,
>>>>
>>>> The code is not clustered.
>>>>
>>>> You'd need to write an event listener that would fire when new  
>>>> content
>>>> is uploaded. It would put the content ids on a JMS queue. Then
>>>> implement a ContentFetcher that grabs a message off of the queue  
>>>> and
>>>> wire that into the PPI. Events and Messages are not clustered in  
>>>> OAE
>>>> (AFAIK) so this would have to be run on each app server.
>>>>
>>>> While we're in event-land it'd be nice to be able to regenerate a
>>>> preview when a content body is updated. I'm not sure if this is
>>>> possible yet.
>>>>
>>>> I'm not sure how we'd limit the CPU usage yet either. You could  
>>>> manage
>>>> the quartz schedule that runs the PPI.
>>>>
>>>> We can also disable concurrent executions of the job.
>>>>
>>>> Erik
>>>>
>>>> On Wed, Jul 11, 2012 at 8:44 PM, Roma, David <dr...@csu.edu.au>  
>>>> wrote:
>>>>> Awesome news Erik!
>>>>>
>>>>> Our Ops guys will be stoked when we can get this in.. A couple of
>>>>> questions from someone who hasn't looked at the code or read too
>>>>> deeply....
>>>>> - Does it support clustering
>>>>>        -e.g. can we just run it side by side on each of our app
>>>>> servers
>>>>> and they will play nice sharing out processing jobs?
>>>>>        -will it affect performance of the app servers much? Can we
>>>>> limit the preview processor to say 10%cpu and 500mb ram or low  
>>>>> priority
>>>>> threads or limit the number of items to process or something? This
>>>>> would
>>>>> make for a nice simple deployment that wouldn't threaten the app  
>>>>> server
>>>>> stability.
>>>>>
>>>>> Cheers,
>>>>> Dave.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: oae-dev-boun...@collab.sakaiproject.org
>>>>> [mailto:oae-dev-boun...@collab.sakaiproject.org] On Behalf Of Erik
>>>>> Froese
>>>>> Sent: Thursday, 12 July 2012 2:37 AM
>>>>> To: Carl Hall
>>>>> Cc: oae-dev@collab.sakaiproject.org; Clay Fenlason
>>>>> Subject: Re: [oae-dev] Moving the preview processor to java
>>>>>
>>>>> Hey everyone,
>>>>>
>>>>> Its been a few months but I actually implemented the Java preview
>>>>> processor as an OSGi bundle. I filed a ticket for it [1]
>>>>>
>>>>> I'm not sure where to go from here. Is this something that could  
>>>>> be
>>>>> included POST 1.4.0?
>>>>> Should I open a PR so we can review the code? If so, PR against  
>>>>> which
>>>>> branch?
>>>>>
>>>>> Either way, have a look, give it a go. We'll probably wind up  
>>>>> using it
>>>>> at rSmart.
>>>>>
>>>>> Erik
>>>>>
>>>>> [1] https://jira.sakaiproject.org/browse/KERN-3021
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 17, 2012 at 9:09 AM, Carl Hall <c...@hallwaytech.com>
>>>>> wrote:
>>>>>> I totally agree that we should ally ourselves with other  
>>>>>> communities.
>>>>>> I
>>>>>> see
>>>>>> where we get docsplit from DocumentCloud[1] and we use several  
>>>>>> other
>>>>>> libraries for processing that they've most likely contributed to.
>>>>>> The Java approach is very little custom code compared to the  
>>>>>> libraries
>>>>>> we're
>>>>>> getting from Apache (tika, sanselan, commons, pdfbox), so we  
>>>>>> would
>>>>>> still
>>>>>> building on the shoulders of our friendly community giants.
>>>>>>
>>>>>> 1 https://github.com/documentcloud/docsplit
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Apr 14, 2012 at 5:43 AM, John Norman <j...@caret.cam.ac.uk 
>>>>>> >
>>>>>> wrote:
>>>>>>>
>>>>>>> My recollection (perhaps wrong) is that  we got this from  
>>>>>>> Document
>>>>>>> Cloud
>>>>>>> and I /think/ Chris Roby found it. Document Cloud seems a very
>>>>>>> relevant and
>>>>>>> valuable project. If we were able to help them while helping
>>>>>>> ourselves,
>>>>>>> other good things could come from the relationship. My general  
>>>>>>> point
>>>>>>> is that
>>>>>>> we are thin on resources and so, in principle, symbiotic
>>>>>>> relationships
>>>>>>> are
>>>>>>> helpful.
>>>>>>>
>>>>>>> http://www.documentcloud.org/home
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>> Sent from my iPad
>>>>>>>
>>>>>>> On 13 Apr 2012, at 17:03, Carl Hall <c...@hallwaytech.com>  
>>>>>>> wrote:
>>>>>>>
>>>>>>> I agree with Daniel that our modifications to the preview  
>>>>>>> processor
>>>>>>> have
>>>>>>> put its ownership square on us. Was there a community that this
>>>>>>> script
>>>>>>> was
>>>>>>> borrowed from? I thought it was original development that uses
>>>>>>> various
>>>>>>> external libraries to do the actual work. This is the approach  
>>>>>>> that
>>>>>>> Erik is
>>>>>>> taking with the rewrite using things like Tika (text  
>>>>>>> extraction),
>>>>>>> Sanselan
>>>>>>> (images) and a Java port of the python topia.termextract  
>>>>>>> library.
>>>>>>>
>>>>>>> I certainly don't deny the speed of development that was  
>>>>>>> realized in
>>>>>>> creating the PP but the current state of the code is a mess at  
>>>>>>> best.
>>>>>>> Reuse
>>>>>>> of libraries in Java is showing a fast rewrite with very little
>>>>>>> managed code
>>>>>>> on our part.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 13, 2012 at 12:50 AM, Daniel Parry
>>>>>>> <dan...@caret.cam.ac.uk>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Thu, Apr 12, 2012 at 04:21:36PM -0400, Clay Fenlason wrote:
>>>>>>>>> I think this response is at best orthogonal to the point  
>>>>>>>>> John's
>>>>>>>>> trying
>>>>>>>>> to raise, though I gather this kind of reaction must come  
>>>>>>>>> from a
>>>>>>>>> buildup of some real frustration around the PP, which I  
>>>>>>>>> don't mean
>>>>>>>>> to
>>>>>>>>> discount. I also think John was pretty clear about what he was
>>>>>>>>> suggesting: that there be a conversation with the community  
>>>>>>>>> we got
>>>>>>>>> the
>>>>>>>>> PP from, if the conversation hasn't happened already, to see  
>>>>>>>>> if
>>>>>>>>> there
>>>>>>>>> might still be a way to work together before we decide to  
>>>>>>>>> just own
>>>>>>>>> it
>>>>>>>>> ourselves.
>>>>>>>>
>>>>>>>> I'd suggest the way that the preview processor was being  
>>>>>>>> extended
>>>>>>>> (initially a
>>>>>>>> python server add on, followed by a ruby rewrite for tag  
>>>>>>>> extraction)
>>>>>>>> and
>>>>>>>> the
>>>>>>>> variety of ruby versions that deployers were using and the  
>>>>>>>> methods
>>>>>>>> used
>>>>>>>> to
>>>>>>>> deploy it were indicative of a) the OAE community already  
>>>>>>>> 'owning'
>>>>>>>> the PP
>>>>>>>> and b)
>>>>>>>> as has already been pointed out some standardization needed
>>>>>>>> restoring
>>>>>>>> and
>>>>>>>> additional functionality added for deployers.  Hence, the  
>>>>>>>> list was
>>>>>>>> pinged[0] a
>>>>>>>> while back to ask about standardizing and extending in java.  
>>>>>>>> I'm not
>>>>>>>> sure
>>>>>>>> of any
>>>>>>>> other way to contact the original PP community or if such a
>>>>>>>> community
>>>>>>>> separate
>>>>>>>> to OAE even still exists?
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>>
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>> [0]
>>>>>>>>
>>>>>>>>
>>>>>>>> http://collab.sakaiproject.org/pipermail/oae-dev/2012-April/001677.html
>>>>>>>>
>>>>>>>> --
>>>>>>>> --| Daniel Parry: dan...@caret.cam.ac.uk.  
>>>>>>>> www.caret.cam.ac.uk/ |--
>>>>>>>> "Of all the things a leader should fear, complacency should
>>>>>>>> head the list." [John C. Maxwell]
>>>>>>>> _______________________________________________
>>>>>>>> oae-dev mailing list
>>>>>>>> oae-dev@collab.sakaiproject.org
>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> oae-dev mailing list
>>>>>>> oae-dev@collab.sakaiproject.org
>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> oae-dev mailing list
>>>>>> oae-dev@collab.sakaiproject.org
>>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>>>>>
>>>>> _______________________________________________
>>>>> oae-dev mailing list
>>>>> oae-dev@collab.sakaiproject.org
>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>>>> Charles Sturt University
>>>>>
>>>>> | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN |  
>>>>> MELBOURNE |
>>>>> ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA |
>>>>>
>>>>> LEGAL NOTICE
>>>>> This email (and any attachment) is confidential and is intended  
>>>>> for the
>>>>> use of the addressee(s) only. If you are not the intended  
>>>>> recipient of
>>>>> this
>>>>> email, you must not copy, distribute, take any action in  
>>>>> reliance on it
>>>>> or
>>>>> disclose it to anyone. Any confidentiality is not waived or lost  
>>>>> by
>>>>> reason
>>>>> of mistaken delivery. Email should be checked for viruses and  
>>>>> defects
>>>>> before
>>>>> opening. Charles Sturt University (CSU) does not accept  
>>>>> liability for
>>>>> viruses or any consequence which arise as a result of this email
>>>>> transmission. Email communications with CSU may be subject to  
>>>>> automated
>>>>> email filtering, which could result in the delay or deletion of a
>>>>> legitimate
>>>>> email before it is read at CSU. The views expressed in this  
>>>>> email are
>>>>> not
>>>>> necessarily those of CSU.
>>>>>
>>>>> Charles Sturt University in Australia  http://www.csu.edu.au  The
>>>>> Chancellery, Panorama Avenue, Bathurst NSW Australia 2795  ABN:  
>>>>> 83 878
>>>>> 708
>>>>> 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B  
>>>>> (ACT)
>>>>>
>>>>> Charles Sturt University in Ontario  http://www.charlessturt.ca  
>>>>> 860
>>>>> Harrington Court, Burlington Ontario Canada L7N 3N4  Registration:
>>>>> www.peqab.ca
>>>>>
>>>>> Consider the environment before printing this email.
>>>
>>>
>>
>>
>> _______________________________________________
>> oae-dev mailing list
>> oae-dev@collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>
>>
>>
>> _______________________________________________
>> oae-dev mailing list
>> oae-dev@collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>
>>
>>
>> _______________________________________________
>> oae-dev mailing list
>> oae-dev@collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>
>>
>> _______________________________________________
>> oae-dev mailing list
>> oae-dev@collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>
>>
>> _______________________________________________
>> oae-dev mailing list
>> oae-dev@collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>
>>
>>
>> _______________________________________________
>> oae-dev mailing list
>> oae-dev@collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>

_______________________________________________
oae-dev mailing list
oae-dev@collab.sakaiproject.org
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Reply via email to