Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
the requirement and will end up in much more complicated solution - IMO.

Best,
Christian

On Fri, Nov 9, 2012 at 6:57 PM, <[email protected]> wrote:

>  You may also want to check out Hadoop and map reduce
>
>
>
> http://camel.apache.org/hdfs.html
>
>
>
> with respect to point a and b.
>
>
>
> You can have an index on the record and the “reduce” job can serialize on
> the index.
>
>
>
> *From:* Gonzalo Vasquez [mailto:[email protected]]
> *Sent:* Friday, November 09, 2012 10:16 PM
> *To:* [email protected]
> *Subject:* Re: Camel performance tuning
>
>
>
> Thanks for your answer, my comments:
>
>
>
> a) a 5M file could be loaded into memory, but I have streaming enabled as
> file size could be in the range of GB. Notwithstanding, I'll check what
> Hypersonic & Mongo are, as I'm not aware of them.
>
> b) Parallel processing is set to false, because records must preserve
> order on the output file
>
> c) Don't see the point here
>
> d) See a)
>
> e) what about async processing? There's no "long running process" here
>
>
>
> Thanks again.-
>
>
>
> *Gonzalo Vásquez Sáez*
>
> *Gerente Investigación y Desarrollo (R&D)*
> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> *[email protected] <[email protected]>l*
>
> *http://www.altiuz.cl*
>
>
>
>
>
>
>
> El 09-11-2012, a las 13:12, <[email protected]> escribió:
>
>
>
>   I am really new to Camel but here are some options you can try
>
>
>
> a)      Can you load the 5 MB file to memory before splitting it ? That
> way IO will not be a problem. Probably put it in something like Hypersonic
> or Mongo
>
> b)      Why is parallel  processing false ? Are the records related to
> each other ? If true you can take advantage of multicore
>
> c)       Is it possible to first split the files into chunks and then use
> process the chunks independently ?
>
> d)      Can you write into memory and flush at once ?
>
> e)      Sync/Asynch : http://camel.apache.org/async.html
>
>
>
> *From:* Gonzalo Vasquez [mailto:[email protected]]
> *Sent:* Friday, November 09, 2012 8:32 PM
> *To:* [email protected]
> *Subject:* Camel performance tuning
>
>
>
> I'm running a route that basically adds a character per line to a plain
> text file, but it's taking to long, and it seems that it's due to some kind
> of buffering issue when reading/writing from disk.
>
>
>
> I'm processing a 5MB file (attached as DC_FACCL132_0000
> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
> template (also attached).
>
>
>
> It's taking for ever to process such a file, I understand I'm tokenizing
> on line breaks, which could be the source of the problem as there are many
> lines in the file (48198 exactly), but when running jvisualvm (see attached
> images/snapshot)I can see the writing op is invoked 20386 times, which seem
> not related to the line count. Is there an output buffer size that I can
> configure? Or something like that?
>
>
>
> This is the route:
>
> <camel:route id="pager" autoStartup="true">
>
> <camel:from
>
> uri="
> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
> " />
>
> <camel:split streaming="true" parallelProcessing="false">
>
> <camel:tokenize token="\n" />
>
> <camel:to uri="bean:pager" />
>
> <camel:to
>
> uri="
> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
> " />
>
> </camel:split>
>
> </camel:route>
>
>
>
> This is the referenced bean:
>
>
>
> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>
> <property name="xsltPath"
>
> value=
> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>  />
>
> <property name="param" value="C.*PAG.* 1" />
>
> </bean>
>
>
>
> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
> isn't a platform dependent problem, but a configuration one.
>
>
>
> Any ideas? Any thing else that I should send?
>
>
>
> Thanks!
>
>
>
> *Gonzalo Vásquez Sáez*
>
> *Gerente Investigación y Desarrollo (R&D)*
> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> *[email protected] <[email protected]>l*
>
> *http://www.altiuz.cl*
>
>
>
>
>
>        This e-mail and any files transmitted with it are for the sole use
> of the intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
>
>
>  This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
>



--

Reply via email to