Hi I think in the past there has been some threads talking about this, how to speedup such use-case. Not sure how easy it is to search for those. For example use nabble or markmail to search in the archives.
On Fri, Nov 11, 2016 at 8:20 AM, Zoran Regvart <zo...@regvart.com> wrote: > Hi Christian, > I was solving the exact same problem few years back, here is what I > did: I've created a custom @Handler that performs the JDBC query, the > purpose of which was to return Iterator over the records. The > implementation of the handler used springjdbc-iterable[1] to stream > the rows as they were consumed by another @Handler that took the > Iterator from the body and wrote line item by item using BeanIO. > > On a more recent project I had PostgreSQL as the database and could > use the CopyManager[2] that proved to be very performant, perhaps your > database the same functionality you can use. > > So basically custom coded the solution. > > zoran > > [1] https://github.com/apache/cxf > [2] > https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html > > On Thu, Nov 10, 2016 at 10:01 PM, Christian Jacob <cjaco...@aol.com> wrote: >> Hi there,my task is to execute a JDBC query against a Hive database and >> produce rows in csv files. The clue is, that depending on the query >> criteria, the number of range from some dozens to some millions. My first >> solution was something like this: >> from ("...").to ("sql:...") // produces a List<Map<String, >> Object>>.split(body()).process(myProcessor) // produces a single row for the >> csv file.to("file:destination?fileExists=Append"); >> This was awful slow because the file producer opens the file, appends one >> single row, and closes it again.I found some posts how to use an Aggregator >> before sending the content to the file producer. This really was the desired >> solution, and the performance was satisfying. In this solution, the >> aggregator holds the total content of the csv file to be produced. >> Unfortunately, the files can be so large that I get stuck in "java gc >> overhead limit exceeded" exceptions. No matter how high I set the heap >> space, I have no chance to avoid this.Now I'm looking for a way how to get >> out of this, and I don't know how. My ideas are: >> Use a splitter that produces a sublist - I don't know how I could do it >> Use an aggregator that does not produce the total content for the files to >> be created, but only for example 1000 lines and then collects the next block >> - I don't know it here either >> Or maybe someone has a better idea...Kind regards,Christian >> >> >> >> -- >> View this message in context: >> http://camel.465427.n5.nabble.com/Processing-VERY-large-result-sets-tp5790018.html >> Sent from the Camel - Users mailing list archive at Nabble.com. > > > > -- > Zoran Regvart -- Claus Ibsen ----------------- http://davsclaus.com @davsclaus Camel in Action 2: https://www.manning.com/ibsen2