Re: Best Strategy to process a large number of rows in File

2016-04-27 Thread Brad Johnson
"When I log a message that contains 1000 lines I see line\rline2\rline3\r...line999 The split works when token is \\r. Why?" I'm not positive but it appears that something in the marshaling is escaping the backslash on the \r so when it writes it out you get \\r but I'm not sure. You could try

Re: Best Strategy to process a large number of rows in File

2016-04-26 Thread Michele
Hi Brad, as you suggested and I split big files into chunks of 1000 lines changing my route like this:

Re: Best Strategy to process a large number of rows in File

2016-04-19 Thread Brad Johnson
You may want to tell it to chunk the file then. That's something you can add to the splitter/tokenizer. On Mon, Apr 18, 2016 at 4:12 PM, Michele wrote: > Hi, > > I have different mediations. Each mediation handles an incoming file type > (csv, txt delimited

Re: Best Strategy to process a large number of rows in File

2016-04-18 Thread Michele
Hi, I have different mediations. Each mediation handles an incoming file type (csv, txt delimited or fixed length) during the prorcessing. In same csv and txt file Tokenizer works fine with \n or \r\n or \r. Few minutes ago, I found solution for a txt delimited file adding a charset=iso-8859-1

Re: Best Strategy to process a large number of rows in File

2016-04-18 Thread Brad Johnson
If that doesn't work for you there is another way I use when I have to read complex files that aren't simple one line = one record but it isn't really necessary if you are simply reading in a CSV or fixed width file with only single lines. I don't know what your record format or Beanio looks

Re: Best Strategy to process a large number of rows in File

2016-04-18 Thread Brad Johnson
The tokenization may require a different line ending - \r or \n or \r\n for example. The file reader has to understand what it is parsing. I take it when you use that splitter it with the tokenization it was reading the whole file in one big slurp and never finding a line ending and you ended up

Re: Best Strategy to process a large number of rows in File

2016-04-18 Thread Michele
Hi Brad, first of all thank you very much for the time you dedicate me. Are you getting the entire file in memory? I think so. I thought BeanIO worked in lazy mode... A question, I noticed that in some file Splitter doesn't work with but It required a convertBodyTo

Re: Best Strategy to process a large number of rows in File

2016-04-16 Thread Brad Johnson
At this point in your code: Are you getting the entire file in memory? That's going to be big so maybe not what you want. It may be livable however. ActiveMQ can handle a lot of throughput but remember that if you are using persistent queues you necessarily lose several orders of magnitude

Re: Best Strategy to process a large number of rows in File

2016-04-16 Thread Brad Johnson
At this point in your code: Are you getting the entire file in memory? That's going to be big so maybe not what you want. It may be livable however. ActiveMQ can handle a lot of throughput but remember that if you are using persistent queues you necessarily lose several orders of magnitude

Re: Best Strategy to process a large number of rows in File

2016-04-16 Thread Michele
Hi, as Ranx suggested, I tried with chain routes and attached screenshot of memory usage seda-memory-usage.png . it is evident that there is a net improvement, on average 600 MB memory usage. The chain of routes configured

Re: Best Strategy to process a large number of rows in File

2016-04-15 Thread Brad Johnson
I suspect the biggest problem you may be having here is that the file isn't truly streaming but getting slurped into memory as a whole. I don't know that for certain but that test of two queues should show you that. That might be throwaway code or it might be the basis for a next implementation.

Re: Best Strategy to process a large number of rows in File

2016-04-15 Thread Michele
How big is that object on the queue? As optimization, I changed generic LinkedHashMap in a bean SerialNumber that has this properties public class SerialNumber implements Serializable { private static final long serialVersionUID = -9067015746823050409L; private Date arg1;

Re: Best Strategy to process a large number of rows in File

2016-04-15 Thread Brad Johnson
The easiest way to think about SEDA queues is to recognize why they were created and I think that will help you decompose such problems. Staged Event Driven Architecture. Initially they were invented for auto management of threads on queues where one stage that is getting too few threads could

Re: Best Strategy to process a large number of rows in File

2016-04-15 Thread Jens Breitenstein
Hi Michele Reading a CSV with 40k lines using camel in streaming takes a view seconds. As you limit the queue-size to avoid OOM the entire performance depends how fast you can empty the queue. How long does processing of ONE message take in average? To me it looks like approximately 1.6 secs

RE: Best Strategy to process a large number of rows in File

2016-04-15 Thread Michele
Hi, I spent a bit of time reading different topics on this issue, and I changed my route like this reducing the memory usage of about 300Mb: ${date:now:MMdd-HHmmss} CBIKIT_INBOUND_${in.header.ImportDateTime}

RE: Best Strategy to process a large number of rows in File

2016-04-13 Thread Michele
Hi, With Memory Analyzer but also with Visual VM, problems suspect found: The thread java.lang.Thread @ 0x783481d90 Camel (IF_CBIKIT-Inbound-Context) thread #28 - seda://processAndStoreInQueue keeps local variables with total size 54.615.096 (11,93%) byte 1.212 instances of "byte[]", loaded by

RE: Best Strategy to process a large number of rows in File

2016-04-13 Thread Siano, Stephan
Hi, The camel-csv component converts the file into a List, Therefore your 20 MB file is taking up 40 MB as a String. The List will take up a little more than another 40 MB (so you will need 80 MB during the conversion). This lists lives throughout your split process... What you could do to

RE: Best Strategy to process a large number of rows in File

2016-04-13 Thread Michele
Hi, In this case file size is 20.424KB and it contains about 4 lines. I used the convertTo to avoid Rollback on rename file (GenericFileOperationFailedException - only Windows). The File is a cvs and so I defined CSV Data format like this (Transforms each line in a map key-value)

RE: Best Strategy to process a large number of rows in File

2016-04-13 Thread Siano, Stephan
Hi, How big is your document? You are converting that thing into a String, which is generally unadvisable for large content (as it will consume twice the file size in memory). The second question is what your ReaderDataFormat does with the data. Does it copy it again or does it support

Re: Best Strategy to process a large number of rows in File

2016-04-13 Thread Michele
Hi, I'm here again because I don't resolved my problem. After several checks, I noticed this: memory-usage.png Why does thread related to seda://processAndStoreInQueue consume much memory? How to optimize memory usage? This

Re: Best Strategy to process a large number of rows in File

2016-03-31 Thread Ranx
Cool. It's good to see you're able to chip away at the problem. This is another section you may want to look at as well. Christian has good explanation of it: https://dzone.com/articles/activemq-understanding-memory This is why when I'm starting on proof of concepts, spikes or protos I tend

Re: Best Strategy to process a large number of rows in File

2016-03-31 Thread Michele
Hi, yesterday I worked on JMS Configuration and following your suggestion I changed it like this avoiding OOM

Re: Best Strategy to process a large number of rows in File

2016-03-30 Thread Ranx
Are you eventually going to separate these queues in different locations? One thing that's happening when you use JMS this way is you read the records in, and then you immediately write them back out over a socket and then read them back in. Those all involve duplication of memory. Since you

Re: Best Strategy to process a large number of rows in File

2016-03-30 Thread Michele
Hi, yes... it is clear ;)! I changed configuration of Heap and Perm progressively as you suggested like this: -server -Xms1024M -Xmx2048M -XX:PermSize=256M -XX:MaxPermSize=512M -Xss512M -XX:+HeapDumpOnOutOfMemoryError -verbose:gc -Xloggc:gc.log ... The number of rows processed has increased,

Re: Best Strategy to process a large number of rows in File

2016-03-30 Thread mailingl...@j-b-s.de
Hi Michele! Your mem settings look odd to mee. > -server -Xms256M -Xss512M -Xmx512M -XX:+UnlockDiagnosticVMOptions > -XX:+UnsyncloadClass -XX:PermSize=512M -XX:MaxPermSize=1024M Your Perm Space is greater than your heap? Sure? I doubt you need 1G set your heap to 1G and Perm to 256m as a

Re: Best Strategy to process a large number of rows in File

2016-03-30 Thread Michele
Hi all, first of all, thanks so much for your support. I configured routing with SEDA Component configured like this: I declared Bean with queueSize and then 5

Re: Best Strategy to process a large number of rows in File

2016-03-30 Thread Michele
I forgot this link http://activemq.apache.org/what-is-the-prefetch-limit-for.html and so I changed the consumer endpoint like this Thanks again. Regards Michele -- View this message in context:

Re: Best Strategy to process a large number of rows in File

2016-03-29 Thread Ranx
I think you're hitting a lot of good points there. I'm not used to CSVs with 100 columns of data but can see why that could get huge. If she starts with SEDA and just sets a queue size of something like 100 or 200 and then sets blockWhenFull to true her streaming will halt until the queue can

Re: Best Strategy to process a large number of rows in File

2016-03-29 Thread Jens Breitenstein
Sorry Ranx, missed your previous post. From our experience: We ran into OOM troubles with CVS files when converting a row to Map for example. Depending on the number of columns (we have aprox 100+) this can quite easily eat up the entire memory (sure you can always provide

Re: Best Strategy to process a large number of rows in File

2016-03-29 Thread Ranx
Jens, That's why I suggested setting the limit on the queue size. She has streaming turned on already so I believe that will block when the queue (SEDA or JMS) gets full. But 50,000 objects isn't usually that much memory so there may be something else in the JMS settings that is actually

Re: Best Strategy to process a large number of rows in File

2016-03-29 Thread Jens Breitenstein
To me it looks like the entire CSV file is converted to activemq messages which finally causes the OOM. The same problem exists with Camel SEDA queues as the messages need to be kept in memory until handled. I guess you need to set a queue size limit which blocks your csv sending process if no

Re: Best Strategy to process a large number of rows in File

2016-03-29 Thread Ranx
There are a number of answers to that question but this should be relatively easy to fix. Since you are running out of memory you should probably bump the max memory in the karaf startup batch file. But you will also want to limit the number of rows that you bring into memory. You already have

Re: Best Strategy to process a large number of rows in File

2016-03-29 Thread Michele
Hi, thanks a lot for your reply. I changed my route introducing Throttler Pattern like this: . 5

Re: Best Strategy to process a large number of rows in File

2016-03-28 Thread Ranx
Michelle, There are a number of ways you can do that and it will depend on what is constraining your REST API. Is it limited on the number of concurrent connections? Is it limited to the number of transactions/minute? There are at least two components you'll want after the JMS queue. One will