[jira] [Commented] (KAFKA-4335) FileStreamSource Connector not working for large files (~ 1GB)
[ https://issues.apache.org/jira/browse/KAFKA-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604522#comment-15604522 ] Rahul Shukla commented on KAFKA-4335: - Yes I got this exception on producer console java.lang.OutOfMemoryError: Java heap space at org.apache.kafka.connect.file.FileStreamSourceTask.poll(FileStreamSourceTask.java:135) at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:155) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) > FileStreamSource Connector not working for large files (~ 1GB) > -- > > Key: KAFKA-4335 > URL: https://issues.apache.org/jira/browse/KAFKA-4335 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.10.0.0 >Reporter: Rahul Shukla >Assignee: Ewen Cheslack-Postava > > I was trying to sink large file about (1gb). FileStreamSource connector is > not working for that it's working fine for small files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4335) FileStreamSource Connector not working for large files (~ 1GB)
[ https://issues.apache.org/jira/browse/KAFKA-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604115#comment-15604115 ] Rahul Shukla commented on KAFKA-4335: - It did not throw any exception but not producing content to the topic as well. I looked into source code and find that it's trying to read the file in memory and then produce the record. Which I believe it's difficult for hold entire file in memory. Below is source code snippet which tries to do ... int nread = 0; while (readerCopy.ready()) { nread = readerCopy.read(buffer, offset, buffer.length - offset); log.trace("Read {} bytes from {}", nread, logFilename()); if (nread > 0) { offset += nread; if (offset == buffer.length) { char[] newbuf = new char[buffer.length * 2]; System.arraycopy(buffer, 0, newbuf, 0, buffer.length); buffer = newbuf; } String line; do { line = extractLine(); if (line != null) { log.trace("Read a line from {}", logFilename()); if (records == null) records = new ArrayList<>(); records.add(new SourceRecord(offsetKey(filename), offsetValue(streamOffset), topic, VALUE_SCHEMA, line)); } } while (line != null); } } > FileStreamSource Connector not working for large files (~ 1GB) > -- > > Key: KAFKA-4335 > URL: https://issues.apache.org/jira/browse/KAFKA-4335 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.10.0.0 >Reporter: Rahul Shukla >Assignee: Ewen Cheslack-Postava > > I was trying to sink large file about (1gb). FileStreamSource connector is > not working for that it's working fine for small files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4335) FileStreamSource Connector not working for large files (~ 1GB)
[ https://issues.apache.org/jira/browse/KAFKA-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602851#comment-15602851 ] Ewen Cheslack-Postava commented on KAFKA-4335: -- Can you be more specific about what isn't working? Does it throw an exception or some other error? > FileStreamSource Connector not working for large files (~ 1GB) > -- > > Key: KAFKA-4335 > URL: https://issues.apache.org/jira/browse/KAFKA-4335 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.10.0.0 >Reporter: Rahul Shukla >Assignee: Ewen Cheslack-Postava > > I was trying to sink large file about (1gb). FileStreamSource connector is > not working for that it's working fine for small files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)