Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written
On Tue, 2016-06-28 at 16:42 +, Rajendran, Prabaharan wrote: > Please suggest me which is best way to index(multithreaded) if your > input format is text/csv (file). Last I tried, it was pretty straight forward: Split your CSV in chunks and start about as many separate uploads as you have (real) CPU cores. - Toke Eskildsen, State and University Library, Denmark
RE: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written
Thanks Toke, now I am splitting file before indexing. Shalin, thanks for the details. Even this fixed in 5.5 and 6.0 is there any threshold value. Please suggest me which is best way to index(multithreaded) if your input format is text/csv (file). Thanks, Prabaharan -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: 28 June 2016 16:06 To: solr-user@lucene.apache.org; Toke Eskildsen Subject: Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written This was fixed in 5.5 and 6.0. You can upload files larger than 2GB with the simple post tool however I don't recommend it because it uses a single indexing thread. On Tue, Jun 28, 2016 at 3:55 PM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > On Mon, 2016-06-27 at 13:24 +, Rajendran, Prabaharan wrote: > > I am trying to index a text file about 4.2 GB in size. [...] > > > > SimplePostTool: FATAL: IOException while posting data: > java.io.IOException: too many bytes written > > SimplePostTool uses > HttpUrlConnection.setFixedLengthStreamingMode(file_size) > where file_size is an integer. > > Unfortunately there is no check for overflow (which happens with files > > 2GB), so there is no sane error message up front and you only get > the error you pasted after some bytes has been sent. With a 4.2GB > input file, I would guess after about 200MB (4.2GB % 2GB). > > > Long story short: Keep your posts below 2GB. > > - Toke Eskildsen, State and University Library, Denmark > > > -- Regards, Shalin Shekhar Mangar.
RE: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written
Thanks Erick, for your response. Now I am splitting the file before indexing. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 28 June 2016 11:01 To: solr-user Subject: Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written You're most likely not getting _near_ 4.2G written to Solr, the transport protocol is probably cutting that off as indicated by the "early EOF" exception. It's really hard to justify trying to index 4.2G as a _single_ file. First of all you won't even be able to receive it in Solr after you've given it only 1G of memory even if you get the transport stuff worked out. Second, searching it is totally useless in most cases as it will probably match _everything_. Thirdly, even if it does match something, how are you going to return it to a user? If it's multiple documents in a huge uber-doc you can break it up at ingestion and only send docs to Solr rather than the whole thing. IOW, I think this is a waste of your time. I understand that you're trying to see the limits, but this limit is not a reasonable one to hope to cross. Best, Erick On Mon, Jun 27, 2016 at 6:24 AM, Rajendran, Prabaharan <rajendra...@dnb.com> wrote: > Hi, > > I am trying to index a text file about 4.2 GB in size. This kind of POC to > understand Solr capacity on indexing & searching. > > Here is my Solr configuration > -Xms1024m-Xmx1024m-Xss256k > > java -Dtype=text/csv -Dparams="separator=%09" > -Durl=http://localhost:8983/solr/mycollection/update -jar > ..\example\exampledocs\post.jar ..\example\exampledocs\largefile.txt > > While doing index got error like below, > SimplePostTool: FATAL: IOException while posting data: > java.io.IOException: too many bytes written > > Kindly let me know, if I need to change (increase memory) any solr > configuration to handle this. > > Here is my log file entry, > > ERROR (qtp297811323-14) [ x:collection2] o.a.s.c.SolrCore > org.apache.solr.common.SolrException: CSVLoader: input=null, > line=2815040,can't read line: 2815040 > values={NO LINES AVAILABLE} > at > org.apache.solr.handler.loader.CSVLoaderBase.input_err(CSVLoaderBase.java:317) > at > org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:356) > at > org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) > at > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) > at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:499) > at > org.ecli
Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written
This was fixed in 5.5 and 6.0. You can upload files larger than 2GB with the simple post tool however I don't recommend it because it uses a single indexing thread. On Tue, Jun 28, 2016 at 3:55 PM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > On Mon, 2016-06-27 at 13:24 +, Rajendran, Prabaharan wrote: > > I am trying to index a text file about 4.2 GB in size. [...] > > > > SimplePostTool: FATAL: IOException while posting data: > java.io.IOException: too many bytes written > > SimplePostTool uses > HttpUrlConnection.setFixedLengthStreamingMode(file_size) > where file_size is an integer. > > Unfortunately there is no check for overflow (which happens with files > > 2GB), so there is no sane error message up front and you only get the > error you pasted after some bytes has been sent. With a 4.2GB input > file, I would guess after about 200MB (4.2GB % 2GB). > > > Long story short: Keep your posts below 2GB. > > - Toke Eskildsen, State and University Library, Denmark > > > -- Regards, Shalin Shekhar Mangar.
Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written
On Mon, 2016-06-27 at 13:24 +, Rajendran, Prabaharan wrote: > I am trying to index a text file about 4.2 GB in size. [...] > > SimplePostTool: FATAL: IOException while posting data: java.io.IOException: > too many bytes written SimplePostTool uses HttpUrlConnection.setFixedLengthStreamingMode(file_size) where file_size is an integer. Unfortunately there is no check for overflow (which happens with files > 2GB), so there is no sane error message up front and you only get the error you pasted after some bytes has been sent. With a 4.2GB input file, I would guess after about 200MB (4.2GB % 2GB). Long story short: Keep your posts below 2GB. - Toke Eskildsen, State and University Library, Denmark
Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written
You're most likely not getting _near_ 4.2G written to Solr, the transport protocol is probably cutting that off as indicated by the "early EOF" exception. It's really hard to justify trying to index 4.2G as a _single_ file. First of all you won't even be able to receive it in Solr after you've given it only 1G of memory even if you get the transport stuff worked out. Second, searching it is totally useless in most cases as it will probably match _everything_. Thirdly, even if it does match something, how are you going to return it to a user? If it's multiple documents in a huge uber-doc you can break it up at ingestion and only send docs to Solr rather than the whole thing. IOW, I think this is a waste of your time. I understand that you're trying to see the limits, but this limit is not a reasonable one to hope to cross. Best, Erick On Mon, Jun 27, 2016 at 6:24 AM, Rajendran, Prabaharan <rajendra...@dnb.com> wrote: > Hi, > > I am trying to index a text file about 4.2 GB in size. This kind of POC to > understand Solr capacity on indexing & searching. > > Here is my Solr configuration > -Xms1024m-Xmx1024m-Xss256k > > java -Dtype=text/csv -Dparams="separator=%09" > -Durl=http://localhost:8983/solr/mycollection/update -jar > ..\example\exampledocs\post.jar ..\example\exampledocs\largefile.txt > > While doing index got error like below, > SimplePostTool: FATAL: IOException while posting data: java.io.IOException: > too many bytes written > > Kindly let me know, if I need to change (increase memory) any solr > configuration to handle this. > > Here is my log file entry, > > ERROR (qtp297811323-14) [ x:collection2] o.a.s.c.SolrCore > org.apache.solr.common.SolrException: CSVLoader: input=null, > line=2815040,can't read line: 2815040 > values={NO LINES AVAILABLE} > at > org.apache.solr.handler.loader.CSVLoaderBase.input_err(CSVLoaderBase.java:317) > at > org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:356) > at > org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) > at > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) > at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:499) > at > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.t
SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written
Hi, I am trying to index a text file about 4.2 GB in size. This kind of POC to understand Solr capacity on indexing & searching. Here is my Solr configuration -Xms1024m-Xmx1024m-Xss256k java -Dtype=text/csv -Dparams="separator=%09" -Durl=http://localhost:8983/solr/mycollection/update -jar ..\example\exampledocs\post.jar ..\example\exampledocs\largefile.txt While doing index got error like below, SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written Kindly let me know, if I need to change (increase memory) any solr configuration to handle this. Here is my log file entry, ERROR (qtp297811323-14) [ x:collection2] o.a.s.c.SolrCore org.apache.solr.common.SolrException: CSVLoader: input=null, line=2815040,can't read line: 2815040 values={NO LINES AVAILABLE} at org.apache.solr.handler.loader.CSVLoaderBase.input_err(CSVLoaderBase.java:317) at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:356) at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: org.eclipse.jetty.io.EofException: Early EOF at org.eclipse.jetty.server.HttpInput$3.noContent(HttpInput.java:506) at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:124) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:154) at java.io.BufferedReader.read(BufferedReader.java:175) at org.apache.solr.internal.csv.ExtendedBufferedReader.read(ExtendedBufferedReader.java:82) at org.apache.solr.internal.csv.CSVParser.simpleTokenLexer(CSVParser.java:421) at org.apache.solr.internal.csv.CSVParser.nextToken(CSVParser.java:371) at org.apache.solr.internal.csv.CSVParser.getLine(CSVParser.java:231) at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoade