Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.
Hello. The server has much more memory. I have given minimum 8 GB to Application Server.. The Java opts which are of interest is : -server -Xms8192m -Xmx16384m -XX:MaxPermSize=8192m Even after giving this much memory to the server, how come i am hitting OOM exceptions. No other activity is being performed on the server apart from this. Checking from JConsole, the maximum Heap during indexing was close to 1.2 GB whereas the memory allocated is as mentioned above,. I did mentioned 128MB also but this is when I start the server on a normal windows machine. Isn't there any property/configuration in LUCENE which I should do in order to index large files. Say about 30 MB.. I read something MergeFactor and etc. but was not able to set any value for it. Don't even know whether doing that will help the cause.. On 8/29/2013 7:04 PM, Ian Lea wrote: Well, I use neither Eclipse nor your application server and can offer no advice on any differences in behaviour between the two. Maybe you should try Eclipse or app server forums. If you are going to index the complete contents of a file as one field you are likely to hit OOM exceptions. How big is the largest file you are ever going to index? The server may have 8GB but how much memory are you allowing the JVM? What are the command line flags? I think you mentioned 128Mb in an earlier email. That isn't much. -- Ian. On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka wrote: Hello, I get exception only when the code is fired from Eclipse. When it is deployed on an application server, I get no exception at all. This forced me to invoke the same code from Eclipse and check what is the issue.,. I ran the code on server with 8 GB memory.. Even then no exception occurred!!.. Only write.lock is formed.. Removing contents field is not desirable as this is needed for search to work perfectly... On 8/29/2013 6:17 PM, Ian Lea wrote: So you do get an exception after all, OOM. Try it without this line: doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8"; I think that will slurp the whole file in one go which will obviously need more memory on larger files than on smaller ones. Or just run the program with more memory, -- Ian. On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka wrote: Yes I know that Lucene should not have any document size limits. All I get is a lock file inside my index folder. Along with this there's no other file inside the index folder. Then I get OOM exception. Please provide some guidance... Here is the example: package com.issue; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.LongField; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexCommit; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.LiveIndexWriterConfig; import org.apache.lucene.index.LogByteSizeMergePolicy; import org.apache.lucene.index.MergePolicy; import org.apache.lucene.index.SerialMergeScheduler; import org.apache.lucene.index.MergePolicy.OneMerge; import org.apache.lucene.index.MergeScheduler; import org.apache.lucene.index.Term; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.LineNumberReader; import java.util.Date; public class D { /** Index all text files under a directory. */ static String[] filenames; public static void main(String[] args) { //String indexPath = args[0]; String indexPath="D:\\Issue";//Place where indexes will be created String docsPath="Issue";//Place where the files are kept. boolean create=true; String ch="OverAll"; final File docDir = new File(docsPath); if (!docDir.exists() || !docDir.canRead()) { System.out.println("Document directory '" +docDir.getAbsolutePath()+ "' does not exist or is not readable, please check the path"); System.exit(1); } Date start = new Date(); try { Directory dir = FSDirectory.open(new File(indexPath)); Analyzer analyzer=new com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44); IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44, analyzer); iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); IndexWriter writer = new IndexWriter(dir, iwc); if(ch.equalsIgnoreCase("OverAll")){ indexDocs(writer, docDir,true); }else
Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.
Hello, The following exception is being printed on the server console when trying to index. As usual, indexes are not getting created. java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.AttributeSource.(AttributeSource.java:148) at org.apache.lucene.util.AttributeSource.(AttributeSource.java:128) 18:42:21,764 INFO at org.apache.lucene.analysis.TokenStream.(TokenStream.java:91) 18:42:21,765 INFO at org.apache.lucene.document.Field$StringTokenStream.(Field.java:568) 18:42:21,765 INFO at org.apache.lucene.document.Field.tokenStream(Field.java:541) 18:42:21,765 INFO at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95) 18:42:21,766 INFO at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245) 18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265) 18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169) 18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:197) 18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:153) 18:42:21,768 INFO at com.rancore.MainClass1.main(MainClass1.java:95) 18:42:21,771 INFO java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit 18:42:21,772 INFO at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) 18:42:21,911 INFO at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) 18:42:21,911 INFOat org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) 18:42:21,912 INFO at com.rancore.MainClass1.main(MainClass1.java:122) 18:42:22,008 INFO Indexing to directory Any guidance will be highly appreciated...>... Server Opts are -server -Xms8192m -Xmx16384m -XX:MaxPermSize=512m On 8/30/2013 3:13 PM, Ankit Murarka wrote: Hello. The server has much more memory. I have given minimum 8 GB to Application Server.. The Java opts which are of interest is : -server -Xms8192m -Xmx16384m -XX:MaxPermSize=8192m Even after giving this much memory to the server, how come i am hitting OOM exceptions. No other activity is being performed on the server apart from this. Checking from JConsole, the maximum Heap during indexing was close to 1.2 GB whereas the memory allocated is as mentioned above,. I did mentioned 128MB also but this is when I start the server on a normal windows machine. Isn't there any property/configuration in LUCENE which I should do in order to index large files. Say about 30 MB.. I read something MergeFactor and etc. but was not able to set any value for it. Don't even know whether doing that will help the cause.. On 8/29/2013 7:04 PM, Ian Lea wrote: Well, I use neither Eclipse nor your application server and can offer no advice on any differences in behaviour between the two. Maybe you should try Eclipse or app server forums. If you are going to index the complete contents of a file as one field you are likely to hit OOM exceptions. How big is the largest file you are ever going to index? The server may have 8GB but how much memory are you allowing the JVM? What are the command line flags? I think you mentioned 128Mb in an earlier email. That isn't much. -- Ian. On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka wrote: Hello, I get exception only when the code is fired from Eclipse. When it is deployed on an application server, I get no exception at all. This forced me to invoke the same code from Eclipse and check what is the issue.,. I ran the code on server with 8 GB memory.. Even then no exception occurred!!.. Only write.lock is formed.. Removing contents field is not desirable as this is needed for search to work perfectly... On 8/29/2013 6:17 PM, Ian Lea wrote: So you do get an exception after all, OOM. Try it without this line: doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8"; I think that will slurp the whole file in one go which will obviously need more memory on larger files than on smaller ones. Or just run the program with more memory, -- Ian. On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka wrote: Yes I know that Lucene should not have any document size limits. All I get is a lock file inside my index folder. Along with this there's no other file inside the index folder. Then I get OOM exception. Please provide some guidance... Here is the example: package com.issue; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.docum
Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.
Can someone please suggest what might be the possible resolution for the issue mentioned in trailing mail:: Also now on changing some settings for IndexWriterConfig and LiveIndexWriterConfig I get the following exception: 20:31:23,540 INFO java.lang.OutOfMemoryError: Java heap space 20:31:23,540 INFO at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8WithHash(UnicodeUtil.java:136) 20:31:23,540 INFO at org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.fillBytesRef(CharTermAttributeImpl.java:91) 20:31:23,541 INFO at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 20:31:23,541 INFO at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:165) 20:31:23,541 INFO at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245) 20:31:23,542 INFO at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265) 20:31:23,542 INFO at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) 20:31:23,542 INFO at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) 20:31:23,542 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188) 20:31:23,543 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169) 20:31:23,543 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:220) 20:31:23,543 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:167) 20:31:23,543 INFO at com.rancore.MainClass1.main(MainClass1.java:110) 20:31:23,546 INFO java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit 20:31:23,546 INFO at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) 20:31:23,546 INFO at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) 20:31:23,546 INFO at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) 20:31:23,547 INFO at com.rancore.MainClass1.main(MainClass1.java:136) Can anyone please guide There has to be some way how a file of say 20 MB can be properly indexed... Any guidance is highly appreciated.. On 8/30/2013 6:49 PM, Ankit Murarka wrote: Hello, The following exception is being printed on the server console when trying to index. As usual, indexes are not getting created. java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.AttributeSource.(AttributeSource.java:148) at org.apache.lucene.util.AttributeSource.(AttributeSource.java:128) 18:42:21,764 INFO at org.apache.lucene.analysis.TokenStream.(TokenStream.java:91) 18:42:21,765 INFO at org.apache.lucene.document.Field$StringTokenStream.(Field.java:568) 18:42:21,765 INFO at org.apache.lucene.document.Field.tokenStream(Field.java:541) 18:42:21,765 INFO at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95) 18:42:21,766 INFO at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245) 18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265) 18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169) 18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:197) 18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:153) 18:42:21,768 INFO at com.rancore.MainClass1.main(MainClass1.java:95) 18:42:21,771 INFO java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit 18:42:21,772 INFO at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) 18:42:21,911 INFO at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) 18:42:21,911 INFOat org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) 18:42:21,912 INFO at com.rancore.MainClass1.main(MainClass1.java:122) 18:42:22,008 INFO Indexing to directory Any guidance will be highly appreciated...>... Server Opts are -server -Xms8192m -Xmx16384m -XX:MaxPermSize=512m On 8/30/2013 3:13 PM, Ankit Murarka wrote: Hello. The server has much more memory. I have given minimum 8 GB to Application Server.. The Java opts which are of interest is : -server -Xms8192m -Xmx16384m -XX:MaxPermSize=8192m Even after giving this much memory to the server, how come i am hitting OOM exceptions. No other activity is being performed on the server apart from this. Checking from JConsole, the maximum Heap durin
Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.
The exact point at which a java program hits OOM is pretty random and it isn't always fair to blame the chunk triggering the exception. callMethodThatAllocatesNearlyAllMemory(); callMethodThatAllocatesABitOfMemory(); and the second one hits OOM, but is not to blame. Anyway, your base problem seems to be that you are trying to index a 20Mb chunk of text in one go and, despite appearing to have plenty of memory, failing. What happens with different -Xmx values? Are you sure they are taking effect? What does code like this: Runtime rt = Runtime.getRuntime(); long maxMemory = rt.maxMemory() / 1024 / 1024; long totalMemory = rt.totalMemory() / 1024 / 1024; long freeMemory = rt.freeMemory() / 1024 / 1024; say before/after loading a 10Mb file, 15Mb file, 20Mb file etc? Have you run it with a memory profiler? Verbose GC? Googled diagnosing java memory problems? What happens when you don't index the 20Mb in one chunk? Only index the 20Mb in one chunk, not the individual lines? Does it work when you are adding a doc but not when you are updating? The other way round? Fail on all 20Mb files or just some? If some, what's the difference between them? Good luck. Have a nice weekend. -- Ian. On Fri, Aug 30, 2013 at 4:14 PM, Ankit Murarka wrote: > Can someone please suggest what might be the possible resolution for the > issue mentioned in trailing mail:: > > Also now on changing some settings for IndexWriterConfig and > LiveIndexWriterConfig I get the following exception: > > > 20:31:23,540 INFO java.lang.OutOfMemoryError: Java heap space > 20:31:23,540 INFO at > org.apache.lucene.util.UnicodeUtil.UTF16toUTF8WithHash(UnicodeUtil.java:136) > 20:31:23,540 INFO at > org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.fillBytesRef(CharTermAttributeImpl.java:91) > 20:31:23,541 INFO at > org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) > 20:31:23,541 INFO at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:165) > 20:31:23,541 INFO at > org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245) > 20:31:23,542 INFO at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265) > 20:31:23,542 INFO at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) > 20:31:23,542 INFO at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) > 20:31:23,542 INFO at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188) > 20:31:23,543 INFO at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169) > 20:31:23,543 INFO at > com.rancore.MainClass1.indexDocs(MainClass1.java:220) > 20:31:23,543 INFO at > com.rancore.MainClass1.indexDocs(MainClass1.java:167) > 20:31:23,543 INFO at com.rancore.MainClass1.main(MainClass1.java:110) > 20:31:23,546 INFO java.lang.IllegalStateException: this writer hit an > OutOfMemoryError; cannot commit > 20:31:23,546 INFO at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) > 20:31:23,546 INFO at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) > 20:31:23,546 INFO at > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) > 20:31:23,547 INFO at com.rancore.MainClass1.main(MainClass1.java:136) > > Can anyone please guide > There has to be some way how a file of say 20 MB can be properly indexed... > > Any guidance is highly appreciated.. > > > > On 8/30/2013 6:49 PM, Ankit Murarka wrote: >> >> Hello, >> >> The following exception is being printed on the server console when trying >> to index. As usual, indexes are not getting created. >> >> >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.lucene.util.AttributeSource.(AttributeSource.java:148) >> at >> org.apache.lucene.util.AttributeSource.(AttributeSource.java:128) >> 18:42:21,764 INFO at >> org.apache.lucene.analysis.TokenStream.(TokenStream.java:91) >> 18:42:21,765 INFO at >> org.apache.lucene.document.Field$StringTokenStream.(Field.java:568) >> 18:42:21,765 INFO at >> org.apache.lucene.document.Field.tokenStream(Field.java:541) >> 18:42:21,765 INFO at >> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95) >> 18:42:21,766 INFO at >> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245) >> 18:42:21,766 INFO at >> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265) >> 18:42:21,766 INFO at >> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) >> 18:42:21,767 INFO at >> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) >> 18:42:21,767 INFO at >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188) >> 18:42:21,767
Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.
Ankit, The stack traces you are showing only say there was an out of memory error. In those case, the stack trace is unfortunately not always helpful since the allocation may fail on a small object because another object is taking all the memory of the JVM. Can you come up with a small piece of code that reproduces the error you are encountering? This would help us see if there is something wrong in the indexing code and try to debug it otherwise. -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Optimize Lucene 4.4 for CPU usage
I've noticed that processes that were previously IO bound (in 3.5) are now CPU bound (in 4.4) and I expect it is due to the compression/decompression of term vector fields in 4.4. It would be nice if users of 4.4 could turn the compression OFF entirely. -Original Message- From: Ivan Krišto [mailto:ivan.kri...@gmail.com] Sent: Wednesday, August 21, 2013 12:45 PM To: java-user@lucene.apache.org Subject: Re: Optimize Lucene 4.4 for CPU usage On 08/20/2013 07:53 PM, Mirko Sertic wrote: > I am using Lucene 4.4, and i am hitting cpu usage limitations on my > core i7 windows 7 64bit box. Seems like the IO system(ssd) has still > capacity, but when running 8 threads searching on the index in > parallel, all logical cpu cores are at 100% usage. > > Is there a common way available to optimize query throughput and lower > cpu usage? I am thinking index compression could be disabled for > instance, as index size is not the problem. Have you tried to profile code to rule out options? Plain JVisualVM (free Java profiler that comes with JDK) should do the trick. Just run profiler against Lucene and check which methods take most of CPU time. Maybe some serialization outside lucene takes most of the CPU time. Regards, Ivan Krišto - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org