Re: Merging of index in Solr

2017-11-27 Thread Zheng Lin Edwin Yeo
Hi, I found that in the IndexMergeTool.java, we found that there is this line which set the maxNumSegments to 1 writer.forceMerge(1); For this, does it means that there will always be only 1 segment after the merging? Is there any way which we can allow the merging to be in multiple segment,

Re: Merging of index in Solr

2017-11-23 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thanks for the info. We will most likely be doing sharding when we migrate to Solr 7.1.0, and re-index the data. But as Solr 7.1.0 is still not ready to index EML files yet due to this JIRA, https://issues.apache.org/jira/browse/SOLR-11622, we have to make use with our current Solr

Re: Merging of index in Solr

2017-11-22 Thread Shawn Heisey
On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote: I'm doing the merging on the SSD drive, the speed should be ok? The speed of virtually all modern disks will have almost no influence on the speed of the merge.  The bottleneck isn't disk transfer speed, it's the operation of the merge code

Re: Merging of index in Solr

2017-11-22 Thread Zheng Lin Edwin Yeo
Hi Erick, Yes, we are planning to do sharding when we upgrade to the newer Solr 7.1.0, and probably will re-index everything. But currently we are waiting for certain issues on indexing the EML files to Solr 7.1.0 to be addressed first, like for this JIRA,

Re: Merging of index in Solr

2017-11-22 Thread Erick Erickson
Sure, sharding can give you accurate faceting, although do note there are nuances, JSON faceting can occasionally be not exact, although there are JIRAs being worked on to correct this. "traditional" faceting has a refinement phase that gets accurate counts. But the net-net is that I believe

Re: Merging of index in Solr

2017-11-22 Thread Zheng Lin Edwin Yeo
I'm doing the merging on the SSD drive, the speed should be ok? We need to merge because the data are indexed in two different collections, and we need them to be under the same collection, so that we can do things like faceting more accurately. Will sharding alone achieve this? Or do we have to

Re: Merging of index in Solr

2017-11-22 Thread Erick Erickson
Really, let's back up here though. This sure seems like an XY problem. You're merging indexes that will eventually be something on the order of 3.5TB. I claim that an index of that size is very difficult to work with effectively. _Why_ do you want to do this? Do you have any evidence that you'll

Re: Merging of index in Solr

2017-11-22 Thread Shawn Heisey
On 11/21/2017 9:10 AM, Zheng Lin Edwin Yeo wrote: > I am using the IndexMergeTool from Solr, from the command below: > > java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar > org.apache.lucene.misc.IndexMergeTool > > The heap size is 32GB. There are more than 20 million documents in the

Re: Merging of index in Solr

2017-11-22 Thread Zheng Lin Edwin Yeo
Hi Emir, Yes, I am running the merging on a Windows machine. The hard disk is a SSD disk in NTFS file system. Regards, Edwin On 22 November 2017 at 16:50, Emir Arnautović wrote: > Hi Edwin, > Quick googling suggests that this is the issue of NTFS related to large

Re: Merging of index in Solr

2017-11-22 Thread Emir Arnautović
Hi Edwin, Quick googling suggests that this is the issue of NTFS related to large number of file fragments caused by large number of files in one directory of huge files. Are you running this merging on a Windows machine? Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
Hi, I have encountered this error during the merging of the 3.5TB of index. What could be the cause that lead to this? Exception in thread "main" Exception in thread "Lucene Merge Thread #8" java.io. IOException: background merge hit exception: _6f(6.5.1):C7256757 _6e(6.5.1):C646 2072

Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
I am using the IndexMergeTool from Solr, from the command below: java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar org.apache.lucene.misc.IndexMergeTool The heap size is 32GB. There are more than 20 million documents in the two cores. Regards, Edwin On 21 November 2017 at 21:54,

Re: Merging of index in Solr

2017-11-21 Thread Shawn Heisey
On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote: Does anyone knows how long usually the merging in Solr will take? I am currently merging about 3.5TB of data, and it has been running for more than 28 hours and it is not completed yet. The merging is running on SSD disk. The following will

Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
Hi Edwin, I’ll let somebody with more knowledge about merge to comment merge aspects. What do you use to merge those cores - merge tool or you run it using Solr’s core API? What is the heap size? How many documents are in those two cores? Regards, Emir -- Monitoring - Log Management - Alerting -

Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
Hi Emir, Thanks for your reply. There are only 1 host, 1 nodes and 1 shard for these 3.5TB. The merging has already written the additional 3.5TB to another segment. However, it is still not a single segment, and the size of the folder where the merged index is supposed to be is now 4.6TB, This

Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
Hi Edwin, How many host/nodes/shard are those 3.5TB? I am not familiar with merge code, but trying to think what it might include, so don’t take any of following as ground truth. Merging for sure will include segments rewrite, so you better have additional 3.5TB if you are merging it to a

Merging of index in Solr

2017-11-20 Thread Zheng Lin Edwin Yeo
Hi, Does anyone knows how long usually the merging in Solr will take? I am currently merging about 3.5TB of data, and it has been running for more than 28 hours and it is not completed yet. The merging is running on SSD disk. I am using Solr 6.5.1. Regards, Edwin