Re: Help needed in breaking large solr index file into smaller ones

2017-01-09 Thread Erick Erickson
Why? What do you think this will accomplish? I'm wondering if this is an XY problem. Best, Erick On Mon, Jan 9, 2017 at 7:48 AM, Manan Sheth wrote: > Hi All, > > I have a problem simillar to this one, where the indexes in multiple solr > shards has created large

Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Erick Erickson
Why do you have a requirement that the indexes be < 4G? If it's arbitrarily imposed why bother? Or is it a non-negotiable requirement imposed by the platform you're on? Because just splitting the files into a smaller set won't help you if you then start to index into it, the merge process will

Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Mikhail Khludnev
Perhaps you can copy this index into a separate location. Remove odd and even docs into former and later indexes consequently, and then force merge to single segment in both locations separately. Perhaps shard splitting in SolrCloud does something like that. On Mon, Jan 9, 2017 at 1:12 PM,

Re: term frequency solrj

2017-01-09 Thread Mikhail Khludnev
Hello Huda, Try to check this https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/test/org/apache/solr/client/solrj/response/TermsResponseTest.java On Mon, Jan 9, 2017 at 4:31 PM, huda barakat wrote: > Hi, > Can anybody help me, I need to get term

Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Anshum Gupta
Can you provide more information about: - Are you using Solr in standalone or SolrCloud mode? What version of Solr? - Why do you want this? Lack of disk space? Uneven distribution of data on shards? - Do you want this data together i.e. as part of a single collection? You can check out the

Re: Solr Index upgradation Merging issue observed

2017-01-09 Thread Shawn Heisey
On 1/8/2017 11:21 PM, Manan Sheth wrote: > Currently, We are in process of upgrading existing Solr indexes from Solr 4.x > to Solr 6.2.1. In order to upgrade existing indexes we are planning to use > IndexUpgrader class in sequential manner from Solr 4.x to Solr 5.x and Solr > 5.x to Solr

Re: term frequency solrj

2017-01-09 Thread Shawn Heisey
On 1/9/2017 6:31 AM, huda barakat wrote: > Can anybody help me, I need to get term frequency for a specific > filed, I use the techproduct example and I use this code: The variable "terms" is null on line 29, which is why you are getting NullPointerException. > query.setRequestHandler("terms");

Re: CDCR logging is Needlessly verbose, fills up the file system fast

2017-01-09 Thread Shawn Heisey
On 12/22/2016 8:10 AM, Webster Homer wrote: > While testing CDCR I found that it is writing tons of log messages per > second. Example: > 2016-12-21 23:24:41.652 INFO (qtp110456297-13) [c:sial-catalog-material > s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1] > o.a.s.c.S.Request

Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2017-01-09 Thread Shawn Heisey
On 11/28/2016 11:06 AM, Walter Underwood wrote: > Worst case: > 1. Disable merging. > 2. Delete all the documents. > 3. Add all the documents. > 4. Enable merging. > > After step 3, you have two copies of everything, one deleted copy and one new > copy. > The merge makes a third copy. Just

Re: CloudSolrStream can't set the setZkClientTimeout and setZkConnectTimeout properties

2017-01-09 Thread Joel Bernstein
Currently these are not settable.It's easy enough to add a setter for this values. What types of behaviors have you run into when CloudSolrClient is having timeouts issues? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Jan 9, 2017 at 10:06 AM, Yago Riveiro wrote:

Re: Loading Third party libraries along with Solr

2017-01-09 Thread Shawn Heisey
On 1/9/2017 11:35 AM, Shashank Pedamallu wrote: > I’m Shashank. I’m new to Solr and was trying to use amazon-aws sdk > along with Solr. I added amazon-aws.jar and its third party > dependencies under /solr-6.3.0/server/solr/lib folder. Even after I > add all required dependencies, I keep getting

Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Manan Sheth
Hi Erick, Its due to some past issues observed with Joins on Solr 4, which got OOM on joining to large indexes after optimization/compaction, if those are stored as smaller files those gets fit into memory and operations are performed appropriately. Also, there are slow write/commit/updates

ICUFoldingFilter with swedish characters, and tokens with the keyword attribute?

2017-01-09 Thread jimi.hullegard
Hi, I wasn't happy with how our current solr configuration handled diacritics (like 'é') in the text and in search queries, since it simply considered the letter with a diacritic as a distinct letter. Ie 'é' didn't match 'e', and vice versa. Except for a handful rare words where the

Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Manan Sheth
Additionally to answer Anshum's queries, We are currently using Solr 4.10 and planning to upgrade to Solr 6.2.1 and upgradation process in creating the current problem. We are using it in SolrCloud with 8-10 shards split on different nodes each having segment size ~30 GB for some collection

Re: CDCR How to recover from Corrupted transaction log

2017-01-09 Thread Webster Homer
The root cause was the aggressive logging filling up the file system. Our admins have the logs on the same file system with the data, so when the filesystem got full it couldn't write to the transaction logs which corrupted them Thank you for the tips on recovery, I will forward them to our

Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread billnbell
Can you set Solr config segments to a higher number, don't optimize and you will get smaller files after a new index is created. Can you reindex ? Bill Bell Sent from mobile > On Jan 9, 2017, at 7:15 AM, Narsimha Reddy CHALLA > wrote: > > No, it does not work by

Re: Question about Lucene FieldCache

2017-01-09 Thread billnbell
Try disabling and perf may get better Bill Bell Sent from mobile > On Jan 9, 2017, at 6:41 AM, Yago Riveiro wrote: > > The documentation says that the only caches configurable are: > > - filterCache > - queryResultCache > - documentCache > - user defined caches > >

Available

2017-01-09 Thread billnbell
I am available for consulting projects if your project needs help. Been doing Solr work for 6 years... Bill Bell Sent from mobile

Facet date Range without start and and date

2017-01-09 Thread nabil Kouici
Hi All, Is it possible to have facet date range without specifying start and and of the range. Otherwise, is it possible to put in the same request start to min value and end to max value. Thank you. Regards,NKI.

Re: SolrCloud and LVM

2017-01-09 Thread billnbell
Yeah we normally take the number of GB on a machine for the index size on disk and then double it for memory... For example we have 28gb on disk and we see great perf at 64gb ram. If you can do that you will probably get good results. Remember to not give Java much memory. We set it at 12gb.

can we customize SOLR search for IBM Filenet 5.2?

2017-01-09 Thread puneetmishra2555
can we customize SOLR search for IBM Filenet 5.2? -- View this message in context: http://lucene.472066.n3.nabble.com/can-we-customize-SOLR-search-for-IBM-Filenet-5-2-tp4313091.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Moenieb Davids
Hi, Aplogies for my response, did not read the question properly. I was speaking about splitting files for import -Original Message- From: billnb...@gmail.com [mailto:billnb...@gmail.com] Sent: 09 January 2017 05:45 PM To: solr-user@lucene.apache.org Subject: Re: Help needed in

Re: Help needed in breaking large solr index file into smaller ones

2017-01-09 Thread Manan Sheth
Hi All, I have a problem simillar to this one, where the indexes in multiple solr shards has created large index files (~10 GB each) and wanted to split this large file on each shard into smaller files. Please provide some guidelines. Thanks, Manan Sheth

Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
The documentation says that the only caches configurable are: - filterCache - queryResultCache - documentCache - user defined caches There is no entry for fieldValueCache and in my case all of list in the documentation are disable ... -- /Yago Riveiro On 9 Jan 2017 13:20 +, Mikhail

RE: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Moenieb Davids
Hi, Try split on linux or unix split -l 100 originalfile.csv this will split a file into 100 lines each see other options for how to split like size -Original Message- From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com] Sent: 09 January 2017 12:12 PM To:

Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Narsimha Reddy CHALLA
No, it does not work by splitting. First of all lucene index files are not text files. There is a segment_NN file which will refer index files in a commit. So, when we split a large index file into smaller ones, the corresponding segment_NN file also needs to be updated with new index files OR a

Re: Question about Lucene FieldCache

2017-01-09 Thread Mikhail Khludnev
This probably says why https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrConfig.java#L258 On Mon, Jan 9, 2017 at 4:41 PM, Yago Riveiro wrote: > The documentation says that the only caches configurable are: > > - filterCache > -

Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Thanks for re reply Mikhail, Do you know if the 1 value is configurable? My insert rate is so high (5000 docs/s) that the cache it's quite useless. In the case of the Lucene field cache, it's possible "clean" it in some way? Some cache is eating my memory heap. - Best regards /Yago

OnError CSV upload

2017-01-09 Thread Moenieb Davids
Hi All, Background: I have a mainframe file that I want to upload and the data is pipe delimited. Some of the records however have a few fields less that others within the same file and when I try to import the file, Solr has an issue with the amount of columns vs the amount of values, which is

Help needed in breaking large index file into smaller ones

2017-01-09 Thread Narsimha Reddy CHALLA
Hi All, My solr server has a few large index files (say ~10G). I am looking for some help on breaking them it into smaller ones (each < 4G) to satisfy my application requirements. Are there any such tools available? Appreciate your help. Thanks NRC

Re: Question about Lucene FieldCache

2017-01-09 Thread Mikhail Khludnev
Hello, Yago. "size": "1", "showItems": "-1", "initialSize": "10", "name": "fieldValueCache" These are Solr's UnInvertedFields, not Lucene's FieldCache. That 1 is for all fields of the collection schema. Collection reload or commit drop all entries from this cache. On Mon, Jan 9, 2017

Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL

2017-01-09 Thread Gethin James
For NOT NULL, I had some success using: WHERE field_name <> '' (greater or less than empty quotes) Best regards, Gethin. From: Joel Bernstein Sent: 05 January 2017 20:12:19 To: solr-user@lucene.apache.org Subject: Re: Regarding /sql --

Help needed in breaking large solr index file into smaller ones

2017-01-09 Thread Narsimha Reddy CHALLA
Hi All, My solr server has a few large index files (say ~10G). I am looking for some help on breaking them it into smaller ones (each < 4G) to satisfy my application requirements. Basically, I am not looking for any optimization of index here (ex: optimize, expungeDeletes etc.). Are there

Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Hi, After some reading into the documentation, supposedly the Lucene FieldCache is the only one that it's not possible to disable. Fetching the config for a collection through the REST API I found an entry like this: "query": { "useFilterForSortedQuery": true, "queryResultWindowSize": 1,

Re: Question about Lucene FieldCache

2017-01-09 Thread Mikhail Khludnev
On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro wrote: > Thanks for re reply Mikhail, > > Do you know if the 1 value is configurable? yes. in solrconfig.xml https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches

How to integrate SOLR in ibm filenet 5.2.1?

2017-01-09 Thread puneetmishra2555
How we can integrate SOLR in IBM filenet 5.2? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-integrate-SOLR-in-ibm-filenet-5-2-1-tp4313090.html Sent from the Solr - User mailing list archive at Nabble.com.

term frequency solrj

2017-01-09 Thread huda barakat
Hi, Can anybody help me, I need to get term frequency for a specific filed, I use the techproduct example and I use this code: // import java.util.List; import org.apache.solr.client.solrj.SolrClient; import

Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Manan Sheth
Is this really works for lucene index files? Thanks, Manan Sheth From: Moenieb Davids Sent: Monday, January 9, 2017 7:36 PM To: solr-user@lucene.apache.org Subject: RE: Help needed in breaking large index file into smaller ones

RE: How to integrate SOLR in ibm filenet 5.2.1?

2017-01-09 Thread Markus Jelsma
Apache ManifolCF is probably your friend here: http://manifoldcf.apache.org/en_US/index.html -Original message- > From:puneetmishra2555 > Sent: Monday 9th January 2017 14:37 > To: solr-user@lucene.apache.org > Subject: How to integrate SOLR in ibm filenet 5.2.1? > >

IndexWriter.forceMerge not working as desired

2017-01-09 Thread Manan Sheth
Hi All, While doing index merging through IndexWriter.forceMerge method in solr 6.2.1, I am passing the argument as 30, but it is still merging all the data (earlier collection use to have 10 segments) into single segment. Please provide some information in understading the behaviour.

Re: SolrCloud and LVM

2017-01-09 Thread Chris Ulicny
That's good to hear. I didn't think there would be any reason that using lvm would impact solr's performance but wanted to see if there was anything I've missed. As far as other performance goes, we use pcie and sata solid state drives since the indexes are mostly too large to cache entirely in

CloudSolrStream can't set the setZkClientTimeout and setZkConnectTimeout properties

2017-01-09 Thread Yago Riveiro
Hi, Using the CloudSolrStream, is it possible define the setZkConnectTimeout and setZkClientTimeout of internal CloudSolrClient? The default negotiation timeout is set to 10 seconds. Regards, /Yago - Best regards /Yago -- View this message in context:

Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Yago Riveiro
You can try to reindex your data to another collection with more shards -- /Yago Riveiro On 9 Jan 2017 14:15 +, Narsimha Reddy CHALLA , wrote: > No, it does not work by splitting. First of all lucene index files are not > text files. There is a segment_NN file which

Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Ok, then I need to configure to reduce the size of the cache. Thanks for the help Mikhail. -- /Yago Riveiro On 9 Jan 2017 17:01 +, Mikhail Khludnev , wrote: > This probably says why >

Soir Ulr entity

2017-01-09 Thread fabigol
Hi, i made a soir project with multiple entity. I want to launch one entity index with an URL. How i can choose the entity that i want in my url? Thank to your help -- View this message in context: http://lucene.472066.n3.nabble.com/Soir-Ulr-entity-tp4313172.html Sent from the Solr - User

Loading Third party libraries along with Solr

2017-01-09 Thread Shashank Pedamallu
Hi, I’m Shashank. I’m new to Solr and was trying to use amazon-aws sdk along with Solr. I added amazon-aws.jar and its third party dependencies under /solr-6.3.0/server/solr/lib folder. Even after I add all required dependencies, I keep getting NoClassDefinitionError and NoSuchMethod Errors. I

Re: SolrCloud different score for same document on different replicas.

2017-01-09 Thread Morten Bøgeskov
On Fri, 6 Jan 2017 10:45:02 -0600 Webster Homer wrote: > I was seeing something like this, and it turned out to be a problem with > our autoCommit and autoSoftCommit settings. We had overly aggressive > settings that eventually started failing with errors around too many