Re: solr cell: write entire file content binary to index along with metadata

2018-04-25 Thread Shawn Heisey
On 4/25/2018 4:02 AM, Lee Carroll wrote: *We don't recommend using solr-cell for production indexing.* Ok. Are the reasons for: Performance. I think we have rather modest index requirement (1000 a day... on a busy day) Security. The index workflow is, upload files to public facing s

Re: solr cell: write entire file content binary to index along with metadata

2018-04-25 Thread Lee Carroll
> > > > > *That's not usually the kind of information you want to have in a > Solrindex. Most of the time, there will be an entry in the Solr index > thattells the system making queries how to locate the actual data -- > afilename, a URL, a database lookup key, etc.*

Re: solr cell: write entire file content binary to index along with metadata

2018-04-24 Thread Shawn Heisey
On 4/24/2018 10:26 AM, Lee Carroll wrote: > Does the solr cell contrib give access to the files raw content along with > the extracted metadata?\ That's not usually the kind of information you want to have in a Solr index.  Most of the time, there will be an entry in the Solr index

Re: IndexFetcher cannot download index file

2018-04-24 Thread Shawn Heisey
On 4/24/2018 1:53 PM, Markus Jelsma wrote: > I don't see stack traces for most WARNs, for example the checksum > warning on recovery (other thread), or the Trie* deprecations. I just tried it on 7.3.0.  Added a line to CoreContainer.java to log an exception at warn when Solr is starting:     log

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
Inline. -Original message- > From:Shawn Heisey > Sent: Tuesday 24th April 2018 21:18 > To: solr-user@lucene.apache.org > Subject: Re: IndexFetcher cannot download index file > > On 4/24/2018 12:36 PM, Markus Jelsma wrote: > > I should be more precise, i said the

Re: IndexFetcher cannot download index file

2018-04-24 Thread Shawn Heisey
On 4/24/2018 12:36 PM, Markus Jelsma wrote: > I should be more precise, i said the stack traces of WARN are not shown, only > the messages are visible. The 'low disk space' line was hidden in the stack > trace of the WARN, as you can see in the pasted example, thus invisible in > the GUI with de

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
E: IndexFetcher cannot download index file > > Hello Shawn, > > I should be more precise, i said the stack traces of WARN are not shown, only > the messages are visible. The 'low disk space' line was hidden in the stack > trace of the WARN, as you can see in the

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
ds, Markus -Original message- > From:Shawn Heisey > Sent: Tuesday 24th April 2018 19:12 > To: solr-user@lucene.apache.org > Subject: Re: IndexFetcher cannot download index file > > On 4/24/2018 9:46 AM, Markus Jelsma wrote: > > Disk space was WARN level. It seem

Re: IndexFetcher cannot download index file

2018-04-24 Thread Shawn Heisey
disk space so that all the indexes can double in size temporarily.  To be absolutely certain you won't run out, it should be enough space so they can triple in size temporarily -- there is a certain indexing scenario where this can happen in the wild.  Replication can also create an ent

solr cell: write entire file content binary to index along with metadata

2018-04-24 Thread Lee Carroll
Does the solr cell contrib give access to the files raw content along with the extracted metadata? cheers Lee C

Re: IndexFetcher cannot download index file

2018-04-24 Thread Charlie Hull
On 24/04/2018 16:44, Walter Underwood wrote: In Ultraseek, we checked free disk space before starting a merge or replication. If there wasn’t enough space, it emailed an error to the admin and disabled merging or replication, respectively. Checking free disk space on Windows was a pain. On a

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
> To: solr-user@lucene.apache.org > Subject: Re: IndexFetcher cannot download index file > > On 4/24/2018 6:52 AM, Markus Jelsma wrote: > > Forget about it, recovery got a java.io.IOException: No space left on > > device but it wasn't clear until i inspected the real logs. >

Re: IndexFetcher cannot download index file

2018-04-24 Thread Walter Underwood
In Ultraseek, we checked free disk space before starting a merge or replication. If there wasn’t enough space, it emailed an error to the admin and disabled merging or replication, respectively. Checking free disk space on Windows was a pain. wunder Walter Underwood wun...@wunderwood.org http:/

Re: IndexFetcher cannot download index file

2018-04-24 Thread Shawn Heisey
On 4/24/2018 6:52 AM, Markus Jelsma wrote: Forget about it, recovery got a java.io.IOException: No space left on device but it wasn't clear until i inspected the real logs. The logs in de web admin didn't show the disk space exception, even when i expand the log line. Maybe that could be chang

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
--Original message- > From:Markus Jelsma > Sent: Tuesday 24th April 2018 14:39 > To: Solr-user > Subject: IndexFetcher cannot download index file > > Hello, > > Slightly different questions/problem, what is going on here on 7.2.1? During > the recovery, none of thi

IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
that? Also, why is fetching the full index so slow? It should take just a few minutes to transfer 50 GB between those nodes. While recovering, CPU utilization is normal/low. Many thanks, Markus Error fetching file,​ doing one retry...:org.apache.solr.common.SolrException: Unable to download _l5

Re: How to index and search (integer or float) vector.

2018-04-13 Thread Alexandre Rafalovitch
you do get it to work, a follow-up summary email would be fantastic resources for others searching this kind of information later. On 12 April 2018 at 20:44, Jason wrote: > Hi,I have specific documents that consist of integer vector with fixed > length.But I have no idea how to index in

Re: How to index and search (integer or float) vector.

2018-04-13 Thread Rick Leir
th fixed >length.But I have no idea how to index integer vector and search >similar >vector.Which fieldType should I use to solve this problem?And can I get >any >example for how to search? > > > >-- >Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

How to index and search (integer or float) vector.

2018-04-12 Thread Jason
Hi,I have specific documents that consist of integer vector with fixed length.But I have no idea how to index integer vector and search similar vector.Which fieldType should I use to solve this problem?And can I get any example for how to search? -- Sent from: http://lucene.472066.n3.nabble.com

Re: Default Index config

2018-04-11 Thread mganeshs
ns same. Pls check the JVM snapshot <https://lh3.googleusercontent.com/-MHYsop5Vovo/Ws3V8KrT2VI/ALI/GXmcWA4OtPQwwpF09scU5riJ1VHFOUtHQCL0BGAYYCw/h1080/2018-04-11.png> when we index using 6.2.1 Following is the snapshot <https://lh3.googleusercontent.com/

Re: Default Index config

2018-04-09 Thread mganeshs
Hi Shawn, Thanks for the reply. Yes we use only one solr client. Though collection name is passed in the function, we are using same client for now. Regarding merge config, after reading lot of forums and listening to presentation of revolution 2017, idea is to reduce the merge frequency, so th

Re: Default Index config

2018-04-09 Thread Shawn Heisey
On 4/9/2018 4:04 AM, mganeshs wrote: Regarding CPU high, when we are troubleshooting, we found that Merge threads are keep on running and it's take most CPU time ( as per Visual JVM ). With a one second autoSoftCommit, nearly constant indexing will produce a lot of very small index seg

Re: Default Index config

2018-04-09 Thread mganeshs
ut couldn't see any much change in the behaviour. In same solr node, we have multiple index / collection. In that case, whether TieredMergePolicyFactory will be right option or for multiple collection in same node we should go for other merge policy ( like LogByte etc ) Can you throw some li

Re: how to reset the index in solr

2018-04-02 Thread delk
If you want to delete all items from Solr index, use the query *:* - Development Center Toronto -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Default Index config

2018-03-28 Thread Shawn Heisey
tps://docs.google.com/document/d/1SaKPbGAKEPP8bSbdvfX52gaLsYWnQfDqfmV802hWIiQ/edit?usp=sharing> The VIRT memory should be about equivalent to the RES size plus the size of all the index data on the system.  So that looks about right.  The actual amount of memory allocated by Java for the heap and other memory structures

Re: Default Index config

2018-03-28 Thread mganeshs
, most of time simple queries with fq. Regarding index, during peak hours, we index around 100 documents in a second in a average. I also shared the CPU utilization in the same doc <https://docs.google.com/document/d/1SaKPbGAKEPP8bSbdvfX52gaLsYWnQfDqfmV802hWIiQ/edit#heading=h.ahsgapiu4829>

Re: Default Index config

2018-03-27 Thread Shawn Heisey
that leaves about 22GB for everything else.  If Solr is the only thing running on the machine, and your numbers mean that each server has about 30GB of index data, then that means you can get about two thirds of the index into the OS disk cache.   Usually this is enough for decent performance, but t

Re: Default Index config

2018-03-27 Thread mganeshs
much of RAM and CPU usages. CPU is always 80 to 90% even if we are trying to index or update some 50 docs at one shot and RAM it occupies whatever we give. We started with 8GB of Heap. But its always 8GB. Initially we were using CMS GC and tried with G1 GC. Only difference is that, In case of

Re: Default Index config

2018-03-26 Thread Shawn Heisey
On 3/26/2018 10:45 AM, mganeshs wrote: > I haven't changed the solr config wrt index config, which means it's all > commented in the solrconfig.xml. > > It's something like what I pasted before. But I would like to know whats the > default value of each of this. D

Default Index config

2018-03-26 Thread mganeshs
Hi, I haven't changed the solr config wrt index config, which means it's all commented in the solrconfig.xml. It's something like what I pasted before. But I would like to know whats the default value of each of this. Coz.. after loading to 6.5.1 and our document size also cros

Re: /var/solr/data has lots of index* directories

2018-03-05 Thread Tom Peters
/var/solr/data >> solr2-b: 29G/var/solr/data >> solr2-c: 6.6G /var/solr/data >> solr2-d: 9.7G /var/solr/data >> solr2-e: 19G/var/solr/data >> >> The leader is currently "solr2-a" >> >> Here's the actual index size:

Re: /var/solr/data has lots of index* directories

2018-03-05 Thread Shalin Shekhar Mangar
ata > solr2-b: 29G/var/solr/data > solr2-c: 6.6G /var/solr/data > solr2-d: 9.7G /var/solr/data > solr2-e: 19G/var/solr/data > > The leader is currently "solr2-a" > > Here's the actual index size: > > Master (Searching) > 1520273

/var/solr/data has lots of index* directories

2018-03-05 Thread Tom Peters
ta solr2-c: 6.6G /var/solr/data solr2-d: 9.7G /var/solr/data solr2-e: 19G/var/solr/data The leader is currently "solr2-a" Here's the actual index size: Master (Searching) 1520273178244 # version 73034 # gen 3.66 GB # size When I look inside

Re: Configuring Solr Data and Index directories

2018-03-02 Thread Shawn Heisey
On 3/2/2018 2:15 AM, YELESWARAPU, VENKATA BHAN wrote: While deploying Solr I just see one parameter where we provide solr_home path. For ex: -Dsolr.solr.home=/usr/local/clo/ven/solr_home 1) Is there any path where we can configure data and index directories. 2) Can we separate data

index mail with MailEntityProcessor

2018-03-02 Thread Dimitris Kardarakos
Hello everyone. I have created a collection and indexed mails from a gmail mailbox. Nevertheless, only plain text is indexed. Neither html formatted nor attachments' indexing works. To index mails, I have included the below libs to solrconfig: regex=".*\.jar" /> regex=&q

Configuring Solr Data and Index directories

2018-03-02 Thread YELESWARAPU, VENKATA BHAN
Information Classification: ** Limited Access Dear Team, While deploying Solr I just see one parameter where we provide solr_home path. For ex: -Dsolr.solr.home=/usr/local/clo/ven/solr_home 1) Is there any path where we can configure data and index directories. 2) Can we separate data

Re: storing large text fields in a database? (instead of inside index)

2018-02-21 Thread Roman Chyla
Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 20 Feb 2018, at 20:39, Roman Chyla wrote: > > > > Say there is a high load and I'd like to bring a new machine and let it > > replicate the index, if 100gb and more can be shaved, i

Re: storing large text fields in a database? (instead of inside index)

2018-02-21 Thread Emir Arnautović
toring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 20 Feb 2018, at 20:39, Roman Chyla wrote: > > Say there is a high load and I'd like to bring a new machine and let it > replicate the inde

Re: storing large text fields in a database? (instead of inside index)

2018-02-20 Thread Roman Chyla
Say there is a high load and I'd like to bring a new machine and let it replicate the index, if 100gb and more can be shaved, it will have a significant impact on how quickly the new searcher is ready and added to the cluster. Impact on the search speed is likely minimal. we are investig

Re: storing large text fields in a database? (instead of inside index)

2018-02-20 Thread David Hastings
Really depends on what you consider too large, and why the size is a big issue, since most replication will go at about 100mg/second give or take, and replicating a 300GB index is only an hour or two. What i do for this purpose is store my text in a separate index altogether, and call on that

storing large text fields in a database? (instead of inside index)

2018-02-20 Thread Roman Chyla
Hello, We have a use case of a very large index (slave-master; for unrelated reasons the search cannot work in the cloud mode) - one of the fields is a very large text, stored mostly for highlighting. To cut down the index size (for purposes of replication/scaling) I thought I could try to save

Sitecore Analytics Index

2018-02-20 Thread rojerick luna
Hi, For those have Sitecore website app having multisite (but only 1 Sitecore code base), have you separated the index for each multisite? how where you able to manage it? also do you have archiving since analytics data keep growing? Thanks Best Regards, Jeck

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-19 Thread Alessandro Benedetti
gathered so far. Your situation caught our attention and definitely changing the order of the documents in input shouldn't affect the index size ( by such a greater factor). The fact that the optimize didn't change anything is even more suspicious. It may be an indicator that in some edge cases o

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-18 Thread Howe, David
new Solr server from scratch, I'm pretty confident that the defaults haven't changed between test runs as when we create the Solr index, Solr doesn't know what order the data in the database table is in. I did try removing the geo location field to see if that made a difference,

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread Gora Mohanty
On 18 February 2018 at 08:18, @Nandan@ wrote: > Thanks Rick. > Is it possible to get some demo learning video link or web links from > where I can get overview with real example? > By which I can able to know in more details. > Searching Google for "Solr index data database&

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread @Nandan@
gt; >Hi David , > >Thanks for your reply. > >My few questions are :- > >1) I have to denormalize my MySQL data manually or some process is > >there. > >2) is it like when Data will insert into my MySQL , it has to auto > >index > >into solr ? > > &

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread Rick Leir
s for your reply. >My few questions are :- >1) I have to denormalize my MySQL data manually or some process is >there. >2) is it like when Data will insert into my MySQL , it has to auto >index >into solr ? > >Please explain these . >Thanks > >On Feb 18, 2018 1:51

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread @Nandan@
Hi David , Thanks for your reply. My few questions are :- 1) I have to denormalize my MySQL data manually or some process is there. 2) is it like when Data will insert into my MySQL , it has to auto index into solr ? Please explain these . Thanks On Feb 18, 2018 1:51 AM, "David Hastings&qu

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread David Hastings
Your first step is to denormalize your data into a flat data structure. Then index that into your solr instance. Then you’re done On Feb 17, 2018, at 12:16 PM, @Nandan@ mailto:nandanpriyadarshi...@gmail.com>> wrote: Hi Team, I am working on one e-commerce project in which my data is s

Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread @Nandan@
Hi Team, I am working on one e-commerce project in which my data is storing into MySQL DB. As currently we are using mysql search but planning to implement Solr search to provide our customers more facilities. Just for development purpose ,I am trying to do experiments into localhost. Please guide

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Erick Erickson
t; The only difference is that I added docValues=false to all of the fields that > are indexed=false and stored=true in the run that is smaller. I had tested > this previously with the data in the order that makes the index larger and it > only made a minor difference (see one of my

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
I haven't accidentally changed anything, and I have done this. The only difference is that I added docValues=false to all of the fields that are indexed=false and stored=true in the run that is smaller. I had tested this previously with the data in the order that makes the index larger and

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Erick Erickson
tored Best, Erick On Fri, Feb 16, 2018 at 11:37 AM, Howe, David wrote: > > Hi Erick, > > Below is the file listing for when the index is loaded with the table ordered > in a way that produces the smaller index. > > I have checked the console, and we have no deleted docs and

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, Thinking some more about the differences between the two sort orders has suggested another possibility. We also have a geo spatial field defined in the index: echo "$(date) Creating geoLocation field" curl -X POST -H 'Content-type:application/json' --dat

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, Below is the file listing for when the index is loaded with the table ordered in a way that produces the smaller index. I have checked the console, and we have no deleted docs and we have the same number of docs in the index as there are rows in the staging table that we load from

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Alessandro, There are 14,061,990 records in the staging table and that is how many documents that we end up with in Solr. I would be surprised if we have a problem with the id, as we use the primary key of the table as the id in Solr so it must be unique. The primary key of the staging ta

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Alessandro Benedetti
It's a silly thing, but to confirm the direction that Erick is suggesting : How many rows in the DB ? If updates are happening on Solr ( causing the deletes), I would expect a greater number of documents in the DB than in the Solr index. Is the DB primary key ( if any) the same of the uniq

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Emir, We have no copy field definitions. To keep things simple, we have a one to one mapping between the columns in our staging table and the fields in our Solr index. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Emir Arnautović
Hi David, I skimmed through thread and don’t see if already eliminated, so will ask: Can you check if there are some copyField rules that are triggered when new field is added. You mentioned that ordering fixed the size of the index, but might be worth checking. Emir -- Monitoring - Log

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Erick Erickson
This isn't terribly useful without a similar dump of "the other" index directory. The point is to compare the different extensions some segment where the sum of all the files in that segment is roughly equal. So if you have a listing of the old index around, that would help. bq: We

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David
Hi Erick, I have the full dump of the Solr index file sizes as well if that is of any help. I have attached it below this message. We don't have any deleted docs in our index, as we always build it from a brand new virtual machine with a brand new installation of Solr. The orderi

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Erick Erickson
ter checking that these ratios are true for a single like-sized segment in both cases 1> the LukeReqeustHandler can tell you information about exactly how the index is defined, and using Luke itself can provide you a much more detailed look at what's actually _in_ your index. You could al

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Pratik Patel
gotten me closer to what > the issue is. When I run the version of the index that is working > correctly against my database table that has the extra field in it, the > index suddenly increases in size. This is even though the data importer is > running the same SELECT as before (wh

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David
Hi Alessandro, Some interesting testing today that seems to have gotten me closer to what the issue is. When I run the version of the index that is working correctly against my database table that has the extra field in it, the index suddenly increases in size. This is even though the data

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Alessandro Benedetti
@Pratik: you should have investigated. I understand that solved your issue, but in case you needed norms it doesn't make sense that cause your index to grow up by a factor of 30. You must have faced a nasty bug if it was just the norms. @Howe : *Compound File* .cfs, .cfe An opt

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Howe, David
I have re-run both scenarios and captured the total size of each type of index file. The MB (1) column is for the baseline scenario which has the smaller index and acceptable performance. The MB(2) column is after I have added the extra field to the index. Ext MB (1) MB (2

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
Feb 14, 2018 at 1:01 PM, Alessandro Benedetti wrote: > Hi pratik, > how is it possible that just the norms for a single field were causing such > a massive index size increment in your case ? > > In your case I think it was for a field type used by multiple fields, but > it&

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Alessandro Benedetti
Hi pratik, how is it possible that just the norms for a single field were causing such a massive index size increment in your case ? In your case I think it was for a field type used by multiple fields, but it's still suspicious in my opinions, norms should be that big. If I remember correct

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Erick Erickson
or sort on the field. This _will_ increase the index size on disk, but it's almost always a good tradeoff, here's why: To facet, group or sort you need to "uninvert" the field. If you have docValues=false, this universion is done at run-time into Java's heap. If you have do

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
I had a similar issue with index size after upgrading to version 6.4.1 from 5.x. The issue for me was that the field which caused index size to be increased disproportionately had a field type("text_general") for which default value of omitNorms was not true. Turning it on explicitl

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
I have set docValues=false on all of the string fields in our index that have indexed=false and stored=true. This gave a small improvement in the index size from 13.3GB to 12.82GB. I have also tried running an optimize, which then reduced the index to 12.6GB. Next step is to dump the sizes

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Thanks Hoss. I will try setting docValues to false, as we only ever want to be able to retrieve the value of this field. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T 0391067904 M 0424036591 E david.h...@auspost.com.au W

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Erick, Thanks for responding. You are correct that we don't have any deleted docs. When we want to re-index (once a fortnight), we build a brand new installation of Solr from scratch and re-import the new data into an empty index. I will try setting docValues to false and see if

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Alessandro, The docker image is like a disk image of the entire server, so it includes the operating system, the Solr installation and the data. Because we run in the cloud and our index isn't that big, this is an easy and fast way for us to scale our Solr cluster without havi

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread David Hastings
To piggy back on this, what would be the right scenarios to use docvalues='true'? On Tue, Feb 13, 2018 at 1:10 PM, Chris Hostetter wrote: > > : We are using Solr 7.1.0 to index a database of addresses. We have found > : that our index size increases massively when we add

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Chris Hostetter
: We are using Solr 7.1.0 to index a database of addresses. We have found : that our index size increases massively when we add one extra field to : the index, even though that field is stored and not indexed, and doesn’t what about docValues? : When we run an index load without the

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Erick Erickson
David: Right, Optimize Is Evil. Well, actually in your case it's not. In your specific case you can optimize every time you build your index and be OK, gory details here: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ But that's just for backgroun

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Alessandro Benedetti
Hi David, given the fact that you are actually building a new index from scratch, my shot in the dark didn't hit any target. When you say : "Once the import finishes we save the docker image in the AWS docker repository. We then build our cluster using that image as the base" D

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Alessanro, Thanks for responding. We rebuild the index every time starting from a fresh installation of Solr. Because we are running at AWS, we have automated our deployment so we start with the base docker image, configure Solr and then import our data every time the data changes (it

Re: solr spell check index dictionary build failed issue

2018-02-13 Thread Alessandro Benedetti
Shooting in the dark it seems that 2 processes are trying to write the same disk directory. Is this directory shared by different Solr cores or Solr instances ? If you contribute the configuration from the solrconfig we may be able to help. - --- Alessandro Benedetti Search Cons

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Alessandro Benedetti
I assume you re-index in full right ? My shot in the dark is that this increment is temporary. You re-index, so effectively delete and add all documents ( this means that even if the new field is just stored, you re-build the entire index for all the fields). Create new segments and the old docs

Index size increases disproportionately to size of added field when indexed=false

2018-02-12 Thread Howe, David
Hi, We are using Solr 7.1.0 to index a database of addresses. We have found that our index size increases massively when we add one extra field to the index, even though that field is stored and not indexed, and doesn’t contain a lot of data. When this occurs, we also observe a significant

Null Pointer exception after upgrading lucene index from 6.1 to 7.2

2018-02-12 Thread Webster Homer
We ran the org.apache.lucene.index.IndexUpgrader as part of upgrading from 6.1 to 7.2.0 After the upgrade, one of our collections threw a NullPointerException on a query of *:* We didn't observe errors in the logs. All of our other collections appear to be fine. Re-indexing the collection seems

solr spell check index dictionary build failed issue

2018-02-12 Thread Krishna Kumar Sharma
Hell I have issue on building of spell check index dictionary showing error as like ERROR undefined SpellCheckComponent Exception in building spell check index for spellchecker: indexD org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: /var/lib/solr/data

Re: can you migrate solr index files from osx to linux

2018-02-07 Thread Jeff Dyke
I forgot to report back on this. For anyone that runs into it, you need the entire data directory not just the index directory, at least that's what made it work for me. On Thu, Feb 1, 2018 at 9:52 PM, Erick Erickson wrote: > I think SCP will be fine. Shawn's comment is proba

Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Erick Erickson
ried that the gzip caused issues, but as i mentioned no >> errors on start up, and i thought i would see some. @Erick, how would you >> recommend. This is going to be less of an issue b/c i need to build the >> index programmatically anyway, but would be nice to know if only for

Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Shawn Heisey
This is going to be less of an issue b/c i need to build the > index programmatically anyway, but would be nice to know if only for > curiosity. Perhaps making a replication backup and then restoring on the > new server would be better. In the middle of other things now, will try a >

Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Jeff Dyke
d the index programmatically anyway, but would be nice to know if only for curiosity. Perhaps making a replication backup and then restoring on the new server would be better. In the middle of other things now, will try a few of those, plus some other ideas. On Thu, Feb 1, 2018 at 4:49 PM, Erick Eri

Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Erick Erickson
or install on Ubuntu. I >> didn't think a point minor point release would matter. >> >> solr@stagingsolr01:~/data/issuers/data$ ls -1 >> 981552 >> index >> _mg8.dii >> _mg8.dim >> _mg8.fdt >> _mg8.fdx >> _mg8.fnm >&g

Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Shawn Heisey
/data/issuers/data$ ls -1 > 981552 > index > _mg8.dii > _mg8.dim > _mg8.fdt > _mg8.fdx > _mg8.fnm > _mg8_Lucene50_0.doc > _mg8_Lucene50_0.pos > _mg8_Lucene50_0.tim > _mg8_Lucene50_0.tip > _mg8_Lucene70_0.dvd > _mg8_Lucene70_0.dvm > _mg8.nvd > _mg8.nvm >

Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Jeff Dyke
That's exactly what i thought as well. The only difference and i can try to downgrade OSX is 7.2, and i grabbed 7.2.1 for install on Ubuntu. I didn't think a point minor point release would matter. solr@stagingsolr01:~/data/issuers/data$ ls -1 981552 index _mg8.dii _mg8.dim _mg8.fd

Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Shawn Heisey
index), but no documents are seen. Nor are any errors thrown in the logs or at startup. Given the case sensitivity differences between OSX and Linux, could that be a problem? Are there further steps required, or is just not possible. Granted i'm going to programmatically rebuild the index, but

can you migrate solr index files from osx to linux

2018-02-01 Thread Jeff Dyke
e any errors thrown in the logs or at startup. Given the case sensitivity differences between OSX and Linux, could that be a problem? Are there further steps required, or is just not possible. Granted i'm going to programmatically rebuild the index, but wanted to start here. Thanks!

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread alessandro.benedetti
think the index time strategy will be much easier and it will just require a re-index and few small changes at query time configuration. Another possibility may be to use payloads and the related query parser, but also in this case you would need to re-index so it is unlikely that this option would be

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
-user@lucene.apache.org Subject: Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index) Hi Alessandro, Thanks for making it more clear. As I mentioned I do not want to change my index (mentioned in subject) for the feature I requested. search query will have

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Muhammad Zahid Iqbal
Hi Alessandro, Thanks for making it more clear. As I mentioned I do not want to change my index (mentioned in subject) for the feature I requested. search query will have to look for first 100 characters indexed in same XYZ field. " How can I achieve this without changing index? I wa

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread alessandro.benedetti
This seems different from what you initially asked ( and Diego responded) "One is simple, search query will look for whole content indexed in XYZ field Other one is, search query will have to look for first 100 characters indexed in same XYZ field. " This is still doable at Indexing time using a

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Emir Arnautović
Hi Muhammad, If the limit(s) are static, you can still do it at index time: Assuming you send “content” field, you index it fully (and store if needed), and you use copy field to copy to content_limitted field where you use limit token count filter to index only first X tokens: https

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Muhammad Zahid Iqbal
our update chain, > here's the base definition: > > > > trunc > 5 > > > > This _can_ be configured to operate on "all StrField", or "all > TextFields" as well, see the Javadocs. > > This is static, that is the fi

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-27 Thread Erick Erickson
Sure, use TruncateFieldUpdateProcessorFactory in your update chain, here's the base definition: trunc 5 This _can_ be configured to operate on "all StrField", or "all TextFields" as well, see the Javadocs. This is static, that is the field i

<    2   3   4   5   6   7   8   9   10   11   >